A Unified Theory of Genetic Genealogy

Bottom Line Up Front (BLUF):

Triangulated Groups = Clusters = Common Ancestors

Brief overview: Each of us has a specific genealogy Tree of Ancestors; and a fixed arrangement of our DNA segments from those Ancestors. I believe our DNA segments are reflected in our Triangulated Groups (TGs) of shared DNA segments, which are from specific Common Ancestor (CAs), and that each CA is represented by a specific Cluster of Shared Matches. I believe there is alignment between the TG CA and the Cluster CA, which can be very helpful. Put another way, each of our Ancestors will have a specific TG/Cluster combination, and at some point in our Tree there will be one TG and a corresponding Cluster for each Ancestor.

TRIANGULATED GROUPS

In these blog posts I’ve often stated that each segment Triangulated Group (TG) is from a specific Common Ancestor (CA) – in other words the DNA segment identified by a TG came from a specific Ancestor down the line of descent of your Ancestors to you. The Matches in a TG will be relatives (usually cousins) along one of your ancestral lines.  For example, if a TG is from a 6xG grandparent (7th cousin (7C) level), some of the Matches may be cousins from 1C to 7C; and some may be from Ancestors beyond the 6xG grandparent – perhaps (usually with shared segments below 15cM) somewhat beyond the 6xG grandparent.

Because of the random nature of DNA, and the wide range of cMs for cousins beyond 3C, there is no set of parameters (short of complete chromosome mapping) that will get you only TGs at one generation. For instance, I know of no cM parameter that will get you only TGs at, say, the 6C level – or any other level. So we usually wind up with a mix of TGs at different cousinship levels.

TG Outliers

Like with most things DNA, there may be some outliers, and not every Match in a TG will be found to share an IBD segment (in other words, some Matches with small shared DNA segments  – under 15cM – may be false Matches). But the important take-away is that the TG will represent a CA, even if a few Matches are false.

TG Bottom Line

Your DNA has fixed crossover points. Depending on the cM threshold you use for comparing shared DNA segments, your data will have natural break points between TGs. I used a 7cM threshold and got 372 TGs covering over 98% of my 45 Chromosomes. It was hard work doing all the comparisons and culling out the false shared segments. I have Matches who are 2C to 9C for about 80% of these TGs.

SHARED MATCH CLUSTERS

Recently, I’ve been blogging about Clustering. Clusters appear to come from a specific CA down the line of descent of your Ancestors to you (just like the description of a TG).

When I did a Clustering run of all my 5732 Matches at Family Finder, I got 352 Clusters which had a very high correlation to my 372 TGs.

Well… duh! When we consider that each of us has fixed segments in our DNA and fixed ancestors in our Tree, we understand that each of us, in our own unique ways, has a specific “solution” (Ancestors linked to DNA segments). So if we look at grouping by Clusters, it should reflect that “solution”. And when we form segment TGs, they should reflect that “solution”. And in combination, the Clusters and TGs should reflect the same “solution”.  In other words the Clusters and TGs should align.

In my opinion, Clustering with Shared Matches is a sophisticated way of grouping Matches based on the probability that a number of Shared Matches who mostly match each other, will be from the same CA.

Clustering Outliers

Like with most things DNA, there may be some outliers, and not every Match in a Cluster will be found to share the same CA. But the important take-away is that most do share the same CA, and the Cluster will represent that CA, even if a few Matches don’t.

Cluster Bottom Line

Your Shared Matches will tend to Cluster on CAs. Depending on the cM threshold you use for comparing shared DNA segments, your data will divide into different numbers of Clusters. See my experience here; and the process here. I used a 6cM threshold and got 350 to 382 Clusters, covering at least all of my 4xG grandparents and some out to 8xG grandparents. It was relatively easy to run the Cluster programs to get the Match/SharedMatch data, and relatively little work to determine a consensus of a CA for each Cluster, for each run at different cM levels (smaller thresholds result in more Matches and Clusters, and more work). I can see CAs out to 8C for some Clusters. [NB: Clustering does not find the CAs – this is homework you have to do before Clustering: find as many CAs as possible and put that information in the Notes, so it’s available for analysis at each Cluster run].

ACTION – USE CLUSTERS to form TRIANGULATED GROUPS

I’ve spent a lot of work over the past 8 years determining my 372 TGs (your number of TGs may vary, but I believe using a 7cM threshold for Shared Segments, it come out at this order of magnitude). Triangulation, even with the tools at 23andMe, MyHeritage and GEDmatch takes time and work. In contrast, Clustering is relatively simple – pretty close to a “click” process. If Clusters are the same as TGs, we should be able to run a Cluster report on all of our Matches (at a company which also provides segment data), and then easily sort on the DNA segment data (sort by Chr and Start), and then relatively easily scroll down the several thousand Matches and group them into TGs. Yes, this scrolling will take some work, but it’s a whole lot easier than comparing each shared DNA segment pair in a browser. I believe the combination of Cluster numbers and segment data will easily define the TGs – maybe just a little “quality control” at the end, depending on how the data looks.

I have my brother’s DNA at FTDNA and 23andMe – I’m going to try this process on his results, and will report back.

The Bottom Line

Once you determine your TGs and the CAs that go with them, you have a Chromosome Map!

My Bottom Line

I’m trying to demonstrate:

  1. TG=CL=CA
  2. The CA will be in the 7C-9C range*

 

*I recognize that my belief that our DNA tests can accurately determine our CAs out to 8C, or so, is not held by most genetic genealogists. But based on my experience, particularly using Walking The Clusters Back, I believe this is a realistic range – easily and accurately obtained – and confirmed by both TGs and Clustering.

With our fixed Ancestry and DNA crossover points, each process should give us the same “solution” – whether we use DNA Painter, Kitty Cooper’s Chromosome Mapping, GenomeMatePro, Visual Phasing, Double Match Triangulator, etc., etc. We are just using different tools to “see” the chromosome map.

 

[19G] Segment-ology: A Unified Theory of Genetic Genealogy by Jim Bartlett 20191216

15 thoughts on “A Unified Theory of Genetic Genealogy

  1. I’m not a scientist, but I’m trying to learn a little about genetics. This post, as much as I have understood it, is fascinating. I’m going to follow you for a while.

    Like

  2. Pingback: Friday's Family History Finds | Empty Branches on the Family Tree

  3. Jim, very interesting as all your articles are. Have you given any thought to how many of your TGs will be from very distant ancestors. 10 generations back or even more.

    Like

      • A nice web site, HAPI-DNA, has an interactive tool to calculate the number of segments that will be shared. It was created by Amy Williams, a professor in computational biology at Cornell University. 1C and 2C (1st and 2nd cousins) relatives have essentially 100% probability of sharing at least five segments, each at least 7cM long. 4C relatives will share segments about 50% of the time, and most of them will match on just one shared segment >= 7cM. 8C relatives (8th cousins) will have a shared segment >= 7cM just 0.286% of the time, and practically all will be matches on just one segment. 8C relatives are ten generations back, so the probability of getting 8C matches to form a TG is virtually nil, as Jim explained. The HAPI-DNA calculator is a great tool for exploring the probability of matching segments as a function of relationship distance. https://hapi-dna.org/ibd-sharing-rates/

        Like

      • Thanks for the post Andy. I’ve watched Amy’s talks with some interest. However, I’m coming to the conclusion that many tools are just “pushing our vegetables around on our plate”. I’m all for grouping Matches (by Shared Matches, Clustering, and/or Triangulated Groups. But at the end of the day, we have to do the genealogy to find and confirm our Common Ancestors. It’s good to know the background of how segments work (I’ve posted a number of these posts), but I think the name of the game is linking Common Ancestors to DNA segments – and doing enough of them to be confident we have the correct links.

        Like

  4. I use dnagedcom.com to produce clusters using the ADSA tool (Autosomal DNA Segment Analyzer). Each report shows results from whatever chromosome # you chose to study. One can easily see the clusters in the report. The names of matches are on the left and mousing across each one shows the ancestral surnames (if they bothered to show them). On the far right one can see each match’s chromosomal info along mousing across that section to see the names in common with that individual on this segment And in the middle one can clearly see colorful ‘clusters’ of matches whose segments happen to fall on that particular chromosome.. In my case, most of the clusters tend to alternate between paternal and maternal clusters. I have tested many known relatives on both sides of my family which helps tremendously. Most helpful in my experience has been tracking down descendants of a great- or great great-grandparent who was married twice. Or three times! That way, when you and a known cousin from the ‘other’ marriage are matching on that segment you know you are matching on the known CA.

    Like

    • A G, I too have used DNAGedcom; and had good results. The Shared Clustering program lets me add Tags to the Notes in the Cluster report, and then upload that information to AncestryDNA. I can edit my Notes the same way – pretty easy in the Excel spreadsheet – and upload them back to Ancestry. Each of the Clustering programs should provide roughly the same Clusters, so my recommendation is to try each one and pick the one you are most comfortable with.

      Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.