Using a Group Common Ancestor

A Triangulation (and grouping) Concept

We have spent a lot of time and effort to describe *how* to group our Matches: segment Triangulation, DNA Painting, Shared Match Clustering. Each of these processes results in a group of Matches that should have a Common Ancestor (CA). This is an important concept.

But the main thing is to *use* this concept – to use the information found in these groups. If a group is formed around a CA, then all of the Matches in the group should share a CA. Once a CA is found, each Match in the group should also have that group CA, or be a closer cousin with an MRCA that descends from the group CA, or have a more distant MRCA which is ancestral to the group CA. In other words, all the Matches in a group should have the same distant CA.

So… if we find a CA for a group, the other Matches in the group should have the same CA line. This is a powerful focus – let’s *use* it. We should be able to look at other Matches in the group (who have Trees) and find that CA – either directly through a search, or indirectly by building out their Tree.

I illustrated this in Case 3 of Chapter 1 (Lessons Learned from Triangulating a Genome) of “Advanced Genetic Genealogy: Techniques and Case Studies” – here or here. This was all about one of my TGs which I call [04P36]. At Ancestry, I found a few cousins (who had uploaded to GEDmatch) in that TG who  shared my HIGGINBOTHAM ancestry. Armed with that hint, I searched for HIGGINBOTHAMs in other Matches (in that TG) who had trees. I also contacted Matches from FTDNA, 23andMe and MyHeritage – and several replied that they had the same HIGGINBOTHAM Ancestry. In the end I found 14 different Matches ranging from 4C to 8C on this HIGGINBOTHAM line in TG [04P36].

Because TG [04P36] came down a line of descent with the HIGGINBOTHAM surname in 5 generations, this case was an easier example – searching for one distinct surname. If a group represents a CA with a male-female zig-zag line of descent to me, it will be harder – the surname will change often. However, each line of descent (from a given Ancestor) is fixed – and we may find Match cousins with MRCAs of different surnames, but they will all be on the same ancestral line. This is akin to “Genealogy Triangulation” – getting an alignment of multiple cousins on one line.

Finding one Match with a CA in a group is not the end of the story – it’s a clue to the beginning of more research. If we find a CA for a group, but no other Match seems to have that CA, maybe we need to look for a different CA. The “correct” CA for each group should lead to Genealogy Triangulation – agreement by other Matches on the same ancestral line. If you find a CA in a group, *use* it to find more Matches on that same line. Seek CA agreement among Matches in each group.


[08D] Segment-ology: Using a Group Common Ancestor Concept by Jim Bartlett 20200620

Using Ethnicity to Identify a Cluster

A Segmentology TIDBIT

My Ancestor 14M was John William CAMPBELL, born 1856 NY; died 1916 WV. His parents were Samuel CAMPBELL and Ann CLARK who were married 1851 in Scotland and immigrated to the US in 1853. This 1/8 of my ancestry is the only known part to come from Scotland. Several cousins have done Y-DNA testing and the CAMPELL line is the Argyll CAMPBELLs.

I have over 125,000 Matches at AncestryDNA. I have identified Common Ancestors with over 4,500 Matches – only 5 of them are on my CAMPBELL line. About 12.5% of my DNA is from my CAMPBELL line, and, all other things being equal, about 12.5% of my Matches should come from my CAMPBELL line.  But all things are not equal – this CAMPBELL line is relatively small, and there are no known Ancestors before 1850, and there are no known links to any Ancestors in Scotland.

This doesn’t mean that none of my other Matches are cousins from this CAMPBELL line. However, it does result in me not being able to find any more links. I have tens of thousands of Matches with no Trees; I’ve even found some with a CAMPBELL surname – but no way to determine if I am related to them (other than the few who have matching Y-DNA at FamilyTreeDNA).

So, I drop back and relook at the big picture: exactly 1/8 of my Ancestry came from Scotland (well, maybe not going way back, but probably within a genealogy timeframe); roughly 1/8 of my DNA came from/through Scotland; and if not 1/8, perhaps 10,000 of my Matches should be on this part of my Ancestry– certainly more than the five close cousins I already knew about.

I decided to turn this lemon into lemonade. The lemon is recent Scottish immigrant ancestor – the lemonade is Scotland ethnicity. If this is the only part of ancestry from Scotland, maybe I could use that information. When I Cluster my AncestryDNA Matches at the 20cM Threshold (the lowest cM amount with Shared Matches to each other) I get about 160 Clusters. 1/8 of those is 20 Clusters – a manageable number. So when I see some solid looking Clusters without any hints of other ancestry, maybe they are from my Scotland line.

Here is one such Cluster. I clicked on the link for each Match and checked their ethnicity:

Every Match in this Cluster has 14% to 62% Scotland ethnicity. A few scattered Matches with Scotland ethnicity might be expected randomly, but for all of them to have significant amounts of Scotland ethnicity is a strong clue.

I think I can safely assume this CL149/14/[Scotland…] Cluster represents my Ancestor, John CAMPBELL – Ahnentafel 14M. If I knew the DNA segment, I could Paint this Cluster. I have several others that also show a pretty clear Cluster “picture”. Next I’ll be looking a some other Clusters which may even have a ThruLines Common Ancestor in them, but also have a lot of Scotland ethnicity – the ThruLines CA may be the outlier… With only one ThruLines CA I don’t have a high confidence that it’s right. But with high concordance of Scottish ethnicity, that’s a strong clue the Cluster is on my CAMPBELL line.

The next step is studying any Trees in these Scotland Clusters to see if those Matches have some Common Ancestors among themselves… That will be the sweetest lemonade of all.


[22AU] Segment-ology: Using Ethnicity to Identify a Cluster TIDBIT by Jim Bartlett 20200612

Clusters at Brick Walls

A Segmentology TIDBIT

Finding Common Ancestors with Matches in a Cluster sometimes “stops” at a specific generation – for example at the 3xGreat grandparent [4C] level. In other words, I’ve found cousins up to that generation, but not beyond. When one of these 3xGreat grandparents is a Brick Wall (or an “iffy” Ancestor), that’s probably the reason. The Cluster really goes back farther, but I don’t recognize any Common Ancestor further back.

It’s time to research and take notes.

I see three courses of action:

  1. If a surname is known or suspected, look in the Cluster for Matches with Trees and search them for that surname. Often, when I find one, I can build the Match’s ancestry out from there – looking for a link to my line.
  2. If a surname is unknown, jot down each Match’s surnames and try to find a Common Ancestor among them. Then I build the family around that Ancestor – looking for a link to my line.
  3. Alternatively, look for a common place and time approximately where the Cluster stops. Noodle around for any likely links. Check other Matches in the Cluster for those same links.

I use the Shared Clustering program which shows me the Matches for each Cluster, Common Ancestors from ThruLines, the number of people in their Tree, my Notes, and a hyperlink back to their AncestryDNA Profile. For each Cluster it’s easy to see potential CAs, then click on Match links, and see the surnames in common or call up their Tree for a more in depth review. It goes pretty quickly.

The result of these courses of action have ranged from easy “low hanging fruit” to “Mission Impossible”. In other words – sometimes it works, sometimes it doesn’t. I try these alternatives because they work in enough cases to encourage me to try more. I hope they will help you.


[22AT] Segment-ology: Clusters at Brick Walls TIDBIT by Jim Bartlett 20200507

Are Overlapping Segments Triangulated?

This question comes up often. The answer is: we cannot tell from just the fact that two shared DNA segments overlap in a chromosome browser.  Here is the picture we see:

11D Figure 1 Browser

In this picture, you are normally A and you have two Matches, B and C, which show as overlapping on Chromosome 6. Because they overlap, is this Triangulation? Do A, B and C shared the same Common Ancestor? We cannot tell from this picture.

Assuming the shared DNA segments are Identical By Descent (IBD) – generally true for all such shared segments over 15cM – there are two possibilities:

  1. They are on different Chromosome 06’s in A. Remember we have two of each Chromosome – one from our mother and one from our father.


11D Figure 2a Two Chr

In this case, we are (somehow) looking at just A’s two Chromosome 06’s and showing where the shared DNA segments are on A’s DNA. It looks just like the picture we saw in the browser – two overlapping DNA segments. But in this case A & B are sharing on A’s maternal Chromosome 06; and A & C are sharing on A’s paternal Chromosome 06. These two Chromosome 06’s are physically separate (think of two strands of spaghetti). Because A & B have a shared DNA segment, they have a Common Ancestor (CA) who passed that DNA down to them. Because we know in this example that it’s on the maternal Chromosome (the one from A’s mother), we know the CA is on A’s maternal side. Similarly, we know the CA with C is on A’s paternal side. Yes, there is a very unlikely chance that these two CA’s could be the same person, and the DNA segment came down two very different paths to A’s mother and father. I’ll not be sarcastic here – you can decide for yourself if you think that is possible (or what the probability is) in your case.* In general, in genetic genealogy, we conclude that B & C are probably not related to each other – at least not on this segment.

  1. Alternatively, the two shared segments are on the same Chromosome 06 in A – let’s say, for example, they are both on the maternal side (imagine the two bars below on one Chromosome).


11D Figure 3a One Chr

In this case, we are (somehow) looking at just A’s one maternal Chromosome 06, and showing where the shared DNA segments are. Again, it looks just like the picture we saw in the browser – two overlapping DNA segments. But in this case A & B and A & C are sharing on A’s maternal Chromosome 06 (they are both on the same strand of spaghetti). From the beginning of the A & C shared segment to the end of the A & B shared segment, we are looking at the exact same place on A’s Chromosome 06. For there to be a match, all the tested markers (SNPs) are the same. In general, in genetic genealogy, we take this to mean that this DNA came from the same Common Ancestor. It came from that CA down to A and to B and to C. Because both B and C share this same segment of DNA found on one Chromosome 06 in A, both and B and C should themselves show up as a Match to each other. After all they have the same DNA over this area of their own Chromosome 06.

You may have noticed that I stated each explanation of the two possibilities with: “In this case, we are (somehow) looking at…” Well we can’t just look at just one chromosome in a browser and compare it to someone else’s DNA. We don’t have that technology for genealogy DNA testing. But if we could, that is what we would see (probably without the color coding). But we cannot! We can only visualize it. So what can we do?

We use reverse logic. In the first possibility, we noted that B & C wouldn’t match each other; and in the second possibility, we noted that B & C should match each other. That is information we often can determine (at 23andMe, MyHeritage and GEDmatch – and round-aboutly at FTDNA). So, we say that if A matches B, and A matches C on the same/overlapping DNA segment, AND B matches C there too, it indicates the second possibility above – the three of them share the same Common Ancestor. This case of A=B=C=A is called segment Triangulation, and the three Matches are in a Triangulated Group [TG]. There is more about Triangulation here.

In my case, I have close to 20,000 Match/segments – each shared DNA segment is in one of 372 TGs which cover all of my DNA. In other words, these 372 TGs form a segment map of my 45 Chromosomes. The objective now is to determine the Ancestors who passed these TGs down to my parents and then to me.

*If you want to check to see if you have the same segment from your mother and your father, upload your DNA to and use the “Are Your Parents Related” program. It will show you any such segments, which is good information to have in any case.


[11D] Segment-ology: Are Overlapping Segments Triangulated? by Jim Bartlett 20200414

Download Your AncestryDNA Matches in 10 Minutes!

A Segmentology TIDBIT


UPDATE: AncestryDNA has issued a cease and desist order, and this process is no longer available to download your Matches. Sorry about that.

That is download: all your Matches, a hyperlink [to their Page as a Match to you], Shared cM, Shared Segments, Tree Type, Tree Size, Common Ancestors [per ThruLines], a tic for each Dot and Star, and your Notes! This fast download does NOT include your Shared Matches, which may take days to download.

Here’s the process:

  1. Before running this program, I set up a separate folder with todays date [e.g. 20200409] for each download; the Shared Clustering program will give you a chance to select this folder and to rename the download file.
  2. Download the Shared Clustering program. See my review of this program here. The link to upload this program is:
  3. Click on Download TAB
  4. Enter your Ancestry user name and password [stored on your PC only]
  5. Click on Sign In
  6. Select your Test (if you have access to more than one)
  7. Click the button for Fast but incomplete
  8. Open Advanced options
  9. Lowest centimorgans to retrieve: 6 [this includes all of your Matches]
  10. Lowest centimorgans of shared matches: 4000 [this means don’t download any Shared Matches]
  11. Click on: Get DNA Matches


Here’s a picture of the message when the download is complete:

So 125,000+ Matches in 6 minutes – your results may vary.

After the download, Export the downloaded txt file to Excel. Click on the Export TAB, and follow the prompts to create an Excel file – takes about 4 min.

You can then use/manipulate the Excel file. You can sort on any field, and you can edit any Notes and then Upload those revisions back to AncestryDNA. I use this as an opportunity to do a Quality check of my Notes, and to insure I have a Note for each Match with a ThruLines Common Ancestor. I find it’s much easier to edit Notes in the spreadsheet, than to jump around to each Match at AncestryDNA. NB: Don’t edit Notes in AncestryDNA when you are also editing Notes in the spreadsheet. If you do any edits in AncestryDNA, you need to do a new Download (it only takes 10 minutes!)


[22AS] Segment-ology: Download Your AncestryDNA Matches in 10 Minutes!

TIDBIT by Jim Bartlett 20200409 EDITED 20200808

AncestryDNA ThruLines Missing Out

A Segment-ology TIDBIT

ThruLines is based on genealogy – it finds Common Ancestors based on your Tree and the Trees of others. However, it only reports Common Ancestors with your DNA Matches. So, in a sense it has a DNA component. But the connections TL finds are not based on shared DNA cMs, Chromosome location, segment Triangulation, Clustering or Shared Matching – it is based only on connections found through Trees (only on genealogy). And ThruLines only reports Common Ancestors with your DNA Matches.

This is a two edge sword:

  1. If you only want to work with DNA Matches, it’s a good thing.
  2. However, if you are a genealogist looking for cousins who might share records, pictures, stories, analysis, new branches, etc., it leaves something out. Remember that roughly half of our 4th cousins (4C) don’t share DNA with us, and roughly 90% of our true 5C don’t share DNA with us, and the vast majority of our more distant true cousins don’t share any DNA with us. This means that, although a program like ThruLines could find those non-DNA-sharing cousins for us, it doesn’t. Think of all that we are missing – think of all the lost opportunities.

Well… looking back on the #1 cutting edge of the sword – I’ve got to be a happy camper. I’m finding more ThruLines Matches than I can keep up with. By adding children and grandchildren of my Ancestors in my Tree, ThruLines is finding more Matches with Common Ancestors. And these Matches and their Trees are reinforcing my Tree (and pointing out a few soft spots…)

Back to work… Stay safe!

[22AR] Segment-ology: AncestryDNA ThruLines Missing Out – TIDBIT by Jim Bartlett 20200326

In Defense of Small Segments

Do you remember genealogy before atDNA? Pre-2010?

There was a time when we didn’t know about atDNA segments. We researched records, and looked at other people’s Trees/records. We developed our Trees and found cousins, without any knowledge of whether we shared any DNA or not.

So what’s changed?

We got a great new tool called atDNA that told us who we “matched” based on one or more shared atDNA segments. Each company developed an algorithm and reported Matches based on at least 6-8cM of matching DNA. The concept was that a person who shared a DNA segment of at least the minimum “threshold” size was probably related. Early on we learned that a shared DNA segment of at least 15cM was “always” a true match – it was Identical By Descent (IBD); and those IBD shared segments came from a Common Ancestor (CA) to us and our Matches. We also learned that from the company threshold (6-8cM) up to 15cM, some of the shared segments were false – the lower the cM, the more likely that the shared segment was false. Generally, about half of the 7cM shared segments were true and half were false; 6cM shared segments were false most of the time, and 8-15cM shared segments were true most of the time – we just couldn’t tell which were true and which were false. Some of the companies had other ways to improve the probabilities, but many of the experts admonished us to generally avoid using the segments below 15cM. A huge debate grew up about the use of 6-15cM shared DNA segments.

To get some data on shared DNA segments, Blaine Bettinger developed the Shared cM Project which showed our collective experience in finding cousins with various amounts of shared cM. His chart is in this article. The Shared cM Project showed that many had found 3rd cousins (3C) to 6C with ranges of cMs down to the threshold amounts. And at the testing companies and GEDmatch, we were finding 3C to 8C with shared segments in the 6-15cM range. AncestryDNA reported Circles (with CAs) out to 8C. The genetic genealogy community was finding cousins with these small shared segments – we just didn’t know if the DNA segments were true or false.

We also heard about scientific studies that showed that most of the IBD (true) shared segments in the 5 to 20cM range were from ancestors greater than 10 generations back – at least 8xG grandparents (or 9C level). This is usually beyond a genealogy time frame for many of us. For instance, see the Speed and Balding chart in this article. But even this data showed that within the 5 to 20cM range there were some 3C to 8C.

However, we continue to be admonished to avoid, or discard, Matches in this 6-15cM range. Such small segments were branded as “suspicious”, “dangerous”, “poison”, “a fool’s errand”, etc.

I don’t deny that some of the 6-15cM shared segments are false, and that many of them are beyond a genealogical time frame. But on the other hand, some of them are true and within a genealogical time frame. I’m unwilling to discard all of them, because some of them are false or too distant. As I will show below, many of my Matches with these small segments are very useful.

What’s at stake?

So, before we adopt a hard rule one way or the other, let’s look at small segments from a different viewpoint. At AncestryDNA, I have 120,000 Matches. Their ThruLines (TL) program has identified over 2,000 Matches who share a specific CA with me. The shared DNA segments range from 208cM down to 6cM; and from 2C to 6C. In fact about 2/3 of these TL CAs are with Matches who share 6-15cM segments with me. Based on my 45 years of true genealogy research, I’ve determined that only about 5% of these TL Matches are incorrect (the Matches and I may still be cousins somehow, but not on the CA identified by TL). So… over 1,900 of these TL Match cousins and CAs are ‘keepers”. I don’t want to throw away 2/3 of these easily identified CAs.

This genealogy analysis had nothing to do with the size of shared DNA segments. I believe these 1,900 people (Matches) are my true cousins – even if we didn’t share any DNA! As a genealogist, I’m a happy camper.  I very much want to share records, stories, pictures, research, and other descendants, or maybe test a Y-DNA or mtDNA line, with each of these new-found cousins. Even if I could eventually determine that our shared DNA segment was false, this person is still a cousin.

Most of our true Cousins won’t be DNA Matches

Over half of our 4C wouldn’t show up as a DNA Match; only about 10% of our 5C would show up as a Match; and only a very small fraction of our deeper cousins will show up as Matches. So when someone does shows up as a DNA Match (at any level), and there is a valid paper trail showing they are an 8C – why not accept that? At least accept the 8C part, if not the DNA link. Later, in Triangulated Groups or Clusters, we’ll see if that person “groups” with others on the same line. This would indicate to me that the genealogy was true.

Between 1974 (when I started researching genealogy in earnest) and and 2010 (when atDNA testing became available), I found many cousins with no knowledge of any shared DNA. Some of them probably shared DNA with me, but most would did not. But they were all my cousins.

I hope I’ve made two key points so far: 1. atDNA is just a tool we’ve used over the past 10 years – it’s not our master; and 2. atDNA does not find everything in genealogy – we have many cousins, and indeed many Ancestors, we will never find with shared atDNA.  Ponder these points for a moment….

So back to small shared segment (6-15cM) Matches – are they worth it? Well as discussed above, of course they are!

Are they useful in Genetic Genealogy – beyond just as cousins? I think the answer is often they are useful… Let’s look at a few situations.

ThruLines and Clusters

Suppose, using ThruLines at AncestryDNA, you found 20 Matches in the 6-15cM range, who were all cousins (3C to 6C) on a line back to a 5xG grandparent couple. [NB: I have 64 5xG grandparent couples, and over 2,000 TL Matches – an average of 30 TL Matches (with a CA) per couple, so 20 TL Matches is a reasonable number]. At AncestryDNA we don’t have shared segment info for Triangulation, but we can do Clustering. Let’s Cluster on a 6cM threshold (all my 120,000 Matches, including the 1,900 good TL Matches). If the above 20 6-15cM Matches were sprinkled all over the Matrix (in different Clusters) – then nothing special. But if 11 Matches (of the 20) are in one Cluster, and 6 are in another Cluster, I’d sit up and take notice! There is nothing “random” about that. Clusters are formed on Common Ancestors, so we’d expect to see most of these 20 TL Matches in a Cluster, or two, or three. I have mostly Colonial Virginia ancestry, and some of my Matches have multiple CAs – so some of the 20 TL Matches may well wind up in a different Cluster. But, whenever your Matches form a strong* group (Cluster, Triangulated Group, DNA Painting, etc), they are very likely to have the same CA and share IBD segments. At least this is a good hypothesis.  At this point I am not claiming a “proof”, but I am claiming a lot of evidence that points in one direction. [*strong group does not mean 2 or 3 Matches in a Cluster; nor 10 Matches in a Cluster, but each one only matches 2 or 3 others. A strong group would be 10 Matches in a Cluster with each one matching about 8 of the others. Use judgment here.]

Finding more CAs in Clusters

In the big picture, all of our Matches can be divided into two groups – those with true shared DNA segments, and those with false DNA segments. I believe most of my 120,000 Matches at AncestryDNA have true shared DNA segments with me (although as outlined above, I don’t really care if some are not DNA cousins: since AncestryDNA doesn’t show shared DNA segment info, I cannot Triangulate them anyway). Therefore, if Clustering groups these Matches, I have every reason to believe they are valid when they point to the same ancestral line. And if some of those Clustered Matches (with a CA on the same ancestral line) have small (6-15cM) shared DNA segments with me – so what? It’s close enough for a second look. Recently I’ve been looking through my Clusters for Matches who have over 1,000 people in a Public Tree. Most of my Clusters have a hypothetical Ancestor in them, so I look for surnames in that line.  Sometimes, I find a clue and I’m able to build the Match’s Tree out to connect with my line. This adds even further evidence that this Cluster is based on that line.

Genealogy vs Genetic Genealogy

Another aspect of this whole discussion is genealogy vs. genetic genealogy. If you are just interested in genealogy, it doesn’t matter what the size of the shared DNA segment is. In fact, while looking at Hints, I run across a lot of helpful Trees, where the owners are not DNA Matches at all. Only in certain circumstances (Chromosome Mapping; bio parents/Ancestry; “proof” where genealogy records are insufficient; etc.), do you need to insure a shared DNA segment is true (IBD) and cannot be from a different Ancestor. So, unless you need to “prove” a genetic link, don’t worry about the size of the shared DNA segment. There is a lot to learn from many of your DNA Matches, even those with small segments, and even from other people (with no DNA match) at Ancestry.

Breaking through a Brick Wall

Even breaking though a brick wall is primarily a genealogy exercise. To be clear, this process is often aided by starting with a group of DNA Matches (Painted, Clustered and/or Triangulated), and looking for Matches with Trees that have Common Ancestors among themselves – beyond where your brick wall is. In these cases you are using DNA Matches who are probably related to you and who group with other Matches. You use this cadre of Matches to find a CA among them. This is basically a genealogy exercise – and, again, it doesn’t make a lot of difference how much shared DNA you have. In fact, to find a CA beyond your brick wall, you are probably looking for a distant CA – often found with smaller DNA segments. So don’t discard those Matches with small segments who have a Common Ancestor with you – use them.

Use caution with isolated small segments

My discussions above about using small segments is in the context of clues and grouping (Painting, Clustering, Triangulating, etc). IMO, it is reckless, and wrong, to find a 9C Match, sharing 10cM, and declare that “proves” the Ancestral line by itself. Such a “find” is one clue (and by itself, a very shaky one), and much more corroborating evidence is needed even to form a hypothesis. The “rule-of-thumb”, I’ve been using is to have at least G independent Matches (at least cousins to each other) who all agree on the same Common Ancestor – were G is the number of Gs of the Common Ancestor. At the 7xG grandparent level (9 generations back – 8th cousin level) this means 8 Matches in agreement.  It’s relatively easy to get that many Matches in a Cluster or Triangulated Group – it’s much harder to find Common Ancestors with each of them. So be sure to include those CAs from Matches who share small DNA segments with you!

Bottom Line

Use as many of your DNA Matches as you can, to learn more about your own genealogy. IMO, Matches with small shared DNA segments often provide the clues and evidence you are looking for. But use extreme caution with small shared DNA segments in isolation – they are much more credible when they are part of a group. Small segments in context and groups can be very helpful.


[06C] Segment-ology: In Defense of Small Segments by Jim Bartlett 20200131

20200202: Edited 10 paragraph to change “DNA segments” to “genealogy”

How Many TGs From Distant Ancestors?

I was recently asked if I’d thought about this question. The quick answer is YES – the answer to this question is at the core of my belief that genetic genealogy is valid out to 9 generations back. And I think this question is really two questions: one about the Triangulated Groups (TGs) themselves; and one about the Matches with shared DNA segments within each TG.

How far back do our TGs go?

Using a 7cM threshold for shared DNA segments, I’ve documented 372 TGs, covering over 98% of my DNA. These TGs have natural breaks [recombination crossover points] between them. These TGs represent actual DNA segments, on my chromosomes, which are from my Ancestors down to a parent to me.  So how far back do they probably go?

The number of segments we have at each generation of our ancestors is fairly easy to estimate. Using a female to make it easier, she gets 46 segments from her two parents – in the form of 46 chromosomes. Pretty big segments…  Using the average recombination rate of 34 crossovers per genome (per parent), she would get 68 additional segments one generation back. In other words she would have a total of 46+68=114 segments from her grandparents. And she would get 114+68=182 segments from her Great grandparents.  Here is a handy table I made up for my reference:

This table starts with me at the bottom and shows the generations back, the number of Ancestors at each generation back, the generic name of those Ancestors, the relationship of my cousins who share a Common Ancestor with me at that level, the calculated percentage and cM amount of DNA I got from each of those Ancestors (at any given number of generations back), the calculated average number of segments in my DNA from all the Ancestors in any given generation, the average cMs per TG; and in the last two columns the average and range of cMs collected in Blaine’s cM study. The first column is just for a very rough estimate of the birth year of my Ancestors at any given generation (it helps me).

Highlighted in yellow is the 386 segments expected (roughly) from my 3xG grandparents. That’s roughly the same as my 372 TGs. So I expect some kind of distribution curve around that point. Matches who share the full DNA segment represented by a TG would probably be 4th cousins (4C). Due to the random nature of DNA, I expect a range from 2C to 7C or 8C. My TGs range in size from a few just over 7cM to some around 50cM – it all depends on several variables.

Another aspect of this discussion has to do with what I call “sticky” segments. Per the Table above at 5 generations back we would see 386 segments – or 386 TGs – of about 18cM each. But going back one more generation – one more round of 68 crossover points would result in 454 segments. This means that 64 of the 386 segments were subdivided, and 322 segments were not! This means that 322 segments (TGs) were passed down intact (no recombination). The effect of this is that many TGs will persist, at the same size, for several generations. We could well see the same size TG from a 6xG grandparent to a 5xG to a 4xG to a 3xG grandparent. So it would be possible for a 7C, 6C, 5C and 4C to all share the full size DNA segment represented by the TG. Clearly the probabilities of that decrease as the cousinship increases.

Bottom line from my experience: I think we’ll find most of our TGs to be within a genealogical time frame of, say, 9 or 10 generations. And there is always the opportunity for closer cousins to share a DNA segment within any of our TGs.

How far back do the Matches go?

This is a different, but related, question. The above discussion was all about the full DNA segment represented by a TG. Most of our Matches in a TG will not share the full DNA segment. They overlap us or are wholly included within the TG segment. For example, the Matches in 20cM TG can range from sharing 7cM up to 20 cM. And, in fact, some of our closer cousins may share 35cM and span across more than one TG. It’s very random. However, to the point of the question – many of our Matches who share, say, 7 to 15cM may well be cousins beyond the Ancestors who passed down the full TG. To be sure, the Common Ancestors in this case would be ancestral to the TG Ancestor, but it could be 10, 20, or more generations back.

Bottom line: Matches in a TG are limited to a narrow range of your Ancestors, but they are not limited by how close or how distant they could be. And Matches who share small segments may well be beyond a genealogical timeframe; but some will be within a genealogical timeframe. Witness the Ancestry ThruLines Common Ancestors down to 6cM.

Summary: I think most TGs will be within a genealogical timeframe (using a 7cM threshold for shared DNA segments). The Matches in a TG will range from close Matches, out to Matches on the fringes of our genealogy and on out to Matches who will be beyond our genealogy.


[19H] Segment-ology: How Many TGs From Distant Ancestors? By Jim Bartlett 20191217

A Unified Theory of Genetic Genealogy

Bottom Line Up Front (BLUF):

Triangulated Groups = Clusters = Common Ancestors

Brief overview: Each of us has a specific genealogy Tree of Ancestors; and a fixed arrangement of our DNA segments from those Ancestors. I believe our DNA segments are reflected in our Triangulated Groups (TGs) of shared DNA segments, which are from specific Common Ancestor (CAs), and that each CA is represented by a specific Cluster of Shared Matches. I believe there is alignment between the TG CA and the Cluster CA, which can be very helpful. Put another way, each of our Ancestors will have a specific TG/Cluster combination, and at some point in our Tree there will be one TG and a corresponding Cluster for each Ancestor.


In these blog posts I’ve often stated that each segment Triangulated Group (TG) is from a specific Common Ancestor (CA) – in other words the DNA segment identified by a TG came from a specific Ancestor down the line of descent of your Ancestors to you. The Matches in a TG will be relatives (usually cousins) along one of your ancestral lines.  For example, if a TG is from a 6xG grandparent (7th cousin (7C) level), some of the Matches may be cousins from 1C to 7C; and some may be from Ancestors beyond the 6xG grandparent – perhaps (usually with shared segments below 15cM) somewhat beyond the 6xG grandparent.

Because of the random nature of DNA, and the wide range of cMs for cousins beyond 3C, there is no set of parameters (short of complete chromosome mapping) that will get you only TGs at one generation. For instance, I know of no cM parameter that will get you only TGs at, say, the 6C level – or any other level. So we usually wind up with a mix of TGs at different cousinship levels.

TG Outliers

Like with most things DNA, there may be some outliers, and not every Match in a TG will be found to share an IBD segment (in other words, some Matches with small shared DNA segments  – under 15cM – may be false Matches). But the important take-away is that the TG will represent a CA, even if a few Matches are false.

TG Bottom Line

Your DNA has fixed crossover points. Depending on the cM threshold you use for comparing shared DNA segments, your data will have natural break points between TGs. I used a 7cM threshold and got 372 TGs covering over 98% of my 45 Chromosomes. It was hard work doing all the comparisons and culling out the false shared segments. I have Matches who are 2C to 9C for about 80% of these TGs.


Recently, I’ve been blogging about Clustering. Clusters appear to come from a specific CA down the line of descent of your Ancestors to you (just like the description of a TG).

When I did a Clustering run of all my 5732 Matches at Family Finder, I got 352 Clusters which had a very high correlation to my 372 TGs.

Well… duh! When we consider that each of us has fixed segments in our DNA and fixed ancestors in our Tree, we understand that each of us, in our own unique ways, has a specific “solution” (Ancestors linked to DNA segments). So if we look at grouping by Clusters, it should reflect that “solution”. And when we form segment TGs, they should reflect that “solution”. And in combination, the Clusters and TGs should reflect the same “solution”.  In other words the Clusters and TGs should align.

In my opinion, Clustering with Shared Matches is a sophisticated way of grouping Matches based on the probability that a number of Shared Matches who mostly match each other, will be from the same CA.

Clustering Outliers

Like with most things DNA, there may be some outliers, and not every Match in a Cluster will be found to share the same CA. But the important take-away is that most do share the same CA, and the Cluster will represent that CA, even if a few Matches don’t.

Cluster Bottom Line

Your Shared Matches will tend to Cluster on CAs. Depending on the cM threshold you use for comparing shared DNA segments, your data will divide into different numbers of Clusters. See my experience here; and the process here. I used a 6cM threshold and got 350 to 382 Clusters, covering at least all of my 4xG grandparents and some out to 8xG grandparents. It was relatively easy to run the Cluster programs to get the Match/SharedMatch data, and relatively little work to determine a consensus of a CA for each Cluster, for each run at different cM levels (smaller thresholds result in more Matches and Clusters, and more work). I can see CAs out to 8C for some Clusters. [NB: Clustering does not find the CAs – this is homework you have to do before Clustering: find as many CAs as possible and put that information in the Notes, so it’s available for analysis at each Cluster run].


I’ve spent a lot of work over the past 8 years determining my 372 TGs (your number of TGs may vary, but I believe using a 7cM threshold for Shared Segments, it come out at this order of magnitude). Triangulation, even with the tools at 23andMe, MyHeritage and GEDmatch takes time and work. In contrast, Clustering is relatively simple – pretty close to a “click” process. If Clusters are the same as TGs, we should be able to run a Cluster report on all of our Matches (at a company which also provides segment data), and then easily sort on the DNA segment data (sort by Chr and Start), and then relatively easily scroll down the several thousand Matches and group them into TGs. Yes, this scrolling will take some work, but it’s a whole lot easier than comparing each shared DNA segment pair in a browser. I believe the combination of Cluster numbers and segment data will easily define the TGs – maybe just a little “quality control” at the end, depending on how the data looks.

I have my brother’s DNA at FTDNA and 23andMe – I’m going to try this process on his results, and will report back.

The Bottom Line

Once you determine your TGs and the CAs that go with them, you have a Chromosome Map!

My Bottom Line

I’m trying to demonstrate:

  1. TG=CL=CA
  2. The CA will be in the 7C-9C range*


*I recognize that my belief that our DNA tests can accurately determine our CAs out to 8C, or so, is not held by most genetic genealogists. But based on my experience, particularly using Walking The Clusters Back, I believe this is a realistic range – easily and accurately obtained – and confirmed by both TGs and Clustering.

With our fixed Ancestry and DNA crossover points, each process should give us the same “solution” – whether we use DNA Painter, Kitty Cooper’s Chromosome Mapping, GenomeMatePro, Visual Phasing, Double Match Triangulator, etc., etc. We are just using different tools to “see” the chromosome map.


[19G] Segment-ology: A Unified Theory of Genetic Genealogy by Jim Bartlett 20191216

Walking The Clusters Back III

Progress Report – Observations…

Main benefits, so far:

  1. Impute Cluster Common Ancestor (CA) to other Matches in the Cluster – this let’s us focus on individual Matches – look at their Tree with a CA in mind, and/or communicate with the Matches and ask about a specific Surname or Ancestral line.
  2. Compare Cluster CA to ThruLines CA – if the same, we have reinforcing evidence; if different, the ThruLines CA may be wrong, or it may be correct genealogy, but the Match has another CA linked to the DNA (and the Cluster).
  3. Link some Clusters (and the CA) to a Triangulated Group (TG) – this will strengthen the evidence of the Ancestral line of a TG. Often the Cluster CA is more distant than the CA found in TGs at 23andMe, FTDNA, MyHeritage or GEDmatch.
  4. As the threshold decreases, there are more Matches included in the Clustering process, and those Matches tend to have more distant CAs with us. Clusters will start with only 2C and 3C; and grow to include 4C and 5C, etc. We can see the Walking The Cluster Back happening within each Cluster. Eventually each Cluster will begin to show several generations of CAs – they should all be on the same Ancestral line [if not, check with the correlated Clusters]
  5. Clustering reduces the range of possibilities. If a Cluster has a CA of A18 [Ahnentafel number for a specific 2xGreat grandparent = father’s father’s mother’s father], there are only two possibilities for the next generation: A36 and A 37 (although a Match may share a CA another generation back: A72, A73, A74, or A75). If a new Match in the next (lower threshold) Cluster run has CA = A74 – this is reinforcing evidence. If the new Match has CA = A88 – something is amiss [check for another CA, check for a correleated Cluster which is A44, or A176, etc.]

Main issues, so far:

  1. It’s been hard to specifically find Clusters which split into two Clusters a generation further out. Many Clusters have included CAs which span several generations on the same line. I’m inclined to “go with the flow” [accept the Clusters with CAs on the same line]; and not try force Clusters into a “genealogy Tree” structure. The data is just too variable. Maybe when I get down to a 6cM threshold, it may play out that way – but, I have a feeling the vagaries of random DNA will make that a wild goose chase.
  2. Several BIG variables combine to give us trends, rather than a uniform picture/pattern:
    1. Our Ancestry (size of families, documentation, probable NPEs at some level, etc.)
    2. Our random DNA from different Ancestors
    3. Which of our cousins have DNA tested.
  3. Homework is needed – Clusters can be formed, but some genealogy is needed to identify the CAs. I recommend building a Tree of Ancestors out 7 generations, wherever possible – with that AncestryDNA will find ThruLines CAs for you. Enter those CAs (or their Ahnentafel) into the Match’s Notes, so that information will be available in the different Clustering runs.

My status so far is summarized in this Table of different Cluster runs:

The first column shows the decreasing thresholds I used (basically every 5cM) – the top line is the original download: 6cM threshold, 119,068 Matches (and all their Shared Matches) which took 9 hours and is in a .txt file.

The # Matches and # Clusters are for the various cluster runs – which take negligible time to produce an Excel Cluster report.

The 3C, 4C, 5C, etc column show how many Clusters I got with CAs at those levels. (There were some 2C, but they were in Clusters that also had 3C – I counted each Cluster with the most distant cousinship which had a consensus.)

The larger threshold Clusters had multiple TGs in them. Beginning about at the 35cM threshold, some of the Clusters started showing a single, or consensus, TG – so I counted them.

Starting after the 45cM threshold, the number of included Matches about doubled with each decrease of 5cM in the threshold, and the number of Clusters began increasing dramatically. This means the amount of work for scrolling down the entire report, analyzing the data in each Cluster and determining the consensus, also increased a lot. Sometimes the CA and/or TG of a Cluster is very clear; sometimes a Match’s correlated Clusters must be reviewed and the Match assigned to another Cluster. And all new Matches need to be “Tagged”; and often other Matches need to have their “Tag” adjusted [what I alluded to in the Iterative WTCM Process], as new Matches and their new information are added to the Clusters.

The good news is in the TG column – where about 1/4 of the Clusters (and CAs) can be linked to TGs [I have TGs for over 300 Matches at AncestryDNA].

More good news: The Shared Clustering program, below 20cM, will first Cluster on the 4,515 Matches, basically retaining the 382 Clusters constant, and then go back and add in the new Matches. Therefor, all the Matches below 20cM (including about 300 with TGs, and over 1,500 with CAs) will be added to the existing Clusters, and very probably push the Cluster CA out even farther and add the a TG to many of them.

Note the trend in the 35, 30 and 25cM Cluster runs to more distant cousinships. As I find the time to analyze the 20cM Cluster run, and then runs at 15cM and 10cM, I expect this trend to continue, giving me many more CAs in the 6C, 7C and 8C range. Of course these are all clues, but I believe they are very strong clues. Time will tell as I investigate each Cluster/CA/TG more deeply.


[19F] Segment-ology: Walking The Clusters Back III by Jim Bartlett 20191214