Easy Manual Clustering at AncestryDNA

Auto-Clustering at AncestryDNA is in a pause mode now. But we can still look at and analyze our own Matches any way we want. We can even form our own Clusters. Here is a modest process that may produce Clusters that are very helpful to us. AncestryDNA does not provide segment information that would allow grouping by Triangulated Groups, so Clustering is the best way to group Matches. And there are several advantages to using Clusters.

Manual Clustering Process at AncestryDNA

  1. Start with your ThruLines. These are Matches who share a Common Ancestor (usually a couple) with us. The ThruLines process looks for obvious CAs; it looks in Private (but searchable) Trees for CAs; it sometimes “fills in the blanks” with information from other (even non-DNA Match) Trees to create a link between you and a DNA Match back to a CA. This ‘fill in the blanks” process may be in your Tree or your Match’s Tree or both. The ThruLines process works out to 5xG grandparents on both sides – if either side is more than 7 generations back, it will not be reported. In any case, you should review the information provided by Ancestry and decide if the ThruLines CA is correct, or not.
  2. Enter the CA information in your Match’s Note box [see “Add note”] – I use a combination of Ahnentafel Number/side; relationship; and surnames. Example: A0140P/6C: WELCH/SPENCE – the Match and I are 6C, sharing ancestors Sylvester WELCH Jr and Anne SPENCE; Sylvester WELCH is my Ahnentafel Number 140 on my Paternal side. I like using Ahnentafel numbers as they are easy to compare and determine relationships. Just divide by 2 to get 140>70>35>17>8>4>2>1 (me), so A0008P/2C: BARTLETT/NEWLON is on this same ancestral line. Do this for all your valid or suspected* ThruLines CAs. NB: Some Matches will share more than one CA with you – enter them both.         [* I include suspected ThruLines CAs – if they are incorrect, they almost never Cluster and can thus be culled out.]  Anyway – use whatever system works for you, just enter something in the Note box. You’ll be looking at these Notes of Shared Matches to form Clusters, and you want to know who shares the same CA.
  1. After going one or all the ThruLines CAs, call up one of these Matches and review their Shared Matches. I count the number of SMs and the number on the same ancestral line, and record this in the Note box. Example: SM: 17/25xA0140P. This means that out of 25 total Shared Matches, 17 of them had a Note indicating A0140P CA. NB a Shared Match with a Note indicating A0070P would be included. A SM with A0034P would also be included because 34P is really a short cut for 34P/35P, and 35P is in the same ancestral line. Likewise, 8P is in the same ancestral line and would be counted as also having A0140P ancestry. Repeat for all ThruLines Matches. It doesn’t take that long for such a powerful tool as Clustering.
  2. Use judgement to decide who is in a Cluster. In some cases, it’s crystal clear – virtually every Shared Match has an “SM: note” with the same CA. Other cases are not so clear, so you need to decide if there is sufficient evidence to include a Match in a Cluster. In some cases, a Match with a ThruLines CA will actually have several Shared Matches with a “different CA” – the Clustering process dictates such a Match be Clustered with the Matches with a “different CA”. And I would certainly review that Match again to see if there isn’t some clue that indicates the “different CA” is in their tree, too.
  3. Cluster ID – you can use any system you want to name your Clusters. One way is CL001 to CL200. Another way is to use the CA – Example: CL0140P1. This is the Ahnentafel Number preceeded by CL. NB: I added a 1 at the end because some of your Ancestors may be linked to more than one Cluster. [I have Ancestor A0556M, a 7xG grandparent couple, who are in three large Clusters.] Add this Cluster ID to the Match notes. Example A0170P-CL047/6C: WELCH/SPENCE. Or use whatever system you want.
  4. Once you have determined Clusters based on ThruLines Matches and CAs, you can go back and look at a Match in a Cluster and look at his/her Shared Matches who aren’t in a Cluster. Do some have several Shared Matches themselves who are in a Cluster? If so, add these Shared Matches to the Cluster. NB: You can also look at Matches under 20cM – many of them have Shared Matches. If several Shared Matches are in one particular Cluster, add the under 20cM Match to the Cluster.

Clusters are one of the best tools I’ve found for grouping AncestryDNA Matches and finding more CAs.


[19I] Segment-ology: Easy Manual Clustering at AncestryDNA by Jim Bartlett 20200701

Let the Matches Tell Us the Cluster Common Ancestor

Using a 20cM threshold at AncestryDNA, I got 156 Clusters. That’s roughly one Cluster for each of my 128 5xG grandparents – or two Clusters per 5xG grandparent couples – often with valuable Common Ancestor (CA) hints from ThruLines. I don’t know 50 of my 128 5xG grandparents (they are brick walled) – so I would expect (50×156/128=) 61 of my 156 clusters to be blank. What’s a body to do?

Well… in the first place the above calculation is based on finding a CA at the 5xG grandparent level. ThruLines provides clues for all the Ancestors I know – but, clearly, they cannot help with Clusters (or TGs) beyond a brick wall. For almost all of the Clusters, I know the parent; and for roughly 80% I know the grandparent; and for many I know the CA out to the brick wall. So I’ve got a start. But, for many of my Clusters, there is very little otherwise to go on – just a lot of Matches in a Cluster. What’s a body to do?

As I’ve said before, let’s think about lemonade…  In my last post (Using a Group Ancestor), I noted that grouping (segment Triangulation and Shared Match Clustering) results in a group of Matches with the same Common Ancestor (CA). This is the concept, even if we don’t have any clue as to who the CA is. But let’s make “the certainty that there is a CA” work for us… Let’s have the Matches tell us who the CA is for a Cluster. Seems like lemonade to me.

Here is a process for AncestryDNA: [I hope you’ve saved your last Cluster report]

  1. Select a large Cluster for which you have no known CAs (or only a few which are in conflict with each other).
  2. Make a spreadsheet with three columns: Match Name and Surnames and Notes.
  3. Select a Match in the Cluster who has a Tree with more than 99 people.
  4. Type the Match name in the spreadsheet.
  5. Go to that Match in AncestryDNA (either from the URL in the Cluster; or by searching AncestryDNA).
  6. Type the surnames for that Match (both Shared Surnames & Match’s Tree Only) in Surname column.
  7. Copy the Match name down the spreadsheet for each surname.
  8. Repeat for each Match in the Cluster with a Tree over 99 people.
  9. Sort the spreadsheet on the Surname column.
  10. Scroll down the list and highlight likely Surname groups [it would be great to find a clear winner – repeated multiple times. If not pick the top few surnames].
  11. Go back to the Matches with most likely surname(s) and put in the Notes column the Patriarch or any other identifying information (birth, location, ethnicity, etc). The expectation (hope) is that you’ll find a Common Ancestor or two in this process.

I can almost hear the collective groan at step #6. Yes, it’s an onerous task. I sat down with a favorite beverage and typed non-stop the 660 surnames for Matches in one Cluster; 750 in another Cluster. But, think about this another way: would you spend a half-day of work to find a new Ancestor? That would be a nice glass of lemonade.

In my first Cluster try, I found three Surnames (ADAMS, CAUDILL and CRAFT) repeated several times. A quick and dirty Tree quickly determined John ADAMS married 1769 Loudoun Co, VA Nancy CAUDILL; and their daughter, Elizabeth married Archelous CRAFT – and 5 of my Matches in the Cluster descended from these two couples!! I already had some clues that this Cluster was on my father’s father’s side. This includes my NEWLON line which had a brick wall born c1774 Loudon Co, VA which I determined was Susan CUMMINGS – blogpost here. Her father is strongly suspected to be John CUMMINGS born c1746, but nothing is known about John’s first wife, the mother of Susan CUMMINGS and my Ancestor – a new brick wall. If John’s first wife was an ADAMS, all of this would fall into place as a hypothesis.

By the nature of Shared Match Clustering, this Cluster must have a CA. With five widely separated Matches agreeing on the same CA (and no other surnames turning out any hints at all), I think this is a strong clue. But, more research is needed.

The other Cluster had several repeated surnames, but none that I have been able to link together, yet. I may drop down and look at the surnames of Matches with Trees in the 50-99 people range… maybe another hour of typing… If I find a clue it will all be worth while.

Bottom Line: A Cluster (or a TG) has a CA. The Matches in a Cluster should all share this CA. Let the Matches Tell Us the Cluster Common Ancestor.  The process above is one way to do this. A particular advantage to me is that this process is comprehensive, and with no bias – the data from the Matches is treated evenly.

Post Script: By it’s nature genealogy is an ego-centric hobby. We tend to focus on ourselves as the center of the universe. Or, if we are professionals, we treat the Client as the center of the universe. Everything revolves around our Ancestors and what we can find out about them. But each of us is a small part of the human race, and our Matches – our cousins – are part of this larger picture. They fit in, too. They are an interlocking part of the whole jigsaw puzzle, and in some (many?) cases, some of them know more than I do . The process above draws on the data they have provided. Often, they have clues to the solutions we seek. Often, they know what’s on the other side of our brick walls.

Edit 6/22/20: I’ve been asked to add a photo of my spreadsheet. Here it is – showing the top two surnames.

Spreadsheet of Cluster Common Ancestors

The 3rd column is Match Names and it has been narrowed for Match privacy. When I started, I had columns for Company and Where (the name of the Cluster run – 20cMCL63: Cluster 63 of the Shared Match run using a 20cM threshold), but it turns out this is a Quick and Dirty spreadsheet, and I didn’t need those columns. The objective is to get started on a Quick and Dirty Tree, and work from there. As soon as I saw the last line – a CRAFT married to an ADAMS, I started the Q&D Tree and found the five Matches who all tied together. Since then, I’ve used the previous blogpost on Searching and have found over a dozen more Matches who descend from this same line. All of the Cluster Matches were over 20cM. However, now knowing what I’m looking for, the Search process let me drop below 20cM and find many more – and most of them have above-20cM Shared Matches from the same Cluster. This is added evidence that I tie into this line some how.

[19H] Segment-ology: Let the Matches Tell Us the Cluster Common Ancestor by Jim Bartlett 20200620

Using a Group Common Ancestor

A Triangulation (and grouping) Concept

We have spent a lot of time and effort to describe *how* to group our Matches: segment Triangulation, DNA Painting, Shared Match Clustering. Each of these processes results in a group of Matches that should have a Common Ancestor (CA). This is an important concept.

But the main thing is to *use* this concept – to use the information found in these groups. If a group is formed around a CA, then all of the Matches in the group should share a CA. Once a CA is found, each Match in the group should also have that group CA, or be a closer cousin with an MRCA that descends from the group CA, or have a more distant MRCA which is ancestral to the group CA. In other words, all the Matches in a group should have the same distant CA.

So… if we find a CA for a group, the other Matches in the group should have the same CA line. This is a powerful focus – let’s *use* it. We should be able to look at other Matches in the group (who have Trees) and find that CA – either directly through a search, or indirectly by building out their Tree.

I illustrated this in Case 3 of Chapter 1 (Lessons Learned from Triangulating a Genome) of “Advanced Genetic Genealogy: Techniques and Case Studies” – here or here. This was all about one of my TGs which I call [04P36]. At Ancestry, I found a few cousins (who had uploaded to GEDmatch) in that TG who  shared my HIGGINBOTHAM ancestry. Armed with that hint, I searched for HIGGINBOTHAMs in other Matches (in that TG) who had trees. I also contacted Matches from FTDNA, 23andMe and MyHeritage – and several replied that they had the same HIGGINBOTHAM Ancestry. In the end I found 14 different Matches ranging from 4C to 8C on this HIGGINBOTHAM line in TG [04P36].

Because TG [04P36] came down a line of descent with the HIGGINBOTHAM surname in 5 generations, this case was an easier example – searching for one distinct surname. If a group represents a CA with a male-female zig-zag line of descent to me, it will be harder – the surname will change often. However, each line of descent (from a given Ancestor) is fixed – and we may find Match cousins with MRCAs of different surnames, but they will all be on the same ancestral line. This is akin to “Genealogy Triangulation” – getting an alignment of multiple cousins on one line.

Finding one Match with a CA in a group is not the end of the story – it’s a clue to the beginning of more research. If we find a CA for a group, but no other Match seems to have that CA, maybe we need to look for a different CA. The “correct” CA for each group should lead to Genealogy Triangulation – agreement by other Matches on the same ancestral line. If you find a CA in a group, *use* it to find more Matches on that same line. Seek CA agreement among Matches in each group.


[08D] Segment-ology: Using a Group Common Ancestor Concept by Jim Bartlett 20200620

Using Ethnicity to Identify a Cluster

A Segmentology TIDBIT

My Ancestor 14M was John William CAMPBELL, born 1856 NY; died 1916 WV. His parents were Samuel CAMPBELL and Ann CLARK who were married 1851 in Scotland and immigrated to the US in 1853. This 1/8 of my ancestry is the only known part to come from Scotland. Several cousins have done Y-DNA testing and the CAMPELL line is the Argyll CAMPBELLs.

I have over 125,000 Matches at AncestryDNA. I have identified Common Ancestors with over 4,500 Matches – only 5 of them are on my CAMPBELL line. About 12.5% of my DNA is from my CAMPBELL line, and, all other things being equal, about 12.5% of my Matches should come from my CAMPBELL line.  But all things are not equal – this CAMPBELL line is relatively small, and there are no known Ancestors before 1850, and there are no known links to any Ancestors in Scotland.

This doesn’t mean that none of my other Matches are cousins from this CAMPBELL line. However, it does result in me not being able to find any more links. I have tens of thousands of Matches with no Trees; I’ve even found some with a CAMPBELL surname – but no way to determine if I am related to them (other than the few who have matching Y-DNA at FamilyTreeDNA).

So, I drop back and relook at the big picture: exactly 1/8 of my Ancestry came from Scotland (well, maybe not going way back, but probably within a genealogy timeframe); roughly 1/8 of my DNA came from/through Scotland; and if not 1/8, perhaps 10,000 of my Matches should be on this part of my Ancestry– certainly more than the five close cousins I already knew about.

I decided to turn this lemon into lemonade. The lemon is recent Scottish immigrant ancestor – the lemonade is Scotland ethnicity. If this is the only part of ancestry from Scotland, maybe I could use that information. When I Cluster my AncestryDNA Matches at the 20cM Threshold (the lowest cM amount with Shared Matches to each other) I get about 160 Clusters. 1/8 of those is 20 Clusters – a manageable number. So when I see some solid looking Clusters without any hints of other ancestry, maybe they are from my Scotland line.

Here is one such Cluster. I clicked on the link for each Match and checked their ethnicity:

Every Match in this Cluster has 14% to 62% Scotland ethnicity. A few scattered Matches with Scotland ethnicity might be expected randomly, but for all of them to have significant amounts of Scotland ethnicity is a strong clue.

I think I can safely assume this CL149/14/[Scotland…] Cluster represents my Ancestor, John CAMPBELL – Ahnentafel 14M. If I knew the DNA segment, I could Paint this Cluster. I have several others that also show a pretty clear Cluster “picture”. Next I’ll be looking a some other Clusters which may even have a ThruLines Common Ancestor in them, but also have a lot of Scotland ethnicity – the ThruLines CA may be the outlier… With only one ThruLines CA I don’t have a high confidence that it’s right. But with high concordance of Scottish ethnicity, that’s a strong clue the Cluster is on my CAMPBELL line.

The next step is studying any Trees in these Scotland Clusters to see if those Matches have some Common Ancestors among themselves… That will be the sweetest lemonade of all.


[22AU] Segment-ology: Using Ethnicity to Identify a Cluster TIDBIT by Jim Bartlett 20200612

Clusters at Brick Walls

A Segmentology TIDBIT

Finding Common Ancestors with Matches in a Cluster sometimes “stops” at a specific generation – for example at the 3xGreat grandparent [4C] level. In other words, I’ve found cousins up to that generation, but not beyond. When one of these 3xGreat grandparents is a Brick Wall (or an “iffy” Ancestor), that’s probably the reason. The Cluster really goes back farther, but I don’t recognize any Common Ancestor further back.

It’s time to research and take notes.

I see three courses of action:

  1. If a surname is known or suspected, look in the Cluster for Matches with Trees and search them for that surname. Often, when I find one, I can build the Match’s ancestry out from there – looking for a link to my line.
  2. If a surname is unknown, jot down each Match’s surnames and try to find a Common Ancestor among them. Then I build the family around that Ancestor – looking for a link to my line.
  3. Alternatively, look for a common place and time approximately where the Cluster stops. Noodle around for any likely links. Check other Matches in the Cluster for those same links.

I use the Shared Clustering program which shows me the Matches for each Cluster, Common Ancestors from ThruLines, the number of people in their Tree, my Notes, and a hyperlink back to their AncestryDNA Profile. For each Cluster it’s easy to see potential CAs, then click on Match links, and see the surnames in common or call up their Tree for a more in depth review. It goes pretty quickly.

The result of these courses of action have ranged from easy “low hanging fruit” to “Mission Impossible”. In other words – sometimes it works, sometimes it doesn’t. I try these alternatives because they work in enough cases to encourage me to try more. I hope they will help you.


[22AT] Segment-ology: Clusters at Brick Walls TIDBIT by Jim Bartlett 20200507

Are Overlapping Segments Triangulated?

This question comes up often. The answer is: we cannot tell from just the fact that two shared DNA segments overlap in a chromosome browser.  Here is the picture we see:

11D Figure 1 Browser

In this picture, you are normally A and you have two Matches, B and C, which show as overlapping on Chromosome 6. Because they overlap, is this Triangulation? Do A, B and C shared the same Common Ancestor? We cannot tell from this picture.

Assuming the shared DNA segments are Identical By Descent (IBD) – generally true for all such shared segments over 15cM – there are two possibilities:

  1. They are on different Chromosome 06’s in A. Remember we have two of each Chromosome – one from our mother and one from our father.


11D Figure 2a Two Chr

In this case, we are (somehow) looking at just A’s two Chromosome 06’s and showing where the shared DNA segments are on A’s DNA. It looks just like the picture we saw in the browser – two overlapping DNA segments. But in this case A & B are sharing on A’s maternal Chromosome 06; and A & C are sharing on A’s paternal Chromosome 06. These two Chromosome 06’s are physically separate (think of two strands of spaghetti). Because A & B have a shared DNA segment, they have a Common Ancestor (CA) who passed that DNA down to them. Because we know in this example that it’s on the maternal Chromosome (the one from A’s mother), we know the CA is on A’s maternal side. Similarly, we know the CA with C is on A’s paternal side. Yes, there is a very unlikely chance that these two CA’s could be the same person, and the DNA segment came down two very different paths to A’s mother and father. I’ll not be sarcastic here – you can decide for yourself if you think that is possible (or what the probability is) in your case.* In general, in genetic genealogy, we conclude that B & C are probably not related to each other – at least not on this segment.

  1. Alternatively, the two shared segments are on the same Chromosome 06 in A – let’s say, for example, they are both on the maternal side (imagine the two bars below on one Chromosome).


11D Figure 3a One Chr

In this case, we are (somehow) looking at just A’s one maternal Chromosome 06, and showing where the shared DNA segments are. Again, it looks just like the picture we saw in the browser – two overlapping DNA segments. But in this case A & B and A & C are sharing on A’s maternal Chromosome 06 (they are both on the same strand of spaghetti). From the beginning of the A & C shared segment to the end of the A & B shared segment, we are looking at the exact same place on A’s Chromosome 06. For there to be a match, all the tested markers (SNPs) are the same. In general, in genetic genealogy, we take this to mean that this DNA came from the same Common Ancestor. It came from that CA down to A and to B and to C. Because both B and C share this same segment of DNA found on one Chromosome 06 in A, both and B and C should themselves show up as a Match to each other. After all they have the same DNA over this area of their own Chromosome 06.

You may have noticed that I stated each explanation of the two possibilities with: “In this case, we are (somehow) looking at…” Well we can’t just look at just one chromosome in a browser and compare it to someone else’s DNA. We don’t have that technology for genealogy DNA testing. But if we could, that is what we would see (probably without the color coding). But we cannot! We can only visualize it. So what can we do?

We use reverse logic. In the first possibility, we noted that B & C wouldn’t match each other; and in the second possibility, we noted that B & C should match each other. That is information we often can determine (at 23andMe, MyHeritage and GEDmatch – and round-aboutly at FTDNA). So, we say that if A matches B, and A matches C on the same/overlapping DNA segment, AND B matches C there too, it indicates the second possibility above – the three of them share the same Common Ancestor. This case of A=B=C=A is called segment Triangulation, and the three Matches are in a Triangulated Group [TG]. There is more about Triangulation here.

In my case, I have close to 20,000 Match/segments – each shared DNA segment is in one of 372 TGs which cover all of my DNA. In other words, these 372 TGs form a segment map of my 45 Chromosomes. The objective now is to determine the Ancestors who passed these TGs down to my parents and then to me.

*If you want to check to see if you have the same segment from your mother and your father, upload your DNA to http://www.GEDmatch.com and use the “Are Your Parents Related” program. It will show you any such segments, which is good information to have in any case.


[11D] Segment-ology: Are Overlapping Segments Triangulated? by Jim Bartlett 20200414

Download Your AncestryDNA Matches in 10 Minutes!

A Segmentology TIDBIT


UPDATE: AncestryDNA has issued a cease and desist order, and this process is no longer available to download your Matches. Sorry about that.

That is download: all your Matches, a hyperlink [to their Page as a Match to you], Shared cM, Shared Segments, Tree Type, Tree Size, Common Ancestors [per ThruLines], a tic for each Dot and Star, and your Notes! This fast download does NOT include your Shared Matches, which may take days to download.

Here’s the process:

  1. Before running this program, I set up a separate folder with todays date [e.g. 20200409] for each download; the Shared Clustering program will give you a chance to select this folder and to rename the download file.
  2. Download the Shared Clustering program. See my review of this program here. The link to upload this program is: https://github.com/jonathanbrecher/sharedclustering/wiki
  3. Click on Download TAB
  4. Enter your Ancestry user name and password [stored on your PC only]
  5. Click on Sign In
  6. Select your Test (if you have access to more than one)
  7. Click the button for Fast but incomplete
  8. Open Advanced options
  9. Lowest centimorgans to retrieve: 6 [this includes all of your Matches]
  10. Lowest centimorgans of shared matches: 4000 [this means don’t download any Shared Matches]
  11. Click on: Get DNA Matches


Here’s a picture of the message when the download is complete:

So 125,000+ Matches in 6 minutes – your results may vary.

After the download, Export the downloaded txt file to Excel. Click on the Export TAB, and follow the prompts to create an Excel file – takes about 4 min.

You can then use/manipulate the Excel file. You can sort on any field, and you can edit any Notes and then Upload those revisions back to AncestryDNA. I use this as an opportunity to do a Quality check of my Notes, and to insure I have a Note for each Match with a ThruLines Common Ancestor. I find it’s much easier to edit Notes in the spreadsheet, than to jump around to each Match at AncestryDNA. NB: Don’t edit Notes in AncestryDNA when you are also editing Notes in the spreadsheet. If you do any edits in AncestryDNA, you need to do a new Download (it only takes 10 minutes!)


[22AS] Segment-ology: Download Your AncestryDNA Matches in 10 Minutes!

TIDBIT by Jim Bartlett 20200409 EDITED 20200808

AncestryDNA ThruLines Missing Out

A Segment-ology TIDBIT

ThruLines is based on genealogy – it finds Common Ancestors based on your Tree and the Trees of others. However, it only reports Common Ancestors with your DNA Matches. So, in a sense it has a DNA component. But the connections TL finds are not based on shared DNA cMs, Chromosome location, segment Triangulation, Clustering or Shared Matching – it is based only on connections found through Trees (only on genealogy). And ThruLines only reports Common Ancestors with your DNA Matches.

This is a two edge sword:

  1. If you only want to work with DNA Matches, it’s a good thing.
  2. However, if you are a genealogist looking for cousins who might share records, pictures, stories, analysis, new branches, etc., it leaves something out. Remember that roughly half of our 4th cousins (4C) don’t share DNA with us, and roughly 90% of our true 5C don’t share DNA with us, and the vast majority of our more distant true cousins don’t share any DNA with us. This means that, although a program like ThruLines could find those non-DNA-sharing cousins for us, it doesn’t. Think of all that we are missing – think of all the lost opportunities.

Well… looking back on the #1 cutting edge of the sword – I’ve got to be a happy camper. I’m finding more ThruLines Matches than I can keep up with. By adding children and grandchildren of my Ancestors in my Tree, ThruLines is finding more Matches with Common Ancestors. And these Matches and their Trees are reinforcing my Tree (and pointing out a few soft spots…)

Back to work… Stay safe!

[22AR] Segment-ology: AncestryDNA ThruLines Missing Out – TIDBIT by Jim Bartlett 20200326

In Defense of Small Segments

Do you remember genealogy before atDNA? Pre-2010?

There was a time when we didn’t know about atDNA segments. We researched records, and looked at other people’s Trees/records. We developed our Trees and found cousins, without any knowledge of whether we shared any DNA or not.

So what’s changed?

We got a great new tool called atDNA that told us who we “matched” based on one or more shared atDNA segments. Each company developed an algorithm and reported Matches based on at least 6-8cM of matching DNA. The concept was that a person who shared a DNA segment of at least the minimum “threshold” size was probably related. Early on we learned that a shared DNA segment of at least 15cM was “always” a true match – it was Identical By Descent (IBD); and those IBD shared segments came from a Common Ancestor (CA) to us and our Matches. We also learned that from the company threshold (6-8cM) up to 15cM, some of the shared segments were false – the lower the cM, the more likely that the shared segment was false. Generally, about half of the 7cM shared segments were true and half were false; 6cM shared segments were false most of the time, and 8-15cM shared segments were true most of the time – we just couldn’t tell which were true and which were false. Some of the companies had other ways to improve the probabilities, but many of the experts admonished us to generally avoid using the segments below 15cM. A huge debate grew up about the use of 6-15cM shared DNA segments.

To get some data on shared DNA segments, Blaine Bettinger developed the Shared cM Project which showed our collective experience in finding cousins with various amounts of shared cM. His chart is in this article. The Shared cM Project showed that many had found 3rd cousins (3C) to 6C with ranges of cMs down to the threshold amounts. And at the testing companies and GEDmatch, we were finding 3C to 8C with shared segments in the 6-15cM range. AncestryDNA reported Circles (with CAs) out to 8C. The genetic genealogy community was finding cousins with these small shared segments – we just didn’t know if the DNA segments were true or false.

We also heard about scientific studies that showed that most of the IBD (true) shared segments in the 5 to 20cM range were from ancestors greater than 10 generations back – at least 8xG grandparents (or 9C level). This is usually beyond a genealogy time frame for many of us. For instance, see the Speed and Balding chart in this article. But even this data showed that within the 5 to 20cM range there were some 3C to 8C.

However, we continue to be admonished to avoid, or discard, Matches in this 6-15cM range. Such small segments were branded as “suspicious”, “dangerous”, “poison”, “a fool’s errand”, etc.

I don’t deny that some of the 6-15cM shared segments are false, and that many of them are beyond a genealogical time frame. But on the other hand, some of them are true and within a genealogical time frame. I’m unwilling to discard all of them, because some of them are false or too distant. As I will show below, many of my Matches with these small segments are very useful.

What’s at stake?

So, before we adopt a hard rule one way or the other, let’s look at small segments from a different viewpoint. At AncestryDNA, I have 120,000 Matches. Their ThruLines (TL) program has identified over 2,000 Matches who share a specific CA with me. The shared DNA segments range from 208cM down to 6cM; and from 2C to 6C. In fact about 2/3 of these TL CAs are with Matches who share 6-15cM segments with me. Based on my 45 years of true genealogy research, I’ve determined that only about 5% of these TL Matches are incorrect (the Matches and I may still be cousins somehow, but not on the CA identified by TL). So… over 1,900 of these TL Match cousins and CAs are ‘keepers”. I don’t want to throw away 2/3 of these easily identified CAs.

This genealogy analysis had nothing to do with the size of shared DNA segments. I believe these 1,900 people (Matches) are my true cousins – even if we didn’t share any DNA! As a genealogist, I’m a happy camper.  I very much want to share records, stories, pictures, research, and other descendants, or maybe test a Y-DNA or mtDNA line, with each of these new-found cousins. Even if I could eventually determine that our shared DNA segment was false, this person is still a cousin.

Most of our true Cousins won’t be DNA Matches

Over half of our 4C wouldn’t show up as a DNA Match; only about 10% of our 5C would show up as a Match; and only a very small fraction of our deeper cousins will show up as Matches. So when someone does shows up as a DNA Match (at any level), and there is a valid paper trail showing they are an 8C – why not accept that? At least accept the 8C part, if not the DNA link. Later, in Triangulated Groups or Clusters, we’ll see if that person “groups” with others on the same line. This would indicate to me that the genealogy was true.

Between 1974 (when I started researching genealogy in earnest) and and 2010 (when atDNA testing became available), I found many cousins with no knowledge of any shared DNA. Some of them probably shared DNA with me, but most would did not. But they were all my cousins.

I hope I’ve made two key points so far: 1. atDNA is just a tool we’ve used over the past 10 years – it’s not our master; and 2. atDNA does not find everything in genealogy – we have many cousins, and indeed many Ancestors, we will never find with shared atDNA.  Ponder these points for a moment….

So back to small shared segment (6-15cM) Matches – are they worth it? Well as discussed above, of course they are!

Are they useful in Genetic Genealogy – beyond just as cousins? I think the answer is often they are useful… Let’s look at a few situations.

ThruLines and Clusters

Suppose, using ThruLines at AncestryDNA, you found 20 Matches in the 6-15cM range, who were all cousins (3C to 6C) on a line back to a 5xG grandparent couple. [NB: I have 64 5xG grandparent couples, and over 2,000 TL Matches – an average of 30 TL Matches (with a CA) per couple, so 20 TL Matches is a reasonable number]. At AncestryDNA we don’t have shared segment info for Triangulation, but we can do Clustering. Let’s Cluster on a 6cM threshold (all my 120,000 Matches, including the 1,900 good TL Matches). If the above 20 6-15cM Matches were sprinkled all over the Matrix (in different Clusters) – then nothing special. But if 11 Matches (of the 20) are in one Cluster, and 6 are in another Cluster, I’d sit up and take notice! There is nothing “random” about that. Clusters are formed on Common Ancestors, so we’d expect to see most of these 20 TL Matches in a Cluster, or two, or three. I have mostly Colonial Virginia ancestry, and some of my Matches have multiple CAs – so some of the 20 TL Matches may well wind up in a different Cluster. But, whenever your Matches form a strong* group (Cluster, Triangulated Group, DNA Painting, etc), they are very likely to have the same CA and share IBD segments. At least this is a good hypothesis.  At this point I am not claiming a “proof”, but I am claiming a lot of evidence that points in one direction. [*strong group does not mean 2 or 3 Matches in a Cluster; nor 10 Matches in a Cluster, but each one only matches 2 or 3 others. A strong group would be 10 Matches in a Cluster with each one matching about 8 of the others. Use judgment here.]

Finding more CAs in Clusters

In the big picture, all of our Matches can be divided into two groups – those with true shared DNA segments, and those with false DNA segments. I believe most of my 120,000 Matches at AncestryDNA have true shared DNA segments with me (although as outlined above, I don’t really care if some are not DNA cousins: since AncestryDNA doesn’t show shared DNA segment info, I cannot Triangulate them anyway). Therefore, if Clustering groups these Matches, I have every reason to believe they are valid when they point to the same ancestral line. And if some of those Clustered Matches (with a CA on the same ancestral line) have small (6-15cM) shared DNA segments with me – so what? It’s close enough for a second look. Recently I’ve been looking through my Clusters for Matches who have over 1,000 people in a Public Tree. Most of my Clusters have a hypothetical Ancestor in them, so I look for surnames in that line.  Sometimes, I find a clue and I’m able to build the Match’s Tree out to connect with my line. This adds even further evidence that this Cluster is based on that line.

Genealogy vs Genetic Genealogy

Another aspect of this whole discussion is genealogy vs. genetic genealogy. If you are just interested in genealogy, it doesn’t matter what the size of the shared DNA segment is. In fact, while looking at Hints, I run across a lot of helpful Trees, where the owners are not DNA Matches at all. Only in certain circumstances (Chromosome Mapping; bio parents/Ancestry; “proof” where genealogy records are insufficient; etc.), do you need to insure a shared DNA segment is true (IBD) and cannot be from a different Ancestor. So, unless you need to “prove” a genetic link, don’t worry about the size of the shared DNA segment. There is a lot to learn from many of your DNA Matches, even those with small segments, and even from other people (with no DNA match) at Ancestry.

Breaking through a Brick Wall

Even breaking though a brick wall is primarily a genealogy exercise. To be clear, this process is often aided by starting with a group of DNA Matches (Painted, Clustered and/or Triangulated), and looking for Matches with Trees that have Common Ancestors among themselves – beyond where your brick wall is. In these cases you are using DNA Matches who are probably related to you and who group with other Matches. You use this cadre of Matches to find a CA among them. This is basically a genealogy exercise – and, again, it doesn’t make a lot of difference how much shared DNA you have. In fact, to find a CA beyond your brick wall, you are probably looking for a distant CA – often found with smaller DNA segments. So don’t discard those Matches with small segments who have a Common Ancestor with you – use them.

Use caution with isolated small segments

My discussions above about using small segments is in the context of clues and grouping (Painting, Clustering, Triangulating, etc). IMO, it is reckless, and wrong, to find a 9C Match, sharing 10cM, and declare that “proves” the Ancestral line by itself. Such a “find” is one clue (and by itself, a very shaky one), and much more corroborating evidence is needed even to form a hypothesis. The “rule-of-thumb”, I’ve been using is to have at least G independent Matches (at least cousins to each other) who all agree on the same Common Ancestor – were G is the number of Gs of the Common Ancestor. At the 7xG grandparent level (9 generations back – 8th cousin level) this means 8 Matches in agreement.  It’s relatively easy to get that many Matches in a Cluster or Triangulated Group – it’s much harder to find Common Ancestors with each of them. So be sure to include those CAs from Matches who share small DNA segments with you!

Bottom Line

Use as many of your DNA Matches as you can, to learn more about your own genealogy. IMO, Matches with small shared DNA segments often provide the clues and evidence you are looking for. But use extreme caution with small shared DNA segments in isolation – they are much more credible when they are part of a group. Small segments in context and groups can be very helpful.


[06C] Segment-ology: In Defense of Small Segments by Jim Bartlett 20200131

20200202: Edited 10 paragraph to change “DNA segments” to “genealogy”

How Many TGs From Distant Ancestors?

I was recently asked if I’d thought about this question. The quick answer is YES – the answer to this question is at the core of my belief that genetic genealogy is valid out to 9 generations back. And I think this question is really two questions: one about the Triangulated Groups (TGs) themselves; and one about the Matches with shared DNA segments within each TG.

How far back do our TGs go?

Using a 7cM threshold for shared DNA segments, I’ve documented 372 TGs, covering over 98% of my DNA. These TGs have natural breaks [recombination crossover points] between them. These TGs represent actual DNA segments, on my chromosomes, which are from my Ancestors down to a parent to me.  So how far back do they probably go?

The number of segments we have at each generation of our ancestors is fairly easy to estimate. Using a female to make it easier, she gets 46 segments from her two parents – in the form of 46 chromosomes. Pretty big segments…  Using the average recombination rate of 34 crossovers per genome (per parent), she would get 68 additional segments one generation back. In other words she would have a total of 46+68=114 segments from her grandparents. And she would get 114+68=182 segments from her Great grandparents.  Here is a handy table I made up for my reference:

This table starts with me at the bottom and shows the generations back, the number of Ancestors at each generation back, the generic name of those Ancestors, the relationship of my cousins who share a Common Ancestor with me at that level, the calculated percentage and cM amount of DNA I got from each of those Ancestors (at any given number of generations back), the calculated average number of segments in my DNA from all the Ancestors in any given generation, the average cMs per TG; and in the last two columns the average and range of cMs collected in Blaine’s cM study. The first column is just for a very rough estimate of the birth year of my Ancestors at any given generation (it helps me).

Highlighted in yellow is the 386 segments expected (roughly) from my 3xG grandparents. That’s roughly the same as my 372 TGs. So I expect some kind of distribution curve around that point. Matches who share the full DNA segment represented by a TG would probably be 4th cousins (4C). Due to the random nature of DNA, I expect a range from 2C to 7C or 8C. My TGs range in size from a few just over 7cM to some around 50cM – it all depends on several variables.

Another aspect of this discussion has to do with what I call “sticky” segments. Per the Table above at 5 generations back we would see 386 segments – or 386 TGs – of about 18cM each. But going back one more generation – one more round of 68 crossover points would result in 454 segments. This means that 64 of the 386 segments were subdivided, and 322 segments were not! This means that 322 segments (TGs) were passed down intact (no recombination). The effect of this is that many TGs will persist, at the same size, for several generations. We could well see the same size TG from a 6xG grandparent to a 5xG to a 4xG to a 3xG grandparent. So it would be possible for a 7C, 6C, 5C and 4C to all share the full size DNA segment represented by the TG. Clearly the probabilities of that decrease as the cousinship increases.

Bottom line from my experience: I think we’ll find most of our TGs to be within a genealogical time frame of, say, 9 or 10 generations. And there is always the opportunity for closer cousins to share a DNA segment within any of our TGs.

How far back do the Matches go?

This is a different, but related, question. The above discussion was all about the full DNA segment represented by a TG. Most of our Matches in a TG will not share the full DNA segment. They overlap us or are wholly included within the TG segment. For example, the Matches in 20cM TG can range from sharing 7cM up to 20 cM. And, in fact, some of our closer cousins may share 35cM and span across more than one TG. It’s very random. However, to the point of the question – many of our Matches who share, say, 7 to 15cM may well be cousins beyond the Ancestors who passed down the full TG. To be sure, the Common Ancestors in this case would be ancestral to the TG Ancestor, but it could be 10, 20, or more generations back.

Bottom line: Matches in a TG are limited to a narrow range of your Ancestors, but they are not limited by how close or how distant they could be. And Matches who share small segments may well be beyond a genealogical timeframe; but some will be within a genealogical timeframe. Witness the Ancestry ThruLines Common Ancestors down to 6cM.

Summary: I think most TGs will be within a genealogical timeframe (using a 7cM threshold for shared DNA segments). The Matches in a TG will range from close Matches, out to Matches on the fringes of our genealogy and on out to Matches who will be beyond our genealogy.


[19H] Segment-ology: How Many TGs From Distant Ancestors? By Jim Bartlett 20191217