In Defense of Small Segments

Do you remember genealogy before atDNA? Pre-2010?

There was a time when we didn’t know about atDNA segments. We researched records, and looked at other people’s Trees/records. We developed our Trees and found cousins, without any knowledge of whether we shared any DNA or not.

So what’s changed?

We got a great new tool called atDNA that told us who we “matched” based on one or more shared atDNA segments. Each company developed an algorithm and reported Matches based on at least 6-8cM of matching DNA. The concept was that a person who shared a DNA segment of at least the minimum “threshold” size was probably related. Early on we learned that a shared DNA segment of at least 15cM was “always” a true match – it was Identical By Descent (IBD); and those IBD shared segments came from a Common Ancestor (CA) to us and our Matches. We also learned that from the company threshold (6-8cM) up to 15cM, some of the shared segments were false – the lower the cM, the more likely that the shared segment was false. Generally, about half of the 7cM shared segments were true and half were false; 6cM shared segments were false most of the time, and 8-15cM shared segments were true most of the time – we just couldn’t tell which were true and which were false. Some of the companies had other ways to improve the probabilities, but many of the experts admonished us to generally avoid using the segments below 15cM. A huge debate grew up about the use of 6-15cM shared DNA segments.

To get some data on shared DNA segments, Blaine Bettinger developed the Shared cM Project which showed our collective experience in finding cousins with various amounts of shared cM. His chart is in this article. The Shared cM Project showed that many had found 3rd cousins (3C) to 6C with ranges of cMs down to the threshold amounts. And at the testing companies and GEDmatch, we were finding 3C to 8C with shared segments in the 6-15cM range. AncestryDNA reported Circles (with CAs) out to 8C. The genetic genealogy community was finding cousins with these small shared segments – we just didn’t know if the DNA segments were true or false.

We also heard about scientific studies that showed that most of the IBD (true) shared segments in the 5 to 20cM range were from ancestors greater than 10 generations back – at least 8xG grandparents (or 9C level). This is usually beyond a genealogy time frame for many of us. For instance, see the Speed and Balding chart in this article. But even this data showed that within the 5 to 20cM range there were some 3C to 8C.

However, we continue to be admonished to avoid, or discard, Matches in this 6-15cM range. Such small segments were branded as “suspicious”, “dangerous”, “poison”, “a fool’s errand”, etc.

I don’t deny that some of the 6-15cM shared segments are false, and that many of them are beyond a genealogical time frame. But on the other hand, some of them are true and within a genealogical time frame. I’m unwilling to discard all of them, because some of them are false or too distant. As I will show below, many of my Matches with these small segments are very useful.

What’s at stake?

So, before we adopt a hard rule one way or the other, let’s look at small segments from a different viewpoint. At AncestryDNA, I have 120,000 Matches. Their ThruLines (TL) program has identified over 2,000 Matches who share a specific CA with me. The shared DNA segments range from 208cM down to 6cM; and from 2C to 6C. In fact about 2/3 of these TL CAs are with Matches who share 6-15cM segments with me. Based on my 45 years of true genealogy research, I’ve determined that only about 5% of these TL Matches are incorrect (the Matches and I may still be cousins somehow, but not on the CA identified by TL). So… over 1,900 of these TL Match cousins and CAs are ‘keepers”. I don’t want to throw away 2/3 of these easily identified CAs.

This genealogy analysis had nothing to do with the size of shared DNA segments. I believe these 1,900 people (Matches) are my true cousins – even if we didn’t share any DNA! As a genealogist, I’m a happy camper.  I very much want to share records, stories, pictures, research, and other descendants, or maybe test a Y-DNA or mtDNA line, with each of these new-found cousins. Even if I could eventually determine that our shared DNA segment was false, this person is still a cousin.

Most of our true Cousins won’t be DNA Matches

Over half of our 4C wouldn’t show up as a DNA Match; only about 10% of our 5C would show up as a Match; and only a very small fraction of our deeper cousins will show up as Matches. So when someone does shows up as a DNA Match (at any level), and there is a valid paper trail showing they are an 8C – why not accept that? At least accept the 8C part, if not the DNA link. Later, in Triangulated Groups or Clusters, we’ll see if that person “groups” with others on the same line. This would indicate to me that the genealogy was true.

Between 1974 (when I started researching genealogy in earnest) and and 2010 (when atDNA testing became available), I found many cousins with no knowledge of any shared DNA. Some of them probably shared DNA with me, but most would did not. But they were all my cousins.

I hope I’ve made two key points so far: 1. atDNA is just a tool we’ve used over the past 10 years – it’s not our master; and 2. atDNA does not find everything in genealogy – we have many cousins, and indeed many Ancestors, we will never find with shared atDNA.  Ponder these points for a moment….

So back to small shared segment (6-15cM) Matches – are they worth it? Well as discussed above, of course they are!

Are they useful in Genetic Genealogy – beyond just as cousins? I think the answer is often they are useful… Let’s look at a few situations.

ThruLines and Clusters

Suppose, using ThruLines at AncestryDNA, you found 20 Matches in the 6-15cM range, who were all cousins (3C to 6C) on a line back to a 5xG grandparent couple. [NB: I have 64 5xG grandparent couples, and over 2,000 TL Matches – an average of 30 TL Matches (with a CA) per couple, so 20 TL Matches is a reasonable number]. At AncestryDNA we don’t have shared segment info for Triangulation, but we can do Clustering. Let’s Cluster on a 6cM threshold (all my 120,000 Matches, including the 1,900 good TL Matches). If the above 20 6-15cM Matches were sprinkled all over the Matrix (in different Clusters) – then nothing special. But if 11 Matches (of the 20) are in one Cluster, and 6 are in another Cluster, I’d sit up and take notice! There is nothing “random” about that. Clusters are formed on Common Ancestors, so we’d expect to see most of these 20 TL Matches in a Cluster, or two, or three. I have mostly Colonial Virginia ancestry, and some of my Matches have multiple CAs – so some of the 20 TL Matches may well wind up in a different Cluster. But, whenever your Matches form a strong* group (Cluster, Triangulated Group, DNA Painting, etc), they are very likely to have the same CA and share IBD segments. At least this is a good hypothesis.  At this point I am not claiming a “proof”, but I am claiming a lot of evidence that points in one direction. [*strong group does not mean 2 or 3 Matches in a Cluster; nor 10 Matches in a Cluster, but each one only matches 2 or 3 others. A strong group would be 10 Matches in a Cluster with each one matching about 8 of the others. Use judgment here.]

Finding more CAs in Clusters

In the big picture, all of our Matches can be divided into two groups – those with true shared DNA segments, and those with false DNA segments. I believe most of my 120,000 Matches at AncestryDNA have true shared DNA segments with me (although as outlined above, I don’t really care if some are not DNA cousins: since AncestryDNA doesn’t show shared DNA segment info, I cannot Triangulate them anyway). Therefore, if Clustering groups these Matches, I have every reason to believe they are valid when they point to the same ancestral line. And if some of those Clustered Matches (with a CA on the same ancestral line) have small (6-15cM) shared DNA segments with me – so what? It’s close enough for a second look. Recently I’ve been looking through my Clusters for Matches who have over 1,000 people in a Public Tree. Most of my Clusters have a hypothetical Ancestor in them, so I look for surnames in that line.  Sometimes, I find a clue and I’m able to build the Match’s Tree out to connect with my line. This adds even further evidence that this Cluster is based on that line.

Genealogy vs Genetic Genealogy

Another aspect of this whole discussion is genealogy vs. genetic genealogy. If you are just interested in genealogy, it doesn’t matter what the size of the shared DNA segment is. In fact, while looking at Hints, I run across a lot of helpful Trees, where the owners are not DNA Matches at all. Only in certain circumstances (Chromosome Mapping; bio parents/Ancestry; “proof” where genealogy records are insufficient; etc.), do you need to insure a shared DNA segment is true (IBD) and cannot be from a different Ancestor. So, unless you need to “prove” a genetic link, don’t worry about the size of the shared DNA segment. There is a lot to learn from many of your DNA Matches, even those with small segments, and even from other people (with no DNA match) at Ancestry.

Breaking through a Brick Wall

Even breaking though a brick wall is primarily a genealogy exercise. To be clear, this process is often aided by starting with a group of DNA Matches (Painted, Clustered and/or Triangulated), and looking for Matches with Trees that have Common Ancestors among themselves – beyond where your brick wall is. In these cases you are using DNA Matches who are probably related to you and who group with other Matches. You use this cadre of Matches to find a CA among them. This is basically a genealogy exercise – and, again, it doesn’t make a lot of difference how much shared DNA you have. In fact, to find a CA beyond your brick wall, you are probably looking for a distant CA – often found with smaller DNA segments. So don’t discard those Matches with small segments who have a Common Ancestor with you – use them.

Use caution with isolated small segments

My discussions above about using small segments is in the context of clues and grouping (Painting, Clustering, Triangulating, etc). IMO, it is reckless, and wrong, to find a 9C Match, sharing 10cM, and declare that “proves” the Ancestral line by itself. Such a “find” is one clue (and by itself, a very shaky one), and much more corroborating evidence is needed even to form a hypothesis. The “rule-of-thumb”, I’ve been using is to have at least G independent Matches (at least cousins to each other) who all agree on the same Common Ancestor – were G is the number of Gs of the Common Ancestor. At the 7xG grandparent level (9 generations back – 8th cousin level) this means 8 Matches in agreement.  It’s relatively easy to get that many Matches in a Cluster or Triangulated Group – it’s much harder to find Common Ancestors with each of them. So be sure to include those CAs from Matches who share small DNA segments with you!

Bottom Line

Use as many of your DNA Matches as you can, to learn more about your own genealogy. IMO, Matches with small shared DNA segments often provide the clues and evidence you are looking for. But use extreme caution with small shared DNA segments in isolation – they are much more credible when they are part of a group. Small segments in context and groups can be very helpful.


[06C] Segment-ology: In Defense of Small Segments by Jim Bartlett 20200131

20200202: Edited 10 paragraph to change “DNA segments” to “genealogy”