In Defense of Small Segments

Do you remember genealogy before atDNA? Pre-2010?

There was a time when we didn’t know about atDNA segments. We researched records, and looked at other people’s Trees/records. We developed our Trees and found cousins, without any knowledge of whether we shared any DNA or not.

So what’s changed?

We got a great new tool called atDNA that told us who we “matched” based on one or more shared atDNA segments. Each company developed an algorithm and reported Matches based on at least 6-8cM of matching DNA. The concept was that a person who shared a DNA segment of at least the minimum “threshold” size was probably related. Early on we learned that a shared DNA segment of at least 15cM was “always” a true match – it was Identical By Descent (IBD); and those IBD shared segments came from a Common Ancestor (CA) to us and our Matches. We also learned that from the company threshold (6-8cM) up to 15cM, some of the shared segments were false – the lower the cM, the more likely that the shared segment was false. Generally, about half of the 7cM shared segments were true and half were false; 6cM shared segments were false most of the time, and 8-15cM shared segments were true most of the time – we just couldn’t tell which were true and which were false. Some of the companies had other ways to improve the probabilities, but many of the experts admonished us to generally avoid using the segments below 15cM. A huge debate grew up about the use of 6-15cM shared DNA segments.

To get some data on shared DNA segments, Blaine Bettinger developed the Shared cM Project which showed our collective experience in finding cousins with various amounts of shared cM. His chart is in this article. The Shared cM Project showed that many had found 3rd cousins (3C) to 6C with ranges of cMs down to the threshold amounts. And at the testing companies and GEDmatch, we were finding 3C to 8C with shared segments in the 6-15cM range. AncestryDNA reported Circles (with CAs) out to 8C. The genetic genealogy community was finding cousins with these small shared segments – we just didn’t know if the DNA segments were true or false.

We also heard about scientific studies that showed that most of the IBD (true) shared segments in the 5 to 20cM range were from ancestors greater than 10 generations back – at least 8xG grandparents (or 9C level). This is usually beyond a genealogy time frame for many of us. For instance, see the Speed and Balding chart in this article. But even this data showed that within the 5 to 20cM range there were some 3C to 8C.

However, we continue to be admonished to avoid, or discard, Matches in this 6-15cM range. Such small segments were branded as “suspicious”, “dangerous”, “poison”, “a fool’s errand”, etc.

I don’t deny that some of the 6-15cM shared segments are false, and that many of them are beyond a genealogical time frame. But on the other hand, some of them are true and within a genealogical time frame. I’m unwilling to discard all of them, because some of them are false or too distant. As I will show below, many of my Matches with these small segments are very useful.

What’s at stake?

So, before we adopt a hard rule one way or the other, let’s look at small segments from a different viewpoint. At AncestryDNA, I have 120,000 Matches. Their ThruLines (TL) program has identified over 2,000 Matches who share a specific CA with me. The shared DNA segments range from 208cM down to 6cM; and from 2C to 6C. In fact about 2/3 of these TL CAs are with Matches who share 6-15cM segments with me. Based on my 45 years of true genealogy research, I’ve determined that only about 5% of these TL Matches are incorrect (the Matches and I may still be cousins somehow, but not on the CA identified by TL). So… over 1,900 of these TL Match cousins and CAs are ‘keepers”. I don’t want to throw away 2/3 of these easily identified CAs.

This genealogy analysis had nothing to do with the size of shared DNA segments. I believe these 1,900 people (Matches) are my true cousins – even if we didn’t share any DNA! As a genealogist, I’m a happy camper.  I very much want to share records, stories, pictures, research, and other descendants, or maybe test a Y-DNA or mtDNA line, with each of these new-found cousins. Even if I could eventually determine that our shared DNA segment was false, this person is still a cousin.

Most of our true Cousins won’t be DNA Matches

Over half of our 4C wouldn’t show up as a DNA Match; only about 10% of our 5C would show up as a Match; and only a very small fraction of our deeper cousins will show up as Matches. So when someone does shows up as a DNA Match (at any level), and there is a valid paper trail showing they are an 8C – why not accept that? At least accept the 8C part, if not the DNA link. Later, in Triangulated Groups or Clusters, we’ll see if that person “groups” with others on the same line. This would indicate to me that the genealogy was true.

Between 1974 (when I started researching genealogy in earnest) and and 2010 (when atDNA testing became available), I found many cousins with no knowledge of any shared DNA. Some of them probably shared DNA with me, but most would did not. But they were all my cousins.

I hope I’ve made two key points so far: 1. atDNA is just a tool we’ve used over the past 10 years – it’s not our master; and 2. atDNA does not find everything in genealogy – we have many cousins, and indeed many Ancestors, we will never find with shared atDNA.  Ponder these points for a moment….

So back to small shared segment (6-15cM) Matches – are they worth it? Well as discussed above, of course they are!

Are they useful in Genetic Genealogy – beyond just as cousins? I think the answer is often they are useful… Let’s look at a few situations.

ThruLines and Clusters

Suppose, using ThruLines at AncestryDNA, you found 20 Matches in the 6-15cM range, who were all cousins (3C to 6C) on a line back to a 5xG grandparent couple. [NB: I have 64 5xG grandparent couples, and over 2,000 TL Matches – an average of 30 TL Matches (with a CA) per couple, so 20 TL Matches is a reasonable number]. At AncestryDNA we don’t have shared segment info for Triangulation, but we can do Clustering. Let’s Cluster on a 6cM threshold (all my 120,000 Matches, including the 1,900 good TL Matches). If the above 20 6-15cM Matches were sprinkled all over the Matrix (in different Clusters) – then nothing special. But if 11 Matches (of the 20) are in one Cluster, and 6 are in another Cluster, I’d sit up and take notice! There is nothing “random” about that. Clusters are formed on Common Ancestors, so we’d expect to see most of these 20 TL Matches in a Cluster, or two, or three. I have mostly Colonial Virginia ancestry, and some of my Matches have multiple CAs – so some of the 20 TL Matches may well wind up in a different Cluster. But, whenever your Matches form a strong* group (Cluster, Triangulated Group, DNA Painting, etc), they are very likely to have the same CA and share IBD segments. At least this is a good hypothesis.  At this point I am not claiming a “proof”, but I am claiming a lot of evidence that points in one direction. [*strong group does not mean 2 or 3 Matches in a Cluster; nor 10 Matches in a Cluster, but each one only matches 2 or 3 others. A strong group would be 10 Matches in a Cluster with each one matching about 8 of the others. Use judgment here.]

Finding more CAs in Clusters

In the big picture, all of our Matches can be divided into two groups – those with true shared DNA segments, and those with false DNA segments. I believe most of my 120,000 Matches at AncestryDNA have true shared DNA segments with me (although as outlined above, I don’t really care if some are not DNA cousins: since AncestryDNA doesn’t show shared DNA segment info, I cannot Triangulate them anyway). Therefore, if Clustering groups these Matches, I have every reason to believe they are valid when they point to the same ancestral line. And if some of those Clustered Matches (with a CA on the same ancestral line) have small (6-15cM) shared DNA segments with me – so what? It’s close enough for a second look. Recently I’ve been looking through my Clusters for Matches who have over 1,000 people in a Public Tree. Most of my Clusters have a hypothetical Ancestor in them, so I look for surnames in that line.  Sometimes, I find a clue and I’m able to build the Match’s Tree out to connect with my line. This adds even further evidence that this Cluster is based on that line.

Genealogy vs Genetic Genealogy

Another aspect of this whole discussion is genealogy vs. genetic genealogy. If you are just interested in genealogy, it doesn’t matter what the size of the shared DNA segment is. In fact, while looking at Hints, I run across a lot of helpful Trees, where the owners are not DNA Matches at all. Only in certain circumstances (Chromosome Mapping; bio parents/Ancestry; “proof” where genealogy records are insufficient; etc.), do you need to insure a shared DNA segment is true (IBD) and cannot be from a different Ancestor. So, unless you need to “prove” a genetic link, don’t worry about the size of the shared DNA segment. There is a lot to learn from many of your DNA Matches, even those with small segments, and even from other people (with no DNA match) at Ancestry.

Breaking through a Brick Wall

Even breaking though a brick wall is primarily a genealogy exercise. To be clear, this process is often aided by starting with a group of DNA Matches (Painted, Clustered and/or Triangulated), and looking for Matches with Trees that have Common Ancestors among themselves – beyond where your brick wall is. In these cases you are using DNA Matches who are probably related to you and who group with other Matches. You use this cadre of Matches to find a CA among them. This is basically a genealogy exercise – and, again, it doesn’t make a lot of difference how much shared DNA you have. In fact, to find a CA beyond your brick wall, you are probably looking for a distant CA – often found with smaller DNA segments. So don’t discard those Matches with small segments who have a Common Ancestor with you – use them.

Use caution with isolated small segments

My discussions above about using small segments is in the context of clues and grouping (Painting, Clustering, Triangulating, etc). IMO, it is reckless, and wrong, to find a 9C Match, sharing 10cM, and declare that “proves” the Ancestral line by itself. Such a “find” is one clue (and by itself, a very shaky one), and much more corroborating evidence is needed even to form a hypothesis. The “rule-of-thumb”, I’ve been using is to have at least G independent Matches (at least cousins to each other) who all agree on the same Common Ancestor – were G is the number of Gs of the Common Ancestor. At the 7xG grandparent level (9 generations back – 8th cousin level) this means 8 Matches in agreement.  It’s relatively easy to get that many Matches in a Cluster or Triangulated Group – it’s much harder to find Common Ancestors with each of them. So be sure to include those CAs from Matches who share small DNA segments with you!

Bottom Line

Use as many of your DNA Matches as you can, to learn more about your own genealogy. IMO, Matches with small shared DNA segments often provide the clues and evidence you are looking for. But use extreme caution with small shared DNA segments in isolation – they are much more credible when they are part of a group. Small segments in context and groups can be very helpful.

 

[06C] Segment-ology: In Defense of Small Segments by Jim Bartlett 20200131

20200202: Edited 10 paragraph to change “DNA segments” to “genealogy”

25 thoughts on “In Defense of Small Segments

  1. Hi Jim,

    This is not necessarily the right article for this question, but here goes. What is your method for resolving the small segment length differences between matches who test at more than one company? Do you personally just merge them together as is, or or do you judge the start and stop points better at some companies than at others and adjust your database accordingly? I have read that it doesn’t really matter much, but then again some locations on some chromosomes can be quite different between companies.

    Like

    • Chris,
      I’ve found the results from 23andMe and FTDNA to be very close. GEDmatch used to be very close, too, but they do some imputing which means they add the “probable” SNP value when they are needs for “coverage” between the different testing companies. MyHeritage also imputes and some shared segments run long. I have a 20,000 row spreadsheet to list my shared segments. I start a TG where there appears to be a natural break point (the actual crossover point may be slightly different). I then look for the next natural break point and start a new TG. Some of the shared segments in the previous TG may slip past this point a Mbp or two or three – I don’t let this bother me. The main point is that there are a bunch of shared segments which overlap each other, indicating they all share a lot of the same SNPs – THAT’s the key feature which means that the bulk of that TG segment came from a Common Ancestor. If you are searching for a particular gene that is near a crossover point, you may have to use real phased data. But for genealogy with Matches who all got the same overlapping segment from the same CA, this works fine.
      Hope this helps, Jim

      Like

  2. Hi, Jim,

    I was very happy to read your article because of a recent comment from a match to me who is a well known professional genealogist. We triangulate with 6 others who range in centimorgans from 16cM to 10cM. The individual in question told me that his 10 cM match to me was too small and he didn’t want to participate with the 7 of us in finding our common ancestor or in zeroing in on a location where we must have matched at some time! I had read in another blog that small segments were sometimes valuable and in this case, I found very good results in connecting 6 of us. This CA was in Ireland which is exceedingly difficult to research. I was proud of my success in connecting the dots here and disappointed in the “professional’s” comments.

    Like

    • Roberta, This blog post on Small Segments was focused primarily on AncestryDNA Matches where we don’t know the detailed segment information. At AncestryDNA we have to rely on tools like Clustering to group our Matches. “When” a Cluster shows a number of Matches with the same Ancestry (which is expected in a Cluster), I think this is a very strong clue that that *is* the Ancestry for the Cluster. However two things: 1) this applies to a *strong* Cluster where most of the Matches are Shared Matches with most of the other Matches in the Cluster; and 2) not everyone in the Cluster may have the same Ancestry (but I’ve found it to be such a strong clue, that I often find that Ancestry in other Matches in the Cluster when I look at their genealogy in depth).
      In the cases with the other DNA companies, we can do true segment Triangulation. I feel each Triangulated Group presents a much stronger case for a Common Ancestor. If there is any doubt that some of the segments in a TG may be false, those segments can be Triangulated in their own right. In your case, if the 7 of you are at least 1C or more to each other, you have a good TG. I think there is enough evidence to pursue finding a Common Ancestor. Good luck. Jim

      Like

  3. I guess great minds think alike Jim. 😉 I have been using some of the techniques you described for some time now in my own research. Also, I have both parents tested and a well-proven/researched tree has really made the smaller segments a very useful tool if used properly. Now that I have all our family results imported to other sites with browser tools, it has opened new avenues of research and has really expedited my mapping at DNA Painter. Well done and I shared this page with our local FB genealogy group and some of my genealogy students at the local college where I teach. Now we need to compare our Virginia colonial genealogies sometime. I have very deep VA roots back to Jamestown.

    Like

    • Larry,
      Thanks for your comments. My experience is much like yours – with years of genealogy research already documented, it’s much easier to find others who’s Trees agree, and we have Common Ancestors. To me, when I find the grouping techniques agree with these Common Ancestors, that’s an added level of confirmation. Very similar to finding marriage, census, tax, and other paper records all in agreement. In each case where I disagree with a ThruLines CA, I check back and find that the Clustering was also way different – confirming to me that the TL was wrong, and that the Clustering tool is a powerful one. However care is needed, as Clustering is not as rigid as Triangulation. And all require extra care under 15cM. However, finding new Matches with CAs is a very important aspect of these tools. I’ve got the segments (TGs) for all of my DNA identified and locked in. What remains is finding CAs for each TG. For me, that is the major issue now.
      Thanks again for posting.

      Like

      • Larry, In further reply – I have my Dad’s DNA only (my bio-mom died in 1961). It took me many months of hard work to Triangulate all my segments from FTDNA, 23andMe, MyHeritage and a few AncestryDNA at GEDmatch. But it was worth it (to me) – I now have 372 TGs that cover all of my tested DNA. Every single new segment I get falls easily into one of my extant TGs (a close cousin may span over more than on TG). Grouping by TGs is done, and adding new ones is easy. I know all the boundaries of my personal jigsaw puzzle pieces – I just don’t have all the pictures (my Ancestors) for each TG back as far as I want it. So I’m using Clustering at AncestryDNA to group the Matches and find more Ancestors – the trick is, now, how do I think the Cluster Ancestors to my extant TGs?
        I don’t have either parents’ DNA at AncestryDNA – If you do Clustering on your self and your parents at AncestryDNA, I’d be very interested in learning if all your Clusters were validated by one parent or the other.
        Jim

        Like

  4. Excellent article and I have found similar results. Prior to Ancestry’s change in the Algorithm, I had similar results with being able to prove about 70% of my matches, some of which went back to 7th and 8th grandparents. Luckily I had family members who had kept and published ancestry books since the 1800s, so I had records back to these ancestors, and the DNA matches to family members on these lines seemed logical, and I too had similar results with only being able to prove out about 70% of the matches. After Ancestry changed the algorithm, the matches only go back to the 5th great grandparents. With the new algorithm I lost all my matches beyond 5th great grandparents, luckily I created a spread sheet with all the matches prior to the change. After the change my confirmation went to almost 95% as well. Steve Coker is not a DNA Match, however, I am a DNA match to a number of the individuals whom Steve is the administrator, as well as we have a number of relations by marriage. As such I often come across his records.

    One advantage that I have over many is that my Great Grandfather was born 124 years before I was born, which in many families would be 4 to 6 generations, but for me was only 3 generations, so as a result I match back 2 to 3 generations more than many individuals my age, and another couple of generations for 20 year old.

    Like

    • Stuart, great to hear about your success. And I think what you are saying is that you *concur* with a high percentage of your Matches on the Common Ancestors. It’s what we did in genealogy before DNA. Finding others with the same CAs often lead to finding new information about them, and made our genealogy stronger. In these days with DNA, we rarely use the word *prove* because that involves other criteria. Thanks for your story! Jim

      Like

  5. This resonates so much with my own experience. Recently I have been worrying about the possibility of IBC (I don’t have a parent to test), and it was reassuring to read your thoughtful comments about the usefulness of building our trees using a small match as a hint – regardless of whether the match is IBC or IBD. If we can build our tree from that connection, using traditional genealogy methods, then it doesn’t matter whether the DNA match is valid. Thank you.

    Like

    • Jean – Thank you! You got my message perfectly, and worded it well: if it helps us in traditional genealogy, it doesn’t mater if the DNA is valid. Precisely. Genealogy is a great hobby. We should not become a slave to only working on it with large-segment Matches. Jim

      Like

  6. Hi Jim,

    Thank you, for putting this information into basic english. So easy to read and understand. Wonderful article.

    D. Reeves

    —————————————–

    Like

  7. My 4th gGrandfather had 4 well-documented children. Thu-Lines shows I match each of my 3rd Grandfather’s siblings (A: 7 descendants, 21,8,7,7,8,7,11 cM; B: 2 descendants, 11,7 cM; C: 4 descendants, 8,11,7,13 cM; D: 2 descendants, 8,9 cM). Of course, no segment details. But, all these “small” segments are collectively suggestive in context. Leave no data behind!

    Like

    • Joe, Thanks for your feedback. I see a bumper sticker in this: Leave No Data Behind!
      What if we found a 7cM Match who had the Family Bible with birth dates for everyone? Do we discard it because it’s from a small segment Match? Of course not. Out of all the Matches you’ve found as cousins, I bet you’ve learned some new things from some of them. Jim

      Like

  8. Hi Jim,
    First – I really enjoy reading your posts when I receive them. I truly appreciate the perspective and logic you provide with BOTH genetic genealogy and old fashioned genealogy. I’ve been testing DNA on my family, when I can, for only 5 years or so. Why I really appreciated this post is because of the small segment to CAs information you shared in it.

    I conducted an “experiment” within Ancestry recently that would probably have some “gasp”–and I don’t mean to be controversial, but between the results I found and your article, it gives me more confidence that the information I found was supportive of a “theory” I’ve tried to prove for many, many years: My ancestor “Nancy” was b 1781 and was married to “John” b. 1774. They lived next to a family that I believe was “Nancy’s” brother. I have no proof of “Nancy’s” maiden name, but because I suspected it was the same as “Henry”, I extended my Ancestry tree as if “Nancy’s” parents were the same as “Henry’s”. Because I tested at Ancestry, I then watched the change in my “Thru Lines” report. One I selected “Henry’s parents” (now listed as my ancestors thru Nancy), it now showed 9 cousins with rates of 6-15 cm matching segments. These cousins were from multiple children of “Henry’s” parents.

    While I know none of this is conclusive evidence–the rate of small segments being true IBD in your article (with your experience) gives me more confidence to try to continue to find proof for my theory.

    Finally—I would never advocate for anyone to place FALSE information in their trees–it was my experiment and I had every intention of changing it back after I was able to see how the “Thru Lines” report changed or didn’t change.
    Tom Spradling

    Like

    • Tom – Thanks for relaying your experience. I, too, worry about putting “iffy” info in my Tree as a test (a lot of folks do it regularly). I try to label it as “my guess” in the suffix field, so others will be aware – it doesn’t always work… But when I broke through a brick wall and found my CUMMIN/GS ancestry, I put it in my Tree and got lots of “hits”. In my case I also Triangulate segments and run Cluster reports – in both cases the CUMMIN/GS Matches stayed in a few TGs and a few Clusters, so I’m now more confident than ever. And, like with other surnames beyond 4C, I get more Matches with that Ancestry, just with small segments (as expected). And, as in your case, when it involves genealogy Triangulation, that is even more evidence that we are on the right track. Jim

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.