Bad Segments – Good Segments

A Segment-ology TIDBIT

There are two ways of looking at small segments. But first please remember that ALL of your own DNA is  true, even the very smallest part of your DNA came from a parent as a true segment. What we are talking about when we discuss “small segments” are small shared DNA segments with a Match – segments which are determined by a computer algorithm comparing your (true) DNA with a Match’s (true) DNA. Below about 15cM some of those comparisons report a false shared DNA segment. The smaller the segment, the more likely that it is false. The distribution curve starts at about 0% false reporting at about 15cM and drops down to about 50% false reporting at 6-7cM and drops down fairly dramatically below that.

In this post “segment” means a computer generated shared DNA segment.

1. Bad Segments: Small segments have a high probability of being false, and there is no easy way to tell if it’s a valid shared segment or not. And, perhaps, even if it’s a true segment, it’s probably from a very distant Ancestor – probably beyond your genealogy. These small segments are called names and referred to as POISON – DO NOT USE! However, in this derogatory sense we are talking about NOT using these small segments as evidence; NOT the basis of a hypothesis; NOT part of a “proof”. However, these segments are may not be worthless…

2. Good Segments. Shared segments are used by each company to identify DNA Matches, and report them to us. As noted above the small segments may be true or false. But what if they lead us to a person who is really related to us = a cousin? If the “Match” has a Tree we can check it out. We can look at the information presented. Finding a Common Ancestor is only part of the possibilities. Maybe this Match-cousin has more information about our Common Ancestor than we do. Maybe they’ve found records we don’t have, written an interesting story, uploaded pictures we didn’t have. Maybe we can establish a dialog (message, email, phone, in person…) I have made lasting friendships with some of my Matches – some of whom we still don’t know how we are related. The possibilities and opportunities are endless.

At AncestryDNA, ThruLines finds cousins with a Common Ancestor, down to 8cM (they used to go down to 6cM). I checked every one of them, and often found new information. With each DNA Match, keep your genealogy cap on. A small segment may in fact be false, but that doesn’t mean there isn’t a true relationship. Remember, about half of your true 4th cousins won’t share any DNA with you. My advice: don’t ignore a true cousin just because you share a small segment. Genetic Genealogists, myself included, have long stated that a Match with a Common Ancestor and a shared DNA segment does not necessarily mean the shared DNA segment came from the Common Ancestor. By the same logic, a relative with a Common Ancestor to you, may or may not have a true shared DNA segment from that Common Ancestor.  

If you are trying to prove a bio-ancestor, or a brick wall Ancestor, or some other relationship using DNA, don’t use small segments. If they cannot be proved to be true segments, they must be ignored as part of a proof. But, on the other hand, don’t ignore a paper-trail relationship just because you share a small segment. Learn what you can from a genealogy perspective and ignore the DNA.

Just my perspective as a long-time genealogist…

[22BD] Segment-ology: Bad Segments – Good Segments TIDBIT by Jim Bartlett 20210930

8 thoughts on “Bad Segments – Good Segments

  1. I would like to present data on my short segment (6 cM & 7 cM) matches from Ancestry, which also have ThruLines (TL), along with my ideas about this. The results tabulated below are for all of my apparently valid TL matches, as presented with four different categories of shared matches, if any. The categories are (from left to right): 1. Total number of TL matches; 2. Number of TL matches with no shared matches; 3. Number of TL matches with one or more shared matches, but the majority of the shared matches seem to be from a different ancestral line than the TL match (referred to as misleading); and 4. At least one shared match, and 50% or more of the shared matches seem to be from the same ancestral line as the TL match (potentially useful).
    Number of Shared Matches
    cM Total None Misleading Useful
    6 32 7 3 22
    7 49 8 6 35

    My primary genealogical goal here is to use the information from short segment TL matches to help identify long segment matches that are part of a cluster. The far right column (Useful) could help, although in some instances the genealogy would be complex; but the column just to the left (Misleading) would yield bad or mostly bad information for my purpose. The Useful column outnumbered the Misleading column by 6.3:1 in my case. I think that the 6 cM and 7 cM TL matches help me reach my genealogical goal. The reason the 6 cM row in the above table has fewer matches than the 7 cM row is mainly because of the 6.00 cM bar (as previously used by Ancestry instead of a 5.5 cM bar). Shared matches are with the 20 cM bar used by Ancestry (lower bar would be better for my purpose).

    The trend in the cM data listed above, although limited, indicates that still shorter segments would probably be useful. The shorter segments would be expected to rely more on indirect detection of shared matches, which can occur on the same chromosome or on a different chromosome than the short segment match. Since the indirect detection of shared matches (previously introduced) can occur with an invalid match (or a valid match), it should even be possible with a 0 cM match, although the relative usefulness of the shared matches of 0 cM TL matches cannot be estimated from the limited data above. Conventional concepts in genetic genealogy (e.g., triangulation), or consideration of short segment matches in isolation, are not as valuable for evaluating the potential usefulness of the shared matches of short or very short segment TL matches.

    Ancestry is unmatched in data size, quantity of trees, and their TL matches, but they do not have a chromosome browser. Use of small segment TL matches is an alternative way to help extract information about long segment matches, and Ancestry is in the best position to do this. Management of data size due to the large quantity of small segment matches (or maybe very small segment matches) would seem to be feasible by periodically filtering out those that do not have TL matches.

    Like

    • Fred, I’m glad to present your data and analysis. As a genealogist of over 45 years, I’ve worked most of that time without DNA. So small segments are not a problem for me (I mostly ignore their size and focus on the Match’s Tree and records). However, I also focus on the Shared Matches – SMs are a strong indicator, and a valuable tool. When I find a small-segment Match who appears to be a 5C, I think of several things: a 5C has a fairly wide range of observed cMs down to 0, and including 6 and 7cM – so it’s in the range. I then look at the SMs – in some cases a clear majority of the SMs are from the same line as the 5C CA – this is very good news; in some cases a clear majority of the SMs are from a very different CA line (go back to the Match and look for that line – including extending their Tree to do so); and in some cases the SMs are all over the place – on both sides of my Ancestry. In each of these cases, I make type the SM indication in the Notes box – usually with an X next to the 5C CA as probably not correct – and I use an Orange Dot (my way of “filtering out” this Match until I have more data). Jim

      Like

  2. Thank you for the statistics and insights!

    With respect to your comments above that “I have concluded that 7cM shared segments which Triangulate are usually true. I’m pretty confident that 7cM segments which don’t Triangulate are false,” what statistics or further insights do you have regarding segments, as reported from the GEDmatch Tier 1 ($) Segment Triangulation tool, below 7 cM?

    7 cM is the current default for general use of the Triangulation tool but lower values can be used via Multiple Kit Analysis. I understand from above that one-to-one matches at 6-7 cm are about 50% false but are triangulated segment matches at 6-7 cM also about 50% false? Stated another way, at what cM range are triangulated segments usually true?

    Like

    • Perry, I have not done any simulations, or tried to pin it down. As you know, DNA is fairly random. Just like trying to decide on an umbrella when there is a 50% chance of rain – in the end, it either rains or it doesn’t. I will say: with larger segments creating a solid TG, the underlying SNPs are equivalent to phased data – they have to line up with the SNPs on one side of your DNA. *If* a 6-7cM shared segment shows as a Match to all the other overlapping segments in the TG at GEDmatch, then I would conclude it has the phased SNPs and is true. But I have no *proof* of that. At some point (and I don’t know that point), there are true small segments that are the same in most humans – they often show up as pile-ups – they could come from many different Ancestors. In this case we shouldn’t call them IBD (Identical because they are from a CA, because they may not be). I have some small segments from Neanderthals… So a word of caution – there is no absolute, and the further you deviate from solid footing, the more reinforcing evidence you need.
      And always check – one-to-one – any results from the Multiple Kit Analysis…
      Jim

      Like

  3. A related question. Suppose you have match with multiple segments. You have a 40 cM match on one segment and a 7.4 on another. Does the 40 cM match change the odds that the 7.4 is IBS? Or are the odds independent of any other match?
    Additionally, if the 7.4 match has no other matches or it forms a group of matches. How does that affect the determination of a good versus bad match.

    Like

    • Cryptoref; The events are independent! Think of it from the DNA’s point of view – the DNA goes through the recombination process with no “knowledge” of the host, or potential Matches or anything else. And what happens on Chr 06 is physically separated from Chr 05. Each segment has an independent journey down to you. And the “segments” we are talking about here are really *shared* segments as determined by a computer algorithm which compares your DNA (which is all true) with a Match’s DNA (which also is all true), and finds overlaps which are the *shared* DNA. The process is completely independent at each event.
      I have worked with a lot of 7cM shared segments – some Triangulate with other shared segments, some do not. We know that about 50% of 7cM shared segments are true, %50 false. So I have concluded that 7cM shared segments which Triangulate are usually true. I’m pretty confident that 7cM segments which don’t Triangulate are false. NB: As with almost all segments – having a Match with a CA and a shared segment (or TG) is OK, but it does NOT necessarily mean the shared segment (TG) came from *that* CA – the shared segment could come from almost any CA – you cannot tell by the two facts (CA,TG) alone – you also need other evidence… Jim

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.