Identifying False Shared DNA Segments

A Segment-ology TIDBIT

I contend that segment Triangulation will identify most of the false shared DNA segments reported from your DNA test. This includes a Match with one segment which is false; as well as a Match with multiple segments, some of which are false. I have Triangulated DNA segments at FTDNA, 23andMe, MyHeritage, and GEDmatch, and found many false segments (segments which did not Triangulate with other overlapping segments). In almost all cases these false segments are under 15cm. I cannot guarantee that all the false segments can be identified this way, but I am confident that most can. Triangulation is a time-consuming process – starting with a download of all your segments from one company at a time; sorting them by Chr and Start, and then working down the list to see which ones Triangulate. I did a blogpost using MyHeritage as an example for segment Triangulation: https://segmentology.org/2020/12/29/triangulating-your-genome/

Warnings: this takes weeks; there are some Triangulations that are difficult – just skip over these; a few might slip through the cracks; there may be some bare spots in your DNA.

Special Note: Although a Match’s segment(s) may be false, that does not mean the Match is not a cousin. This looks like a double negative, so let’s phrase it this way: a Match may be a cousin and not share any DNA segment with you, or they may share a false segment with you. In fact, about half of your true fourth cousins (4C) will not share a DNA segment with you.

Recently, at the Genetic Genealogy Tips & Techniques facebook group, there was a post looking for ways to identify Matches at MyHeritage which are random junk. Segment Triangulation would identify a lot of false segments. However, at MyHeritage, it might be efficient to just download all segments, focus on those below 15cM (where most false segments would be) and work down the list to see which Match segments don’t have a TG Icon. Still a lot of work… If anyone tries this, please post about your experience – we can learn from each other. 

[22BV] Segment-ology: Identifying False Shared DNA Segments TIDBIT by Jim Bartlett 20230618

15 thoughts on “Identifying False Shared DNA Segments

  1. Pingback: Genea-Musings: Best of the Genea-Blogs - Week of 18 to 24 June 2023

  2. I don’t have a science background, and I’ve only been working with DNA mapping for the past year and a half.

    Anyways.. Not having a lot of experience has never stopped me from wanting to test ideas out for myself … and forming an opinion.

    And a few weeks back, I spent the better part of a few days in an attempt to objectively measure whether or not segment based triangulation is an effective way to identify very small false matches, especially if the triangulation group contains a parent and child who match on the same segment. (I don’t usually have any reason to use small segments, but sometimes, in the right location they can seem to be important clues.)

    While a shared triangulated match between a parent, and a child, and a third party would seem to make a false match unlikely, there seems to be some uncertainty whether a combination of maternal and paternal DNA on the side of the other match, might have the ability to form consistent false matches with valid matches in a larger triangulation group.

    This small study was an attempt to test this out, and the result seems to demonstrate what many of us have observed. False matches usually don’t have the ability to repeatedly pull this off, and they tend to be chance matches that are not consistently repeated.

    Methodology
    I used the kit of an adult child who is on Gedmatch, and collected all of their 5 to 7 cm matches found in their 100 top matches. I then compared all these matches to their 2 parents who are also on Gedmatch. All 3 were tested through ftDNA.

    I put matches of the child in the category of “Most Likely Valid” if this match on the same small segment, was shared by one of the child’s parents.

    And I put matches of the child in the “Most Likely False” category, if neither parent shared this match.

    Using this method I was able to collect 24 seemingly false matches, and 18 seemingly valid matches, in the 5 cm to 7 cm range.

    I then did everything I could to find matches shared by both the parents and the child, in the same location as the child’s valid and false matches.

    To do this I used both the Gedmatch Tier One segment search tool, and various searches using the paid version of Gedmatch’s Matches One or Both tool, which can generate a compact match map showing segment locations. I then checked all the segments found by these methods, to see if they matched both the parent and child. Ideally I would have liked to find at least 5 maternal and paternal matches that completely encompassed the area of each false segment. I wasn’t able to do this, though if this was a personal project, where I had full access to matches on Gedmatch, 23andme, My Heritage and ftDNA, I would be much better able to fill the gaps with enough matches to be sure of a consistent pattern for each triangulation group.

    I was able to find 95 valid matches in the same location as one of these 24 matches I had identified as most likely false, as the match in the child was not shared by either of the child’s parents.

    All 95 of these valid matches had an unambiguous overlap with the location I had identified as the child’s false match with the other party. Of these 95 valid matches, shared by the parent and child, only 7 of these seemingly false matches were able to match one other valid segment in the triangulated group in that location.

    But this was almost always a one off, and the other segments in the triangulated group couldn’t replicate this match with the seemingly false segment.

    And the small false segments that did manage to make a match, came with some really really obvious red flags.

    So here is a tour of the 7 false matches I found that had the ability to form a false triangulated match…

    On chromosome 2 the child had a 5.3 cm false match, and there was a 5 cm match between the false match of the child, and one of the Mother/ child’s valid matches, BUT… the Mother has 3 other valid triangulated matches in the same location that do not match the false match, which flags this match as unreliable.

    On chromosome 3, the child’s 6 cm false match also matched one of the Father/child’s 5 valid triangulated matches, at 6.1 cm, BUT again, the child’s false match only matched one match of one of the father’s, and didn’t match the other 4 valid triangulated parental matches, which again shows there is a problem with this match.

    On chromosome 4, the child’s 5.2 cm false match also matched one of the Father/ child’s 5 valid triangulated matches, at 5.6 cm, BUT the other 4 valid matches did not match the false match, again showing this tiny standalone match as unreliable.

    Another one of the child’s seemingly false matches was on chromosome 15, and this false 5.2 cm match in the child, managed to match 3 of the Mother’s matches if I dropped the requirement down to 3 to 3.5 cm, BUT… it also matches the Mother at 3.3 cm, suggesting maybe this match in the child isn’t entirely false? ( And I would not normally even consider using such a small match)

    Last but not least, in a well known pile up area on chromosome 22 that frequently appears to be linked to some other dimension, the child had a 6.8 cm ? false match, to both a brother and sister. Neither parent had this match, even if I drop the cm down to 3, but 2 of the Father’s 4 valid matches in this spot, also match the brother, one of the Father’s valid matches in this spot matches the sister but not the brother, and another one of the Father’s valid matches in this spot match both the brother and sister…. But the lack of consistency in the false matches ability to form matches in this location, and that this spot generally has a lot of very odd matches, again seems to be big clues for any researchers who want to avoid false matches.

    The child also had two 5.6 cm false matches that begin and end in the well known pile up area on chromosome 1, and another two false 5 cm false matches that begin and end in the well known pile up area on chromosome 9. I couldn’t find where either parent had any legitimate matches in this area, and neither of these 2 pairs of false matches match each other.

    In searching for valid paternal/ maternal/ child matches in specific locations, one of the interesting things I noticed is there were locations where the child appeared to have many many many more false matches than valid ones, and many of these false matches were larger 7 cm to 12 cm segments. One of these locations was the pile up area with the false ? brother and sister match on chromosome 22, and the child appears to have another spot like this at the beginning of chromosome 12. Which may suggest specific locations may have as much to do with false matches as the size of the segments?

    16 of the 18 seemingly valid parent/ child matching segments had sufficient overlap to match 40 other valid Father/ child segments and 13 Mother/child matching segments. And all of these valid Father/child or Mother/child segments matched each other.

    In conclusion, this small easily replicated study seems to support the hypothesis that although false matches between 5 cm and 7 cm are very common, and they are false matches more often than not, these false matches usually do not have the ability to consistently match all of the other triangulated matches required to form a valid triangulated match group.

    I am still learning and if there is something in my methodology or reasoning and interpretation that is flawed, I would appreciate learning about this, as I know others reading here have skills I don’t, and also a lot more experience using DNA as a genealogical tool!

    Liked by 1 person

    • D. Firstl, you are a kindrid spirit – willing to do the work to develop and test a methodology. Thank you!
      Second, I have to mention that a parent and child and a Match cannot be used to form a Triangulated Group. As in surveying and navigating, segment Triangulation relies, in part, on having three widely separated individuals. A parent and child don’t meet this criteria. Almost by definition the child has the same DNA as a parent. The child cannot be used to form a Triangulation because the child and parent are essential both the same side of the triangle. The general rule is at least 3 cousins must be in the Triangulation (after that you can add children, grandchildren, etc to the TG, but the base segment must be formed with some separation.
      If your main conclusion is that a false segment will not, generally, match others, I agree. These false segments in small segments are often formed by “zigzagging” between maternal and paternal SNPs along an area of a chromosome to match the SNPs of someone else (who has a true segment of SNPs along one chromosome (totally maternal OR paternal). The person with the “zigzag” segment appears to match someone else, but the segment is not real, and it did not come, intact, from an Ancestor – it’s not Identical By Descent (IBD). The false (zigzag) segment is not real – it’s a fabrication of a computer matching algorithm. Hope this helps, Jim

      Like

      • Jim thanks for your feedback! I should have made it clear the match or non match between the parent, child and “other” was just my preliminary way of sorting out matches that were most likely false, so I could see if the false matches had obvious problems triangulating. None of the false matches could consistently triangulate with all the other matches in the group. But I agree that true triangulation requires a larger group and knowing how several of the people in the group are related is always a good start to figuring out there rest of the puzzle!

        Like

    • D.
      Again – an interesting study. To paraphrase your conclusion: a presumed false segment in this study, remained a false segment in Triangulation – inferring that, perhaps, false segments found through Triagulation are truely false segments – they are not from a Common Ancestor. waddayathink? Jim

      Like

      • I have been mapping out the matches of my great grandmother’s brother’s daughter, and the first thing I had to do was sort out her maternal side matches from her paternal side matches, as her paternal side all come from my gr gr grandparents. I now have triangulated groups covering 97% of her paternal side chromosomes, and most of her maternal side too. And for most of these groups I know which side is paternal and which is maternal and which branch of the family the TG groups are related to. I figured this out using triangulation or the lack of it. What I began to notice doing this, is occasionally there would be a match that didn’t match either side, and often I knew the tree of this match, that was valid on other segments, had no known connection to either the paternal or maternal triangulated groups in that location. I am reasonably sure these are just random glitch false matches. On the other hand, I have a group of small interconnected matches that are part of a larger group, and these do all match others in the group and also fit with what I know about the trees, in a meaningful way. Leading me to believe triangulation, or the lack of it, is usually a reliable indicator of whether a small match is false or valid. This small study was an attempt to objectively measure this, and seems to confirm this is usually true.

        Like

  3. Agree wholeheartedly Jim. It’s an ongoing exercise to sort out those false segments and can be disappointing when you figure out someone you expected to match shares a false segment. But as you say its all about expanding the genealogy! Common ancestors at Ancestry can be great clues but many of those more distant ones aren’t always true genetic cousins – but it doesn’t matter unless someone is trying to argue because they have a genealogical connection they must also have a genetic connection! I continue to work through all my TG segments for my Mum and her 3 siblings – hoping to eventually piece together their missing GGP b1841. The only way to do it when you are working from the unknown that far back! I use GDAT so have all my data loaded in one place, I started with MH as you suggest but then work on the other sites for the same chromosome. Often there are clues lurking in bridge matches which can be very helpful.

    Liked by 1 person

    • Thanks for the feedback – you have a great insight into segmentology. I’m not sure how DNA almost highjacked our genealogy hobby to focus only on DNA-linked relatives. We should all take a step back and think about our real objectives. Although most of our Ancestors back 7 or 8 generations will provide part of our DNA, somewhere beyond that level some of our true Ancestors will not be DNA contributors to ourselves. In the meantime, our TG segments cover a lot of our Ancestry, and there is a lot of work in “painting” that full picture. If I were starting over, I’d be using MS Access instead of several MS Excel spreadsheets. I’ve tried GDAT, twice, but am too impatient; and I want to venture beyond what others have programmed.
      I like the term “bridge Matches” – or maybe “bridge Segments” which highlight two TGs which are probably from a husband and wife… Jim

      Like

      • Understand what you mean by a bridge segment being from an ‘ancestral couple’. However, when I said ‘bridge match’ I meant a match in a MH TG that is also in a GEDmatch TG etc. Whilst not true Triangulation you can theorise that the TG in the same segment area at the other site would probably be the same group if they were all on the same site. Clues generated by one can help the other!

        Like

  4. https://isogg.org/wiki/Cousin_statistics gives stats for 4th cousins. We have about 1500. Enough to make searching for segment sharing 4th cousins worth while….if you are interested in such. Triangulation is too time consuming for me. I will accespt that a documented 4th cousin, might have a shared segment with me that shows up as an IBD match. If it is false, that person is still a 4th cousin….and will likely share a segment with another 4th cousin.

    Like

    • Miles – you are a true genealogist at heart. I, too, don’t really care if a new cousin shares DNA with me, or not – it’s the records and family stories they may share that is important. And you are 100% correct – even though we don’t share a DNA segment with about half our true 4Cs, each of them will share with about half of their 4Cs (if they’ve tested). I think that’s why we should be encouraging much more collaboration among cousins. Jim

      Like

    • Miles
      Also…. thanks for the link. Two items of interest from ISOGG – (1) the number of cousins increases dramatically. (2) however, with the decreasing amount of shared DNA with more distant cousins, the number of “detectable” increases modestly. The “detectable” estimate is 700 5C; 943 6C; 1320 7C; and 1416 8C – interesting that the number confinues to increase. I just checked my Common Ancestor spreadsheet: 1360 5C; 2313 6C; 260 7C; and 19 8C. ThruLines is doing a great job at the 5C and 6C level; and I’ve got a lot of digging to do at 7C and beyond – those cousins are out there, “detectable” among the small segments!. [NB: my totals are slightly inflated due to duplicate Matches from different companies; and a few that are iffy and I haven’t culled out yet – but the order of magnitudes are valid. I think the “detectable” estimates listed at ISOGG are low. I’m now curious as to the numbers others have for say 6C at ThruLines (only the cousins at the 5XG grandparent couple level)] Jim

      Like

  5. I have shared your myheritage post with others. But not all these people are using MH. I have received comments that understanding triangulation is somewhat complicated and that they wish you did a post for 23andme and familytreedna and gedmatch. The basic concept is the same but the website differences are throwing some people off. So if you have time you could write how you would do on the other websites.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.