How small can we go with triangulation?
We have anecdotal information to indicate “almost all” shared segments above 15cM are Identical By Descent (IBD). There is always a tail on the distribution curve of random events, so we cannot say 100%.
From my experience (mapping over 90% of my chromosomes) I am confident that triangulation can tighten the distribution curve so that “almost all” segments down to 7cM in a Triangulated Group (TG) are IBD. I say this because I find some 7-10cM shared segments which do not triangulate with the TG on either the maternal or paternal sides. Although several segments in each TG triangulate with each other, some shared segments, with the same “address”, do not. To me this is proof positive that these shared segments which do not triangulate must be Identical by State (IBS), meaning, in this case, not-IBD. And the number of such 7-10cM shared segments which don’t triangulate, and are thus IBS, seems to generally agree with the percent IBS in the ISOGG/Wiki: http://www.isogg.org/wiki/Identical_by_descent
However, the fact that the segments in a TG do triangulate does not, in my mind, provide a 100% guarantee that they are all IBD. The same is true for a random shared segment in the 10-15cM range – most, but not all, are IBD. But in the aggregate, when we have say 20 shared segments in a TG, usually of various cMs, this pretty much defines that area of the chromosome as coming from an ancestor. If 1 or 2 of those triangulated shared segments turns out to be IBS, it’s not harmful in the grand scheme – we are looking for a Common Ancestor (CA) for the TG, and generally find only a few Matches in the TG who have a robust enough Ancestral Tree to help with this goal. We are looking for several such Matches to confirm the same CA. Having a close cousin in the TG, increases our confidence in the CA. As our Match list doubles over the next 12 months, so too should the number of Matches in each TG, adding to the preponderance of evidence for both the TG and CA. The key is that several distant cousins all agree on the same CA for the TG – this, too, adds to our confidence level.
My chromosome mapping has resulted in about 350 defined TGs which are adjacent to each other (“heel and toe”) – covering long stretches of each of my 45 chromosomes, with only a few bare spots over 10cM. All new Matches have shared segments which easily “fit” into, and triangulate with, existing TGs – except a small percentage in the 7-10cM range which don’t and are then labeled IBS. This has also added to my confidence that triangulation, down to 7cM shared segments, is a good process. The outline of my chromosome map is coming into sharper focus, with fairly well defined crossover points, and ambiguities are fading away.
With this “success”, I’ve been including shared segments in my analysis down to 500 SNPs and 5cM by adjusting the thresholds at GEDmatch. Almost all of the 5-7cM segments do NOT triangulate, and are thus IBS. A few do triangulate – guessing at about 5-10% range. This seems reasonable to me as there are 5cM shared segments which are IBD. I’m adding these into my TGs, but color coding the small cM value to highlight it. To date I cannot recall any which have resulted in a confirming CA. Most of these 5-7cM IBD segments may well be from an even more distant CA… I also include shared segments down to 5cM from close, known cousins. Most are also IBS, but a few of them, so far, agree with the TG CA, and are probably IBD.
The problem is we don’t have a good test for IBD vs IBS. Some have used results from phased data to develop rough percentages for IBD/IBS ratios vs cMs for shared segments. See http://www.isogg.org/wiki/Identical_by_descent I’ve seen no distribution “curve” yet. We don’t have such data for triangulated segments, so we really don’t know what effect triangulation has. Triangulation depends, in part, on using long shared segments. This, coupled with widely separated cousins who got exactly the same long segment, increases the odds that the shared segments are IBD. These two factors (length of segment and a match) combine to increase the probability of IBD. But as we decrease the shared segment size, we reduce that factor. We don’t know, yet, by how much this affects the curve.
Clearly, very small segments (under 5cM) are much easier to match, although most are IBS. Also, many of these very small segments will also triangulate. Triangulation is not a guarantee of IBD. We cannot use triangulation to prove triangulation. In other words, if segment length is a key factor in triangulation, we cannot say that triangulation itself proves smaller shared segments are IBD – it’s a circular argument. We need more corroborating data.
I am hesitant about establishing “rules” for segment sizes for triangulation. We are dealing with distribution curves – with tails. We have not yet drawn these curves, but at some point (as the segment size is reduced), the false positives will occur, even with triangulation. I am confident that triangulation shifts the IBD/IBS-vs-cM distribution curve “to the left”. Triangulation definitely culls out many (most?) IBS segments in the 7-10cM range. Thus the IBD/IBS ratio for a given cM must increase. To what extent is yet to be determined.
Triangulation is a tool. Use judgment when using it.
For me, shared segments below 5cM are uncharted territory for triangulation. I am confident of a Triangulation “guideline” for shared segments down to 7cM. Based on my experience with most segments in the 5-7cM range being IBS, I’m now fairly confident that triangulation also works down to 5cM. At the least, triangulation culls out most of the IBS shared segments. I think most of the few remaining 5-7cM shared segments which triangulate are IBD. For me, it’s at least worth the chance to include them in a TG and enlist the help of those Matches in finding the CA.
13 Segmentology: Small Segments and Triangulation by Jim Bartlett 20150930
I understand that 3cM is small to be very significant. I am comparing my DNA to what I believe is aa distant cousin. I get 8 segments matching that between 3 and 3.7. Is the number matches significant.
Chr B37 Start Pos’n B37 End Pos’n Centimorgans (cM) SNPs
2 8,563,326 9,912,089 3.7 224
6 88,497,286 90,695,246 3.2 406
9 86,679,251 89,022,298 3.3 439
11 18,950,092 19,861,308 3.2 229
12 76,593,029 79,027,420 3.4 426
15 98,081,701 98,898,952 3.3 225
18 4,204,048 5,102,747 3.1 220
22 27,796,846 29,130,458 3 277
Would appreciate your incite.
There are a few things at work here:
1. You are starting with a specific person – you can pick almost any person and find some 3cM segments with them. That doesn’t mean any of those segments came from the Common Ancestor you want.
2. This also doesn’t mean that any of the shared segments are true – the odds are that they are almost all false. To test this, try to build 8 Triangulated Groups – one TG that includes each of your shared segments.
3. Longer story…. any two humans share over 99% of their DNA. The key to autosomal DNA testing for genealogy, is to use segments that are long enough to insure they are IBD (Identical By Descent – i.e. they came from a Common Ancestor). Even with a cutoff at 6 or 7cM, those “shared segments” are about 50% false, and the percentage drops a lot with smaller segments. The the tests for true/false segments are phasing (or comparing with both parents) or building up a somewhat larger TG around each such segment and verifying that any included 2C, 3C, 4C, 5C are all on the same ancestral line.
4. Since 3-4cM shared segments are generally considered “noise”, I would not put any extra credence in the fact that there are 8 such segments. It would be very unusual to receive 8 segments from a distant cousin anyway.
5. Just as it is very difficult to prove your shared segments, it is also hard to disprove them (except as in #2 above). So some of those segments may be true. And, all other things being equal, they may well be from a different Common Ancestor than the one you are hoping for.
6. Sort version – I don’t think the number of segments is significant, and I “know” that this data set is not nearly enough to stand on it’s own. Sorry.
Thankyou very much for your comments. The hunt will continue.
Magda – The pile-up article has already been posted! Hope it helps you.
I am looking forward to the pile up article too as a have a crazy one on Chromosome one !
Pingback: Missing and Small Matches | IowaDNAProject
Jim – As always I enjoyed the article. You and I are a 7.8cM match on GEDmatch (I’m M222141). Normally I wouldn’t ask for such a small segment, but I’d be willing to try to find the CA if you are. I have a solid tree and also a lot of DNA data, though this segment doesn’t match any of my known cousins.
Rich, thanks for the feedback. Our shared segment is IBS – it doesn’t triangulate with my TGs on either side. In fact it’s in a pile-up area for me with many other such segments. I’m working on a blog post about pile-ups now.
Thanks Jim. I look forward to the pile-up article.
June – none of the companies can tell which allele is from which parent – the alleles are listed in alphabetical order. It can be done with a parent’s data and software (or you can check each of the 700,000 SNPs if you have the time and inclination. Read the color codes at the top of the Gedmatch browser. Only siblings will show some areas of FIR (full identical region). Randomly you may get a few markers between Matches which are FIR, but not nearly enough for a shared segment. What we look for are HIRs (half identical regions) – they are long stretches with at least one matching allele for each SNP. In the short run, it’s possible to combine parts from each parent and get a matching (shared) segment, but it doesn’t come from a shared ancestor so the HIR is IBD or IBC (not IBD). GEDmatch and the other companies (except AncestryDNA) do the analysis for you and report these shared segments over the threshold. Look for the blue bars and the tables of data for HIRs at GEDmatch. There is no info in any company’s browser to indicate if the HIR is IBD or from which parent it comes. That’s where the Triangulation is helpful.
As I understand it, the Gedmatch triangulation tool does not pay attention to whether a segment matches on both or just one allele. Looking at the picture than can be requested when you do a one to one I see some long segments that match on both alleles and others where it keeps switching back and forth between one and two along the segment. Is this telling me anything? For example suppose I do a one to one with graphic on each of two kits to me and then a one to one with each other. Is there something in the graphic of all of these that tells me something about the reliability of the match?