This article will describe the various attributes of a Triangulated Group (TG). Some have noted that I use the term TG to describe both a Group of Matches as well as an ancestral segment. Well… yes, I do. Read on.
Once established, each TG has certain attributes which can be used to describe and/or define the TG:
A1. A TG is a group of shared segments from Matches. We often think about the Matches in a TG. They have a Common Ancestor. They can be contacted and encouraged to collaborate on finding the common ancestry. So in this sense a TG is a group of Matches. However, note that any of these Matches could, potentially, also share the same or a different ancestor with you on another segment (in a different TG)
A2. A TG occupies a specific physical space on a chromosome. It is, in effect, a segment in its own right – a segment from one of your ancestors. The TG is on one chromosome with a start location and an end location. These start and end locations are determined by the matching, overlapping, shared segments (from Matches) within the TG. Please review: Anatomy of a TG.
A3. As a segment, a TG has a specific string of SNPs on one chromosome. These would be the same SNPs in a segment on your chromosome, and on the segment each one of your Matches shares with you in the TG. All the SNPs would be the same. The SNPs have to be the same for IBD segments to match.
A4. A TG is the equivalent of phased data. The TG represents an ancestral segment on a chromosome. All of the SNP values (alleles) are on one chromosome, and are the same SNP values you got from your mother or father (depending on the chromosome). What you have in a TG (segment) is exactly what you would have with phased data.
- We don’t see the actual ACGT values in a TG that we would get with true phasing (with a child-parents trio), but they are the same values in the TG. The TG segment represents part of one of your chromosomes – it must have the same ACGT values that your parent passed to you on that chromosome.
- We can treat the TG as phased data, and any other shared IBD segment which Triangulates with the TG will have the same SNP values.
- This is true, even if you have formed a TG (with matching, overlaying, shared segments from Matches) and do not have the genealogy to determine which side it is on. You can be confident that the TG exactly matches the DNA on one of your two chromosomes. In this case the TG is not entirely equivalent to a true phased segment. But, if you had the phased information, you’d already know which side the TG was on. And very often you can determine the side of a TG by imputation – by determining which side it’s not on; or by the admixture of the segment.
A5. Technically, each TG has a cM value. However, it usually takes a lookup table to determine the cM value for a segment on a given chromosome, between two points. This is what the testing companies and GEDmatch do for each shared segment they report. It’s a lot of work for genetic genealogists – and, in general, our TGs will morph over time as new shared segments are added, and the TG cMs will need to be adjusted. However, we can fairly easily make rough estimates of the TG cMs, which are plenty good enough for genealogy:
- Subtract the TG start location from the end location to get the number of base pairs (bps), divide by 1,000,000 to get Mbp, which is roughly equal to the number of cMs.
- Or, eyeball the cMs of the larger shared segments in a TG and extrapolate to the full TG (if you’re lucky, you may have a shared segment which nearly fills the TG)
- Note that the cM is a fuzzy value anyway – it’s empirically derived (an average of many observations), and it’s an average of the female and male averages. So don’t go to too much trouble, and use round numbers. AND note that there is a wide range of possibilities when trying to use cMs to determine approximate cousinships. See Blaine Bettinger’s chart for the ranges of cMs vs cousinships.
A6. TGs have fuzzy ends. There is no “signpost” in our DNA to identify crossover points, or where a shared segment starts or ends. The company algorithms estimate shared segments by looking for areas of DNA that are identical, and it then continues until the DNA is not identical. This results in some, usually small, amount of overrun (a longer segment than is actually there from a Common Ancestor). So my convention is to use the start location of the first shared segment in a TG (the one with the lowest start in bp). This is the start location of the TG. The end location is determined several ways:
- If there is no overlap with the next TG on that chromosome, then use the largest end location of all the shared segments in the TG – the shared segment which runs the farthest. This often is not the last shared segment in a sorted spreadsheet – you need to look at them all.
- If there is a small, fuzzy overlap of 1-2Mbp with the next TG, I use the start location of the next TG, and accept the fact that there is a fuzzy overlap. We don’t need to be real precise for genealogy. Each TG represents a large block of DNA from an Ancestor – the fact that the edges of the block may be fuzzy should not obscure the big picture: the main TG segment came from an Ancestor!
- If there is a large overlap with the next obvious TG (almost always from a large shared segment with a close cousin, which probably spans more than one TG), I start a new TG at the obvious point dictated by the next group of shared segments, and use the same point as the end location of the first TG. This involves judgment – there is no hard rule – and the data will usually indicate where to start a new TG. Just accept that close cousins may share large segment which span more than one TG.
- If the shared segments in the TG all end a “few” Mbp before the next TG starts, I will just round up, and use the start location of the next TG as the end location. Again, use judgment.
- If there appears to be a large enough gap between two known TGs, I create a “dummy” TG to fill the gap. And then I keep looking for some Matches with shared segments to fill that gap. At this date my dummy TGs are about 7% of my DNA.
- Using these conventions will result in TGs that are adjacent to each other over all your chromosomes. Even with some “dummy“ TGs, this process will organize all of your IBD Match segments into TGs over all of your DNA. When done, this is a happy day! You can then focus on TGs that should link to specific ancestors. And all new Match segments will generally fit easily into existing TGs.
A7. As you work this process of forming TGs and assigning them to sides using genealogy, you are creating a chromosome map. As new Matches are posted, their shared segments may adjust the start and/or end locations of the TGs. When you get lucky, you’ll find a new shared segment that fills a “dummy” TG.
A8. Naming TGs. I label each TG (and all the shared segments in it) with a short code – like 07C25. The 07 means Chr 7; C means the TG starts within the third group of 10Mbp – in this case between 20-30Mbp; 25 means this TG ion on my father’s (2), mother’s (5) side – using Ahnentafel numbers. If I were starting over I’d use 07.027PM to indicate Chr 7; start 27Mbp; on Paternal, and his Maternal ancestry. Note: before you do any assigning, the label might be 07.027, then add P or M when determined. I usually add this code in the subject line of emails and messages – it just helps me keep organized.
A9. Also note that each TG also has within it, many genes. Each of your 22,000 or so genes will have a specific location on your DNA. If you become curious about any particular gene, you can look up where it is located (chromosome and location). You will have two of each gene, one on your maternal chromosome and one on your paternal chromosome. You can then look at your chromosome map of TGs and see which maternal TG and which paternal TG it’s in. If you’ve determined a Common Ancestor for those TGs, you’ll know which ancestor passed that gene down to you. You can also add a gene to your spreadsheet, so that it sorts with all the segments and TGs. Examples: Short Sleepers gene BHLHE41 would be 12.026M and 12.026P (very close to LRRK2 (Parkinson’s) at 12.031. Also my Neanderthal segment is 10.130 (I don’t know which side)
Have fun with your TGs!
15B Segment-ology: The Attributes of a TG by Jim Bartlett 210919