This article will describe the various attributes of a Triangulated Group (TG). Some have noted that I use the term TG to describe both a Group of Matches as well as an ancestral segment. Well… yes, I do. Read on.
Once established, each TG has certain attributes which can be used to describe and/or define the TG:
A1. A TG is a group of shared segments from Matches. We often think about the Matches in a TG. They have a Common Ancestor. They can be contacted and encouraged to collaborate on finding the common ancestry. So in this sense a TG is a group of Matches. However, note that any of these Matches could, potentially, also share the same or a different ancestor with you on another segment (in a different TG)
A2. A TG occupies a specific physical space on a chromosome. It is, in effect, a segment in its own right – a segment from one of your ancestors. The TG is on one chromosome with a start location and an end location. These start and end locations are determined by the matching, overlapping, shared segments (from Matches) within the TG. Please review: Anatomy of a TG.
A3. As a segment, a TG has a specific string of SNPs on one chromosome. These would be the same SNPs in a segment on your chromosome, and on the segment each one of your Matches shares with you in the TG. All the SNPs would be the same. The SNPs have to be the same for IBD segments to match.
A4. A TG is the equivalent of phased data. The TG represents an ancestral segment on a chromosome. All of the SNP values (alleles) are on one chromosome, and are the same SNP values you got from your mother or father (depending on the chromosome). What you have in a TG (segment) is exactly what you would have with phased data.
- We don’t see the actual ACGT values in a TG that we would get with true phasing (with a child-parents trio), but they are the same values in the TG. The TG segment represents part of one of your chromosomes – it must have the same ACGT values that your parent passed to you on that chromosome.
- We can treat the TG as phased data, and any other shared IBD segment which Triangulates with the TG will have the same SNP values.
- This is true, even if you have formed a TG (with matching, overlaying, shared segments from Matches) and do not have the genealogy to determine which side it is on. You can be confident that the TG exactly matches the DNA on one of your two chromosomes. In this case the TG is not entirely equivalent to a true phased segment. But, if you had the phased information, you’d already know which side the TG was on. And very often you can determine the side of a TG by imputation – by determining which side it’s not on; or by the admixture of the segment.
A5. Technically, each TG has a cM value. However, it usually takes a lookup table to determine the cM value for a segment on a given chromosome, between two points. This is what the testing companies and GEDmatch do for each shared segment they report. It’s a lot of work for genetic genealogists – and, in general, our TGs will morph over time as new shared segments are added, and the TG cMs will need to be adjusted. However, we can fairly easily make rough estimates of the TG cMs, which are plenty good enough for genealogy:
- Subtract the TG start location from the end location to get the number of base pairs (bps), divide by 1,000,000 to get Mbp, which is roughly equal to the number of cMs.
- Or, eyeball the cMs of the larger shared segments in a TG and extrapolate to the full TG (if you’re lucky, you may have a shared segment which nearly fills the TG)
- Note that the cM is a fuzzy value anyway – it’s empirically derived (an average of many observations), and it’s an average of the female and male averages. So don’t go to too much trouble, and use round numbers. AND note that there is a wide range of possibilities when trying to use cMs to determine approximate cousinships. See Blaine Bettinger’s chart for the ranges of cMs vs cousinships.
A6. TGs have fuzzy ends. There is no “signpost” in our DNA to identify crossover points, or where a shared segment starts or ends. The company algorithms estimate shared segments by looking for areas of DNA that are identical, and it then continues until the DNA is not identical. This results in some, usually small, amount of overrun (a longer segment than is actually there from a Common Ancestor). So my convention is to use the start location of the first shared segment in a TG (the one with the lowest start in bp). This is the start location of the TG. The end location is determined several ways:
- If there is no overlap with the next TG on that chromosome, then use the largest end location of all the shared segments in the TG – the shared segment which runs the farthest. This often is not the last shared segment in a sorted spreadsheet – you need to look at them all.
- If there is a small, fuzzy overlap of 1-2Mbp with the next TG, I use the start location of the next TG, and accept the fact that there is a fuzzy overlap. We don’t need to be real precise for genealogy. Each TG represents a large block of DNA from an Ancestor – the fact that the edges of the block may be fuzzy should not obscure the big picture: the main TG segment came from an Ancestor!
- If there is a large overlap with the next obvious TG (almost always from a large shared segment with a close cousin, which probably spans more than one TG), I start a new TG at the obvious point dictated by the next group of shared segments, and use the same point as the end location of the first TG. This involves judgment – there is no hard rule – and the data will usually indicate where to start a new TG. Just accept that close cousins may share large segment which span more than one TG.
- If the shared segments in the TG all end a “few” Mbp before the next TG starts, I will just round up, and use the start location of the next TG as the end location. Again, use judgment.
- If there appears to be a large enough gap between two known TGs, I create a “dummy” TG to fill the gap. And then I keep looking for some Matches with shared segments to fill that gap. At this date my dummy TGs are about 7% of my DNA.
- Using these conventions will result in TGs that are adjacent to each other over all your chromosomes. Even with some “dummy“ TGs, this process will organize all of your IBD Match segments into TGs over all of your DNA. When done, this is a happy day! You can then focus on TGs that should link to specific ancestors. And all new Match segments will generally fit easily into existing TGs.
A7. As you work this process of forming TGs and assigning them to sides using genealogy, you are creating a chromosome map. As new Matches are posted, their shared segments may adjust the start and/or end locations of the TGs. When you get lucky, you’ll find a new shared segment that fills a “dummy” TG.
A8. Naming TGs. I label each TG (and all the shared segments in it) with a short code – like 07C25. The 07 means Chr 7; C means the TG starts within the third group of 10Mbp – in this case between 20-30Mbp; 25 means this TG ion on my father’s (2), mother’s (5) side – using Ahnentafel numbers. If I were starting over I’d use 07.027PM to indicate Chr 7; start 27Mbp; on Paternal, and his Maternal ancestry. Note: before you do any assigning, the label might be 07.027, then add P or M when determined. I usually add this code in the subject line of emails and messages – it just helps me keep organized.
A9. Also note that each TG also has within it, many genes. Each of your 22,000 or so genes will have a specific location on your DNA. If you become curious about any particular gene, you can look up where it is located (chromosome and location). You will have two of each gene, one on your maternal chromosome and one on your paternal chromosome. You can then look at your chromosome map of TGs and see which maternal TG and which paternal TG it’s in. If you’ve determined a Common Ancestor for those TGs, you’ll know which ancestor passed that gene down to you. You can also add a gene to your spreadsheet, so that it sorts with all the segments and TGs. Examples: Short Sleepers gene BHLHE41 would be 12.026M and 12.026P (very close to LRRK2 (Parkinson’s) at 12.031. Also my Neanderthal segment is 10.130 (I don’t know which side)
Have fun with your TGs!
15B Segment-ology: The Attributes of a TG by Jim Bartlett 210919
Pingback: Phasing Your Ancestors’ DNA? | segment-ology
I would be interested in finding a list of the human genes and their location on each chromosome. Do you know of a website that might have this information ? Thank you.
Rich – Try this one: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr12%3A26272959%2D26278060&hgsid=573865629_7RtaaBAAPAVj0qdO2EUmQ7V8Gedq
I’ve put together a new version of Double Match Triangulator that now displays Double Match and Triangulation Groups. I used many of your ideas and techniques to do it. Please try it out and let me know what you think and where I might be able to improve it: http://www.beholdgenealogy.com/blog/?p=1807
Great article Jim.
… actually, in Step 3, you do go back to catch that earlier one at 31.9. But then you are marking the end of the previous paternal segment 07.027P to be at 31.9 rather than at 38.4. This is tricky stuff. Hopefully you can understand what I’m confused about.
Louis – I’ve been at this for 6 years – and, yes, I can understand the confusion. Developing a chromosome map is often an iterative process. However, a program that automatically makes all the pairwise comparisons in a spreadsheet, would be awesome.
I’m going to program your algorithm into my Double Match Triangulator and use your “07.027” labeling to identify the Triangulation groups. Unfortunately a program cannot use “judgement”, so I’ll try to do something appropriate.
There’s one part I don’t get though. Doesn’t everyone have TGs from each half of the chromosome that overlap. I don’t see that your 6 steps above will handle that.
e.g. 07.027P from 27.2 to 38.4, 07.038P from 38.4 to 52.6 from the paternal side,
and 07.031M from 31.9 to 41.3 and 07.041M from 41.3 to 58.5 on the maternal side.
Your algorithm would first identify 07.027P and then we’d be at 38.4. You can now easily catch the 07.038P, but don’t you first have to go back to 31.9 and catch the overlapping 07.031M maternal TG?
Louis, you can look at that way, zigzagging back and forth between maternal and paternal chromosomes. But in practice, as you work through all the shared segments on one chromosome number, there will be natural break points, which tell you where one TG ends and another starts. But if you don’t get enough shared segments in one area, on one side or the other, you’ll have a gap. Also, in practice, we tend to determine the largest TGs first – large in length and large in numbers of Matches, and ones with known Matches in them, so we can assign them to the appropriate side (maternal or paternal). I’m excited that you are going to program this, as the program will slog through all the Matches quickly. Assigning sides requires genealogy, so a human may need to designate the known maternal and paternal Matches. It’s fairly easy if both parents have been tested; it gets harder when you have to rely on more distant relatives. Parents give you 100% “coverage”; other relatives, somewhat less – requiring some amount of imputation judgment. Still, just doing the grunt work of all the comparisons to form TGs would be great!
Louis, if you just start at the beginning of a chromosome and do each pair wise comparison, the TGs that can be formed, should fall into place. Each shared segment will match another, or not. The “nots” are either a stand alone segment (with no other sufficiently overlapping segments to compare to), or it’s IBC (if there are sufficiently overlapping segments on both sides and there is no match on either side – these segments are usually smaller than 10cM). Even if a program cannot do 100% of the shared segments, whatever it can do would be of great benefit. I would run that program on my data about monthly as a Quality Control check – making sure I haven’t introduced any errors in my thousand of rows of segments. You might want a user defined threshold (of segment size), that can be adjusted from, say 15cM down to 5cM.
Thanks, Jim. I’ll give it a try.
I found one of your earlier posts: https://segmentology.org/2015/05/11/how-to-triangulate/ which answers my question. It describes nicely how to identify the chromosome half TG A or TG B to which each triangulated segment belongs. I am pretty sure I can get DMT to do this and provide a reliable way to get TGs using FamilyTreeDNA data. I’ll let you try it once I get it working.
Great, Louis, I already know your DNT program is accurate for Triangukation – this would take it a step farther.
LikeLiked by 1 person
You pulled the concepts together well. Obviously, many of these concepts would apply to non-TG matches, but I guess your point is that matches should be in TGs.
I hadn’t really thought of TGs in terms of genes before. That was also interesting. Did you mean to add a Chromosome # to your example at A9?
It would be interesting to see your TG Chromosome Map.
Joel – I think I did have the Chr # for each example – it’s the two digits before the period. The three digits after the period is the Mbp for the start location.
I guess I didn’t read your naming convention closely.