This post is about the distribution of our DNA segments (as represented by TGs) among our Ancestors. It’s gotten long and convoluted, so I am going to post it in pieces. This is Part 1.
I currently have over 110,000 DNA Matches – they are mostly spread over 3/4 of my Ancestry from Colonial America – mostly Virginia. [1/4 of my Ancestry is from my maternal grandmother whose parents were immigrants in the 1860s – and I get relatively few DNA Matches on these Ancestors]. So I am thinking about the distribution of say 90,000 Matches over three of my grandparents from Colonial Virginia, and the chore of finding Common Ancestors (CAs) linked to DNA segments – represented by Triangulated Groups (TGs). I’m musing about the Big Picture – the distribution of TGs in our Ancestry. From a macro view can we learn something? Can we predict something?
These Matches represent a lot of different DNA segments passing down to me (and to my Matches) from my Ancestors. However, I do know something about these “different DNA segments” – I have 372 TGs – each one representing a segment of my DNA from one parent or the other. These 372 TGs are the equivalent to phased data, that “cover” all of my DNA – they are arranged, adjacent to each other, from one end of each chromosome to the other. That’s an average of 186 TG segments on one side.
When I look back at my blog post that used simple math to estimate the number of DNA segments we typically get at each generation – on one side – it shows:
Ancestry used to provide Circles back to the 8C level… They currently limit ThruLines to the 6C level, but at one point clearly acknowledged that the shared DNA segments could come from at least the 8C level, and the Circles they presented showed that plenty of folks had Trees with CAs at that level. I have a lot of evidence that indicates atDNA “works” back to at least the 8C level.
At this point in my musing, I’d like to reflect on several “rules”, or guidelines, or pretty valid assumptions about autosomal DNA.
1. Although atDNA is random, the larger the sample size, the closer to the calculated averages we tend to come. This means that we may find some instances of outliers, but with enough data, it averages out. We see this in several instances – a sibling may share a large part of one chromosome with us, but, on average, the total share will be closer to 50% – some chromosomes may be passed to us from a grandparent intact (no grandparent crossover points), but, on average, the total grandparent crossovers will be closer to the reported averages.
2. Science indicates about 34 crossovers occur per generation on each side. In other words, the biology says our DNA is not pureed into mush and then passed down to the next generation in many little pieces. This means at each generation there are relatively few new subdivisions of the parent’s DNA that is passed to a child (average 34), and over several generations many previous segments are not subdivided by crossovers. [NB: in the closest generation – the grandparents DNA passed to us by our parents – the crossovers may be closer to 27 from the father and 41 from the mother, but this difference damps out with as we go back in generations. For the big picture, I’m using an average of 34.
3. Shared DNA follows a 1/4 “rule” with each generation (rather than a 1/2 “rule”). We see this in the average of 880cM with a 1C, and 220cM with a 2C, and 55cM with a 3C, etc. This means the shared DNA drops off quicky with more distant generations. It’s also the reason why we only share DNA with about 1/2 of our 4C and maybe 1/10 of our 5C. However, even given this steep drop off, we have so many 6C to 9C, that we still get shared DNA with a some of them. Our DNA Match lists are filled with folks who are probably distant cousins…
4. We will almost never see Matches who descend from more than two different children of a distant Ancestor, who share the same (overlapping) DNA segment with us. This one is hard to explain, but it starts with the very low probability that a 7C shares a segment of DNA with us that we each got from the same 6xG grandparent – each of us descending from a different child in that family. Now add to that another Match who shares the same DNA segment with both of us and they descend from a third child in that family. You and a sibling will share about 1/4 of a parent’s DNA. If you add another sibling into the mix, the three of you will only share about 1/8 of a parent’s DNA. Considering the 50/50 chance that a segment will get passed along in the next generation, it gets to be very long odds that you and two Matches will share DNA from a 6xG grandparent’s three children. The science says it’s virtually impossible to add a descendant of a fourth child into a Triangulated Group. I discussed such a scenario in Chapter 1 of “Advanced Genetic Genealogy” and concluded that some Matches with the same TG who descended from 3 different children of the CA probably had mistakes in their genealogy. NB: this “rule” does not preclude multiple Matches in a TG from a distant Ancestor – I have several valid examples of 10 to 20 (or more) Matches from a distant Ancestors. Some of them can be closer cousins to each other, and actually descend from the same child of the distant Ancestor. Also, some of them share different TGs. Bottom line, we won’t see Matches with the same TG descending from more than two children of a CA. I’ll use this “rule” later…
5. The DNA doesn’t play favorites. The DNA process of recombination and crossovers does not have any knowledge of a person’s status, religion, wealth, family size, health, surname, endogamy, etc. etc. The process of recombination is carried out the cell level, without regard to any external factors. So, from the DNA’s perspective, our Ancestors are equivalent. We should expect the same number of segments (TGs) from large or small families; from recent immigrants or Colonial Ancestors; from endogamous family branches or branches without endogamy; etc. However, we can expect a wide range in the number of Matches based on these kinds of factors. But they will not affect the distribution of DNA segments among our Ancestors. The TGs will be distributed randomly – roughly in line with the table above.
So back to my musings….
I believe my TGs come, on average, from my Ancestors around the 5C level – 4xG grandparent couples. Clearly there will be some type of bell-shaped distribution curve, and some will be a little closer and some more distant. At that level, all of my 16 paternal 4xG grandparent couples would pass down about 193 TG segments, averaging about 12 per couple (Note the 193 in the chart above). Probably an average of 6 on the paternal side and 6 on the maternal side. These TGs have to come from somewhere – meaning 6 TGs, average, from each of the 5xG grandparent couples – either as intact segments and/or through recombination. Each of our results may vary some; but we should see these orders of magnitude (which are not large). The total DNA contribution of DNA segments from all the 5xG grandparents, on one side, will add up to a full set of autosomal chromosomes, on one side.
This is the end of Part 1. In Part 2 I’ll muse more about where TGs are formed and their journey from the Ancestors down to us and our Matches. Part 3 will look at conclusions (what can we learn from all of this), and propose a new spreadsheet to track TGs through other children of our Ancestors (what we can do!).
[15H] Segment-ology: Distribution of TGs – Part 1 by Jim Bartlett 20211001