Anatomy of a TG

This is another blog post that gives you some idea of what to expect with autosomal DNA and your segments. In this post we’ll look at the formation of a TG. We’ll walk through the steps:

  1. Start with overlapping segment data
  2. Simplify the data by rounding
  3. Sort by Chromosome and Start location
  4. Then Triangulate the segments (no genealogy required)
  5. Highlight one of the two resulting TGs
  6. Show this data graphically – like you’d see in a chromosome browser
  7. Overlay the total TG
  8. Then use our imagination and x-ray vision (or GEDmatch) to show what the ancestral segments of the Matches might look like
  9. Do some analysis…

Ready?

Figure 1. Some overlapping segment data

10B Figure 1

Letters represent Match names – data is taken from my spreadsheet.

Figure 2 – divide the Start/End locations by 1000000

10B Figure 2

It’s much easier to read the Start/End locations in Mbp; and it’s just as accurate for genealogy.

Figure 3 – the data is sorted by Chromosome and Start location

10B Figure 3

This makes it much easier to see overlapping segments.

Figure 4 – this shows the results of Triangulation into groups 16A and 16B.

10B Figure 4

No genealogy was involved in this process – it’s purely a matter of comparing segments at 23andMe or GEDmatch; or looking for ICW Matches in this list and each ICW list at FTDNA. Again, this is real data from my spreadsheet. Often there is more mixture between the two TGs, but I hope you get the idea.

Figure 5 – Here is only TG 16B data

10B Figure 5

It’s still arranged by Chromosome and Start location.

Figure 6 – Same data and the shared segments displayed graphically

10B Figure 6

This is how you’d see the data in a chromosome browser. Note the top 11 bars will all match each other. The bottom bars will usually all match each other too, and they’ll usually also match the top 8 bars, but maybe R and N will not match at the 7cM level at GEDmatch. Just lower the level to 500 SNPs and 5cM and you’ll find there is enough for a Match. Let’s see what the TG for this data looks like…

Figure 7 – Now the fun begins…

10B Figure 7

Usually the TG is pretty clear cut, but I’ve intentionally selected one with two kinds of ambiguity. In almost all cases the ends of the segments are fuzzy. You can read about Fuzzy Data in my blog post here.

Judgment is needed at this point. I’ve shown the “guaranteed” TG in red, with orange tips where the data looks fuzzy. I want to emphasize that this is NOT a problem for genealogy – the TG (wherever the true crossover points are that define the TG Start and End locations – somewhere in the orange areas) represents an ancestral segment from one of your ancestors. The fuzzy ends are not an issue. Your Matches will share a Common Ancestor with you – and that’s where the focus should be. The crossovers defining the real TG will be somewhere in the fuzzy orange tips.

You can also see that this data indicates a probable more distant crossover point – around say 51Mbp. In this case the top 8 Matches and Match S are probably closer cousins sharing a larger, closer segment with you. At this point you might want to review Crossovers by Generation here.  Going back one or more generations we may see the large red TG being subdivided into two smaller ancestral segments – each with its own Common Ancestor each one of which is ancestral to the Common Ancestor for the red segment. In this case the last 8 Matches (T, J, Q, L, M, I, A and K) will have a different, more distant CA than Matches G and H. The main, red, TG may be from a 5G grandparent, and the smaller, green and purple segments may be from a 6G or 7G grandparent. Actually the purple segment, as an example, may pass intact through several generations, and you could share this same segment with 7C and 8C…

Again, most of your TGs will be tighter and from a single CA, but I wanted to take this opportunity to show what sometimes happens. You can avoid any conflict by watching for this situation and just declaring two TGs in this case – see the green and purple bars. Then it’s like the case where a close cousin spans more than one TG – the close cousin will help you define a larger segment from a closer ancestor, and the close cousin, along with different groups of Matches in different TGs will share more distant Common Ancestors with those TGs, but those more distant CAs will be ancestral to the MRCA the close cousin shares with you.

So now let’s use our imagination a little (or we could actually Triangulate this area from the perspective of some of our Matches. In this next Figure 8, I’ve guessed at what the ancestral segments might be the Common Ancestor down to the different Matches – as shown in green.

Figure 8 – showing ancestral segments for all Matches

10B Figure 8

In some cases our Matches have somewhat larger ancestral segments than we have, or they might have segments that extend over one end of our ancestral segment, or the other. In all cases the blue represents the overlap between each Match’s ancestral segment and our own ancestral segment. And the data is not exact, so the ends often don’t line up vertically. The long green bar at the bottom of Figure 8 is a segment an ancestor passed down to living people – you got part of it, the red part.

At GEDmatch it’s often fun, and instructive, to compare two Matches to each other. Sometimes they turn out to be parent/child, or siblings. This is the exercise you’ll want to do if you are trying to map an ancestor.

So, again, if you have collected a lot of shared segments from FTDNA, 23andMe and GEDmatch, they have to go somewhere. It’s not hard to compare them to each other and see where they Triangulate. If they are IBD, they have to go on one chromosome or the other. When you do this you’ll find there are natural break points where the crossover points are located (often the precise location is a little fuzzy). Just look at the data above.

 

05D Segment-ology: Anatomy of a TG by Jim Bartlett 20160204

Crossovers by Generation

There has been some amount of discussion about segment size, triangulation, and the number of cousins who can share a Triangulated Group. The discussion often uses terms like extremely rare, small segments, distant ancestors, etc. without using specific examples. The arguments go from it’s OK to triangulate with close relatives, to it’s virtually impossible with distant relatives – and there is no discussion of any middle ground.  The odds do diminish as you go back in ancestry, but there is no artificial dividing line: closer works, distant doesn’t work. There are always a gradation – shades of gray, if you will. Let’s see if we can put boundaries on it.

In my mind, one way to try to see the forest, and the trees, is to really take a look at an average genome (23 chromosomes, 3 billion base pairs), and see what kind of segments we might see at each generational level. Most of us know that we get pretty large segments from our grandparents, and the size drops down with each generation as we work our way back/up our ancestry.  So let’s develop a table and take it back and see what we have.

The average number of crossovers per generation is 34. Yes, the average for males (fathers) is 27, and the average for females (mothers) is 41 (per www.isogg.org/wiki/Recombination ). But this difference (with respect to the total number of crossovers in a genome) fades after just a few generations – so we’ll use the average, 34.

Crossover Points in One Generation

Let’s start with a parent and 23 pairs of chromosomes. In passing a genome to a child, this parent adds 34 crossovers, which results in 23+34 = 57 segments. Here is Figure 1 showing 34 crossovers and the 57 segments in one genome:

05D Figure 1

These are generally large segments from the grandparents. On average, these segments will be 3,400 cM divided by 57 segments or about 60cM per segment. But clearly some are larger and some are smaller. Sometimes a chromosome is passed intact – see Chr 21 above. You can try this at home, on a sheet of paper – just make 23 horizontal lines and put 34 vertical tic marks on them. You can put a few more or less tic marks, but the overall picture of relatively large segments from your grandparents will be the same.

The important observation here is that you have these ancestral segments on your chromosomes – they are fixed between fixed crossover points created when your parent passed these chromosomes to you.  Of course you don’t know where they are at first, but as you determine Triangulated Groups (TGs) with various cousins, you’ll find that none of the shared segments span across one of these crossover points. And in fact, with enough shared segments you will start to see these crossover points firm up, with separate TGs (from the other grandparent) on either side of them. This chromosome mapping, with shared segments, identifies the crossover points for your ancestral segments. The shared segments with Matches usually only overlap part of your ancestral segment from a Common Ancestor – in this case a grandparent.

Crossover Points in Two Generations

Adding 34 tic marks per generation is a good exercise to carry out for several generations and get the feel for how this works. Let’s try another 34 vertical tic marks. We’ll add the tic marks to show the crossover points that were formed when grandparents passed the chromosomes (which they got from their parents) down to your parents. In effect this takes the 57 segments we had in Figure 1, and (with 34 more crossovers) creates 91 total segments as shown in the genome in Figure 2:

05D Figure 2

We still have fairly large segments. On average now, these ancestral segments are 3,400/91 = about 37cM per segment. Again – some will be larger, some smaller. Each of these segments in Figure 2 (between tic marks – both old and new) are from a great grandparent. These segments fill up each and every chromosome in this genome. You may note that some of the grandparent segments were not subdivided. This is not unusual. In fact it has to happen. We started with 57 ancestral segments and added 34 new tic marks (crossover points) – so 34 segments got subdivided and 23 segments did not.

Crossover Points for 13 Generations

In the next generation back, we would add 34 more new tic marks (crossovers) which would subdivide only 34 of the 91 ancestral segments creating a total of 125 ancestral segments from 2G grandparents, and leaving 57 segments untouched (no subdivision). Here is a table in Figure 3 carrying this math out for 13 generations:

05D Figure 3

Discussion of Figure 3:

Note: This is a table with various values, depending on which generation you are focused on. So successively, pretend you are at a particular generation and read across to see the statistics. Cousins are abbreviated: 2nd cousin is 2C; 2nd cousin once removed is 2C1R.

– Gen 0: You have 23 chromosomes from a parent (we are only working on one genome, so the number of ancestors is 1. Your parent gave you 23 very large segments (which are chromosomes)

– Gen 1: You get DNA contributions from your 2 grandparents. This is in 57 segments spread over one genome. At this level of your ancestry you would see Matches with 1Cs. Review this in Figure 1.

– Gen 2: You get DNA contributions from your 4 Great grandparents on one side. Now you have 91 ancestral segments spread over 23 chromosomes, and each segment averages about 37cM. Some of these ancestral segments are larger, and some are smaller; and they all add up to 23 complete chromosomes (one full genome). This is the generation that you usually share with 2Cs – review Figure 2. In Figure 3 I also show the calculated shared segment values for the various cousins. With a 2C, you would normally share a total of 106cM (from one side). But the average size of the segments from the Great grandparents is only 37cM. This reflects the fact that you will probably share multiple segments with a 2C – perhaps on average three 37cM segments totaling 111cM… Remember these are averages and in actual practice there is a LOT of variation.

-Gen 3: This shows an average ancestral segment size of 27cM from your 2G grandparents – spread over 125 total segments. The total shared segment for a particular 3C is about 27cM – so you might expect a single segment from a 3C (again, this is just an average, but it might reflect what you often see). I’ve underlined ancestral segment (what you actually got from an ancestor), and shared segment which is the overlap between you and a Match. This overlap is rarely exactly the same ancestral segment in both you and your Match – one or both of you probably has somewhat more in the full segment you got from the Common Ancestor.

NB: this overlapping (shared) segment vs ancestral segment difference may be the root cause of some math calculations which have been touted as proving that exact matches among more than 3C are very rare.  Several cousins having the exact same ancestral segment may be fairly rare, but experience with Triangulated Groups shows that overlaps are not that rare.

-Gen 4: Ancestral segments (averaging 21 cM) from your 16 3G grandparents are spread over about 159 segments. So you would see, on average, an ancestral segment from each 3G grandparent in roughly different 10 segments spread over the chromosomes in that genome. Most of your Matches would be 4C (or 3C1R or 4C1R). The shared segments would average 6.6cM, but another way to look at this is that roughly half of them would be over 7cM. However, experience shows that a relatively small percentage of our Matches are 4C and closer relatives. So there are not many such Matches to cover all the segments in our genome.

-Gen 5: Our 32 4G grandparents still give us fairly large 17cM ancestral segments (on average) spread out over 193 segments. We would still see most of our 4G grandparents in multiple segments. Our 5C Matches only share, on average, 1.7cM. So only some of them, on the tails of the distribution curves, will share 7cM or more. The offset is that we have so many 5Cs, that we still get plenty of IBD matches with them. However, the key point here is that while we may have a 17cM ancestral segment from a 4G grandparent, a 5C is only likely to share part of that with us. It would take several 5Cs, each with a 7-10cM segment, partially overlapping our own ancestral segment, to “cover” our 17cM ancestral segment.  In practice we often get 5C Matches with above average segments, but usually not as large as 17cM.

-Gen 6: Our 64 5G grandparents pass down ancestral segments to us that average about 15cM. They pass these down to an average of 227 segments; and each 5G grandparent will pass down DNA to 3 or 4 different segments, on average. Perhaps some of our 5G grandparents won’t have DNA that reaches us at all, while others my pass down 5 or more segments – roughly, it usually averages out. At this level most of our Matches will be 6C, give or take a little. A 6C, on average, only shares 0.4cM of DNA with us. But there are long tails on these distribution curves, AND we have a LOT of 6Cs. The result is that we do have many 6C who do share IBD segments with us over 7cM. Yes, the probability of a specific 6C shared segment is one forth the probability of a 5C, but we have so many more 6C than 5C, we actually get more Matches with 6C. This means more 6C Matches are out there with a shared segment over 7cM, than there are for 5C. Again, it will normally take several of them to “cover” and ancestral segment (a TG).

-Gen 8: Skipping a generation to the 256 7G grandparents. At this point there are an average of 295 segments, or about one segment per 7G grandparent. Clearly by this time some of the 7G grandparents do not contribute to your DNA, and some 7G grandparents contribute to several ancestral segments. Your ancestral segments are in the 11-12cM range, on average. And despite the fact that 8Cs only share a small amount of DNA on average, there will still be many 8C with shared segments above a 7cM threshold.

Summary

All through this analysis, the number of ancestral segments has increased by a constant 34 with each generation; the average segment size starts off large and decreases with each generation, but even after 13 generation, the average ancestral segment is still over 7cM; the number of ancestors continues to double with each generation (and at some point duplicates will start to appear, but as I’ve outlined in Endogamy I and II, each duplicate really acts like a separate ancestor); and the average size of shared segments decreases by a factor of 4 with each generation, but we still see many Matches with shared segments over 7cM. To expand on this last point, I have over 10,400 “phased” Matches at AncestryDNA, with all the pile-ups and IBS already culled out. About 400 of these Matches are 4C or closer, leaving over 10,000 Matches in the 5C or more distant range. The distribution of these is spread out among 5C, 6C, 7C, 8C, etc. It is, so far, unclear how far back these go, but clearly there are many in the 5C-8C range. And AncestryDNA claims their “phasing” program has less than a 1% error rate. So 99% of these are IBD shared segments, probably most in the 6C-to-8C range. To my thinking, this means most of them must line up somewhere on our chromosomes. If we assume half, or 5,000, of these Matches are for each genome, on average, then these 5,000 Matches must be on 300 to 400 of my ancestral segments – or over 10 Matches in the 5C-8C range on every segment, on average. Some ancestral segments (TGs) may have more, some may have less, but the 5,000 IBD Matches have to go somewhere.  I’ve picked on AncestryDNA here, because they poo-poo Triangulation (I think they don’t really understand it), and because they have equations that some have used to argue that we cannot have multiple 4C or above in TGs. But the same analysis is true using 23andMe and FTDNA data – they each report many Matches, they each claim a small IBS rate (under 5%), and by their own estimates, most of our Matches are beyond 4C. All of these IBD Matches have to be on our chromosomes somewhere. And, in 14 months (by my estimation), we will have twice as many Matches as we have now – we’ll have over 20 Matches per ancestral segment (TG)!

NOTE: the number of crossovers per generation will average out. So the number of segments created by each generation is fairly accurate – there is much less variation in these numbers than you might find in the average cM for an ancestral segment (which has a somewhat wider range) or a shared segment (which appears to have a much wider range).

“the main thing is to keep the main thing the main thing”

  1. Your genome (chromosomes) is divided into segments by crossover points.
  2. These are your ancestral segments, and each one is from a specific ancestor.
  3. Each Match will have his/her own crossover points and ancestral segments from specific ancestors.
  4. When you share an IBD segment with a Match this segment comes from a Common Ancestor (CA).
  5. A shared segment means your ancestral segment and your Match’s ancestral segment overlap.
  6. Your Match may have a small ancestral segment, which falls within your ancestral segment; a large ancestral segment, which includes your ancestral segment; or, usually, any size ancestral segment which overlaps a portion of your ancestral segment.
  7. The overlapping amount may be relatively small (say 7cM), or as large as your ancestral segment.
  8. The odds are very small that you and a Match would get exactly the same segment from a CA. And certainly the odds would be extremely small that you and several Matches would get exactly the same ancestral segment from a Common Ancestor.
  9. However, from the numbers of IBD shared segments we are getting from Matches, compared to the number of ancestral segments, it is highly probable that multiple Matches can and do have ancestral segments which overlap your ancestral segments.

 

Note: A full Triangulated Group (TG) is equivalent to one of your ancestral segments. Which ancestral segment the TG represents depends on the shared (overlapping) segments you have with your Matches.  Several Matches with overlapping segments in a TG will tend to “wall paper” your ancestral segment – with enough of the right Match/segments your TG will cover the whole ancestral segment. Some TGs may be from a closer ancestor (say a great grandparent), some may be somewhat more distant (say a 7G grandparent). From my experience, most TGs will be in the 10-40cM range. This does create a hodge-podge effect (with TGs from different generations), but the TGs tend to be adjacent to each other from one end of each chromosome to the other. Alternatively, you can try to map to a specific generation – perhaps starting with grandparents (and determine those crossovers), and then determine which of those segments are subdivided into smaller segments from the great grandparents, and which segments remain intact going back that one generation. And then continue in this fashion with each additional generation. The drawback to this process is that you need many close relatives to take DNA tests to determine all the crossover points at each generation.

 

A final word of caution: don’t get too lost in the details or the math. Generally, you will have many Matches and IBD segments. Because they are IBD segments, they have to go somewhere on Mom’s side or Dad’s side. 23andMe and FTDNA have developed algorithms to help insure that most of your Match segments over 7cM are IBD, and from experience we know that almost all of the shared segments over 10cM are IBD, and well over half of the 7-10cM segments are IBD. So if you are reading this blog, you are probably into utilizing segments, along with your genealogy, to improve your family Tree. You should also upload to GEDmatch to find other Matches (from all 3 testing companies) with segments. When segments over 7cM Triangulate, it’s a very strong indication that those segments are IBD and the resulting TGs are from a Common Ancestor. You have an ancestral segment at the location of each TG, and your Matches share part of that ancestral segment with you. Each ancestral segment (TG) came from one of your parents and one of your grandparents, etc. Match/segments in that TG have to come from a distant ancestor who is ancestral to that grandparent. There is no cutoff to this process. We cannot say that only our large ancestral segments are valid. All of our ancestral segments came from a specific ancestor. Our ancestral segments have their own ancestral “Tree”. You may be more confident about a TG including a first or second cousin, but you probably don’t have enough tested cousins to cover every TG over all of your chromosomes. That doesn’t mean these other TGs are not valid, it just means you don’t have a close cousin to validate it. You have to use the closest cousin you can find to validate each TG.  Your ancestral segments are real! They are part of you, from your ancestors. And Matches who share those segments, also share their ancestry – no matter how far back the Common Ancestor is. Note from Figure 1 and 2 that segments from more distant ancestors are “nested” within larger segments from closer ancestors. So if you cannot determine the most distant Common Ancestor, look for the closer Common Ancestor who provided the larger ancestral segment.

 

05D Segment-ology: Crossovers by Generation by Jim Bartlett 21060201

Endogamy PART II

Endogamy Part II – One Segment from One Ancestor

Review

In Endogamy Part I (Shared DNA), we found that the total cM shared between you and a Match is multiplied by the number of times you had the Common Ancestor (CA) in your Tree. So if you and your Match were 5th cousins (5C), you would normally share 3.4cM. If your CA (between you and your Match) was in your Tree five times you would tend to share, on average, 5 x 3.4cM = 17cM. If your Match has that CA in her Tree three times, the total you would tend to share, on average, would be 3 x 5 x 3.4cM or 51cM. For close-cousin Matches (say 1C, 2C, 3C), this make a big difference. For distant-cousin Matches (say 6C-8C or more), where you would probably not match at all most of the time, the endogamy may increase the total cM enough that you’ll actually get an above-threshold Match. Note: a 7C would normally share 0.1cM, so even with an Endogamy factor of E15, you’d only share 1.5cM, on average, and you’d need to be on the tail of the distribution curve to exceed a 7cM threshold for a Match.

In this post – Endogamy PART II – we’ll look at what happens to an individual segment.

Ground Rules

Shared Segment means an IBD segment, from an Ancestor. I usually consider all shared segments over 7cM in a Triangulated Group to be IBD segments.

One Ancestor or one CA means one Ancestral line. See CA and MRCA for a discussion of the CA for a shared segment. Note: at different cousinship levels, there may be intermediate CAs, or MRCAs, which are all in one Ancestral line. The shared segment comes from the most distant CA of that Ancestral line. In other words, all the Matches with shared segments (think a Triangulated Group), will have a CA with you on one Ancestral line – you and they will all descend from one CA.

Endogamy means a CA is in our Tree multiple times. So Ancestor A can be represented by A1, A2, A3, etc. for each time that Ancestor is in out Tree. As you read on, you’ll note that it’s important to treat each one (A1, A2, A3, etc.) as a separate Ancestor (even though they are all the same individual).

Assumption: Each Match will have one CA with you. I know in many cases a Match may share multiple Ancestors with you. But for the purposes of this blog post, we will only look at the effects of one CA. We have to build up, one concept at a time. Learn the concept in this post.

Shared Segments from Duplicate Ancestors – analysis

Let’s look at a shared segment between you and a Match. It could be any shared segment. Just to add some reality to this discussion, let’s say it’s on Chr 10 from 8 to 20Mbp with 23cM. We’ll call this SEG-1.

In this example, the CA for you and a Match is in your Tree twice: A1 and A2. Both A1 and A2 are the same person, so both A1 and A2 have the same SEG-1. Let’s look at Figure 1 and see how far SEG-1 can descend toward you.

16E Figure 1Analysis of Figure 1:

In Generation 6 (G6), YOU and a MATCH are 4C, and share SEG-1.  SEG-1 came from Common Ancestor A. Red indicates that person has SEG-1.

In G1, your 2 ancestors, A1 and A2, have SEG-1; and your Match’s ancestor, A, has SEG-1. These are all the same individual – so naturally she has SEG-1, no matter where she is in your Tree or your Match’s Tree.

SEG-1 is passed down from A to the Match through one ancestral line.

In G2, SEG-1 is passed from A1 to her son; and from A2 to her daughter.

In G3 A1’s son passes SEG-1 to the paternal chromosome in his son; and A2’s daughter passes SEG-1 to the maternal chromosome in her son. This G3 son now has two copies of SEG-1, one on his paternal Chr 10, and one on his maternal Chr 10. This is indicated by **.

In G3, the ** father, recombines his two Chr 10s to make one Chr 10 to pass to his son. Only one of the two SEG-1s can be passed on. The G4 son will only get one SEG-1. This is indicated by *. We don’t know whether the maternal or paternal SEG-1 is passed on – it’s a 50/50 chance for either. If you need a refresher on recombination and how only one area of two chromosomes can be passed down, please review Segments: Top-Down.

In G4 and G5, SEG-1 will be passed down to you, and you will share this segment with your 4C Match.

Starting with G1, there are lots of possibilities, but for you and your Match to both share SEG-1, it has to start in a CA (in this case A, A1 and A2) and be passed down through each generation to you and your Match.

One Segment from One Ancestor

This is a fundamental concept of genetic genealogy. Each shared segment can come from only one Ancestor. This means from only one of several Ancestors (A1, A2, A3, etc.) in Trees with endogamy.  This can be extended to each Triangulated Group can come from only one Ancestor. A corollary is that a different shared segment (or TG) can come from a different Ancestor. With an Ancestor in your Tree 5 times, it is possible for you to have a different segment from each one. It’s also possible for one Ancestor to pass down several different shared segments (in different TGs). So although we can say “One Segment is from One Ancestor”, the reverse is not true. We have to say “One Ancestor can pass down Multiple Segments” (or no segment).

Shared Segments from Multiple Ancestors – analysis

We can apply this same concept – One Segment from One Ancestor – to even greater endogamy. See Figure 2.

16E Figure 2

Analysis of Figure 2:

In this case we have E5, the Common Ancestor is in your Tree five times. When cousins (descending from the CA) marry, their child may get two SEG-1s – one on each chromosome. This is indicated by the double **. The next generation gets only one SEG-1 from that line. As noted with the A3 line, the son in G4, could have double ** – one from A1 or A2 (paternal chromosome) and one from A3 (maternal chromosome). The daughter is G5 got a paternal SEG-1 from A1, A2 or A3, and a maternal SEG-1 from A4 or A5. You got a SEG-1 from A1 or A2 or A3 or A4 or A5. At this point we cannot tell which of your Ancestors passed down the SEG-1, but we do know it could only have come from one of them.

In this discussion, I’ve used the “worst case” scenario – each of your A Ancestors passed SEG-1 down as far as she could. In fact, SEG-1 is subject to the 50/50 rule – half the time it will be passed down, half the time it won’t. However, the fact that we share SEG-1 with a Match, and have a Common Ancestor A with that Match, means at least one of your Ancestors A had to pass it all the way down to you and the Match.

We are all well aware that you and a Match probably have multiple Common Ancestors. Again, this analysis does not sort out which Ancestor is the correct Common Ancestor for SEG-1, nor does it sort out which one of the multiple Common Ancestors it is. This analysis just establishes the point that:

One Segment comes from One Ancestor

A very unusual exception

In the case where your parents are cousins, it is possible (but not very probable), that you would carry two SEG-1s (from this example) – one paternal, one maternal. This would be the case in Figure 2 if you were the daughter ** in G5. At GEDmatch, any such segment areas would be highlighted with their “Are Your Parents Related?” tool. So it’s easy to check for such segments, and be aware of where they exist (Chromosome and Start/End locations). In all other cases you don’t need to worry about this issue. For any shared segments meeting this very unusual criteria (exactly the same segment on both chromosomes), you wouldn’t know which of your two ancestors it came from. If there were any difference in these two shared segments (they were not exactly the same), then chromosome mapping would usually tell you which one was which.

Summay:

Each shared segment comes from one Ancestor. In the case of endogamy with multiple identical Ancestors, each shared segment comes from only one of them.

It’s possible to have ten identical ancestors in your Tree, and to get a different shared segment (as in a different TG) from each of them.

 

16E Segment-ology: Endogamy PART II – One Segment from One Ancestor by Jim Bartlett 20160104

CA and MRCA

 

Shared IBD segments come from a Common Ancestor (CA). Matching & overlapping IBD segments form Triangulated Groups (TGs). Every Match in a TG with significantly overlapping shared segments will have the same CA! And closer Matches (cousins), will also have a closer CA. So how can we have a close CA and a distant CA when they are in the same TG?  When they are all in the same ancestral line!

Let’s start with a distant cousin (Match) and look at the Common Ancestor.

05C Figure 1

Some notes about Figure 1

– With atDNA the path from the CA can go through males (boxes) and/or females (circles) in any order – it does not matter.

– The CA is one of the two parents above – the DNA that passed down from the CA to you and your 7th cousin (7C) came from one person. In this example, I’ve assumed the mother just to illustrate that it is just one parent. In most cases we don’t know which parent the DNA is from.

– The CA has at least two children: one is an ancestor of your 7C Match (M); and one is the ancestor of you (U)

– In this case the CA is also an MRCA (Most Recent Common Ancestor) – you and your Match don’t relate any closer on this line. However, in genetic genealogy, we tend to call this the CA, rather than the MRCA.

– You and your Match (M), will also share all of the Ancestors of the CA.

– This Figure 1 assumes the CA shown is the correct CA – the one who passed down the shared DNA segment to you and your Match.  We don’t really know if this CA is correct, until we find corroborating evidence – read on.

So how do we confirm that this CA line (either the mother or the father) is the one who passed down the segment you and the Match (M) share (as opposed to some other ancestral line)? One way is by Triangulation. When several people share the same segment, and all have paper trails to the same CA, we assume this CA must be correct. Another method is by “walking the ancestry back”. That is through closer cousins who also share this segment (in a Triangulated Group). We are generally pretty comfortable (when a close cousin shares a lot of DNA with us) that the closer CA is correct. In other words, when a 2nd cousin (2C) shares 220cM with us, and has large individual shared segments with us, we assume the CA is the known Great grandparent. And with large segments this is almost always true. Then if a known 4C shares a good sized segment with us, we also assume the known 3G grandparent is the CA. If all of these occur in the same TG, we need to call the intermediate CAs, MRCAs (Most Recent Common Ancestors) to distinguish them from the CA of the TG. Let’s see how this looks in Figure 2.

05C Figure 2

Some notes on Figure 2:

– The Tree for your Match (7C) and you is the same as in figure 1.

– A matching 2C on the same segment will have an MRCA with you on the G grandparent.

– Matches with 4C and 6C are also shown with MRCAs on the ancestral line from you to the CA.

– Everyone in Figure 2 descends from the CA.

This scenario, with intermediate MRCAs, adds a lot of confidence to the CA being the Ancestor who passed down the DNA that all of you (Match M, you, 2C, 4C and 6C) share.

Note that the intermediate MRCAs could have just as easily been on the Matches line. And/or you and the Match may both have intermediate MRCAs. The key point is that the MRCAs are in the ancestral line to the CA.

This concept applies equally to TGs. Each TG really represents a segment from an Ancestor to you. A “tight” TG – one with significantly overlapping segments among all the Matches in the TG – will have a CA just like a shared segment does. And all the Matches in the TG will share that CA. A “wide” TG – with “cascading” segments such that one at the beginning of the TG doesn’t overlap one at the end of the TG – may well turn out to be two TGs, with two CAs… more on that in a different post.

So there is always an Ancestor who is the most distant MRCA of your TG, for a given threshold. That means that any Ancestor who is more distant would not show up as a Match (using the given threshold, say 7cM), because the segment from the more distant Ancestor down to you would be too small at that distance to match anyone. For example, you may have gotten only 6cM from a 7G grandparent. In that case you would never get any Matches who were cousins on that 7G grandparent, using a 7cM match threshold. Others may get large enough segments from that same 7G grandparent, and maybe get some Matches, but you would not.

It appears this most distant MRCA of a TG may be fairly deep in our Trees in many cases. As a result we are having a hard time finding them. Our best tactic then is testing close cousins, and finding intermediate cousins among all of our Matches. This means testing at all companies and uploading to GEDmatch to get the most Matches you can. We never know when the key intermediate Match will show up – they won’t always have significantly larger segments. And lowering the threshold at GEDmatch, in general, will only result in even more distant cousins and more distant CAs.

Summary

A shared segment is from a Common Ancestor (CA) with a Match (cousin).

Closer cousins would have MRCAs with you who are descendants of the CA. Your Matches may also have closer cousins with MRCAs who also descend from the CA.

These “intermediate” MRCAs increase the probability that the CA passed down the shared segments.

We still do not know if the CA is the mother or the father, but we can be very confident that this is the correct ancestral line, and not some different or alternative ancestral line.

 

 

05C Segment-ology: CA and MRCA by Jim Bartlett 20160101

Endogamy PART I

Endogamy PART I – Shared DNA

This blogpost looks at the amount of shared DNA from endogamy. It does not address the genealogy of endogamy, but instead establishes some terminology and reference material.

First let’s define endogamy: the custom of marrying within the limits of a local community, clan or tribe [Oxford Dictionaries online].

This means cousins marry each other; and those two cousins have at least one ancestor who is the same. In others words an ancestor is in our tree more than once. The same individual occupies two (or more) blocks (or positions) in our tree, and their respective descendants (cousins) marry each other.

Classic examples of endogamous populations include Ashkenazi Jews and Low German Mennonites. In genealogy, endogamy is also used to describe multiple cousin marriages in limited population area such as those found in various areas of Colonial America, for instance [c.f. ISOGG wiki].

Let’s take a more in depth look at how DNA is passed down, how much DNA is shared between cousins, and examine the impact of endogamy. How does endogamy affect the total amount of DNA shared between cousins and the size of the shared segments?

Ground Rules

 Use average cMs. DNA is very random and there is a wide range of possible values of segment cMs passed down from ancestors, as well as the amount of shared cMs between cousins. For this article, I will consistently use the calculated average values. In practice we see values above and below these average values, but with large data they should average out to the calculated values. By using the average cMs we should all come to the same results.

Use 7040cM as the total cMs in one person. Each company tracks the cMs a little differently. I picked this value because it’s roughly right*, it divides easily, and it compliments my notional Segment Size Chart here. We want to stay focused on the big picture and keep things in good perspective, rather than get into a debate about which company has the best total. I’ll use 7040 as the “base”, and also show the percentage that is passed down and shared. You can use a different base if you want. It’s the relative values we are after here, so it really doesn’t make much difference which base you use. The takeaway should be a general understanding of the effects of endogamy.

Use A to designate an ancestors who is in a pedigree more than once. A1 and A2 would be the same individual (A) in two different positions in a pedigree.

Use one Ancestor (A). We usually note a couple as the Common Ancestor because we don’t know which one passed the shared DNA segment down to you and your Match.  But only one Ancestor of this couple had that DNA, and I use only one Ancestor is this analysis.

 Base Chart [E1]

For this discussion we will use average values, and each descendant will get exactly half of their parent’s DNA. Also the shared amount decreases by a factor of 4 with each generation. This gives us the following Base Chart:07D Fig 1

Explanation of Figure 1:

Values under You and Match are in cM. 4C means 4th Cousin; and 4C1R means 4th Cousin once removed. This will be similar in other figures.

Column 1 shows a Common Ancestor (A) at the top of the chart (with a total of 7040cM of DNA). The list of descendants is noted by Gen 1, Gen 2, etc. Note with atDNA, the descendants could be male or female.

Column 2 shows the total amount of DNA passed down from the Common Ancestor (A) to the descendants in each Gen. For the purposes of this article, I used one half of the ancestor’s DNA in each succeeding descendant. Usually this column represents you.

Column 3 shows the relationship between the descendants on your line vs. the descendants of a Match’s line in Column 4.

Column 4 shows the total amount of DNA passed down from the same Common Ancestor (A) to the descendants in each Gen. Again, I used one half of that ancestor’s DNA in each succeeding descendant. Usually this column represents your Match.

Column 5 shows the total amount of DNA that would be shared between you and your Match at each generation. Note that the amount decreases by a factor of 4 in each generation. [Sidenote:In the case of a half cousin, the amount of shared DNA is halved. Example 4C = 13.75cM shared; 4C1R = 6.875cM shared; 5C = 3.438cM shared.] Note that in Gen 6 (5C level) the share is 3.44cM, which is well below a matching threshold of 7cM. Clearly the average 5C would not show up as a Match. However, we know we have many 5C Matches above 7cM, so those Matches which are reported are well into the upper “tail” of the 5C distribution curve – see cM notional distribution curves here.

Column 6 shows these shared cMs as a percentage of the base [7040cM]

Column 7 is a little trick – it shows years inversely spaced at 30 year intervals, starting with a genealogist born about 1950. This allows you to either 1) look at a year of interest to you and see the probable cousins you’d have with ancestors of that time period, or 2) look at the cousinship of a Match and see approximately when the Common Ancestor lived. Of course it’s a very rough approximation, AND you should feel free to use different years that roughly work with your pedigree. This one works pretty well for me…

Column 8 is another little trick – it shows the number of ancestors you would have at each Gen going back – another inversion list. For example: if you and your Match are 8C, you would each have 512 ancestors at your Common Ancestor level. In other words the CA is 1 of 512 ancestors. It’s a handy lookup feature of Figure 1.

Endogamy factor – I have noted this chart as Endogamy 1 [E1], meaning both you and your Match only have the CA in your ancestry once. More on this later.

Modified Base Chart (Cousin Ancestors) [E2]

Now let’s modify the Base Chart and show you having two of the same Common Ancestor (A1 and A2) whose Great grandchildren married each other.

07D Fig 2

Explanation of Figure 2.

Columns 1-4 are similar those columns in Figure 1 with three important differences: (1) they are both on your side (2) the two 2C at Gen 3 marry each other, and (3) in Gen 4 the 440cM which was passed down from Gen 3 for each of A1 and A2 are shown, as well as that amount being combined into a total of 880cM for the single descendant (child) in Gen 4. In succeeding generations the DNA is halved at each generation.

Column 5 shows the net (combined) amount of DNA from A (A1 + A2) for the descendants of the Gen 3 marriage, starting in Gen 4. The net DNA is now twice as much as it was in Column 2 for Gen 4 in Figure 1.

Columns 6-7 are the same as Columns 3-4 in Figure 1.

Columns 8-9 have twice the values at each Gen compared to Figure 1. The shared DNA is now twice as much (by total and percentage).

Endogamy factor – With 2 identical Common Ancestors in your Tree, we have E2.

Important Note: When the DNA is passed from the Gen 3 parents (A1 and A2) to the Gen 4 child, the Gen 4 child gets the total DNA from A1 in various segments on one set of chromosomes (say the paternal side), and the DNA from A2 on the other set of chromosomes (the maternal side). There is no mix at this point. The various segments are subdivided, or not, and passed down normally. In the next generation, the Gen 4 child will recombine both chromosomes and pass the DNA to the Gen 5 child. There is a small probability that some segments from ancestors A1 and A2 may be exactly the same, but they would be on opposing chromosomes in Gen 4 and only one segment area could be passed on to Gen 5 child. There is a very small probability that separate, but adjacent, segments from A1 and A2 (on opposing chromosomes) could wind up adjacent again in Gen 5 child, and be “stitched together” to form a larger segment in Gen 5 from ancestor A than there was in Gen 4. Note that this very small probability can only happen in this one generation (the generation of a child with cousin parents passing DNA to his/her child; in this case Gen 4 to Gen5). In succeeding generations, all the segments for ancestor A are on one side, and can only be subdivided.

Key Findings

Total DNA – As it turns out, no matter where in your ancestry the cousins marry each other, their descendants will have twice the DNA from the Common Ancestor. It doesn’t matter if first cousins or fifth cousins marry, their descendants will carry twice the total Common Ancestor’s DNA (on average). And it doesn’t matter if cousins married recently or 6 generations back, their descendants will carry twice the Common Ancestor’s DNA. This simplifies the analysis a lot!

Shared DNAthe amount of shared DNA will double (with this E2 scenario). An E1 5C = 3.438cM (see Fig 1); an E2 5C = 6.875cM (see Fig 2)

Net effect – With E2 the shared DNA is equivalent to an additional “once removed” in the cousinship. A true 5C Match (normally sharing 3.438cM with E1), with E2 would look like a 4C1R (6.875cM)

Segment Sizes – Although, on average, the total DNA will be doubled, the various segments will not be larger, in general. For sure, the segment sizes are not doubled!

Modified Base Chart (3 Identical Common Ancestors) [E3]

Suppose you have three identical Common Ancestors (A1, A2 and A3) in your Tree. Usually this means two cousin marriages involving the same ancestor.

07D Fig 3

Explanation of Figure 3.

The columns are similar in function to that of Figure 2.

In Gen 3 two 2nd cousins, the highlighted descendants of A1 and A2, marry. Then in Gen 4, a child from this marriage, marries a descendant of A3, also highlighted.

Columns 2, 4 and 5 show the “half-amount” of DNA from ancestors A1, A2 and A3 that continues to add up in each generation (see Column 6). Note this is always the sum of respective portions from A1, A2 and A3, AND in Column 6 the net amount is halved in each succeeding generation.

Columns 9 and 10 show three times the total shared cM and total percent shared.

Endogamy factor – With 3 identical CAs in your Tree we have E3.

Modified Base Chart (2 Identical CAs plus 2 Identical CAs) [E4]

Let’s try an example with cousins in your Tree and cousins in your Match’s Tree. The process should be familiar now.

07D Fig 4

Explanation of Figure 4.

See previous Figures for explanations of the Columns.

As before, in Gen 3 two 2nd cousins in your Tree marry, and all succeeding total DNA is doubled.

In Gen 4 two 3rd cousins in your Match’s Tree marry, and all succeeding total DNA is doubled.

To get the shared DNA at Gen 5 we take the A1 DNA (220cM) compared to A3 DNA (220cM), and from Figure 1 we know this is 13.75cM, We then compare A1 to A4 and get 13.75cM; as is A2 to A3 and A2 to A4. So we have a total of 4 times 13.75cM or 55.0cM total shared. Here we have E2 on your side and E2 on your Match’s side.

Endogamy factor – E2 x E2 is E4.

Modified Base Chart (3 Identical CAs plus 2 Identical CAs) [E6]

So you might ask in the previous chart, do we add (E2 + E2 = E4) or multiply (E2 x E2 = E4)? Let’s resolve this in the following figure.

07D Fig 5

Explanation of Figure 5.

This is the reason why I continue to separately show the total contribution of DNA from each of the Ancestors (A1, A2, A3, A4, and A5 in this case). I don’t know how to compare 660cM and 440cM in Gen 5 to get the shared cM. But comparing these 5 ancestors in separate pairs means we can use shared values we already know from Figure 1. In this case, compare at 220cM for A1-A4, A1-A5, A2-A4, A2-A5, A3-A4 and A3-A5 – a total of 6 sharing comparisons. So we use E3 x E2 = E6.

Endogamy factor is E6; and we can multiply the 220cM-220cM share (13.75cM from Figure 1) by 6. Or 13.75cM x 6 = 82.5cM.

Common Ancestor is in only in each Tree once [E1]

What happens if we have lots of endogamy in our ancestry, but the Common Ancestor with a Match is not repeated in either Tree? Well we would not have any effects of endogamy. The Endogamy factor would be E1, and we’d use Figure 1. The multiplying effect of endogamy on shared DNA only comes into play when the Common Ancestor between you and a Match is repeated in your Tree or in your Match’s Tree.

Modified Base Chart (Common Ancestor is below Endogamy) [E1]

What happens if you and your Match have a Common Ancestor with lots of endogamy? In other words the Common Ancestor is the descendant of endogamy. The analysis of shared DNA is always done by starting with the Common Ancestor’s total DNA [7040cM, or 100%], and working down from there.

07D Fig 6

Explanation of Figure 6.

You can put as many identical Ancestors as you want in this chart (like A1 and A2 above, or the example in Figure 5). But to determine the shared DNA from a Common Ancestor, you must start with that ancestor – noted as B in Figure 6. In this example, ancestor B is only in your Tree once and your Match’s Tree once, notwithstanding the fact that B has multiple A ancestors. B is a separate, individual ancestor and the shared DNA from this B ancestor must be calculated with B as the base.

Endogamy factor is E1 in this case. There is no change in the amount or percentage of shared DNA with any cousin on Common Ancestor B in this case.

Summary Findings:

Total DNA in descendants of multiple Common Ancestors is multiplied by the number of CAs. It doesn’t matter how distant the marrying cousins are or where they are in your Tree. The number of Common Ancestors in a Tree determines the Endogamy factor – a CA in a Tree three times is E3, for example.

Shared DNA with a Cousin is multiplied by the Endogamy factors of you and your Match.

Endogamy only affects the shared DNA from the Common Ancestor between you and a Match.

  • General endogamy, or “population endogamy”, does not affect the shared DNA calculation, except as it applies to the specific CA.
  • Specific endogamy on Ancestors other than the Common Ancestor does not affect the shared DNA calculation.
  • Endogamy ancestral to the Common Ancestor with a Match does not affect the shared DNA calculation
  • If you know all 8 of your Great grandparents are different, and/or all 16 of your 2xGreat grandparents are all different, and/or can be sure (say by geography, ethnicity, etc.) that none of your 32 3xGreat grandparents are repeated as your ancestors, then your Endogamy factor would be E1 (use Figure 1) with any Match who is a cousin from one of these ancestors. If you are positive that any other more distant ancestor was in your Tree only once, the Endogamy factor is E1. However, you also need to consider the Endogamy factor of your Match.

Endogamy must be considered for both you and your Match.

  • Use an Endogamy factor, E, for each time the Common Ancestor is in your Tree and/or your Match’s Tree.
  • If the Common Ancestor is in a Tree only once the Endogamy factor is E1; twice E2; three times E3, etc.
  • Multiply to combine Endogamy factors from you and your Match. Examples: E1 x E1 = E1 (no endogamy); E4 x E2 = E8, and the total amount of shared DNA in Figure 1 for that Gen is multiplied by 8. An E8 5C would share 8 x 3.438 = 27.5cM, which would look like a 3C1R.

Perceived effect of endogamy is the equivalent of one additional “once-removed” for each additional CA involved. So a true 4C (usually sharing 13.75cM), would share 27.5cM with E2 and look like a 3C1R, or 55cM with E4 and look like a 3C. Referring to Figure 1 at the 4C level, we have 32 ancestors, and so does our Match. So to reach E4, both you and the Match would need to have the 2xGreat grandparent (CA) in your Tree twice, for example.

If all or much of your ancestry is in one “pool” of endogamy, the opportunity for large Endogamy factors is great. If various branches of your ancestry come from very different geographic areas or ethnicities, etc., the Endogamy factors will be smaller. You might want to examine various parts of your ancestry to see where endogamy might play a role. Endogamy means more shared DNA, which will also mean more Matches.

The size of shared DNA segments is not, generally, changed by endogamy. Certainly, endogamy does not double the size of shared segments.

Summary Thoughts

This has been an interesting drill for me (I’m sorry for all the tables and numbers).

This article is based on the calculated averages – “your results may vary”. I am certain that many of our Matches are in the 6th to 8th cousin range, and our shared DNA is based on both endogamy and the long “tails” on the cM distribution curves.

I hope this blogpost will help facilitate further discussion of endogamy in genetic genealogy.

 

07D Segment-ology: Endogamy I – Shared DNA by Jim Bartlett 20151202

* At www.isogg.org/wiki/CentiMorgan the atDNA totals are 6769cM at FTDNA; 7174cM at GEDmatch and 7075cM at 23andMe; and ISOGG uses 6800cM at www.isogg.org/wiki/Autosomal_DNA_statistics. Other sources have different totals.

Getting Started with Autosomal DNA Part I

So you are thinking about getting an autosomal (atDNA) test, but are not sure where to start. This blog post will walk you through several steps to help get you started.

An atDNA test will result in a list of Matches based on shared DNA. Almost all of these Matches are your cousins – most will be about 5th to 8th cousins, with some who are closer and some who are more distant. The DNA test will give you this list, and a way to contact your Matches; it’s up to you to share information with your Matches and determine your Common Ancestor(s).

BEFORE YOU TEST – UNDERSTANDING

  1. Determine your objectives. Write your own or choose from these:

A. ___Find new cousins
B. ___Prove your ancestral lines
C. ___Break down brick walls
D. ___Find biological parent(s) of yourself or some ancestor [see also DNAAdoption*]
E. ___Find out your deep ancestry
F. ___Form working groups of your Matches by Ancestor                                               [Triangulated Groups]
G. ___Determine which ancestor provided each part of your DNA                               [Chromosome Mapping]
H. ___Other ______________________________________
2. What to expect from your atDNA results.
Work. Your results will include a list of Matches – people who match your DNA. In general these Matches will be cousins. Generally very few will be close cousins (1st or 2nd cousins) – the bulk of them will be 5th to 8th cousins, or more. Some will have a Tree or Pedigree or list of surnames posted, but many will not. In general you will need to contact your Matches to determine your Common Ancestor. Adoptees and people with close brick walls will need to compare a lot of information from their Matches to develop common threads, and likely relationships. You need to be involved – your Tree is not magically filled out for you.
Ethnicity is a broad estimate. Your results will also include some estimates of your ethnicity or geographic ancestry. Since you only got part of your ancestor’s DNA, these estimates are generally correct, but not very precise.
Maybe unexpected results. When you take a DNA test and are compared with everyone else who has done the same, there is always the potential for a surprise. You may have an ancestor who is not the biological child of the parents you thought they were.
Genetic Genealogy Standards. This document is highly recommended for more information about DNA testing: http://www.geneticgenealogystandards.com/
DNA is just a tool – a very power full tool. You use it as part of your genealogy research, not in place of genealogy. A DNA test by itself cannot create your pedigree.

3. Understand the three types of DNA.
Y-DNA used to study an all-male line – the Y-DNA is passed from fathers to sons.
mtDNA used to study an all-female line – the mtDNA is passed from mothers to their children (sons and daughters).
atDNA used to study all of your ancestry – the atDNA is passed from male and female to their children. This test results in Matches from all of your ancestry. The Matches are your cousins – much more on this later. The atDNA does not include any Y-DNA or mtDNA, although Matches could just as easily be cousins from an all-male-line or all-female-line ancestor, as from any other line.

This post is all about atDNA

4. There are two fundamental levels for using atDNA:
Level 1. Genealogy only. At this level you just accept the list of Matches as your cousins, correspond and share with them to determine how you are related. You may often find that you are related several ways. This is plain and simple genealogy. Think of the DNA test as a filter that separates out only people who are related to you.
Level 2. Using the DNA. This level requires some amount of knowledge about how DNA works – how you got it, how your Match got it, what matching means and doesn’t mean, and some amount of jargon. Much more on all of this later. But if it gets too complex, or if you need a breather, just fall back on Level 1, and work with your Matches.

5. Select the company for your test. The base price is $99 at each of the three companies, and they have $10-$20 reductions several times a year. Each company offers a different mix of features. See http://www.isogg.org/wiki/Autosomal_DNA_testing_comparison_chart for a comprehensive and unbiased comparison matrix. Many folks have their favorite companies for different reasons – but this is my blog, so my thoughts include:

A. All three companies display a list of your Matches (people who share some DNA with you and are your cousins, in most cases), and a way to communicate with them. You can and should upload a GEDcom of your Ancestry to each site. They also offer an estimated relationship range for each Match – a range of relatedness (e.g. 3rd to 5th cousins). They show some ethnicity/geographic estimates (none are very precise because you only get part of each ancestor’s DNA). They also give you the ability to download your raw DNA data.

B. Family Tree DNA (FTDNA) – I think is the best all around. Almost all Matches are listed with real names, emails, and all DNA segment data – upfront and easily downloaded to a spreadsheet which you can use or print out. If you have Colonial American ancestry, you’ll probably get more than a thousand Matches. They also store your DNA and offer a range of other DNA tests. If you have elderly relatives, and want to preserve their DNA for future tests, this is the best site. The main drawback is the ability to compare your Matches with each other – this is mostly overcome with their InCommonWith utility. A good site for all the objectives above.

C. AncestryDNA – If you have Colonial American ancestry, you’ll probably get several thousand Matches. Some Matches have good Trees, some have small or no Trees; and some have Private Trees. To communicate with Matches you must use the Ancestry messaging system. Several of Ancestry’s key features include a Hint system which highlights ancestors in your Tree and your Match’s Tree which are the same, based on genealogy. They also provide a Shared Matches feature based on shared DNA; but they don’t provide any DNA segment data, which is essential for objectives B, D, F and G above. A good site for finding cousins, a poor site for working with DNA. Use this site if you don’t want to learn about DNA.

D. 23andMe – has the largest database (over one million customers, but they only list your top 2,000 Matches). There are no emails posted, so you have to use their messaging system to communicate. They have a utility to compare kits to each other, which is a key feature. Their Tree system is the hardest to use. A good site for all of the objectives above.

E. GEDmatch – is a third party site (free, but donations are encouraged) – you can upload your raw DNA data file from any of the above companies to GEDmatch, and compare among Matches who tested at the three companies above. They list the top 1,500 Matches in a One-to-Many utility; and let you compare any two Matches One-to-One. A great suite of other utilities, including Triangulation and several ethnicity/geography programs. I encourage you to upload to and use GEDmatch. You’ll get more Matches and DNA data.

F. My strong recommendation is to test at all 3 companies – each has a different database of potential Matches, and each offers different features. To save some money, you can test at AncestryDNA during a sale, and then upload (copy) your DNA data to FTDNA for $39, upload to GEDmatch (free), and also test at 23andMe to maximize your chances of finding good, close cousins. Balance this plan against your budget and your desire to test other close relatives (also recommended).

BEFORE YOUR RESULTS ARE POSTED

6. Develop a robust Tree of your Ancestors. By robust, I mean include as many Ancestors as you can, with place and date information, out to 12 generations or so. This is your “bait” when “fishing” for cousins. The atDNA test tells you a Match shares DNA with you – that they are probably a cousin. You have to compare ancestors with the Match to determine your Common Ancestor. If you both don’t have the Common Ancestor in your Tree, it’s very hard to find it. Most of your DNA Matches will be in the 5th to 8th cousin range, some more distant. You need a Tree that includes as much of your Ancestry as possible back to at least 10th cousin range, wherever possible. Few have actually done the detailed research to “prove” all of their ancestors back that far. My recommendation is to “borrow” from the research and Trees of others to fill out your own Tree as much as possible. I’d go so far as to also include “iffy” Ancestors at the tips of your Tree – ones you may not have researched or proved – these are better than blank spaces in your Tree. The objective here is to identify potential Common Ancestors. Then you and your Match (now a potential cousin) can compare notes to see how much documentation you each have.
Create a GEDcom of your robust Tree and upload it to each site where you’ve tested (FTDNA, AncestryDNA, and/or 23andMe); to GEDmatch; and/or to WorldConnect, WikiTree, FamilySearch, etc.

7. Develop a list of Patriarchs [optional, but very helpful] – make an alphabetical list of your ancestral surnames. Then add the most recognizable Patriarch (or Matriarch, if there is no Patriarch) with years and places. Keep each surname/Patriarch to one line, if possible. Some examples:
   CHILES, Col Walter II 1630-1671; John 1679-1723VA; dau Valentine 1719 Caroline Co, VA
   FISHER, George b c1742 PA; RevWar; descendants to Pendleton; then Harrison/Lewis Co, (W)VA
   HAMM, Stephen b c1737 Amherst Co, VA (on Stovall Creek) early 1700s through RevWar

8. Develop a Standard Message – This is a message you’ll send to all your Matches. It’s good to have a Standard Message (which you can tweak over time). You can just copy and paste it into an email or messaging system. This saves a lot of time. After this initial effort to contact each Match, you’ll want to personalize follow up messages.

Your message should include your real name and email; perhaps a very brief introduction, a link to your Tree of Ancestors, a request that your Match share their Tree with you.

An example (revise to suit your style):
Hi, I’m Jim Bartlett. I’ve been a genealogist since 1974. Most of my ancestry is from Colonial Virginia with one grandparent’s ancestry from Scotland and Germany in the mid-1800s. I’m willing to share my documentation. My goals include validating my ancestral lines and working through brick walls using DNA. My Public Tree: http://trees.ancestry.com/tree/20620230/family (I can send invite). Please share your Tree (best), pedigree, list of Patriarchs or surnames.

Ask if you have questions – I teach DNA for genealogists; see my atDNA How to Succeed list at: http://boards.rootsweb.com/topics.dnaresearch.autosomal/301/mb.ashx it has some good links at the end; one on Triangulation! Also see my blog: http://www.segmentology.org

Hope to hear from you, Jim Bartlett jim4bartletts@verizon.net

I have modified my introductory message many time – I’m now on version 23. And you can add in anything, whenever it is appropriate. An example would be something about a particular surname or location you see in a Match’s profile. The “boilerplate” is in your standard message, but you can modify it any time you want.

This blogpost will get you started, and let you order a test with some knowledge of what’s involved. The next blogpost in this series will be:

Getting Started with Autosomal DNA Part II

AFTER YOUR RESULTS ARE POSTED

 

01A Segment-ology: Getting Started with Autosomal DNA Part I – by Jim Bartlett 20151122

* https://groups.yahoo.com/neo/groups/DNAAdoption/info

Proof of Sticky Segments

Well… I should use “proof” in quotes, but the simulations below should show that we will always have some “sticky” segments which survive many generations. Technically, I suppose, a “sticky segment” is one that is passed from one ancestor to another intact. However, “sticky” is usually used in the sense of segments which pass through many generations, intact.

Here are the ground-rules I used for this analysis:

  1. Use a 200cM chromosome (about the size of chromosome 5, 6 or 7)
  2. Assume 2 crossovers per generation. By definition, there will be an average of one crossover in 100cM per generation. So, on average, there will be two crossovers per generation in 200cM. Sometimes there are one or three crossovers; and infrequently there are none or four (or more) crossovers. Since it more or less evens out, I’m using the average of two crossovers per generation to illustrate what happens over 10-20 generations. This avoids any bias on my part.
  3. Use a simplification. Each time there is a crossover, there is a switch from one ancestor’s DNA to another’s. However, the other ancestor’s DNA is subjected to the same one-crossover-per-100cM rule. So for the purposes of this discussion [about how segments are subdivided by crossovers, and not about which ancestor they come from], I will use a simplification: I’ll assume that the other ancestor’s DNA is exactly like the first ancestor’s DNA at that location, as far as crossovers are concerned. This means I’ll just continue to subdivide the segments in the initial 200cM segment, generation after generation. It’s a whole lot simpler and easier than starting with 2,048 chromosomes of different colors and keeping track of those. This simplification will give essentially the same result of segment subdivision, and is much easier to follow.
  4. Assign crossovers to the middle of the largest segments. The DNA is very random, but to A) avoid any bias on my part, and B) show the worst case scenario, I will assign the two crossovers in each generation to the middle of each of the two longest segments in the 200cM chromosome. Of course, in real life some crossovers will subdivide a small segment (leaving a different, larger segment intact for another generation); or subdivide a large segment into very unequal parts (leaving one smaller segment, plus a somewhat larger segment – it all evens out in the average).

So apply these rules to the 200cM chromosome in the figure below.

Figure 1. Tracking Crossovers for 24 Generations

07C Fig 1 Proof of Sticky Segment

Note the first two crossovers subdivide the segment into roughly three equal segments. As outlined in Rule 3 above, the center segment will really be from a different ancestor, but for the purpose of following the subdivision of segments in general, we will continue with all the segments – on average this gives the same result as far as subdivided segment sizes go. Note: the segment which are subdivided are highlighted in yellow.

In the next generation two of the largest segments are subdivided, but the third fairly large segment remains intact. This is because we only have (on average) two crossovers per generation, so one of the three large segments in generation 2 will not be subdivided.

Moving to generation 3 we see the largest (66cM) segment is now subdivided by one of the two new crossovers for this generation, and then a 33cM segment is subdivided by the other crossover for this generation.  And we still have two 33cM segments and one 34cM segment passed intact.

Continue this process of subdividing the two largest segments in half in each generation, until we get to generation 11, where all the segments are now 8 or 9cM except one that is still 16cM. This 16cM segment has remained intact in 6 generations! In fact there are two other 16cM segments that remained intact for 6 generations.

Will it always happens this way? No – the DNA is very random, and a different result will happen every time. But if one of those “sticky” segments had been subdivided in an earlier generation, some other segment would not have been subdivided, and that segment would have remained “sticky” for another generation.

The takeaway here is that there are only two crossovers per generation in a chromosome about 200cM. Those two crossovers can only subdivide two of the many segments in the chromosome. And because of this, there are some segments that will pass down intact, generation after generation.

In actual practice, there are hot spots on each chromosome where crossovers are more likely to occur. The effect of these hot spots is that some smaller segments around the hotspots will be subdivided more frequently, and some other segments will be missed more frequently – leaving us with even more “sticky” segments elsewhere.

After generation 11, our process starts to subdivide some segments into segments so small that they would not show up as a shared segment – they are below the standard thresholds (about 7cM) for a match. But notice in generations 12 through 22, that there are still above-threshold segments. Even in generation 23 there is still an 8cM segment which has survived intact for 14 generations! Remember: if this particular segment had been subdivided, then some other segment would have not been subdivided.

The point is we should expect “sticky” ancestral segments. Particularly in the 7-10cM range. They are actually quite common. Even “sticky” segments in the 10-20cM range are usual, even after 7-10 generations.

Now, we have not studied the probability of matches at these great distances – that’s a different, somewhat harder, discussion. The point here is that there are many “sticky” segments in our DNA. They may come from generations that are generally way beyond our genealogies. Also we should not be surprised when we see a parent and child with essentially the same 7-10cM segment being shared with a Match. It happens pretty frequently with close relatives.

Here is another example based on a 100cM chromosome (think chromosomes 19-22). For this chromosome there is only an average of one crossover per generation – only one segment will be subdivided in each generation. I tried to place them more randomly this time [you can easily try your own pencil & paper sketch of this simulation]. I generally picked on the largest segment. After 11 generations over half of the segments are still over the (7cM) threshold. And several segments 10cM or over have survived, intact, for a number of generations.

07C Fig 2 Tracking Crossovers in 100cMFigure 2. Tracking Crossovers in a 100cM Chromosome.

 

Summary

  1. Ancestral “sticky” segments in the 7-10cM range are normal. We will have Matches with these segments from time to time – and some of them may be fairly distant. But that’s another story.
  2. Some segments over 10cM will survive from 9 or 10 or more generations back – it’s normal and expected. Again, matching is a different calculation.
  3. The point is: we have many above-threshold segments from distant ancestors, back 10 generations and more!
  4. Since we have many above-threshold ancestral segments from distant ancestors, on every chromosome, we should expect to have shared segments with distant cousins.

 

07C Segment-ology: Proof of Sticky Segments by Jim Bartlett 20151116