Getting Started with Autosomal DNA Part I

So you are thinking about getting an autosomal (atDNA) test, but are not sure where to start. This blog post will walk you through several steps to help get you started.

An atDNA test will result in a list of Matches based on shared DNA. Almost all of these Matches are your cousins – most will be about 5th to 8th cousins, with some who are closer and some who are more distant. The DNA test will give you this list, and a way to contact your Matches; it’s up to you to share information with your Matches and determine your Common Ancestor(s).

BEFORE YOU TEST – UNDERSTANDING

  1. Determine your objectives. Write your own or choose from these:

A. ___Find new cousins
B. ___Prove your ancestral lines
C. ___Break down brick walls
D. ___Find biological parent(s) of yourself or some ancestor [see also DNAAdoption*]
E. ___Find out your deep ancestry
F. ___Form working groups of your Matches by Ancestor                                               [Triangulated Groups]
G. ___Determine which ancestor provided each part of your DNA                               [Chromosome Mapping]
H. ___Other ______________________________________
2. What to expect from your atDNA results.
Work. Your results will include a list of Matches – people who match your DNA. In general these Matches will be cousins. Generally very few will be close cousins (1st or 2nd cousins) – the bulk of them will be 5th to 8th cousins, or more. Some will have a Tree or Pedigree or list of surnames posted, but many will not. In general you will need to contact your Matches to determine your Common Ancestor. Adoptees and people with close brick walls will need to compare a lot of information from their Matches to develop common threads, and likely relationships. You need to be involved – your Tree is not magically filled out for you.
Ethnicity is a broad estimate. Your results will also include some estimates of your ethnicity or geographic ancestry. Since you only got part of your ancestor’s DNA, these estimates are generally correct, but not very precise.
Maybe unexpected results. When you take a DNA test and are compared with everyone else who has done the same, there is always the potential for a surprise. You may have an ancestor who is not the biological child of the parents you thought they were.
Genetic Genealogy Standards. This document is highly recommended for more information about DNA testing: http://www.geneticgenealogystandards.com/
DNA is just a tool – a very power full tool. You use it as part of your genealogy research, not in place of genealogy. A DNA test by itself cannot create your pedigree.

3. Understand the three types of DNA.
Y-DNA used to study an all-male line – the Y-DNA is passed from fathers to sons.
mtDNA used to study an all-female line – the mtDNA is passed from mothers to their children (sons and daughters).
atDNA used to study all of your ancestry – the atDNA is passed from male and female to their children. This test results in Matches from all of your ancestry. The Matches are your cousins – much more on this later. The atDNA does not include any Y-DNA or mtDNA, although Matches could just as easily be cousins from an all-male-line or all-female-line ancestor, as from any other line.

This post is all about atDNA

4. There are two fundamental levels for using atDNA:
Level 1. Genealogy only. At this level you just accept the list of Matches as your cousins, correspond and share with them to determine how you are related. You may often find that you are related several ways. This is plain and simple genealogy. Think of the DNA test as a filter that separates out only people who are related to you.
Level 2. Using the DNA. This level requires some amount of knowledge about how DNA works – how you got it, how your Match got it, what matching means and doesn’t mean, and some amount of jargon. Much more on all of this later. But if it gets too complex, or if you need a breather, just fall back on Level 1, and work with your Matches.

5. Select the company for your test. The base price is $99 at each of the three companies, and they have $10-$20 reductions several times a year. Each company offers a different mix of features. See http://www.isogg.org/wiki/Autosomal_DNA_testing_comparison_chart for a comprehensive and unbiased comparison matrix. Many folks have their favorite companies for different reasons – but this is my blog, so my thoughts include:

A. All three companies display a list of your Matches (people who share some DNA with you and are your cousins, in most cases), and a way to communicate with them. You can and should upload a GEDcom of your Ancestry to each site. They also offer an estimated relationship range for each Match – a range of relatedness (e.g. 3rd to 5th cousins). They show some ethnicity/geographic estimates (none are very precise because you only get part of each ancestor’s DNA). They also give you the ability to download your raw DNA data.

B. Family Tree DNA (FTDNA) – I think is the best all around. Almost all Matches are listed with real names, emails, and all DNA segment data – upfront and easily downloaded to a spreadsheet which you can use or print out. If you have Colonial American ancestry, you’ll probably get more than a thousand Matches. They also store your DNA and offer a range of other DNA tests. If you have elderly relatives, and want to preserve their DNA for future tests, this is the best site. The main drawback is the ability to compare your Matches with each other – this is mostly overcome with their InCommonWith utility. A good site for all the objectives above.

C. AncestryDNA – If you have Colonial American ancestry, you’ll probably get several thousand Matches. Some Matches have good Trees, some have small or no Trees; and some have Private Trees. To communicate with Matches you must use the Ancestry messaging system. Several of Ancestry’s key features include a Hint system which highlights ancestors in your Tree and your Match’s Tree which are the same, based on genealogy. They also provide a Shared Matches feature based on shared DNA; but they don’t provide any DNA segment data, which is essential for objectives B, D, F and G above. A good site for finding cousins, a poor site for working with DNA. Use this site if you don’t want to learn about DNA.

D. 23andMe – has the largest database (over one million customers, but they only list your top 2,000 Matches). There are no emails posted, so you have to use their messaging system to communicate. They have a utility to compare kits to each other, which is a key feature. Their Tree system is the hardest to use. A good site for all of the objectives above.

E. GEDmatch – is a third party site (free, but donations are encouraged) – you can upload your raw DNA data file from any of the above companies to GEDmatch, and compare among Matches who tested at the three companies above. They list the top 1,500 Matches in a One-to-Many utility; and let you compare any two Matches One-to-One. A great suite of other utilities, including Triangulation and several ethnicity/geography programs. I encourage you to upload to and use GEDmatch. You’ll get more Matches and DNA data.

F. My strong recommendation is to test at all 3 companies – each has a different database of potential Matches, and each offers different features. To save some money, you can test at AncestryDNA during a sale, and then upload (copy) your DNA data to FTDNA for $39, upload to GEDmatch (free), and also test at 23andMe to maximize your chances of finding good, close cousins. Balance this plan against your budget and your desire to test other close relatives (also recommended).

BEFORE YOUR RESULTS ARE POSTED

6. Develop a robust Tree of your Ancestors. By robust, I mean include as many Ancestors as you can, with place and date information, out to 12 generations or so. This is your “bait” when “fishing” for cousins. The atDNA test tells you a Match shares DNA with you – that they are probably a cousin. You have to compare ancestors with the Match to determine your Common Ancestor. If you both don’t have the Common Ancestor in your Tree, it’s very hard to find it. Most of your DNA Matches will be in the 5th to 8th cousin range, some more distant. You need a Tree that includes as much of your Ancestry as possible back to at least 10th cousin range, wherever possible. Few have actually done the detailed research to “prove” all of their ancestors back that far. My recommendation is to “borrow” from the research and Trees of others to fill out your own Tree as much as possible. I’d go so far as to also include “iffy” Ancestors at the tips of your Tree – ones you may not have researched or proved – these are better than blank spaces in your Tree. The objective here is to identify potential Common Ancestors. Then you and your Match (now a potential cousin) can compare notes to see how much documentation you each have.
Create a GEDcom of your robust Tree and upload it to each site where you’ve tested (FTDNA, AncestryDNA, and/or 23andMe); to GEDmatch; and/or to WorldConnect, WikiTree, FamilySearch, etc.

7. Develop a list of Patriarchs [optional, but very helpful] – make an alphabetical list of your ancestral surnames. Then add the most recognizable Patriarch (or Matriarch, if there is no Patriarch) with years and places. Keep each surname/Patriarch to one line, if possible. Some examples:
   CHILES, Col Walter II 1630-1671; John 1679-1723VA; dau Valentine 1719 Caroline Co, VA
   FISHER, George b c1742 PA; RevWar; descendants to Pendleton; then Harrison/Lewis Co, (W)VA
   HAMM, Stephen b c1737 Amherst Co, VA (on Stovall Creek) early 1700s through RevWar

8. Develop a Standard Message – This is a message you’ll send to all your Matches. It’s good to have a Standard Message (which you can tweak over time). You can just copy and paste it into an email or messaging system. This saves a lot of time. After this initial effort to contact each Match, you’ll want to personalize follow up messages.

Your message should include your real name and email; perhaps a very brief introduction, a link to your Tree of Ancestors, a request that your Match share their Tree with you.

An example (revise to suit your style):
Hi, I’m Jim Bartlett. I’ve been a genealogist since 1974. Most of my ancestry is from Colonial Virginia with one grandparent’s ancestry from Scotland and Germany in the mid-1800s. I’m willing to share my documentation. My goals include validating my ancestral lines and working through brick walls using DNA. My Public Tree: http://trees.ancestry.com/tree/20620230/family (I can send invite). Please share your Tree (best), pedigree, list of Patriarchs or surnames.

Ask if you have questions – I teach DNA for genealogists; see my atDNA How to Succeed list at: http://boards.rootsweb.com/topics.dnaresearch.autosomal/301/mb.ashx it has some good links at the end; one on Triangulation! Also see my blog: http://www.segmentology.org

Hope to hear from you, Jim Bartlett jim4bartletts@verizon.net

I have modified my introductory message many time – I’m now on version 23. And you can add in anything, whenever it is appropriate. An example would be something about a particular surname or location you see in a Match’s profile. The “boilerplate” is in your standard message, but you can modify it any time you want.

This blogpost will get you started, and let you order a test with some knowledge of what’s involved. The next blogpost in this series will be:

Getting Started with Autosomal DNA Part II

AFTER YOUR RESULTS ARE POSTED

 

01A Segment-ology: Getting Started with Autosomal DNA Part I – by Jim Bartlett 20151122

* https://groups.yahoo.com/neo/groups/DNAAdoption/info

Proof of Sticky Segments

Well… I should use “proof” in quotes, but the simulations below should show that we will always have some “sticky” segments which survive many generations. Technically, I suppose, a “sticky segment” is one that is passed from one ancestor to another intact. However, “sticky” is usually used in the sense of segments which pass through many generations, intact.

Here are the ground-rules I used for this analysis:

  1. Use a 200cM chromosome (about the size of chromosome 5, 6 or 7)
  2. Assume 2 crossovers per generation. By definition, there will be an average of one crossover in 100cM per generation. So, on average, there will be two crossovers per generation in 200cM. Sometimes there are one or three crossovers; and infrequently there are none or four (or more) crossovers. Since it more or less evens out, I’m using the average of two crossovers per generation to illustrate what happens over 10-20 generations. This avoids any bias on my part.
  3. Use a simplification. Each time there is a crossover, there is a switch from one ancestor’s DNA to another’s. However, the other ancestor’s DNA is subjected to the same one-crossover-per-100cM rule. So for the purposes of this discussion [about how segments are subdivided by crossovers, and not about which ancestor they come from], I will use a simplification: I’ll assume that the other ancestor’s DNA is exactly like the first ancestor’s DNA at that location, as far as crossovers are concerned. This means I’ll just continue to subdivide the segments in the initial 200cM segment, generation after generation. It’s a whole lot simpler and easier than starting with 2,048 chromosomes of different colors and keeping track of those. This simplification will give essentially the same result of segment subdivision, and is much easier to follow.
  4. Assign crossovers to the middle of the largest segments. The DNA is very random, but to A) avoid any bias on my part, and B) show the worst case scenario, I will assign the two crossovers in each generation to the middle of each of the two longest segments in the 200cM chromosome. Of course, in real life some crossovers will subdivide a small segment (leaving a different, larger segment intact for another generation); or subdivide a large segment into very unequal parts (leaving one smaller segment, plus a somewhat larger segment – it all evens out in the average).

So apply these rules to the 200cM chromosome in the figure below.

Figure 1. Tracking Crossovers for 24 Generations

07C Fig 1 Proof of Sticky Segment

Note the first two crossovers subdivide the segment into roughly three equal segments. As outlined in Rule 3 above, the center segment will really be from a different ancestor, but for the purpose of following the subdivision of segments in general, we will continue with all the segments – on average this gives the same result as far as subdivided segment sizes go. Note: the segment which are subdivided are highlighted in yellow.

In the next generation two of the largest segments are subdivided, but the third fairly large segment remains intact. This is because we only have (on average) two crossovers per generation, so one of the three large segments in generation 2 will not be subdivided.

Moving to generation 3 we see the largest (66cM) segment is now subdivided by one of the two new crossovers for this generation, and then a 33cM segment is subdivided by the other crossover for this generation.  And we still have two 33cM segments and one 34cM segment passed intact.

Continue this process of subdividing the two largest segments in half in each generation, until we get to generation 11, where all the segments are now 8 or 9cM except one that is still 16cM. This 16cM segment has remained intact in 6 generations! In fact there are two other 16cM segments that remained intact for 6 generations.

Will it always happens this way? No – the DNA is very random, and a different result will happen every time. But if one of those “sticky” segments had been subdivided in an earlier generation, some other segment would not have been subdivided, and that segment would have remained “sticky” for another generation.

The takeaway here is that there are only two crossovers per generation in a chromosome about 200cM. Those two crossovers can only subdivide two of the many segments in the chromosome. And because of this, there are some segments that will pass down intact, generation after generation.

In actual practice, there are hot spots on each chromosome where crossovers are more likely to occur. The effect of these hot spots is that some smaller segments around the hotspots will be subdivided more frequently, and some other segments will be missed more frequently – leaving us with even more “sticky” segments elsewhere.

After generation 11, our process starts to subdivide some segments into segments so small that they would not show up as a shared segment – they are below the standard thresholds (about 7cM) for a match. But notice in generations 12 through 22, that there are still above-threshold segments. Even in generation 23 there is still an 8cM segment which has survived intact for 14 generations! Remember: if this particular segment had been subdivided, then some other segment would have not been subdivided.

The point is we should expect “sticky” ancestral segments. Particularly in the 7-10cM range. They are actually quite common. Even “sticky” segments in the 10-20cM range are usual, even after 7-10 generations.

Now, we have not studied the probability of matches at these great distances – that’s a different, somewhat harder, discussion. The point here is that there are many “sticky” segments in our DNA. They may come from generations that are generally way beyond our genealogies. Also we should not be surprised when we see a parent and child with essentially the same 7-10cM segment being shared with a Match. It happens pretty frequently with close relatives.

Here is another example based on a 100cM chromosome (think chromosomes 19-22). For this chromosome there is only an average of one crossover per generation – only one segment will be subdivided in each generation. I tried to place them more randomly this time [you can easily try your own pencil & paper sketch of this simulation]. I generally picked on the largest segment. After 11 generations over half of the segments are still over the (7cM) threshold. And several segments 10cM or over have survived, intact, for a number of generations.

07C Fig 2 Tracking Crossovers in 100cMFigure 2. Tracking Crossovers in a 100cM Chromosome.

 

Summary

  1. Ancestral “sticky” segments in the 7-10cM range are normal. We will have Matches with these segments from time to time – and some of them may be fairly distant. But that’s another story.
  2. Some segments over 10cM will survive from 9 or 10 or more generations back – it’s normal and expected. Again, matching is a different calculation.
  3. The point is: we have many above-threshold segments from distant ancestors, back 10 generations and more!
  4. Since we have many above-threshold ancestral segments from distant ancestors, on every chromosome, we should expect to have shared segments with distant cousins.

 

07C Segment-ology: Proof of Sticky Segments by Jim Bartlett 20151116

Segment Size vs Cousinship Chart Needed

We need a one-page chart that shows the empirical cM values found for various relationships. We know the theoretical, or calculated values, but the randomness of DNA results in a fairly wide range in some cases – particularly for distant cousins.

The chart below shows my guess as to what a chart might look like. The x-axis is cMs on a logarithmic scale. The y-axis is % of all the values for each cousinship (the number of results at each cM value divided by the total number of results for that cousinship – which would normalize the chart for different total number of results for each cousinship). The area under each curve would be 100% of all results. The roughly normal distribution curves are “centered” on the calculated cM values for each cousinship. Based on experience we know that first cousins (1C) tend to share segments with cM values relatively close to the calculated value of 880cMs, producing a tall thin curve (I think); whereas 5C (calculated average 3.4cM) or 6C (calculated average 0.8cM) must have long cM “tails” on this chart in order for us to “see” the shared segment with Matches which are above a 7cM threshold, producing a short wide curve (I think).

Note in this hypothetical chart, the small red dots at the end of some tails were taken from the data compiled by Blaine Bettinger (who did a great service to us all by compiling and reporting this data), which can be found at:

The Shared cM Project – An Update

We need this data displayed this way so we can easily enter with a shared cM value on the x-axis and see the range of cousinships possible. This would quickly show which cousinship is most probable, and how close, or far, other cousinships would be.

As I now think about it, at any cM value on the x-axis, wouldn’t the sum of the values of all the curves have to equal 100%? But to achieve that, we’d have to include all the possible curves, including siblings, half siblings, half double second cousins twice removed, etc., which is probably impractical at this point. I’d rather see the chart soon with the cousinships shown below, than wait a long time for the perfect chart.

Another thought is to blow up the part of the chart from, say, 5cM to 50cM. This would be fairly simple once the data is collected.

Still another observation is that if this chart were based on all collected data, data based on endogamous shared segments would generally be shifted a little more to the right; and data based on non-endogamous shared segments would generally be shifted a little more to the left.

BE CAREFUL – THE CHART BELOW IS A THEORETICAL GUESS (with only a few valid data points)

06B Figure cM vs pct for Cousins 1

06B Segment-ology: Segment Size vs Cousinship Chart Needed;  Jim Bartlett 20151106

Does Triangulation Work?

Sure it does! Triangulation is a tool to use with autosomal DNA. Let’s see how it might work:

  1. Does it work in grouping your shared segments?
  2. Does it work in culling out IBS segments?
  3. Does it work to define and map your ancestral segments?
  4. Does it work to insure that all Matches in a Triangulated Group have an IBD segment?
  5. Does it work in identifying Matches who all share the same Common Ancestor?
  6. Does it work for any size segments? – see more at: Does Triangulation Always Work?

The Big Picture

Let’s start with the Big Picture.  We take an atDNA test and the company reports a list of our Matches. We can also get Matches by uploading our raw DNA data to GEDmatch. Each of the companies compares our raw DNA data to that of all the others in their database, and uses their proprietary matching algorithm to generate a list of Matches. At 23andMe, FTDNA and GEDmatch, they also provide the shared segment information (Chromosome, Start Location, End Location, cMs, and SNPs) for each shared segment. For this discussion I’m only going to be talking about segments over 7cM, just to avoid any debate about smaller segments. Each of the companies have pluses and minuses that go along with their matching algorithm, but we are going to go with the list of Matches they provide to us.

So this is the data we want to work with using the Triangulation tool.

Ancestral vs Shared segments

Please re-read “What Is a Segment?” to recall there are ancestral segments – ones you get from an ancestor – located completely on one of your chromosomes; and there are shared segments – ones that the computer algorithm determines by comparing data on both your chromosomes with data on both chromosomes of another person.

Shared segments are either IBD or not-IBD (IBS)

Most of these shared segments are IBD – meaning they come from a Common Ancestor – common to you and your Match. Some of the shared segments are IBS – meaning they don’t come from a Common Ancestor; they are segments made up by the computer algorithm. We cannot tell which is which by just looking at the one shared segment. ISOGG has a very good wiki article about IBD and non-IBD (IBS) segments. The bottoms lines are:

  1. Shared segments (also called matching segments or Half-Identical Regions (HIRs)) 15cM and greater are IBD virtually 100% of the time.
  2. Shared segments under 5cMs should generally not be used in genealogical analyses [and in this post we are not considering shared segments under 7cM].

So for this blog post we will focus on shared segments from 7cM to 15cM as reported by the companies. And note that each of these segments is either IBD or IBS.

Triangulation Criteria

For Triangulation we find three sets of shared segments which match each other. This usually means you and two Matches have shared segments which overlap at least 7cM, AND the two Matches share a segment which overlaps the same area at least 7cM. This means all three of you have the same, long string of SNPs in the same location. This is Triangulation.

Usually Triangulated Groups (TGs) include more than just you and two Matches. They may include 5, 10, 20 or more Matches. Each TG includes all of the shared segments, and these triangulated segments determine the start and end locations of the TG, such that the TG includes them all.

My Experience with Triangulated Groups

I have over 5,000 different Matches in my spreadsheet, with perhaps 6,000 separate shared segments over 7cM. As a result of the Triangulation process, these shared segments have been placed into 4 categories:

  1. A Triangulated Group on my Dad’s side.
  2. A Triangulated Group on my Mom’s side.
  3. An IBS group (these segments overlap, but do not match, TGs on either side)
  4. Undetermined as yet

The TGs above cover 90% of my 45 chromosomes, and define 340 separate TG segments on my DNA. Most of the TGs are heal-and-toe (adjacent) to each other on each chromosome, with only a few gaps. All of my shared segments either “fit” into (overlap within) one of these TGs or they are IBS (or they are undetermined).

TGs Form a Chromosome Map

The key point here is that these TGs map my chromosomes into specific segments. Each of these specific segments comes from an Ancestor. Similarly your chromosomes are divided into specific segments, defined by crossover points from each generation. Re-read Bottom-up and Top-Down for a refresher on how crossover points and segments are formed. Within each such ancestral segment, defined by start and end locations, each of us will have a continuous string of SNPs – usually thousands of them. And each such ancestral segment comes down a specific path from a specific ancestor to us. On this point endogamy does not matter. In the bizarre extreme, all of your ancestors in one generation could be the same man and woman, but each of your ancestral segments only came from only one place in your Tree, and down one path to you.

IBS Segments

 Most of your shared segments will be IBD and will form TGs. But some shared segments will overlap the segments of a TG but they won’t match any of them. On either side. These shared segments are clearly IBS. If they were IBD – from an ancestor – they would match overlapping segments in a TG.

Are All Segments in a TG IBD?

So one of the arguments for Triangulated Groups is that if three people all match each other on the same segment, the shared segments must be IBD. We have three pairs of matches, each pair with the same long string of SNPs. We have all three companies with proprietary algorithms that try to insure their Matches are IBD. We have TGs that are mapped on our chromosomes, and know that some ancestor provided that segment. It sure looks like the shared segments in these TGs have the same SNPs that our ancestors passed down to us. This is even more compelling when there are several, or more, shared segments which Triangulate and form a TG. But are we sure every segment in a TG is IBD?  Read on for some possible exceptions.

Some Areas to Look Out for:

  1.  If you have only one Match (Match1) who matches a number of your close relatives in an apparent TG: You and Match1 might share an IBS segment (based on Match1’s segment being false); and then Match1 may well match all of your close relatives who have the same ancestral segment you have. For this reason observe the caution that TGs should be formed with widely separated cousins – the wider the better. Another test is to find other Matches (not closely related to Match1 who Triangulate with you and see if they match Match1. All Matches in a TG, who overlap enough, should match each other. If they do not, then an analysis should be done to weed out any potential IBS segments.
  2. You match several other Matches who are closely related to each other in an apparent TG: you might have a false segment and may well match all of the Matches who share the same good ancestral segment. As in the previous paragraph, it’s important to form a TG with widely separated cousins. The test here is to look for other overlapping Matches for this segment area. If this is an IBS TG, the other Matches will not also match the Match family. Also, you do have a true ancestral segment for each area of your chromosomes. If several related Matches all match you in one segment area, and your segment is false with them, you should be able to form two other TGs (one from each parent) based on your ancestral segments compared with other Matches.
  3. Another argument used to debunk Triangulation, is endogamy. The theory here is that due to endogamy – some of our ancestors being the same person – the same ancestral segments are floating around and TGs may be formed with different ancestors. In theory, this is possible – in practice it is improbable. In the first place, endogamy means the two ancestors who are the same person actually had a Common Ancestor. So in fact the TG shared segment really did come from Common Ancestor, several more generations back. With each generation going back, the probability of a match is divided by 4, or 16 for the two generations involved in a first cousin endogamy. Clearly it is much more likely that our Matches in a TG are from a closer cousinship.

Also, based on my chromosome map, the ancestral segment I got for each TG is from a specific ancestor, down a specific line of descent to me. It has a specific string of SNPs that we generally think of as unique. Is it possible, in the 7-15cM range, for a Match to have exactly the same string of SNPs from a different ancestor? With random DNA almost anything is possible, but the premise of autosomal DNA is that this would be very rare. If it did occur, the shared segment would technically be IBS, because it was not identical because of descent from a Common Ancestor. But, we might have to leave the door open for this possibility.

Back to a Big Picture Thought

The number of people taking an atDNA test is about doubling every 12 months. If this continues, I’ll have 10,000 Matches, with shared segments, by this time next year; and 20,000 Matches by the end of 2017. My chromosome map has pretty much been determined (I am now focused on determining the correct Common Ancestor for each TG). A doubling of Matches means a doubling of each TG every year. The point is that if we assume 80-90% of our Match segments are IBD (I actually believe it’s closer to 95%), all of those IBD segments are being added to my existing TGs. Couple this with the fact that most of our Matches are beyond 5th cousins (I believe most of our Matches are actually 6-8th cousins, and some beyond). Even if a few of the Matches in our TGs turn out to be IBS, we are still getting a great influx of true cousins into our TGs.

So to summarize:

  1. Do TGs work to group your Matches? Sure! Instead of the long list of miscellaneous Matches reported by the companies, you can form Triangulated Groups. See Benefits of Triangulation.
  1. Do TGs work to cull out IBS segments? Sure! Many of your 7-15cM segments will not triangulate with any overlapping TG, indicating those “shared” segments are probably IBS. As noted above, not all of the IBS segments may be identified this way, but many (I think most) will. This is progress – it’s an improvement over the list you get from the companies.
  1. Do TGs work to define and map your ancestral segments? Absolutely! It’s hard work, but an easy mechanical process to define the TGs with start/end locations; and only a little genealogy with known relatives is needed to assign them to maternal and paternal sides.
  1. Do TGs work in insuring all Matches in a TG have an IBD segment? Almost all of the time, and there are ways to find and test suspicious shared segments.
  1. Do TGs work in insuring all the Matches in a TG share the same Common Ancestor? This is a tough one because it’s not possible to rule out some outliers. As noted above, if you carefully form the TGs, the Matches should come from the same Common Ancestor. We have lots of examples of Matches in TGs who do share the same CA. It’s very hard to prove that an IBD segment is really from a different ancestor; and I haven’t seen a single case of it so far.

Your ancestral segment in each TG does come from a specific ancestor of yours, and your cousins from that Ancestor with that segment will match you on it in that TG. As several of us have suggested, to determine the true Ancestor for a TG, you need to “walk the segment back.” This means finding cousins at various levels in each TG – a 2nd cousin, a 4th cousin, and a 6th cousin who all have the same segment and ancestral line. This is often hard, but the number of people taking an atDNA test is doubling annually, and more of these intermediate cousins will gradually show up in our Match lists and TGs.

Bottom line for me: Triangulation is a powerful tool.

 

11B Segment-ology: Does Triangulation Work? by Jim Bartlett 20151019

Pile-ups

What’s all the buzz about “pile-ups”?  In my mind there are three kinds of pile-ups: small, medium and large. They are different, so it’s important to understand each one. In this case Goldilocks should prefer the large pile-ups, but let me go through my views of all three kinds.

Alert: This post contains my opinions about small pile-ups and AncestryDNA (based on my own experience) so you should make your own judgments.

Background

I think the two keys to success with autosomal DNA lie in a robust Tree (as many ancestors out to 13 generations as possible) and as many Match-segments as possible (including as many close relatives as you can get). I spent about a year expanding my Tree as best as I could, and then posted that GEDcom in several places. I’ve tested at all three companies and use GEDmatch.  I put every single shared segment I can find over 7cM into my spreadsheet, and I periodically run a Quality Control check against a fresh download to pick up any missed Matches or segments. I currently have 5,000 different individuals with segment data in my spreadsheet, and have determined a Common Ancestor (CA) with 309 of them.

I have compared virtually every segment against other overlapping segments, and formed Triangulated Groups (TGs) that cover over 90% of my 45 chromosomes. It is now rare for me to get a new shared segment that changes my chromosome map in any way. This process has provided some insights on medium and large pile-ups.

Pile-ups

My definition of pile-up sizes:

  1. Small is smaller than 5cM
  2. Medium is 5-10cM
  3. Large is greater than 10cM

Small pile-ups – by my definition, these pileups are composed almost entirely of IBS shared segments. When AncestryDNA first rolled out their autosomal DNA test, their threshold was 5Mbp. This threshold included many shared segments well below 5cM, and resulted in many thousands of bogus Matches. To their credit, they provided a caution about these. When AncestryDNA revised their threshold to 5cM, many of these Matches went away. Part of their explanation was the elimination of “pile-ups”.  I agree that these “small pile-ups” should be eliminated. And when they reset their threshold to 5cM, that should have eliminated this problem. However, their explanations continue to stress the elimination of “pile-ups”. I just hope they don’t also toss out Matches in larger pile-ups – throwing the baby out with the bath water.

Medium pile-ups – 5-10cM range. As I gathered as many segments over 5cM as I could and sorted them in my spreadsheet, I noticed a few areas that had many such segments, all in a very narrow chromosome area. Very clearly a pile-up! Virtually none of them matched each other, although they had almost the same segment start/end locations. And there were a lot of them – many more than in large TGs.

In discussions on various email lists, we compared notes, and found that most of these areas were unique to our own experience. In general they were not due to some common feature of most human genomes. A notable exception to this blanket statement is the HLA Region on Chromosome 6 – roughly from 29.8 to 33.1Mbp.

However, most of the other areas were not tied to known issues like the HLA Region. In my analysis, it was not possible for me to link these to one parental side or the other. The fact that these areas include so many IBC segments indicates to me that it’s the combination of both of my chromosomes (maternal and paternal) that allows the “matches”. It’s the unique combination of alleles in these small stretches of DNA that make matching much easier. And this unique combination is only in my genome. On chromosome 18, I have 307 segments in the 7 to 11 cM range. They are all in a very tight area:  from location 5,800,000 to 8,700,000bp.  Very few of them triangulate.

Sometimes the pile-up area has been documented. On chromosome 15, I have 281 segments in the 7 to 10cM range. They are at: 24,000,000 to 28,000,000 bp. This area partly overlaps a known pile-up area (20,100,000 to 25,200,000). But the known pile-up area is only partly the cause in my case. See 14 small pile-up areas found by Li et al (2014), listed at the ISOGG Wiki: http://www.isogg.org/wiki/Identical_by_descent These medium pile-up areas, and a few others in my experience, are characterized by a very tall pile-up of many segments about the same size in a narrow area just a little larger than the segments. The Li et al (2014) article refers to “regions where excess IBD is detected…” Virtually all of the segments I have noted above are IBS/IBC – they do NOT triangulate with the other segments.  A few segments in these regions do triangulate with known close relatives, and each other. I’ve kept those segments in maternal and paternal TGs, as appropriate, covering that area. After all, both my mother and father gave me those areas, and they in turn got them from their parents, etc.  It is very probable that these segments are IBD and come from a CA.

My experience is that these are areas with a lot of shared segments in the 7-10cM range that are in a tight area, usually just 10cM wide, and a very high proportion of these segments are IBS/IBC.  A few segments in these areas will be IBD, but they will tend to be larger than the 7-10cM segments.

My bottom line for these pile-ups: Unless you have a lot of free time, skip over these areas – particularly the shared segments under 10cM. Concentrate on triangulating any larger segments in these areas and then move on to other areas.

Large pile-ups – these are my favorites. Larger shared segments (over 10cM) that spread out and overlap each other over wider areas.  These segments tend to triangulate with each other, forming TGs on both sides.  I have some of these TGs which include over 50 shared segments.  Since the shared segments triangulate with each other, this is a good pile-up. These TGs are large because more people have these shared segments – probably because the Common Ancestors had large families in Colonial America, leaving us with many, many cousins. Another reason could be a more distant Common Ancestor, who would also leave us a large number of cousins.

In some cases we can use this observation to our advantage. I have a 2nd cousin, on his mother’s side, who is also an 8th cousin, on his father’s side. Our close Common Ancestor was an immigrant to the US in the mid-1800s, and I get relatively few Matches on the segments I share with him. However, on one segment, we have many Matches – it turns out our Common Ancestor is on his father’s side. The tip-off should have been the size of the TG (measured by the number of Matches).

Another observation about large pile-ups…. They will get larger. The number of folks taking an atDNA test is about doubling every 12 months. A consequence of this is that all of our TGs will also double in the next 12 months. So, if you have pile-ups now, they will about double by this time next year. Use these larger TGs to your advantage – work with the Matches to investigate place/time matches, if a Common Ancestor is not easily determined.

Summary

  1. In general, don’t work with shared segments below 5cM. Most are IBS/IBC – even if they appear to triangulate. We don’t have a good test below 5cM to indicate IBD.
  2. Watch for, and avoid, pile-ups in the 5-10cM range. These are characterized by many shared segments in the 5-10cM range in a very tight location- usually only 10 or 11cM wide. Move on to larger shared segments in other locations.
  3. Embrace the large pile-ups. They may from Common Ancestors with large families and/or more distant Common Ancestors. In either case, work with the Matches in these TGs as a Team to determine the Common Ancestor.

18 Segment-ology: Pile-ups by Jim Bartlett 20151007

Anatomy of an IBS segment

This is a guest blog post by Dr. Ann Turner, who has been a great mentor for me.

Anatomy of an IBS segment

 Ann Turner

DNACousins@gmail.com

October 1, 2015

Jim Bartlett, my host for this blog post, shares a 7.8 cM segment at 23andMe with my nephew Larry. This was a serendipitous find, for Jim broke down a brick wall for me with records from an orphan’s court. In turn, I provided a solution to a minor mystery for Jim – where did John Henry go when he disappeared from Frederick County, Virginia?

That discovery was back in 2011, before we had developed much in the way of techniques to analyze segment data. There was one troubling aspect:  Jim did not match my sister (or her husband, either). This could be explained away if there was a false negative in my sister. Fast forward to 2015. Jim’s intensive work on triangulated segments has filled in the section containing Larry’s segment with more cousins. Larry did not match anyone on either one of Jim’s chromosomes.

Is it possible that this match was not Identical by Descent (IBD), but just Identical by State (IBS)?

A Terminology Detour

The terms “Identical by Descent” and “Identical by State” predate their application to segmentology, Jim’s felicitous term for analyzing autosomal DNA. The glossary in Human Evolutionary Genetics[1] contrasts the two phrases:

Identity by Descent: Property of alleles in an individual or in two people that are identical because they were inherited from a common ancestor; as opposed to identity by state

Identity by State: Property of alleles in an individual or in two people that are identical because of coincidental mutational processes, and not because they were inherited from a common ancestor (identity by descent)

In effect, “identical” is the more general word, and the phrase describes two mutually exclusive ways of achieving identity – BY state or BY identity.

Also, the definition is about alleles, alternative versions of a single marker. There are examples in genetic genealogy when we look at the type of DNA that follows one line, the straight paternal line or the straight maternal line.

For the Y chromosome, the ancestral haplotype may sometimes be deduced from multiple lines of descent. The question then becomes whether a variation on the theme marks a specific line: does the fact that two individuals both share a one-step difference from the ancestral haplotype on DYS19 mean that they have identified a branch tag to a more recent common ancestor (the mutation is identical by descent), or did the mutation occur independently in two different lines of descent (the mutation is identical by state)? The mutation rate is high enough that either explanation could hold true.

For mtDNA, there are certain hotspots where a mutation is not a reliable indicator for defining haplogroups or even genealogical relationships. A mutation 16519C has occurred independently hundreds of times in different haplogroup subclades, and insertions at 309.1C (and 309.2C) are frequent enough that even siblings are known to differ.

Adapting the two terms IBS and IBD for segmentology stretches the original context to include regions of the genome, not just single markers. Furthermore, the mutation rate for autosomal DNA is orders of magnitude lower than Y-STRs or mtDNA. Differences in two autosomal markers are not likely to be due to a recent mutation.

With this shift to testing multiple autosomal markers, some authors began to employ the phrase Identical by State as the broader concept. Then some, but not all, IBS regions would also be Identical by Descent. That leaves a vacuum – what should we call regions that are IBS but not IBD? Charles Brenner created his own term, which is not particularly evocative but illustrates the frustrating dilemma:

“Identical by state” (IBS) as used here is synonymous with “identical”, an umbrella meaning in that IBS  thus includes IBD as a subset. Adopting the umbrella definition for IBS means some other term may be needed to mean IBS but not IBD and for this purpose I use the word “strict.”[2]

Indeed, it appears that many technical articles avoid the term IBS entirely. A search of Google Scholar  yields 17,100 citations for Identical (or Identity)  by Descent but only 4,700 citations for Identical (or Identity) by State. Scanning a small sample of those articles reveals that they often describe a segment as IBD or “not IBD”, period.

My personal preference is to hew to the original concept, where identity is the broader, more general term. It avoids the awkward need for a special term to describe IBS but not IBD. Plus in the future, when we can do whole genome sequencing, reserving IBS for accidental identity due to a parallel mutation may become more relevant. In spite of the low mutation rate, the vast number of loci and (perhaps) the large number of tested people will result in a certain number of recurrent mutations. We are already seeing this with more comprehensive sequencing of the Y chromosome.

I have no objections to those who prefer IBS for the more general term, but for the purposes of this blog post, I mean Identical “just/merely/only” by State. For further clarity, we need to emphasize that we are speaking of HALF identity, where at least one of the two alleles in one party’s genotype matches at least one allele in the other party. Leon Kull coined the acronym HIR for Half-Identical Region. That obviously leaves a lot of wiggle room, as shown in the next section.

Dissecting the Segment

Jim graciously shared his raw data with me so I could use Excel to view each and every one of the 850 SNPs in the segment. (See Supplemental data file.) The segment boundaries are opposite homozygotes (e.g. CC and GG) – they do not match at all. Figure 1 shows some of the column headers in the spreadsheet with a few sample rows of data.

Columns A, B, and C give the chromosome number, chromosome position, and SNP ID as found in the raw data download. They are redacted here for privacy reasons, but the column labels are preserved for those who would like to use the spreadsheet as a template for their own analyses.

Column D is for Jim’s genotype data. If Jim is homozygous for a marker (e.g. CC), then he obviously received a C from his father and a C from his mother. If Jim is heterozygous possible alleles are always listed in an arbitrary order (often alphabetical). The C allele could have come from his mother and the T allele from his father, or vice versa. Columns E, F, and G are genotype data for Larry, his mother, and his father.

I also phased Larry’s data so I could tell which allele came from which parent, using David Pike’s utility Phase a Child when given data for child and both parents for the calculations. In a separate step (not shown here), I reformatted the results and loaded them in to the spreadsheet so the rows aligned with Jim’s results. Column H has the maternal allele (from my sister) and Column I has the paternal allele. The results could not be phased in cases where all three parties were heterozygous, and the genotype is retained. A heterozygous result is a universal match – no matter what Jim’s genotype is, at least one of Larry’s alleles will match at least one of Jim’s alleles, because each SNP has only two possible versions.[3] The full spreadsheet can be seen at this link.

Ann Taylor Figure 1

Figure 1

Columns J and K use Excel formulas to show whether Jim matches the maternal allele and/or the paternal allele (coded with a “1”). Conditional formatting shows pink for a maternal match and blue for a paternal match. It’s readily apparent that a mismatch in the maternal side is filled in by a match in the paternal side, and vice versa. Figure 2 shows this pink and blue pattern horizontally (similar to a chromosome browser) for a somewhat longer stretch of 31 SNPs.

Ann Taylor Figure 2

Figure 2

The remaining columns in the spreadsheet (L through T) contain calculations used to generate some summary statistics:

1) The apparent long run of 850 half-identical SNPs is broken up into 61 shorter runs on the maternal side and 31 shorter runs on the paternal side. It is entirely possible that these runs would be fragmented even further if Jim also had phased data.

2) Jim and Larry are both homozygous for the same allele for 368 of the SNPs. If Jim inherited the same allele from his father AND his mother, and ditto for Larry, it seems likely that the allele is rather common in the general population. That makes for easy pickings.

3) Jim is heterozygous for 310 SNPs and Larry for 311 SNPs, about 36%. There are 482 SNPs where at least one party is heterozygous (57%). These are universal matches.

Most segments of this length will actually be IBD.[4] This example is somewhat exceptional, deliberately chosen to dramatize the possible pitfalls and serve as a warning about smaller segments. One explanation may be that the 36% level of heterozygosity happened to be particularly high for this one region. The overall average for Jim and Larry was 28.2% and 30.7% respectively

Red flags were waving for this segment: the lack of triangulation and the lack of a match for both of Larry’s parents. Is the converse true? Can triangulation or a match in a parent prove IBD? No, many counter-examples can be found, especially at shorter segment lengths.[5]

Phasing is our most pressing need, yet it is not always available.[6] Any alternative methodology for claiming that certain short HIRs are IBD must be able to demonstrate that the segment survives in test cases where the phase is known.

One more moral of the story: a genealogical connection can be made without DNA!

[1] M.A. Jobling et al, Human Evolutionary Genetics: Origins, Peoples & Disease, Garland Science, 2004.

[2] Brenner CH. Understanding Y haplotype matching probability. Forensic Sci Int Genet. 2014 Jan;8(1):233-43. http://dna-view.com/downloads/documents/Understanding%20Y%20haplotype%20matching%20probability.pdf

[3] It is possible to have three or four alleles (A/C/G/T) for a SNP, but these are rare and SNP chips tend to avoid them.

[4] According to 23andMe’s simulations “IBD segment lengths [i.e. HIRs] greater than 7 cM were observed 90% of the time in at least one parent. Preliminary data suggest that 7 cM segments shared between a distant cousin and child that were not observed in the parents were due to false negatives in the parents.” Henn BM et al, “Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples.” PLoS One. 2012;7(4):e34267.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3317976/

[5] See my blog post http://www.thegeneticgenealogist.com/2015/03/30/guest-post-what-a-difference-a-phase-makes/ for details on how an experimental phased data file eliminated a large number of small segments reported by Family Tree DNA.

[6] AncestryDNA phases data for its internal calculations, but the raw data download shows genotypes with the alleles in an arbitrary order.

Small Segments and Triangulation

How small can we go with triangulation?

We have anecdotal information to indicate “almost all” shared segments above 15cM are Identical By Descent (IBD). There is always a tail on the distribution curve of random events, so we cannot say 100%.

From my experience (mapping over 90% of my chromosomes) I am confident that triangulation can tighten the distribution curve so that “almost all” segments down to 7cM in a Triangulated Group (TG) are IBD. I say this because I find some 7-10cM shared segments which do not triangulate with the TG on either the maternal or paternal sides. Although several segments in each TG triangulate with each other, some shared segments, with the same “address”, do not. To me this is proof positive that these shared segments which do not triangulate must be Identical by State (IBS), meaning, in this case, not-IBD. And the number of such 7-10cM shared segments which don’t triangulate, and are thus IBS, seems to generally agree with the percent IBS in the ISOGG/Wiki: http://www.isogg.org/wiki/Identical_by_descent

However, the fact that the segments in a TG do triangulate does not, in my mind, provide a 100% guarantee that they are all IBD. The same is true for a random shared segment in the 10-15cM range – most, but not all, are IBD. But in the aggregate, when we have say 20 shared segments in a TG, usually of various cMs, this pretty much defines that area of the chromosome as coming from an ancestor. If 1 or 2 of those triangulated shared segments turns out to be IBS, it’s not harmful in the grand scheme – we are looking for a Common Ancestor (CA) for the TG, and generally find only a few Matches in the TG who have a robust enough Ancestral Tree to help with this goal. We are looking for several such Matches to confirm the same CA. Having a close cousin in the TG, increases our confidence in the CA. As our Match list doubles over the next 12 months, so too should the number of Matches in each TG, adding to the preponderance of evidence for both the TG and CA. The key is that several distant cousins all agree on the same CA for the TG – this, too, adds to our confidence level.

My chromosome mapping has resulted in about 350 defined TGs which are adjacent to each other (“heel and toe”) – covering long stretches of each of my 45 chromosomes, with only a few bare spots over 10cM. All new Matches have shared segments which easily “fit” into, and triangulate with, existing TGs – except a small percentage in the 7-10cM range which don’t and are then labeled IBS. This has also added to my confidence that triangulation, down to 7cM shared segments, is a good process. The outline of my chromosome map is coming into sharper focus, with fairly well defined crossover points, and ambiguities are fading away.

With this “success”, I’ve been including shared segments in my analysis down to 500 SNPs and 5cM by adjusting the thresholds at GEDmatch. Almost all of the 5-7cM segments do NOT triangulate, and are thus IBS. A few do triangulate – guessing at about 5-10% range. This seems reasonable to me as there are 5cM shared segments which are IBD. I’m adding these into my TGs, but color coding the small cM value to highlight it. To date I cannot recall any which have resulted in a confirming CA. Most of these 5-7cM IBD segments may well be from an even more distant CA…  I also include shared segments down to 5cM from close, known cousins. Most are also IBS, but a few of them, so far, agree with the TG CA, and are probably IBD.

The problem is we don’t have a good test for IBD vs IBS. Some have used results from phased data to develop rough percentages for IBD/IBS ratios vs cMs for shared segments. See http://www.isogg.org/wiki/Identical_by_descent I’ve seen no distribution “curve” yet. We don’t have such data for triangulated segments, so we really don’t know what effect triangulation has. Triangulation depends, in part, on using long shared segments. This, coupled with widely separated cousins who got exactly the same long segment, increases the odds that the shared segments are IBD. These two factors (length of segment and a match) combine to increase the probability of IBD. But as we decrease the shared segment size, we reduce that factor. We don’t know, yet, by how much this affects the curve.

Clearly, very small segments (under 5cM) are much easier to match, although most are IBS. Also, many of these very small segments will also triangulate. Triangulation is not a guarantee of IBD. We cannot use triangulation to prove triangulation. In other words, if segment length is a key factor in triangulation, we cannot say that triangulation itself proves smaller shared segments are IBD – it’s a circular argument. We need more corroborating data.

I am hesitant about establishing “rules” for segment sizes for triangulation. We are dealing with distribution curves – with tails. We have not yet drawn these curves, but at some point (as the segment size is reduced), the false positives will occur, even with triangulation. I am confident that triangulation shifts the IBD/IBS-vs-cM distribution curve “to the left”. Triangulation definitely culls out many (most?) IBS segments in the 7-10cM range. Thus the IBD/IBS ratio for a given cM must increase. To what extent is yet to be determined.

Triangulation is a tool. Use judgment when using it.

For me, shared segments below 5cM are uncharted territory for triangulation. I am confident of a Triangulation “guideline” for shared segments down to 7cM. Based on my experience with most segments in the 5-7cM range being IBS, I’m now fairly confident that triangulation also works down to 5cM. At the least, triangulation culls out most of the IBS shared segments. I think most of the few remaining 5-7cM shared segments which triangulate are IBD. For me, it’s at least worth the chance to include them in a TG and enlist the help of those Matches in finding the CA.

 

13 Segmentology: Small Segments and Triangulation by Jim Bartlett 20150930

VUCA DNA

Recently I was at a meeting with some fellow retired Naval Officers. The subject came around to a concept several of us had learned at the National Defense University in Washington, DC.

VUCA

There is now a Wikipedia Article about VUCA – Volatility, Uncertainty, Complexity, Ambiguity. It was coined to refer to wars, but was later applied to many other situations. As I thought about it, VUCA can be used to describe atDNA.

1. Volatility – The basis of human DNA, the “Build” is always being updated and changed – it was recently changed from Build 36 to Build 37 (by most companies). The chips used to determine SNP values have undergone change. FTDNA reran all of their tests when they changed to the Illumina chip. 23andMe has changed chips (and SNPs) at least twice in the last few years. AncestryDNA significantly changed their matching algorithm recently. There will be more change as newer technology or processes are developed.

2. Uncertainty – Is a shared segment IBD or not? Is a Common Ancestor the genetic ancestor or not? How many cMs should we expect to share with a 4th cousin?

3. Complexity – A matching algorithm may take SNPs from either side. Shared segments may be from either parent’s chromosome. Can it get more complex. Well… sure it can. How distant could the Common Ancestor be? Does the shared segment span two Common Ancestors or not? Which ancestors are not genetic ancestors? Which genetic ancestors passed down segments above a matching threshold? How does endogamy affect our shared segments?

4. Ambiguity – The cM measurements are based on an average of observed values for crossovers by males and females. Base pairs span areas of chromosomes that have not been sequenced. There is no sign post to indicate where a segment from an ancestor starts or stops – so shared segments are often reported as longer or shorter than they really are. Company algorithms are different – what is the criterial for a Match? How do they handle no-calls?

I think the VUCA acronym describes atDNA pretty well…

06Z Segment-ology: VUCA DNA by Jim Bartlett 20150823

The Porcupine Chart

Genetic Ancestors – the Porcupine Chart

First let me define Genetic Ancestors. These are the ancestors who passed DNA down to you. Your DNA includes some DNA from each genetic ancestor. But, as we’ll see, not all of your ancestors contributed to your DNA. Some of your distant ancestors passed DNA down to their descendants, but that DNA never made it all the way to you. So let’s delve a little more into this concept, so you’ll know what to expect as you form Triangulated Groups, work with your Matches to find Common Ancestors, and fill out your chromosome map.

You get exactly 1/2 of your autosomal DNA (chromosomes 1-22) from each parent. Each parent has used the two chromosomes they got from their parents (your grandparents) to create one chromosome for you. Actually this is the fundamental way DNA is passed down to you and to each of your ancestors so let’s take a closer look at this process in Figure 1.

07A Fig 1

Here are some important points about Figure 1:

  1. The parent has two of each chromosome – one from each of his/her parents (the child’s grandparents)
  2. The parent takes part of each of the two chromosomes and makes a new chromosome (this process is called recombination)
  3. The parent then passes this new chromosome to a child
  4. Clearly, there is a wide range of alternative possibilities for the new chromosome – I’ve shown 3 alternatives: one where the new chromosome is roughly a 50/50 split between the grandparents; one with a larger split; and one where there is no split (no recombination), and the child gets all the DNA in this chromosome from one grandparent. All of these possibilities occur in nature. And in fact the odds are that one of your smaller chromosomes (18-22) is probably all from one grandparent.
  5. Note that the child got one chromosome from the parent who started with two chromosomes – the child got exactly 1/2 of the DNA from the parent.
  6. This figure shows the total amount of DNA in the new chromosome, not necessarily how it is split up into segments. For more on segments read: Segments: Bottom-up
  7. This figure is based on one chromosome for the child. The recombination process happens for all of the 22 chromosomes, for each of the two parents.

So, since you got exactly 1/2 of your atDNA from each parent, and they each got exactly 1/2 of their atDNA from their parent’s, wouldn’t you be getting exactly 1/4 of your DNA from each grandparent? Well… no!  As shown in Figure 1, the child could get any mix of DNA from the grandparents – just as long as it added up to 100%. On individual chromosomes the mix can vary quite wildly, but in the aggregate over all 22 chromosomes, the average tends toward 50/50, but with a range of possibilities. However, in each case the two percentages will add up to 100%.

Also you can re-read: Measuring Segments to see that you can measure and total the two grandparents’ segments by base pairs (bp), centiMorgans (cM) or SNPs – you’ll get the same percentages  and totals with any method.

So let’s continue this story by looking back one more generation – to the contribution by the great grandparents. Let’s continue to look at one chromosome and assume the mix from the grandparents is 45/55. To keep the description brief and the graphics clear, we’ll look at the 55% from the grandmother. Just like the standard process in Figure 1, this 55% portion will be composed of contributions from the two great grandparents. The great grandparent mix over this portion could range from 50/50 to 0/100, and the two numbers will always total 100. These two great grandparents cannot contribute to the 45% area (two other great grandparents do that), so their total contribution will only be of the 55% portion. So let’s say their mix is 60/40. So over this chromosome, these two great grandparents contributed 33% and 22% (for a 55% total). Note that the 60/40 split is wider than the 45/55 split. This actually happens in nature – the split’s tend to get wider [or wilder, or more random, or have more deviation more from the average] the farther back you go.

It’s time for another visual depiction – see Figure 2:

07A Fig 2

Important points from Figure 2:

  1. You see the total chromosome (100%) the child got from one parent. [The vertical black “tic” is at 50%]
  2. Under that is the 45/55 split between grandparents.
  3. The third row shows the 55% contribution of the parent’s grandmother, being split between her parents, by 60/40. Think of blue as the paternal side, and pink as the maternal side in each succeeding generation. I don’t show the other grandparents – it gets too messy. I just want to follow some ancestral path back, to show how the genetic (DNA) contribution of some ancestors gets smaller.
  4. Note that although this is shown for one chromosome, the same principle applies to the aggregate for all chromosomes.
  5. Note the use of Ahnentafel numbers to easily keep track of the ancestors.

So, let’s carry this story further back in Figure 3:

07A Fig 3

Important points in Figure 3:

  1. You see the diminishing amounts of DNA that are passed down by more distant ancestors.
  2. In the last two lines (4G and 5G grandparents) you see the percent of each couple still totals 100% (of that portion on this chromosome), but the farther back you go (down the chart) the split between the ancestral couple tends to get wider.
  3. In fact, for the 5G grandparents, one of them drops out altogether. That 5G grandparent that dropped out (#172) probably contributed to all of the other generations down to and including your parent, but when your parent recombined his parent’s DNA, it just didn’t include any of the small contributions from this particular 5G grandparent.
  4. On a chromosome level, this 5G grandparent (#172), may not have contributed to other chromosomes either. Once an ancestor drops out of all chromosomes, their contribution to you becomes 0, and this ancestor is then not a genetic ancestor!
  5. Re-read: Segments: Top-Down to see how the DNA of some ancestors drop out of the mix.
  6. Also note that this 5G grandparent (#172) probably did contribute some DNA to many of his other 5G grandchildren, just not to you.

As we look farther up the ancestral Tree, we find more and more ancestors drop out of the mix – you don’t have any DNA from them.

Another important point in this genetic ancestor analysis is that at each generation, going back, at least one ancestor in each couple has to be there to pass the DNA down. Another way to put this is that one parent in a generation may drop out of the DNA mix, but the other one cannot. One of the two of them had to pass down the DNA that the child (your ancestor) got and passed along, eventually reaching you. And that distant ancestor (who passed down the DNA) had to get that DNA from at least one of their parents. Theoretically, your DNA goes all the way back to DNA Adam and Eve. In a more practical timeframe, as in your genealogy, there will be genetic ancestors at each generation who contributed to your total DNA. This is true whether you have identified them in your ancestral Tree, or they are behind a brick wall – whether they are known to you, or not. At each and every generation, you will have genetic ancestors whose DNA contribution to you will add up to 100%. But not every ancestor will contribute – only the genetic ancestors will… This leads us to what I call the Porcupine Chart in Figure 4.

Figure 4: The Porcupine Chart:

07A Fig 4

Used by Permission

This wonderful chart was developed by The Coop Lab – see http://gcbias.org/2013/11/11/how-does-your-number-of-genetic-ancestors-grow-back-over-time/ for their article and another chart of genealogical and genetic ancestors vs generations. It shows a standard ancestral fan chart, colored in with only genetic ancestors, moving out from you at the center. Figure 4 is an approximation based on simulations. It is not “the” chart for everyone. Your results will vary, just as your random DNA, and the contribution by your ancestors, will vary. This chart is intended to illustrate several key points:

  1. Most of your closer ancestors contribute to your DNA.
  2. At some point a few ancestors drop out of the mix.
  3. When an ancestor drops out, his/her spouse/mate stays in the mix.
  4. Every ancestor who has contributed to your DNA has a porcupine “quill”.
  5. The “quills” extend forever – there is always another ancestor who passed down the DNA (theoretically back to DNA Adam and Eve, but in a practical sense, back farther than you can go on your genealogy).
  6. Although ancestors who contribute DNA to you continue to drop out with each succeeding generation going back, the number of contributing ancestors at each generation going back cannot get smaller. NB: some of the individuals may repeat as multiple ancestors, but the number of positions for ancestors in the Tree who contribute to your DNA never gets smaller. In the extreme, the number of genetic individuals gets very small at bottlenecks and deep ancestry; but the number of “slots” in the Tree is very great.
  7. In a practical sense – in your genealogy timeframe – there will be a growing number of ancestors who contributed to your DNA in each generation [each will have a different Ahnentafel number]; and some of them will be repeat ancestors [one individual ancestor may have multiple Ahnentafel numbers].

Above-threshold segments:

Generally, the closer genetic ancestors will contribute a lot to your DNA, and more distant genetic ancestors will pass down a smaller contribution to your DNA. But a genetic ancestor will always contribute something. We can divide our genetic ancestors into two groups: those who pass down “above-threshold” segments, and those who pass down smaller segments. When all the DNA from a genetic ancestor falls below the threshold value, you won’t see any cousins from this ancestor on your Match list. This is a limitation of our programs today. This is what has happens when a true 3rd cousin doesn’t show up as a Match. You and the 3rd cousin probably share some DNA from a common 2x great grandparent, but not enough to meet the matching criteria. However, if you compare yourself with this 3rd cousin at GEDmatch, and lower the threshold to 300 SNPs and 3cM, you will usually find matching IBD segments. Also, by testing and comparing siblings and other close relatives with this 3rd cousin, you may well find that they have above-threshold segments and match. The point is that some genetic ancestors may be above-threshold ancestors with you, and not others; and vice versa.

OK – we’ve now seen that you have many ancestors, but only some of them are genetic ancestors. And only some of your genetic ancestors will pass down above-threshold segments to you.

The most important group of ancestors to genetic genealogists is a subset of your genetic ancestors – it’s those genetic ancestors who passed down to you, and a Match, at least one DNA segment which is over the threshold amount.

So there is another chart which is based on ancestors who contributed DNA segments over the threshold amounts. The chart will be of a similar form to Figure 4, but the missing ancestors will occur closer to you, and the quills will be truncated when the segments get too small (the quills don’t go back forever). The chart will look like a skinnier porcupine with a crewcut – stop to visualize this….  These are the Common Ancestors we are looking for – this is the portion of our genealogy and our ancestry that we are working with. And, yes, some (many?) of these Common Ancestors will be beyond our known ancestral Trees. Someday… much of this chart will be drawn from our completed chromosome maps.

From my experience, it appears that the number of ancestors who contribute above-threshold DNA segments will usually include all of our 16 2G grandparents, maybe all of our 32 3G grandparents, and most of our 64 4G grandparents (5th cousin level). I think that most of our Matches are in the 6th to 8th cousin level (where some of our 7G grandparents have passed “sticky” segments down to us); and that it drops off after that, with some Matches out to 10-12th cousin level with a few of our more distant grandparents. That would be what the skinny porcupine with a crewcut would look like.

A final note: the ancestors we “see” through matching algorithms may be only part of the genetic ancestors who contribute above-threshold DNA to us. We can only compare with Matches who have taken an atDNA test. Some of our ancestors may have very few descendants, or be from an area or country where few folks take DNA tests. So the fact that we don’t find Matches to some ancestors, doesn’t necessarily mean they didn’t pass down sufficient DNA. I’ll have to explore this more, in a different blog post. Chromosome mapping will also help resolve this.

Summary Observations:

  1. Our genealogy ancestors fill up every slot in every generation of our ancestry Tree – doubling with each generation [each with a different Ahnentafel number] – forever.
  2. Individuals will repeat as ancestors [each may have multiple Ahnentafel numbers]
  3. Our genetic ancestors begin to drop out of our Tree at some point.
  4. Our genetic ancestors who have passed down above-threshold DNA segments  begin to drop out of our Tree even sooner.
  5. At each and every generation, there are genetic ancestors whose DNA contributions total 100% – for all of our atDNA and for each chromosome.
  6. The number of genetic ancestors will increase with each generation. And like genealogy ancestors, individual genetic ancestors can also repeat.
  7. Each genetic ancestor will have an ancestral “quill” of genetic ancestors.
  8. Only genetic ancestors who pass down enough above-threshold DNA will be seen as Common Ancestors between Matches
  9. The number of above-threshold genetic ancestors will increase for perhaps 7-9 generations, and then decrease for the remaining generations.
  10. The above-threshold genetic ancestor chart will look like a skinny porcupine with a crew cut. The crew cut “quills” will include “sticky” segments which survive for several generations.
  11. A porcupine “quill” is not necessarily just one segment. The “quills” are for ancestors, and an ancestor may pass down multiple segments.
  12. At each generation, and on each chromosome, the DNA from a parent will be from some mix from his/her parents – ranging from an even 50/50 split to an all or nothing 0/100 split.

 

 

07A Segment-ology: Genetic Ancestors – the Porcupine Chart by Jim Bartlett 20150806

Why Upload to GEDmatch or FTDNA?

What is the advantage of uploading AncestryDNA results to GEDmatch and/or FTDNA? Let me count the ways… Here are my top 10 reasons.

[1]. To get additional Matches. (from other companies, including Matches below thresholds)

[2]. To get Matches with emails. And most at FTDNA have real names; many at GEDmatch have real names.

[3]. To get cooperative Matches. A much higher percentage of folks who test at FTDNA will work with you on genealogy. Same with folks who have taken the trouble to upload to GEDmatch.

[4]. To see the shared DNA segment. This is probably the most important reason, IMO! For each shared segment with a Match, you see the chromosome number, start and end locations, cM value, and number of SNPs included. This is technical DNA info, but it is invaluable to those who utilize the DNA beyond just a list of Matches (who may or may not be related – read on…)

[5]. The shared segment data allows the tester, or a Match, to confirm the segment is a true segment from an ancestor (that the segment is Identical By Descent, IBD) – this is done by Triangulation with other shared segments

[6]. The shared segment data allows the tester, or a Match, to evaluate the segment – a small segment indicates a distant relationship; a large segment, or multiple segments, indicates a closer relationship.

[7]. The shared segment data allows the tester, or a Match, to group segments from Common Ancestors – you will tend to have only one (or very few) different segments from a distant ancestor, and this is where you will find other cousins from that ancestor.

[8]. With Colonial American ancestry (and other endogamous populations), you may have multiple Common Ancestors with a Match. The shared segment data will allow you to determine which ancestor the DNA came from, because all who have the same shared segment data should descend from the same Common Ancestor.

[9]. Admixture (ethnicity, ancestral geography) reports are different at different companies. GEDmatch, in particular, has several utilities with a range of admixture evaluations that target different areas.

[10]. GEDmatch has other utilities, including seeing if your parents were related.

Readers are invited to add other reasons to upload AncestryDNA results to FTDNA and/or GEDmatch in the comments section.

21 Segmentology: Why Upload to GEDmatch or FTDNA by Jim Bartlett 21050611