Getting Started with Autosomal DNA Part I

So you are thinking about getting an autosomal (atDNA) test, but are not sure where to start. This blog post will walk you through several steps to help get you started.

An atDNA test will result in a list of Matches based on shared DNA. Almost all of these Matches are your cousins – most will be about 5th to 8th cousins, with some who are closer and some who are more distant. The DNA test will give you this list, and a way to contact your Matches; it’s up to you to share information with your Matches and determine your Common Ancestor(s).

BEFORE YOU TEST – UNDERSTANDING

  1. Determine your objectives. Write your own or choose from these:

A. ___Find new cousins
B. ___Prove your ancestral lines
C. ___Break down brick walls
D. ___Find biological parent(s) of yourself or some ancestor [see also DNAAdoption*]
E. ___Find out your deep ancestry
F. ___Form working groups of your Matches by Ancestor                                               [Triangulated Groups]
G. ___Determine which ancestor provided each part of your DNA                               [Chromosome Mapping]
H. ___Other ______________________________________
2. What to expect from your atDNA results.
Work. Your results will include a list of Matches – people who match your DNA. In general these Matches will be cousins. Generally very few will be close cousins (1st or 2nd cousins) – the bulk of them will be 5th to 8th cousins, or more. Some will have a Tree or Pedigree or list of surnames posted, but many will not. In general you will need to contact your Matches to determine your Common Ancestor. Adoptees and people with close brick walls will need to compare a lot of information from their Matches to develop common threads, and likely relationships. You need to be involved – your Tree is not magically filled out for you.
Ethnicity is a broad estimate. Your results will also include some estimates of your ethnicity or geographic ancestry. Since you only got part of your ancestor’s DNA, these estimates are generally correct, but not very precise.
Maybe unexpected results. When you take a DNA test and are compared with everyone else who has done the same, there is always the potential for a surprise. You may have an ancestor who is not the biological child of the parents you thought they were.
Genetic Genealogy Standards. This document is highly recommended for more information about DNA testing: http://www.geneticgenealogystandards.com/
DNA is just a tool – a very power full tool. You use it as part of your genealogy research, not in place of genealogy. A DNA test by itself cannot create your pedigree.

3. Understand the three types of DNA.
Y-DNA used to study an all-male line – the Y-DNA is passed from fathers to sons.
mtDNA used to study an all-female line – the mtDNA is passed from mothers to their children (sons and daughters).
atDNA used to study all of your ancestry – the atDNA is passed from male and female to their children. This test results in Matches from all of your ancestry. The Matches are your cousins – much more on this later. The atDNA does not include any Y-DNA or mtDNA, although Matches could just as easily be cousins from an all-male-line or all-female-line ancestor, as from any other line.

This post is all about atDNA

4. There are two fundamental levels for using atDNA:
Level 1. Genealogy only. At this level you just accept the list of Matches as your cousins, correspond and share with them to determine how you are related. You may often find that you are related several ways. This is plain and simple genealogy. Think of the DNA test as a filter that separates out only people who are related to you.
Level 2. Using the DNA. This level requires some amount of knowledge about how DNA works – how you got it, how your Match got it, what matching means and doesn’t mean, and some amount of jargon. Much more on all of this later. But if it gets too complex, or if you need a breather, just fall back on Level 1, and work with your Matches.

5. Select the company for your test. The base price is $99 at each of the three companies, and they have $10-$20 reductions several times a year. Each company offers a different mix of features. See http://www.isogg.org/wiki/Autosomal_DNA_testing_comparison_chart for a comprehensive and unbiased comparison matrix. Many folks have their favorite companies for different reasons – but this is my blog, so my thoughts include:

A. All three companies display a list of your Matches (people who share some DNA with you and are your cousins, in most cases), and a way to communicate with them. You can and should upload a GEDcom of your Ancestry to each site. They also offer an estimated relationship range for each Match – a range of relatedness (e.g. 3rd to 5th cousins). They show some ethnicity/geographic estimates (none are very precise because you only get part of each ancestor’s DNA). They also give you the ability to download your raw DNA data.

B. Family Tree DNA (FTDNA) – I think is the best all around. Almost all Matches are listed with real names, emails, and all DNA segment data – upfront and easily downloaded to a spreadsheet which you can use or print out. If you have Colonial American ancestry, you’ll probably get more than a thousand Matches. They also store your DNA and offer a range of other DNA tests. If you have elderly relatives, and want to preserve their DNA for future tests, this is the best site. The main drawback is the ability to compare your Matches with each other – this is mostly overcome with their InCommonWith utility. A good site for all the objectives above.

C. AncestryDNA – If you have Colonial American ancestry, you’ll probably get several thousand Matches. Some Matches have good Trees, some have small or no Trees; and some have Private Trees. To communicate with Matches you must use the Ancestry messaging system. Several of Ancestry’s key features include a Hint system which highlights ancestors in your Tree and your Match’s Tree which are the same, based on genealogy. They also provide a Shared Matches feature based on shared DNA; but they don’t provide any DNA segment data, which is essential for objectives B, D, F and G above. A good site for finding cousins, a poor site for working with DNA. Use this site if you don’t want to learn about DNA.

D. 23andMe – has the largest database (over one million customers, but they only list your top 2,000 Matches). There are no emails posted, so you have to use their messaging system to communicate. They have a utility to compare kits to each other, which is a key feature. Their Tree system is the hardest to use. A good site for all of the objectives above.

E. GEDmatch – is a third party site (free, but donations are encouraged) – you can upload your raw DNA data file from any of the above companies to GEDmatch, and compare among Matches who tested at the three companies above. They list the top 1,500 Matches in a One-to-Many utility; and let you compare any two Matches One-to-One. A great suite of other utilities, including Triangulation and several ethnicity/geography programs. I encourage you to upload to and use GEDmatch. You’ll get more Matches and DNA data.

F. My strong recommendation is to test at all 3 companies – each has a different database of potential Matches, and each offers different features. To save some money, you can test at AncestryDNA during a sale, and then upload (copy) your DNA data to FTDNA for $39, upload to GEDmatch (free), and also test at 23andMe to maximize your chances of finding good, close cousins. Balance this plan against your budget and your desire to test other close relatives (also recommended).

BEFORE YOUR RESULTS ARE POSTED

6. Develop a robust Tree of your Ancestors. By robust, I mean include as many Ancestors as you can, with place and date information, out to 12 generations or so. This is your “bait” when “fishing” for cousins. The atDNA test tells you a Match shares DNA with you – that they are probably a cousin. You have to compare ancestors with the Match to determine your Common Ancestor. If you both don’t have the Common Ancestor in your Tree, it’s very hard to find it. Most of your DNA Matches will be in the 5th to 8th cousin range, some more distant. You need a Tree that includes as much of your Ancestry as possible back to at least 10th cousin range, wherever possible. Few have actually done the detailed research to “prove” all of their ancestors back that far. My recommendation is to “borrow” from the research and Trees of others to fill out your own Tree as much as possible. I’d go so far as to also include “iffy” Ancestors at the tips of your Tree – ones you may not have researched or proved – these are better than blank spaces in your Tree. The objective here is to identify potential Common Ancestors. Then you and your Match (now a potential cousin) can compare notes to see how much documentation you each have.
Create a GEDcom of your robust Tree and upload it to each site where you’ve tested (FTDNA, AncestryDNA, and/or 23andMe); to GEDmatch; and/or to WorldConnect, WikiTree, FamilySearch, etc.

7. Develop a list of Patriarchs [optional, but very helpful] – make an alphabetical list of your ancestral surnames. Then add the most recognizable Patriarch (or Matriarch, if there is no Patriarch) with years and places. Keep each surname/Patriarch to one line, if possible. Some examples:
   CHILES, Col Walter II 1630-1671; John 1679-1723VA; dau Valentine 1719 Caroline Co, VA
   FISHER, George b c1742 PA; RevWar; descendants to Pendleton; then Harrison/Lewis Co, (W)VA
   HAMM, Stephen b c1737 Amherst Co, VA (on Stovall Creek) early 1700s through RevWar

8. Develop a Standard Message – This is a message you’ll send to all your Matches. It’s good to have a Standard Message (which you can tweak over time). You can just copy and paste it into an email or messaging system. This saves a lot of time. After this initial effort to contact each Match, you’ll want to personalize follow up messages.

Your message should include your real name and email; perhaps a very brief introduction, a link to your Tree of Ancestors, a request that your Match share their Tree with you.

An example (revise to suit your style):
Hi, I’m Jim Bartlett. I’ve been a genealogist since 1974. Most of my ancestry is from Colonial Virginia with one grandparent’s ancestry from Scotland and Germany in the mid-1800s. I’m willing to share my documentation. My goals include validating my ancestral lines and working through brick walls using DNA. My Public Tree: http://trees.ancestry.com/tree/20620230/family (I can send invite). Please share your Tree (best), pedigree, list of Patriarchs or surnames.

Ask if you have questions – I teach DNA for genealogists; see my atDNA How to Succeed list at: http://boards.rootsweb.com/topics.dnaresearch.autosomal/301/mb.ashx it has some good links at the end; one on Triangulation! Also see my blog: http://www.segmentology.org

Hope to hear from you, Jim Bartlett jim4bartletts@verizon.net

I have modified my introductory message many time – I’m now on version 23. And you can add in anything, whenever it is appropriate. An example would be something about a particular surname or location you see in a Match’s profile. The “boilerplate” is in your standard message, but you can modify it any time you want.

This blogpost will get you started, and let you order a test with some knowledge of what’s involved. The next blogpost in this series will be:

Getting Started with Autosomal DNA Part II

AFTER YOUR RESULTS ARE POSTED

 

01A Segment-ology: Getting Started with Autosomal DNA Part I – by Jim Bartlett 20151122

* https://groups.yahoo.com/neo/groups/DNAAdoption/info

Proof of Sticky Segments

Well… I should use “proof” in quotes, but the simulations below should show that we will always have some “sticky” segments which survive many generations. Technically, I suppose, a “sticky segment” is one that is passed from one ancestor to another intact. However, “sticky” is usually used in the sense of segments which pass through many generations, intact.

Here are the ground-rules I used for this analysis:

  1. Use a 200cM chromosome (about the size of chromosome 5, 6 or 7)
  2. Assume 2 crossovers per generation. By definition, there will be an average of one crossover in 100cM per generation. So, on average, there will be two crossovers per generation in 200cM. Sometimes there are one or three crossovers; and infrequently there are none or four (or more) crossovers. Since it more or less evens out, I’m using the average of two crossovers per generation to illustrate what happens over 10-20 generations. This avoids any bias on my part.
  3. Use a simplification. Each time there is a crossover, there is a switch from one ancestor’s DNA to another’s. However, the other ancestor’s DNA is subjected to the same one-crossover-per-100cM rule. So for the purposes of this discussion [about how segments are subdivided by crossovers, and not about which ancestor they come from], I will use a simplification: I’ll assume that the other ancestor’s DNA is exactly like the first ancestor’s DNA at that location, as far as crossovers are concerned. This means I’ll just continue to subdivide the segments in the initial 200cM segment, generation after generation. It’s a whole lot simpler and easier than starting with 2,048 chromosomes of different colors and keeping track of those. This simplification will give essentially the same result of segment subdivision, and is much easier to follow.
  4. Assign crossovers to the middle of the largest segments. The DNA is very random, but to A) avoid any bias on my part, and B) show the worst case scenario, I will assign the two crossovers in each generation to the middle of each of the two longest segments in the 200cM chromosome. Of course, in real life some crossovers will subdivide a small segment (leaving a different, larger segment intact for another generation); or subdivide a large segment into very unequal parts (leaving one smaller segment, plus a somewhat larger segment – it all evens out in the average).

So apply these rules to the 200cM chromosome in the figure below.

Figure 1. Tracking Crossovers for 24 Generations

07C Fig 1 Proof of Sticky Segment

Note the first two crossovers subdivide the segment into roughly three equal segments. As outlined in Rule 3 above, the center segment will really be from a different ancestor, but for the purpose of following the subdivision of segments in general, we will continue with all the segments – on average this gives the same result as far as subdivided segment sizes go. Note: the segment which are subdivided are highlighted in yellow.

In the next generation two of the largest segments are subdivided, but the third fairly large segment remains intact. This is because we only have (on average) two crossovers per generation, so one of the three large segments in generation 2 will not be subdivided.

Moving to generation 3 we see the largest (66cM) segment is now subdivided by one of the two new crossovers for this generation, and then a 33cM segment is subdivided by the other crossover for this generation.  And we still have two 33cM segments and one 34cM segment passed intact.

Continue this process of subdividing the two largest segments in half in each generation, until we get to generation 11, where all the segments are now 8 or 9cM except one that is still 16cM. This 16cM segment has remained intact in 6 generations! In fact there are two other 16cM segments that remained intact for 6 generations.

Will it always happens this way? No – the DNA is very random, and a different result will happen every time. But if one of those “sticky” segments had been subdivided in an earlier generation, some other segment would not have been subdivided, and that segment would have remained “sticky” for another generation.

The takeaway here is that there are only two crossovers per generation in a chromosome about 200cM. Those two crossovers can only subdivide two of the many segments in the chromosome. And because of this, there are some segments that will pass down intact, generation after generation.

In actual practice, there are hot spots on each chromosome where crossovers are more likely to occur. The effect of these hot spots is that some smaller segments around the hotspots will be subdivided more frequently, and some other segments will be missed more frequently – leaving us with even more “sticky” segments elsewhere.

After generation 11, our process starts to subdivide some segments into segments so small that they would not show up as a shared segment – they are below the standard thresholds (about 7cM) for a match. But notice in generations 12 through 22, that there are still above-threshold segments. Even in generation 23 there is still an 8cM segment which has survived intact for 14 generations! Remember: if this particular segment had been subdivided, then some other segment would have not been subdivided.

The point is we should expect “sticky” ancestral segments. Particularly in the 7-10cM range. They are actually quite common. Even “sticky” segments in the 10-20cM range are usual, even after 7-10 generations.

Now, we have not studied the probability of matches at these great distances – that’s a different, somewhat harder, discussion. The point here is that there are many “sticky” segments in our DNA. They may come from generations that are generally way beyond our genealogies. Also we should not be surprised when we see a parent and child with essentially the same 7-10cM segment being shared with a Match. It happens pretty frequently with close relatives.

Here is another example based on a 100cM chromosome (think chromosomes 19-22). For this chromosome there is only an average of one crossover per generation – only one segment will be subdivided in each generation. I tried to place them more randomly this time [you can easily try your own pencil & paper sketch of this simulation]. I generally picked on the largest segment. After 11 generations over half of the segments are still over the (7cM) threshold. And several segments 10cM or over have survived, intact, for a number of generations.

07C Fig 2 Tracking Crossovers in 100cMFigure 2. Tracking Crossovers in a 100cM Chromosome.

 

Summary

  1. Ancestral “sticky” segments in the 7-10cM range are normal. We will have Matches with these segments from time to time – and some of them may be fairly distant. But that’s another story.
  2. Some segments over 10cM will survive from 9 or 10 or more generations back – it’s normal and expected. Again, matching is a different calculation.
  3. The point is: we have many above-threshold segments from distant ancestors, back 10 generations and more!
  4. Since we have many above-threshold ancestral segments from distant ancestors, on every chromosome, we should expect to have shared segments with distant cousins.

 

07C Segment-ology: Proof of Sticky Segments by Jim Bartlett 20151116

Segment Size vs Cousinship Chart Needed

We need a one-page chart that shows the empirical cM values found for various relationships. We know the theoretical, or calculated values, but the randomness of DNA results in a fairly wide range in some cases – particularly for distant cousins.

The chart below shows my guess as to what a chart might look like. The x-axis is cMs on a logarithmic scale. The y-axis is % of all the values for each cousinship (the number of results at each cM value divided by the total number of results for that cousinship – which would normalize the chart for different total number of results for each cousinship). The area under each curve would be 100% of all results. The roughly normal distribution curves are “centered” on the calculated cM values for each cousinship. Based on experience we know that first cousins (1C) tend to share segments with cM values relatively close to the calculated value of 880cMs, producing a tall thin curve (I think); whereas 5C (calculated average 3.4cM) or 6C (calculated average 0.8cM) must have long cM “tails” on this chart in order for us to “see” the shared segment with Matches which are above a 7cM threshold, producing a short wide curve (I think).

Note in this hypothetical chart, the small red dots at the end of some tails were taken from the data compiled by Blaine Bettinger (who did a great service to us all by compiling and reporting this data), which can be found at:

The Shared cM Project – An Update

We need this data displayed this way so we can easily enter with a shared cM value on the x-axis and see the range of cousinships possible. This would quickly show which cousinship is most probable, and how close, or far, other cousinships would be.

As I now think about it, at any cM value on the x-axis, wouldn’t the sum of the values of all the curves have to equal 100%? But to achieve that, we’d have to include all the possible curves, including siblings, half siblings, half double second cousins twice removed, etc., which is probably impractical at this point. I’d rather see the chart soon with the cousinships shown below, than wait a long time for the perfect chart.

Another thought is to blow up the part of the chart from, say, 5cM to 50cM. This would be fairly simple once the data is collected.

Still another observation is that if this chart were based on all collected data, data based on endogamous shared segments would generally be shifted a little more to the right; and data based on non-endogamous shared segments would generally be shifted a little more to the left.

BE CAREFUL – THE CHART BELOW IS A THEORETICAL GUESS (with only a few valid data points)

06B Figure cM vs pct for Cousins 1

06B Segment-ology: Segment Size vs Cousinship Chart Needed;  Jim Bartlett 20151106