Analyzing our Matches at AncestryDNA through Shared Matches
With about 20 million atDNA tests taken, we now have many Matches and lots of data. This blogpost describes one method (a “hack”) to manage some of that data. In this case, I’m trying to squeeze more information from my AncestryDNA Matches. Ancestry doesn’t report shared DNA segment data, but they do have several unique features we can put to use. In preparation, we need two things: Notes, which I’ve posted on here and here; and a spreadsheet of all the Matches.
- Notes. When known, I’ve been entering #A (with the Ahnentafel number and surnames of our MRCA Ancestors) and #T (with the Triangulated Group ID#) into the Note box on the DNA page for each Match at AncestryDNA. I have over 1,000 Hints, so I have a #A entry in the Notes box for each Hint. For hundreds who have uploaded to GEDmatch (or used another company too) I have a #T entry in the Notes box. This data is very helpful, but not essential.
- Spreadsheet of Matches. I have downloaded my 83,000 AncestryDNA Matches to a spreadsheet. This spreadsheet includes the Match name, the Admin, total cMs, all my entries in the Notes boxes, a URL link to the Match, and a URL link to their Tree. This is a great tool for exploring and managing my AncestryDNA Matches. I use DNAGedCom Client for a fairly quick download (small subscription fee).
So I took my spreadsheet (which is sorted on total cM) and added about 20 columns. I then chose an interesting 4th cousin (4C) Match and clicked on Shared Matches (SM). I then put a “1” in the first blank column of my spreadsheet for this 4C Match and also for each of our SMs. Since I started down the list with a middle-of-the-pack 4C, some of our SMs are higher in the spreadsheet, and some were below the SM I started with. (I yellow-highlighted the 1 for the SM I started with – to remind me who I started with). Note that the SMs are all classified as 4C or closer by AncestryDNA, although I’ve found a number of them to be really 5C or even 6C. But that’s OK – these “4C” Matches result in a manageable group of Matches – about 3,000 out of the total of 83,000. So I was working with the top 3 percent of my Matches – all sharing at least 20cM with me. This process resulted in a column with 1’s in it, generally spread out over all of the top 3,000 Matches.
Next, pick another interesting 4C (your choice – I picked one I knew was a real 4C on a pair of 3xG grandparents I’ve done a lot of work on and with whom I have several established Triangulated Groups (TGs) from the other companies. I went to the next column and entered a 2 and yellow-highlighted that 2. I then ran Shared Matches on that Match, and put a 2 in the same column for each one of the SMs – again, some were closer cousins, higher in my spreadsheet, and some were lower, getting down to the end of the 4C SMs. Again, another column with all the same number. I made a blank row at the top of my spreadsheet with the numbers 1 to 20 in that row, and then froze that row, so I could see it as I scrolled down.
I chose more 4C Matches which did not yet have a number, gave them a yellow-highlighted number, ran the Shared Match for each and repeated the process.
Figure 1. Portion of Spreadsheet showing headers, columns Notes and icicles.
Down the spreadsheet, these numbers began to look like icicles hanging down. Nearer the top portion of the spreadsheet, some of the Matches had several numbers, and very near the top some of the Matches had many numbers. Think about this! We are actually looking at an upside down Tree – the trunk at the top and the various large branches (multiple numbers) hanging down, gradually separating into individual branches representing individual ancestors at the 4C or 5C or 6C level. This is as it should be! If we keep going we should find 16 couples at the 4C (or 3xG grandparent) level. Some more will be there because we are actually dealing with some 5C and 6C in the AncestryDNA “4C” category.
Time out for reflection on this process. This methodology is not as finite or rigorous as forming a Triangulated Group – which TG has only one basic solution. TGs are based on segments. These icicles are based on genealogy relationships. The SMs actually have In Common With relationships, and some of them don’t share the same Common Ancestors with you and the base Match. The SMs may relate in different ways, and thus not really all be on the same ancestral line. This often shows up in the spreadsheet when a 4C Match down the list winds up with numbers from more than one icicle. Often, in the Notes for that Match, it’s possible to determine which ancestral line (icicle) they belong. In this case I color their other icicle number red (for wrong icicle).
Another way to tell which ancestral line icicle is correct is to take a 4C Match (down the spreadsheet with multiple icicle numbers), and set them as the base, and run the Shared Match list. Usually it will be very clear that their SMs are on one icicle and not on the other icicle(s), maybe with one or two exceptions due to endogamy. If it really looks like that Match is creating an ambiguous mess, highlight that Match row in red and focus on the more “well behaved” Matches.
Remember, this is just a process and tool for you to use. You are in charge. Don’t become disoriented by a few Matches. Look at the big picture, and come back later – maybe much later, when the “dust has settled” and see if you can rationalize those few Matches.
Another thing to watch for is basically duplicate icicles. This happens when you pick a new 4C Match as the base, and the SMs wind up being almost exactly the same as a previous icicle. In this case merge the two columns. In any case use your judgment – if you want to keep the two, slightly different, icicles, OK; if you want to merge them, OK. At the end of the day it won’t make much difference.
At some point, after you have created many different icicles, it’s time to check your Notes (with MRCA and/or TG info in them) see if you can determine an ancestral thread running through the icicle. At the top of my spreadsheet, I inserted about 20 rows (one for each icicle column) and added the numbers in a diagonal fashion – like a matrix – with each number in its respective column and on a separate row. After each of these numbers I added a description that I felt best described the icicle – usually an Ahnentafel number with the surnames of the MRCA couple; or the TG ID# if that seemed to be the theme of the icicle – there being no consensus on the MRCA yet.
I am now in the process of rearranging the icicle columns based on information from the 2C and 3C Matches. Done correctly, this will wind up with all the paternal icicles on one side and the maternal icicles on the other side of the 20 columns I started with; and within those two sides will be the split between the grandparents (an objective of the Leeds process which has a similar methodology). Theoretically, by using 4Cs, we should be able to sort out the 16 MRCA couples with an icicle for each – or maybe two or more icicles for each of these 3xG grandparent couples. It depends on how far down the list we are willing to go.
Note that the process of forming these icicles does not depend on genealogy knowledge. I am using this process now for a friend with an orphaned grandfather. It would be the same process for an NPE or adoptee at the grandparent level. I’m winding up with multiple icicles, and with some info on the other 3 grandparents, I’ll determine which icicles are from the target grandfather. Then I’ll combine all the downloaded surnames from those Matches, combine them into one spreadsheet, sort on surnames and then analyze the surnames that look promising. I’ve already found a new surname for one of my own brick walls this way. I’ll review that process in a separate blog post, because it can be applied to any icicle which includes Matches with Trees.
On a side note about this icicle process… On almost every icicle I formed, I got distracted. For most of them I started with a 4C Match for whom I had a pretty firm MRCA and/or TG. As I worked down the Shared Match list, I’d see a Match with a 20-people Tree. I clicked on the Match and often quickly found a clue to follow, and it often led to the same MRCA. And now at AncestryDNA, we can see “Unlinked Trees” – BINGO! That sometimes led to an MRCA too – usually on the same ancestral line. It was like picking the low hanging fruit all over again. I had to force myself back to the boredom of filling in my icicles. It is work! But it appears to me that these icicles, like TGs, will each provide a cluster of Matches which is very helpful.
Extra Credit…. I have over 1,000 Hints – almost all of them with valid MRCAs. However, most of them are also more distant than 4C, some with fairly small shared segments under 10cM. But most of them have some Shared Matches. All of these SMs are all 4C or closer – I’m now going to see how many of these Hint Matches with MRCAs can be linked back to one of my icicles. Maybe these little cM Matches with MRCAs will give me important clues for the icicles. This is because a 6C Match can have 4C Shared Matches, even though the 4C Matches do not show the 6C in their Shared Match list. So it is worth my time to start with the 6C Matches with an MRCA Hint and see which 4C icicles they match.
Summary: Download or make a list of your top (closest 2-3%) Matches and group (or cluster) them through Shared Matches. I found it relatively easy to do this with a spreadsheet. It’s work, but provides interesting insights. The icicles formed are a tool to help us analyze our Matches. Often this lets us impute ancestral lines and/or TGs to other Matches.
Edit 20181030 – a portion of my speadsheet has been added as Figure 1. The Names and Admins were condensed for privacy. The Notes field has many “shorthand” entries that I have entered at AncestryDNA, and they are included in the spreadsheet download.
[19A] Segment-ology: Think Icicles!; by Jim Bartlett 20181030