Analyzing our Matches at AncestryDNA through Shared Matches
With about 20 million atDNA tests taken, we now have many Matches and lots of data. This blogpost describes one method (a “hack”) to manage some of that data. In this case, I’m trying to squeeze more information from my AncestryDNA Matches. Ancestry doesn’t report shared DNA segment data, but they do have several unique features we can put to use. In preparation, we need two things: Notes, which I’ve posted on here and here; and a spreadsheet of all the Matches.
- Notes. When known, I’ve been entering #A (with the Ahnentafel number and surnames of our MRCA Ancestors) and #T (with the Triangulated Group ID#) into the Note box on the DNA page for each Match at AncestryDNA. I have over 1,000 Hints, so I have a #A entry in the Notes box for each Hint. For hundreds who have uploaded to GEDmatch (or used another company too) I have a #T entry in the Notes box. This data is very helpful, but not essential.
- Spreadsheet of Matches. I have downloaded my 83,000 AncestryDNA Matches to a spreadsheet. This spreadsheet includes the Match name, the Admin, total cMs, all my entries in the Notes boxes, a URL link to the Match, and a URL link to their Tree. This is a great tool for exploring and managing my AncestryDNA Matches. I use DNAGedCom Client for a fairly quick download (small subscription fee).
So I took my spreadsheet (which is sorted on total cM) and added about 20 columns. I then chose an interesting 4th cousin (4C) Match and clicked on Shared Matches (SM). I then put a “1” in the first blank column of my spreadsheet for this 4C Match and also for each of our SMs. Since I started down the list with a middle-of-the-pack 4C, some of our SMs are higher in the spreadsheet, and some were below the SM I started with. (I yellow-highlighted the 1 for the SM I started with – to remind me who I started with). Note that the SMs are all classified as 4C or closer by AncestryDNA, although I’ve found a number of them to be really 5C or even 6C. But that’s OK – these “4C” Matches result in a manageable group of Matches – about 3,000 out of the total of 83,000. So I was working with the top 3 percent of my Matches – all sharing at least 20cM with me. This process resulted in a column with 1’s in it, generally spread out over all of the top 3,000 Matches.
Next, pick another interesting 4C (your choice – I picked one I knew was a real 4C on a pair of 3xG grandparents I’ve done a lot of work on and with whom I have several established Triangulated Groups (TGs) from the other companies. I went to the next column and entered a 2 and yellow-highlighted that 2. I then ran Shared Matches on that Match, and put a 2 in the same column for each one of the SMs – again, some were closer cousins, higher in my spreadsheet, and some were lower, getting down to the end of the 4C SMs. Again, another column with all the same number. I made a blank row at the top of my spreadsheet with the numbers 1 to 20 in that row, and then froze that row, so I could see it as I scrolled down.
I chose more 4C Matches which did not yet have a number, gave them a yellow-highlighted number, ran the Shared Match for each and repeated the process.
Figure 1. Portion of Spreadsheet showing headers, columns Notes and icicles.
Down the spreadsheet, these numbers began to look like icicles hanging down. Nearer the top portion of the spreadsheet, some of the Matches had several numbers, and very near the top some of the Matches had many numbers. Think about this! We are actually looking at an upside down Tree – the trunk at the top and the various large branches (multiple numbers) hanging down, gradually separating into individual branches representing individual ancestors at the 4C or 5C or 6C level. This is as it should be! If we keep going we should find 16 couples at the 4C (or 3xG grandparent) level. Some more will be there because we are actually dealing with some 5C and 6C in the AncestryDNA “4C” category.
Time out for reflection on this process. This methodology is not as finite or rigorous as forming a Triangulated Group – which TG has only one basic solution. TGs are based on segments. These icicles are based on genealogy relationships. The SMs actually have In Common With relationships, and some of them don’t share the same Common Ancestors with you and the base Match. The SMs may relate in different ways, and thus not really all be on the same ancestral line. This often shows up in the spreadsheet when a 4C Match down the list winds up with numbers from more than one icicle. Often, in the Notes for that Match, it’s possible to determine which ancestral line (icicle) they belong. In this case I color their other icicle number red (for wrong icicle).
Another way to tell which ancestral line icicle is correct is to take a 4C Match (down the spreadsheet with multiple icicle numbers), and set them as the base, and run the Shared Match list. Usually it will be very clear that their SMs are on one icicle and not on the other icicle(s), maybe with one or two exceptions due to endogamy. If it really looks like that Match is creating an ambiguous mess, highlight that Match row in red and focus on the more “well behaved” Matches.
Remember, this is just a process and tool for you to use. You are in charge. Don’t become disoriented by a few Matches. Look at the big picture, and come back later – maybe much later, when the “dust has settled” and see if you can rationalize those few Matches.
Another thing to watch for is basically duplicate icicles. This happens when you pick a new 4C Match as the base, and the SMs wind up being almost exactly the same as a previous icicle. In this case merge the two columns. In any case use your judgment – if you want to keep the two, slightly different, icicles, OK; if you want to merge them, OK. At the end of the day it won’t make much difference.
At some point, after you have created many different icicles, it’s time to check your Notes (with MRCA and/or TG info in them) see if you can determine an ancestral thread running through the icicle. At the top of my spreadsheet, I inserted about 20 rows (one for each icicle column) and added the numbers in a diagonal fashion – like a matrix – with each number in its respective column and on a separate row. After each of these numbers I added a description that I felt best described the icicle – usually an Ahnentafel number with the surnames of the MRCA couple; or the TG ID# if that seemed to be the theme of the icicle – there being no consensus on the MRCA yet.
I am now in the process of rearranging the icicle columns based on information from the 2C and 3C Matches. Done correctly, this will wind up with all the paternal icicles on one side and the maternal icicles on the other side of the 20 columns I started with; and within those two sides will be the split between the grandparents (an objective of the Leeds process which has a similar methodology). Theoretically, by using 4Cs, we should be able to sort out the 16 MRCA couples with an icicle for each – or maybe two or more icicles for each of these 3xG grandparent couples. It depends on how far down the list we are willing to go.
Note that the process of forming these icicles does not depend on genealogy knowledge. I am using this process now for a friend with an orphaned grandfather. It would be the same process for an NPE or adoptee at the grandparent level. I’m winding up with multiple icicles, and with some info on the other 3 grandparents, I’ll determine which icicles are from the target grandfather. Then I’ll combine all the downloaded surnames from those Matches, combine them into one spreadsheet, sort on surnames and then analyze the surnames that look promising. I’ve already found a new surname for one of my own brick walls this way. I’ll review that process in a separate blog post, because it can be applied to any icicle which includes Matches with Trees.
On a side note about this icicle process… On almost every icicle I formed, I got distracted. For most of them I started with a 4C Match for whom I had a pretty firm MRCA and/or TG. As I worked down the Shared Match list, I’d see a Match with a 20-people Tree. I clicked on the Match and often quickly found a clue to follow, and it often led to the same MRCA. And now at AncestryDNA, we can see “Unlinked Trees” – BINGO! That sometimes led to an MRCA too – usually on the same ancestral line. It was like picking the low hanging fruit all over again. I had to force myself back to the boredom of filling in my icicles. It is work! But it appears to me that these icicles, like TGs, will each provide a cluster of Matches which is very helpful.
Extra Credit…. I have over 1,000 Hints – almost all of them with valid MRCAs. However, most of them are also more distant than 4C, some with fairly small shared segments under 10cM. But most of them have some Shared Matches. All of these SMs are all 4C or closer – I’m now going to see how many of these Hint Matches with MRCAs can be linked back to one of my icicles. Maybe these little cM Matches with MRCAs will give me important clues for the icicles. This is because a 6C Match can have 4C Shared Matches, even though the 4C Matches do not show the 6C in their Shared Match list. So it is worth my time to start with the 6C Matches with an MRCA Hint and see which 4C icicles they match.
Summary: Download or make a list of your top (closest 2-3%) Matches and group (or cluster) them through Shared Matches. I found it relatively easy to do this with a spreadsheet. It’s work, but provides interesting insights. The icicles formed are a tool to help us analyze our Matches. Often this lets us impute ancestral lines and/or TGs to other Matches.
Edit 20181030 – a portion of my speadsheet has been added as Figure 1. The Names and Admins were condensed for privacy. The Notes field has many “shorthand” entries that I have entered at AncestryDNA, and they are included in the spreadsheet download.
[19A] Segment-ology: Think Icicles!; by Jim Bartlett 20181030
Jim, What does “NFT” mean in your notes?
No Family Tree
Pingback: Manual Clustering From the Bottom Up | segment-ology
Pingback: Icicles – Part 2 and Match Clustering | segment-ology
Your icicle technique is very similar to Dana Leeds technique, except you’re doing it bottom up and she’s doing it top down. https://www.danaleeds.com/
Louis – As you know, I’ve worked with Triangulated Groups for quite a while. I’ve gotten hundreds of AncestryDNA Matches to upload to GEDmatch (or finding them at GEDmatch, determined their AncestryDNA link). For all of these, I can immediately determine the TG. One of my brick walls is on a TG I call [01S24]. I have long since worked with the Shared Matches of Matches with [01S24]. It’s relatively easy to do with #T[01S24] as the first entry in the Notes field, always visible through the MEDBetterDNA Chrome extension. So I’ve had that in my AncestryDNA Helper and DNAGedCom Client downloads of Matches. I created columns for the TGs, but they were few and far between among many thousands of Matches. When I added the Shared Matches, it gave me a much clearer picture, but then I noticed several TGs with the same Shared Matches. Well, of course, I said to myself – they are probably from the same 3xG or 4xG grandparent. I realized the icicles had to come first and then they could be subdivided into TGs.
The ancestral lines and the TG segments can only fit in our chromosome map one way – like a jigsaw puzzle – there is only one correct configuration.
I became aware of Leeds technique through the GGT&T facebook page. For me, it was focused on the grandparents, and I already know, for sure, the grandparent for over 80% of my DNA (and I can pretty well guess most of the rest) – so that method was not helpful to me. Also, I’m not a “color” person (other than pink and blue for the sides) – it’s too limiting. As soon as I figured out the power of Tringulated Groups in 2012, I’ve been working on macro themes that apply to my whole spreadsheet – I cannot work with enough colors for that. It is important to determine the side for each TG, and then the grandparent and to walk all of our DNA back up our Trees. It turns out the TGs are good building block for that. I’m now learning that icicles, provide an intermediate level tool – between the broad grandparent level and the somewhat more distant Common Ancestor for each TG – a cluster approach.
But in the end, these are just tools – Leeds was rolled out for grandparents, and it appears to work well for that objective. Icicles are a somewhat more ambitious tool, but this technique can be used one at a time or across our whole data set, depending on each person’s objectives. I’m anxious to see where it will go. Jim
LikeLiked by 1 person
Thanks, Nancy – even working on a single icicle can lead to new discoveries. Jim
Thanks so much Jim for another great idea on how to slice and dice the icw records at Ancestry. I struggle to find the time to try this approach, so I’m hoping that a IT savvy user will create a program to do most of the icicle formation work before I retire so I can focus on the analysis.
June, Check the comments for this post. Filling out the whole thing would take a lot of time, but try just one icicle – maybe just 30 minutes. I found myself searching through some of the Shared Matches and finding Common Ancestors that were on the same ancestral line as the 4C I started with – it’s much easier when you have a particular ancestral line in mind, or even a surname. I’d look at the list of surnames for a Shared Match and find the target surname – with the spelling mangled a little bit (enough to preclude a Hint). Icicles provide a focus and a limited group of Matches to work with. And great for brick walls – I’ve already found a new surname!
Thanks Jim, it sounded very labor intensive but based on your suggested approach (one icicle at a time), I can definitely find 30 minutes. I’ve already downloaded the matches and icw Ancestry files.
Hi, I created a website Genetic Affairs which currently performs a default Leeds-like analysis (Leeds-like since I don’t condense the column yet) using 2nd /3rd cousins. However, I am working on a version that works with 4th cousins as well. If you like, you could try it and see how it performs? I can be reached @ email@example.com
EJ – with what I’ve outlined, DON’T try to condense columns (icicles) keep them separate (unless 2 icicles are clearly for the same 3xG or 4xG grandparent). This icicle process if focused on more distant Ancestors, not on the grandparents. Of course when you identify the distant Ancestors for each icicle, the parents, grandparents, great grandparents, etc., will naturally fall into place. Jim
EJ: I immediately thought of your Leeds output when I started reading this. I’ve been using the Leeds chart as a clustering tool since Dana’s first blog post and have been pondering how to leverage my DNAGedcom outputs to create them directly.
Jean – note that this is not precise like Triangulation – the overall icicle should follow an ancestral line, but some of the Matches in the icicle might be random matches to the base Match some other way. If you suspect this, set the suspicious Match as the base and see if most of the Matches are the same as the icicle, or not.
Jean: Thanks for using Genetic Affairs. I am currently testing the custom Leeds. Will update Facebook if we go online with this feature.
I only have 65,150 Ancestry matches.
Caith – this process just uses the top 2-3% – you have plenty of Matches to form icicles. Jim
I would be interested in viewing one of your spreadsheets online – google sheets maybe?
You lost me about 1/4 of the way through your comments. How about a few screen shots to explain visually what you mean??
LikeLiked by 1 person
Kathy, I’ll try to add a sample. Jim