Clustering DNA Matches results in groups that tend to form on one Ancestor. Clustering is a great tool for grouping our Matches. And, if we can figure out the Ancestor for the Cluster, there is a very high probability that the rest of DNA Matches in the Cluster will also have the same Ancestor. In this case the Cluster becomes a “pointer” or a “focus” for investigating the rest of the Matches in that Cluster. This is powerful. In several cases, I’ve been able to use this focus to find a Common Ancestor with a Match who had only one parent in their Tree! I knew the who, what, when and where for my search… Of course, it’s always easier with closer Match cousins, but I’ve been dogged, and successful, even when I needed to build the Match’s Tree out a number of generations back.
There are Auto-Clustering tools which I covered here. Several of these have been improved since I wrote that post in April 2019.
However, for relatively straightforward tasks/issues (like finding a bio-Ancestor, or tackling a specific Ancestor), we can also manually Cluster our Matches. The classic example is the Leeds Method: the closest Matches (90-400cM range), will usually form into 4 groups (Clusters) which align with our 4 grandparents (which Cluster is which grandparent is still a genealogy task). This usually works very well because the 90-400cM Matches tend to be 2C and 3C whose MRCAs should align with the grandparent level.
In the following case, I was looking for two Great grandparents. The subject’s maternal side was known (and was from a different continent). His father was known to be his bio-father, but his paternal bio-grandparents were unknown. At 23andMe I share 40cM – and our Y-DNA is the same unusual E-V13. So, I was sure his male line was a BARTLETT. A quick search of his AncestryDNA Matches showed many WV BARTLETT Matches (at least 17 ranging from 30 to 269cM) – and a clear “hot spot” in my fairly extensive BARTLETT Tree. But the “spouse” was not readily apparent, nor were the other surnames that had to populate his father’s ancestry.
I decided to use Manual Clustering. It was easy to “dot” the maternal-side Matches from another continent, leaving only the paternal-side Matches (at AncestryDNA). I decided to list them down to about 50cM – this would include most of the 3C and 4C. Note: 3C Matches would have 2xG grandparent *couples* as the MRCA, which would identify a Great grandparent – I am looking for four Great grandparents, one of whom is probably a BARTLETT. The 4C Matches would potentially take me back another generation – but that’s OK. The surnames I’m looking for must fill the ancestral boxes from 2 grandparents going back.
So, I typed the top paternal-side Matches in Column A of a spreadsheet, and put the cMs in Column B (for reference). All that remained was to pick a Match, put an A in Column C, open the Shared Match list, and add an A in Column C for each Match who was a Shared Match. Then put B in column D for a Match who did not have an A in Column C, open that Shared Match list and add Bs in column D for each Match who was a Shared Match. Continue. I actually blogged about this process (Think Icicles!) in 2018 here, and in 2019 here. But it didn’t work very well. For one thing there was too much overlap.
As I thought about this process again, it struck me that instead of icicles, I should have used a stalactite analogy. Stalactites hanging from a cave ceiling might have given me a clue to the problem. The cave ceiling was more like a very close relative whose DNA was spread over many different lines (different stalactites). I should not have started with the top of the list of Matches. The top of the list are the Matches with multiple segments which can represent multiple stalactites. Those large Matches have an affinity for several Clusters (but can only be placed in one of them). In a Cluster Matrix, they would have a lot of gray cells. Maybe the trick is to start closer to the bottom of the list and work up… It’s more like working with stalagmites, where there is only one for each source of dripping water.
It actually worked much better!
I started with a 60cM Match, and typed an A (in Column C) for that Match and all the Shared Matches. Then selected the next Match down the list without an A, and typed a B (in column D) for that Match and all the Shared Matches. By the time I got to the bottom of the list, I had 7 groups (A through G) with only a couple of overlaps. They looked like Icicles or Stalacmites… The overlaps were clearly one-time events which could be ignored. I then worked my way back up the list – starting with the 61cM Match. Most of them clearly fell into only one of the 7 groups, while some of the larger Matches began having Shared Matches in two or three groups. This almost always means, that these Matches are 2C or 3C who will span multiple groups (an important clue for those Matches).
The next step was to look at all the available Trees and type the closest, say, 8 surnames in the row for the respective Matches. I then use a Word document (or scratch paper would also work), to outline Trees for duplicate surnames. I was looking for the Ancestors of a man born c1900 in Harrison Co, WV. The Trees of the 50cM-and-above Matches, tended to be from the same area, and ranged through the 1800s as expected. I outlined families for 8 surnames. Most of them interconnected, and I was able to go back to the groups, and, knowing what I was looking for, I teased out a number of additional Trees that linked. In this process, I also found the intermarriages between the groups. As it turns out this case had an extra degree of difficulty. All of this data and the Trees pointed to a man who never married and a woman who never married. Quite possibly the bio-father never knew…
Manual Clustering of the top Matches is a relatively simple task. In this case, it involved about 65 Matches, ranging from 50cM to 269cM. KEY: Working from 60cM down – grouping Shared Matches by letters in a spreadsheet, resulted in 7 groups. Then, working from 61cM up, it was pretty easy to add those Matches to the extant groups (a few to multiple groups). It didn’t take long to open the available Trees and note the closest surnames. Duplicate surnames in a group, led to skeleton family outlines for most of the groups. This then provided a “pointer” the relook at, and extend, small Trees and Unlisted Trees, to build out the outlines some more. By that point it was clear which families had married the other families (the closest Matches were a big help here). And so the Ancestry was built. One Quality Control check, is to search for other Matches who have these Ancestor surnames in their Trees – particularly finding the MRCAs with Matches below 50cM. Once you know, or even think you know, the Ancestors, the Matches should also have those ancestral lines.
[19L] Segment-ology: Manual Clustering From the Bottom Up by Jim Bartlett 20220215