Manual Clustering to Find Ancestors

Below I will outline a process to find a Target Ancestor (TA) – often a Bio-Ancestor, or a brick-wall Ancestor, or maybe to confirm an “iffy” Ancestor.  This is a follow on to Manual Clustering From the Bottom Up. But first, here is a little background.

DNA – We all get exactly 1/2 of our autosomal DNA (atDNA) from each parent. Pretty close to 1/4 from each grandparent ; 1/8 from each Great grandparent; etc. Yes, beyond parents, these fractions are not exact, but for genealogy they are pretty close. The point is that for several generations going back, we get a lot of DNA from each Ancestor – and roughly the same amount from each one in any generation.

Matches – All things being equal, we would get roughly 1/2 of our Matches through each parent; 1/4 through each grandparent; etc. But all things are often not equal:

            1. We tend to get fewer Matches from Ancestors who are recent immigrants (say in the last 4-5 generations). It’s because most test takers are Americans.

            2. We will get fewer Matches from “skinny” families – an Ancestor who had 2 children will have far few descendants (and Matches) than an Ancestor who had 15 children.

            3. Endogamy results in many more Matches than usual (think Jewish, Mennonite, Polynesian, etc.)

Each of these factors will unbalance, or skew, the number of Matches we get for each Ancestor.

For the purposes of this process, I’m going to assume that our Matches are generally spread fairly evenly over our bio-Tree. To the extent this isn’t true in your case, this process may be more complex, or it may not even work.

When we are looking for a TA, the concept is that a good chunk of our DNA came from that Ancestor (depending on the number of generations back), and a good chunk of our Matches will be cousins – either to or through that Ancestor.  Although the Ancestor is not known to us, the TA did have 2 parents, 4 grandparents, etc., and those more distant Ancestors may be well known to our Matches. NB: an immigrant Ancestor may throw us a curve ball here.

The process overview is:

1. Group Matches;

2. Find the Common Ancestor (CA) in each group;

3. Build down from the CA to find links between groups (usually, but not always, a marriage);

4. Build down from those couples;

5. Repeat as necessary (usually down to parents or grandparents of the TA);

6. The end-game may involve date and location issues and further DNA target testing to isolate and identify the TA.

More step-by-step details:

Step 1: Group Matches – this is basically Manual Clustering at AncestryDNA. Start with the Match list from 400cM down. How far down depends on the generation of the bio/brick -wall Target Ancestor (TA). You want to go back 2 more generations. So, if the TA is a one of 4 grandparents (1C level), you’ll want Matches who are 3C level, say a 50cM lower threshold. You want 16 groups. NB: if you can filter out some Matches – say you know one “side” and can identify those Matches, you can cut the groups to 8. And you may be able to quickly identify 4 of those groups to a known grandparent. This would leave you with the 4 groups that represent the 4 grandparents of the TA.

1A. List these Matches in a spreadsheet or write on a piece of paper

1B. Select a Match about 3/4 of the way down the list [avoid starting at the top!]

1C. Open that Match’s Shared Match (SM) list

1D. Put an A next to that Match and each Match on your List who is on the SM list. This forms a Manual Cluster A, which tends to have a Common Ancestor (CA).

1E. Start over at Step 1B, selecting a Match who is not A, and use B.

1F. Repeat as often as necessary, using new letters, until all Matches on your list have at least one letter. NB: The Matches at the top of your list may wind up with multiple letters.

1G. If lower cM Matches have multiple letters, review their SM list – usually one of the letters is a one-of-a-kind and that letter can be deleted. If there is a lot of overlap, between two letters, they can be combined, using one letter.   Use judgment.

Usually this Step can be done in a few hours.

Step 2: Find CA for each Group – this takes some poking around…

2A. Select a Group, and open any available Trees (including Unlinked Trees)

2B. Type/write next to the Match the closest 10-15 surnames

2C. Repeat for as many Matches in the Group as possible

2D. Look/search among the surnames for common surnames

2E. Open the Match Trees and select Ancestor information with the common surnames

2F. Analyze and record the probable Common Ancestor for the Group [if necessary, look at more SMs for the Group Matches for confirmation of the CA]

2G. Repeat for each Group

2H. Note the place/date-range for each CA [these may be a clue to links between Groups]

This Step will take a little longer, depending on whether you want a quick result, or if you want to document the CA with records for the longer term.

If you recognize some Groups as being from known Ancestors in your Tree, they can be set aside. Ideally you want to end up with 4 Groups who would represent the 4 grandparents of the TA.

Step 3. Build down from the CA to find links between Groups – a genealogy exercise…

3A. Use genealogy tools to list the children, spouses, and grandchildren of the CA

3B. Pay particular attention to dates and places

3C. Sometimes a marriage between Groups will pop right up; but sometimes it takes a process of elimination (dates/places help here). It’s possible the bio-parents were not married; or other scenarios. You’ve narrowed the possibilities down a lot, but sometimes, there just isn’t a record of what really happened.

3D. Repeat for other groups

3E. Once you have linked some Groups (by marriage or by place/date or by ethnicity, etc), this helps link the remaining Groups.

If records exist, this Step may follow relatively easily; if not, follow-on DNA testing may be necessary

Bottom line: This Step will provide some family lines that are ancestral to the TA. The top DNA Matches have led you to specific CAs. These may, or may not, mesh with information you already had.

Steps 4 & 5 – see Step 3

Step 6. The end-game – This may involve date and location issues and further DNA target testing to isolate and identify the TA. The best solution is that the TA is obvious. However, sometimes the TA is still buried, but you are somewhat closer.

Sidebar: I do this manually in Excel and Word. It is possible to use one of the auto-Cluster programs to group the Matches. However, I prefer “getting to know” the Matches and their Trees, and this process is fairly straightforward. It also lets me see any overlap between groups. I prefer to manually Cluster for a targeted case. I use the auto-Cluster programs when I’m grouping my entire Match list.

In one recent case I did, the marriage between two Groups popped right up – no secrets. The children included 5 potential sons as the bio-father. All five from PA, went into WWII, 4 came back to PA; and one settled in another state a few blocks from the bio-mother!! We’d never have sorted this out without the process above. And luck is sometimes the key factor.

In another case I’ve worked on for years, I used SMs and records to group many Matches on 8 Great grandparents, and 4 grandparents of the TA. Places and dates all work out, and all the top Matches are in agreement. WATO points to the same place. It now appears the father and mother were not married, and both of them apparently died without out any other issue, or any records.  It’s frustrating to have basically 100% of the Matches all pointing at the same TA, but without revealing the parent’s names. No luck on this one – just a lot of work. Maybe someday a Newspaper article from the 1880s will shed some light. The DNA can only do so much…

SUMMARY – The process above is my current best practice to squeeze out what I can from Matches and Shared Matches at AncestryDNA. This whole process can be done on notebook paper, in a relatively short time, but I still prefer Excel. Note that the process does not depend on knowing any genealogy of the TA, it relies totally on information from Matches and Shared Matches. Hopefully the TA, the last puzzle piece, “fits”.

[19M] Segment-ology: Manual Clustering to Find Ancestors by Jim Bartlett 20220226

Manual Clustering From the Bottom Up

Clustering DNA Matches results in groups that tend to form on one Ancestor. Clustering is a great tool for grouping our Matches. And, if we can figure out the Ancestor for the Cluster, there is a very high probability that the rest of DNA Matches in the Cluster will also have the same Ancestor. In this case the Cluster becomes a “pointer” or a “focus” for investigating the rest of the Matches in that Cluster. This is powerful. In several cases, I’ve been able to use this focus to find a Common Ancestor with a Match who had only one parent in their Tree! I knew the who, what, when and where for my search…  Of course, it’s always easier with closer Match cousins, but I’ve been dogged, and successful, even when I needed to build the Match’s Tree out a number of generations back.

There are Auto-Clustering tools which I covered here. Several of these have been improved since I wrote that post in April 2019. 

However, for relatively straightforward tasks/issues (like finding a bio-Ancestor, or tackling a specific Ancestor), we can also manually Cluster our Matches. The classic example is the Leeds Method: the closest Matches (90-400cM range), will usually form into 4 groups (Clusters) which align with our 4 grandparents (which Cluster is which grandparent is still a genealogy task). This usually works very well because the 90-400cM Matches tend to be 2C and 3C whose MRCAs should align with the grandparent level.

In the following case, I was looking for two Great grandparents. The subject’s maternal side was known (and was from a different continent). His father was known to be his bio-father, but his paternal bio-grandparents were unknown. At 23andMe I share 40cM – and our Y-DNA is the same unusual E-V13. So, I was sure his male line was a BARTLETT. A quick search of his AncestryDNA Matches showed many WV BARTLETT Matches (at least 17 ranging from 30 to 269cM) – and a clear “hot spot” in my fairly extensive BARTLETT Tree. But the “spouse” was not readily apparent, nor were the other surnames that had to populate his father’s ancestry.

I decided to use Manual Clustering. It was easy to “dot” the maternal-side Matches from another continent, leaving only the paternal-side Matches (at AncestryDNA). I decided to list them down to about 50cM – this would include most of the 3C and 4C. Note: 3C Matches would have 2xG grandparent *couples* as the MRCA, which would identify a Great grandparent – I am looking for four Great grandparents, one of whom is probably a BARTLETT. The 4C Matches would potentially take me back another generation – but that’s OK. The surnames I’m looking for must fill the ancestral boxes from 2 grandparents going back.

So, I typed the top paternal-side Matches in Column A of a spreadsheet, and put the cMs in Column B (for reference). All that remained was to pick a Match, put an A in Column C, open the Shared Match list, and add an A in Column C for each Match who was a Shared Match. Then put B in column D for a Match who did not have an A in Column C, open that Shared Match list and add Bs in column D for each Match who was a Shared Match. Continue. I actually blogged about this process (Think Icicles!) in 2018 here, and in 2019 here. But it didn’t work very well. For one thing there was too much overlap.

As I thought about this process again, it struck me that instead of icicles, I should have used a stalactite analogy. Stalactites hanging from a cave ceiling might have given me a clue to the problem. The cave ceiling was more like a very close relative whose DNA was spread over many different lines (different stalactites). I should not have started with the top of the list of Matches. The top of the list are the Matches with multiple segments which can represent multiple stalactites. Those large Matches have an affinity for several Clusters (but can only be placed in one of them). In a Cluster Matrix, they would have a lot of gray cells. Maybe the trick is to start closer to the bottom of the list and work up…  It’s more like working with stalagmites, where there is only one for each source of dripping water.

It actually worked much better!

I started with a 60cM Match, and typed an A (in Column C) for that Match and all the Shared Matches. Then selected the next Match down the list without an A, and typed a B (in column D) for that Match and all the Shared Matches. By the time I got to the bottom of the list, I had 7 groups (A through G) with only a couple of overlaps. They looked like Icicles or Stalacmites… The overlaps were clearly one-time events which could be ignored. I then worked my way back up the list – starting with the 61cM Match. Most of them clearly fell into only one of the 7 groups, while some of the larger Matches began having Shared Matches in two or three groups. This almost always means, that these Matches are 2C or 3C who will span multiple groups (an important clue for those Matches).

The next step was to look at all the available Trees and type the closest, say, 8 surnames in the row for the respective Matches.  I then use a Word document (or scratch paper would also work), to outline Trees for duplicate surnames. I was looking for the Ancestors of a man born c1900 in Harrison Co, WV. The Trees of the 50cM-and-above Matches, tended to be from the same area, and ranged through the 1800s as expected. I outlined families for 8 surnames. Most of them interconnected, and I was able to go back to the groups, and, knowing what I was looking for, I teased out a number of additional Trees that linked. In this process, I also found the intermarriages between the groups. As it turns out this case had an extra degree of difficulty. All of this data and the Trees pointed to a man who never married and a woman who never married. Quite possibly the bio-father never knew…


Manual Clustering of the top Matches is a relatively simple task. In this case, it involved about 65 Matches, ranging from 50cM to 269cM. KEY: Working from 60cM down – grouping Shared Matches by letters in a spreadsheet, resulted in 7 groups. Then, working from 61cM up, it was pretty easy to add those Matches to the extant groups (a few to multiple groups). It didn’t take long to open the available Trees and note the closest surnames. Duplicate surnames in a group, led to skeleton family outlines for most of the groups. This then provided a “pointer” the relook at, and extend, small Trees and Unlisted Trees, to build out the outlines some more. By that point it was clear which families had married the other families (the closest Matches were a big help here). And so the Ancestry was built. One Quality Control check, is to search for other Matches who have these Ancestor surnames in their Trees – particularly finding the MRCAs with Matches below 50cM. Once you know, or even think you know, the Ancestors, the Matches should also have those ancestral lines.

[19L] Segment-ology: Manual Clustering From the Bottom Up by Jim Bartlett 20220215