WTCB Observations and Advice

BLUF: 1. Focus on lower cM Matches in a Cluster to determine the Common Ancestor; 2. Reduce the Cluster upper limit to cull out closer Matches once their ancestral line is imputed to more distant Matches. Known Matches impute to unknown Matches who carry the ancestral line to the next group of Clusters which split only to the parental Ancestors of the previous Matches. Unknown Clusters may well be Bio-Ancestors.

In case you missed one of my many blogposts on Clusters: Clusters form on Ancestors!

This comes from two “facts”: 1. Each of our DNA Matches shares at least one segment of DNA with us that came from a Common Ancestor (CA) – the basic tenant of genealogy DNA testing – the caveat being the segment needs to be Identical By Descent (IBD); i.e. a true segment; and 2. Matches who share the same CA will tend to show up on each other’s Shared Match lists. The inverse is that when a group of Matches show up on each other’s Shared Match lists (i.e. each of your SM lists with them include many of the same Matches), they will almost always share the same CA.

“Clusters form on Ancestors” is a powerful observation – when it happens… And beware the Cinderella slipper – don’t try to force fit a Match into a Cluster if they only share with one or two other Matches – easily seen in SM lists and on the fringes of some Cluster diagrams.

So let’s dive a little deeper. A lot depends on the mix of Matches that are being Clustered. In a perfect world we’d like to Cluster, say, only 3rd cousins (3C) – the resulting 8 Clusters (hopefully) would be 1 Cluster for each Great grandparent. The average for a 3C is 73cM, but a true 3C can range from 7cM to 234cM (per the Shared cM Project 4.0). The point is there is NO cM range that would only include 3C. And it only gets worse with 4C and beyond (and if you follow me – I go way beyond 4C). So: Live with it! We can take some measures to tighten up our Clusters as we Walk The Clusters Back (WTCB).

When we start with an 80 or 90cM lower threshold for a Cluster run, we usually get 4 Clusters, with one for each grandparent. These Clusters tend to follow the rule. But beyond that, with smaller cMs and more distant cousin-Matches, the randomness of atDNA comes into play. We can say the growing numbers of Clusters (as we lower the cM threshold) will tend* to a CA. But I use “tend” because it’s not a guarantee – it cannot be a rock solid rule – “the random DNA didn’t get the memo”.

So, can we have a Cluster using a Cluster run of 60-300cM have 2C, 2C1R, 3C, 3C1R, and 4C Matches in it? Absolutely! They should all be on the same line, but that brings up two important points.

1. Old saying: “Everybody has to be someplace”. The 60-300cM range covers all those cousinships (and more); and in Clustering, every Match “has to be someplace” – it will go into some Cluster! Some of the higher cM Matches (closer cousins) may well have gray-cell links to other Clusters. The way I think about it is that they are “confused” about which Cluster to be in – they are tugged in several directions – but the Cluster algorithm always picks one Cluster. My advice for these Clusters is to focus on the CAs of the smallest cM Matches in each Cluster – usually the most distant Matches – to determine the CA of the Cluster. Hopefully we’ll get a clear consensus (but remember Cinderella’s slipper).  The higher cM Matches in each Cluster often will have gray-cell links to other Clusters – this serves as a QC (Quality Control) check that the several Cluster CAs are related and appropriate. It also confirms these higher cM Matches are closer cousins, descending from all the CAs of gray-cell-linked Clusters.

2. It will also help to reduce the upper cM limit, to cull out some (but probably not all) of the closer cousins as the lower threshold is reduced in the WTCB process. These “closer cousins” have already “done their job” for WTCB. They have helped determine Matches who are one more generation back. In other words, your 2C Matches will help identify your 3C Matches (who have to be from one or the other of the 2C parents). At each Cluster run this information is imputed to the other Matches in their respective Clusters. This is not perfect – there will also be some 2C1R, 3C1R, half 3C, 4C1R in the mix. But it gives you a much better/tighter picture of the CA of each Cluster. These imputed/”tagged” 3C Matches will carry the ancestral thread to the next round of Clusters. Remember, going back in generations, there are only 2 possibilities in the next generation – the father or the mother. Reducing the upper cM threshold will cull out Matches that have already “passed on” their Ancestral line, and will force each new Cluster to group on itself.

The point is to make successive Cluster runs, lowering the thresholds each time to get more Matches, who tend to be a little more distantly related and will divide up into new Clusters which will be a little more distant. In each of these new Clusters there should be a mix of “old” Matches (from previous Clusters, some with known relationships, and some with imputed CAs), and “new” Matches (some with known relationships and CAs, and some unknowns which will be imputed based on analysis of all the Matches in the new Cluster).  

Note 1: Usually, I use CA to note the Common Ancestral Couple between myself and a Match. Clusters tend to form on specific Ancestors. Are they individual Ancestors or the parental couple? I’m not real sure. I will say that I rarely find a Cluster that I can identify solely to a female Ancestor. This makes sense because most of the time the husband/wife couple are together. So I will continue to use the male Ahnentafel number to describe my Cluster CAs.

Note 2: WTCB is a relatively easy process to start, but with each iteration it gets harder – both because of the approximate doubling of Matches at each step but also the inevitably difficult Cluster(s) to sort out (probably a brick wall). In any case you can start manually by just walking down your list of Matches in cM order and coding them (I’d use the Ahnentafel Number) and checking with their respective Shared Match lists. Stop whenever you want. (For me, the first two WTCB iterations were easy (a few hours); and then I worked on one a day for several more…. Your results will vary, depending partly on the amount of “Notes homework” you’ve already done.

Note 3: This is a great tool for bio-Ancestors, Brick Walls, NPEs, etc. Using this WTCB process will identify known Clusters. Some may leave you stumped (a few did for me). One reason you may be stumped is because you have no known/imputed Matches for a new Cluster – just Matches staring back at you with no clue how you are related. The WTCB Cluster comes to a halt. Now’s the time to examine all of the available Trees from the Cluster. If the Cluster Matches have their own CA, you have a BINGO! That’s probably your Ancestor, too. Check any gray-cell links to other Clusters to learn more about where in your Tree this could fit.

[19O] Segment-ology: WTCB Observations and Advice by Jim Bartlett 20231130

2 thoughts on “WTCB Observations and Advice

  1. Jim, I think I’ve asked this question before, but I don’t recall your answer. At what point do you start decreasing the upper threshold? What’s your guidance for how to determine when and how much to decrease the upper limit?

    Thanks!

    Like

    • There is no firm answer. The idea is to let the closest Matches pass their info to other Matches in a Cluster, and then they should drop out – and let the more distant Matches “carry the ball”. I started with 80-300cM, and lowered the lower threshold to 70, then 60. About that time I reduced the upper limit to 100. I looked at the upper cM Matches in the 70cM run, and saw that the Matches over 100cM were not adding anything and they were getting in the way of the smaller Matches pulling the Cluster to a more distant Ancestor. I think it’s practical to lower the upper and lower cM the same – in other words keeping a range of about 40 cM. This is not perfect – the DNA *is* random. If you find a different spread works better for you, please post that comment here (along with an overview of your Ancestry). In my case 3/4 of my Ancestry is from Colonial Virginia and is not endogamous – but several instances of pedigree collapse. My maternal grandmother (1/4 of my Ancestry) had immigrant Ancestors (which account for less than 2% of my Matches… Hope this helps, Jim

      Liked by 1 person

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.