Here is a report on my first Match Clustering effort.
- I used a download of my AncestryDNA Matches above 20cM (I only had a few real 3rd cousins (3C) and below, and I just left them in.
- I have made extensive use of the Notes for as many Matches as I can – all of my almost 1,000 Hints; and maybe 1/4, so far, of all my 4C and closer. NB: AncestryDNA uses 20cM as the threshold for 4C designations, but many Matches in this group are 5C and 6C and I’ve found some who are 7C and 8C, with larger than average shared segments over 20cM.
- For every Match I can, I put the Shorthand CA ID and/or Shorthand TG ID in the Note box for that Match. See the Explanation of Header row below for links that explain these IDs.
- For each Match, I also put a line in each Note which includes a summary of the CA and TG IDs found in all the Match’s Shared Matches (SM). So even a Match with a Private Tree, or No Tree, or scrawny Tree, or can’t-find-anything-in-it large Tree, will get a line summarizing their SMs. This summary often provides a very specific “pointer” to a CA and/or TG. And this added info is very helpful in analyzing Clusters.
When I ran the Cluster Matrix, I developed this summary report:
Next is a spreadsheet with the 86 Clusters, re-sorted on the CA.
Explanation of Header row:
Cluster – the Cluster # in the Cluster Spreadsheet presented to me.
First & Last – the Match # range included in this Cluster (Matches go from 1 to 3571)
SMs – the number of Shared Matches in each Cluster – a wide range…
CA – the CA ID (an Ahnentafel # – see this blogpost). When various Matches had CAs from different generations, but all on the same line, I used the most distant CA – Walking the Ancestor Back. A few Clusters had multiple CA lines, but I used CAs that Walked Back or were repeated several times.
Gen – as a convenience, I noted the generations back to the CA
TGs – the TG ID (see this blogpost). I all cases (I think) the last two numbers in each TG ID (being the TG grandparent) are in agreement with the CA ID. A number of Clusters have multiple TGs.
NB: The CAs and TGs come from my typed Notes for some Matches (I just haven’t gotten to all 3,571 of them, yet). The Notes are based on valid data – from the Match or GEDmatch (i.e. not guesses by me), but I’m fully aware that some of it is not conclusive; and another, closer and/or different, CA may be found. The TGs should not change, but often a Match will have multiple TGs, and only one would apply to the specific Cluster or CA.
Figure 1. Summary of 86 Clusters
A few notes on this data:
- I am sure that, eventually, the Clusters at the top of this table will be found to link to more distant Ancestors – I just haven’t found them yet.
- I am sure that, eventually, the two Clusters in Gens 10 and 11 will wind up with different, closer CAs – I just haven’t found them yet (there are relatively few Matches in each of these Clusters)
- For the bottom 9 Clusters, I do have TGs, so I can use Matches from other companies (already included in these TGs in my Master Spreadsheet), to find likely (or at least possible) CAs. It’s just that no CAs have been determined yet at AncestryDNA for the Matches in these Clusters.
- In Gen 9, CA 856 is my prolific and well documented HIGGINBOTHAM Ancestor; and I’ve Walked this Ancestor Back in at least two TGs. There are several lines from this Ancestor who intermarried.
- In Gen 8, Cluster 61, over 100 Matches – this was a brick wall at Gen 5, until I found several dozen Matches in Gen 6-8 with CUMMINS/CUMMINGS Ancestry, which I have subsequently researched into one Tree – also a prolific line. And a new branch of my Tree!
- I’m sure there will be unfolding stories about other of these Clusters – I’m excited to see the way this is trending.
[22AC] Segment-ology: Match Cluster Report 1 – by Jim Bartlett 20190214