Icicles – Part 2 and Match Clustering

Let me start by saying this Icicle methodology, here, has not been as useful or accurate as I thought it might be. I don’t want to steer the readers of this blog in the wrong direction.

I’ve used this Icicle method and expanded the number of my columns of icicles a lot. Some of them turn out to be very helpful, with many Matches from the same Ancestral line and/or from the same Triangulated Group (TG). However, many are not so helpful. After all, these Icicles are just In Common With (ICW) lists. ICW lists are found at AncestryDNA (called Shared Matches); at FamilyTreeDNA (called In Common With); at 23andMe (called Shared Relatives); at MyHeritage (called Shared DNA Matches); and at GEDmatch (called People who match both kits).

Just a reminder: to get an ICW list, the program starts with a list of all your Matches (at that company) and compares your list with a list of all the Matches of a “base” Match (which you select) – the ICW list is a list of all Matches which are on both lists. In other words, the comparison is based on Match names, or kit IDs, and nothing more than that. In general, you and your selected base Match will have a shared DNA segment and a Common Ancestor (CA). Your ICW Matches may, or may not, share the same DNA segment and/or the same ancestral line. This provides powerful information when there is such an alignment; but it’s just a list of data – not much help – when there isn’t an alignment.

The good news is that 23andMe and MyHeritage both tell you when there is shared DNA alignment with a Match. 23andme puts a “Yes” in their Shared Relatives list; and MyHeritage adds a Triangulation “icon” in their Shared DNA Matches list when the ICW Match aligns (or Triangulates). GEDmatch lets you compare two kits, so you can check for a shared DNA segment; and their Tier1 Triangulation tool will list the top Matches which Triangulate. Since FamilyTreeDNA also provides segment data, we can check Matches in an ICW list to see if they are on the same overlapping DNA segment that you and the base Match share. This means they are in alignment over 95% of the time; virtually 100% of the time when there are multiple (say at least 4, not closely related) Matches who meet this ICW AND same segment criteria. See also the DoubleMatchTriangulator*.

The above segment Triangulations are a much more accurate and reliable way to group (or form Clusters of) Matches. However, this process is not available through AncestryDNA (although segment Triangulation can be accomplished on AncestryDNA kits uploaded to GEDmatch). For AncestryDNA Matches a good process is clustering Matches into groups – a good way to analyze your Matches at AncestryDNA – a step up.

One method to cluster Matches at AncestryDNA is the Leeds Method* – usually used to form cluster groups at the grandparent level, although some are pushing this a generation or two farther. At those more distant levels, some amount of judgment is needed.

Another method is my Icicle method, here, but this has turned out to be a lot of work, with mixed results – sometimes a good, helpful, thread is found; often one is not easily found, or one may not exist. There is no “rule” or argument that says an ICW list must have an ancestral thread. It’s logical that one may exist, given that you and the “base” Match have a Common Ancestor, and therefore some of the ICW Matches may have a higher probability of having the same CA. One tactic is to group the Icicles by ancestral lines or TGs, by moving such Icicle columns to be adjacent to each other and noting a common thread. However, it’s probably somewhat easier to use one of the tools below.

Several new methods for automatic Clustering have come out recently. GeneticAffairs* (small fee) now has an AutoClustering tool that puts all of your Matches (above a threshold) into a matrix, noting which are ICW each other, and then grouping them into matrix “boxes”. These “boxes” have a high probability of the same Common Ancestor, because there are multiple Matches in alignment with each other. Depending on the threshold, you might get 8 or 16 or 32 matrix “boxes” – representing 1, 2, or 3xG grandparents. NodeXL* also forms AncestryDNA Match clusters. And DNAGedCom Client* (small fee) has recently added a clustering tool.

Ideally these matrix “boxes”, or clusters, will group many of your Matches under the correct ancestor. They result in high probability outcomes; however, they are not perfect. Close cousins may be from several of these ancestor clusters and thus cause some confusion in clustering. But usually we know where the close cousins go. Also some Match cousins may share multiple CAs with you.

BOTTOM LINE – The best Clustering technique is Segment Triangulation – basically guaranteeing a Common Ancestor on a specific segment, IMO. I have a total of about 380 TGs that cover all of my DNA. Segment Triangulation is available for all the companies, except AncestryDNA. For Ancestry DNA, there are several Clustering techniques, noted  above, that can be used to group Matches.

* More about these tools, and others, can be found through: https://isogg.org/wiki/Autosomal_DNA_tools


[19B] Segment-ology: Icicles – Part 2 and Match Clustering by Jim Bartlett 20190107