Clustering Programs

A Segment-ology TIDBIT

A number of folks have asked me about the different Clustering Programs, so I thought I’d post some information to get you started.

Clustering analyzes your InCommonWith (ICW) Matches at a company, and groups Matches who are ICW each other the most. Each Match in a Cluster will be ICW with most (but usually not all) of the other Matches in the Cluster. With Cluster groups of 4 or more Matches, they tend to group on a specific Ancestor, which would impute the same Ancestor to every Match in the Cluster. NB: this is not a guarantee, but it appears to work almost all the time.

Clustering Programs:

Leeds Method by Dana Leeds (free)

https://www.danaleeds.com/ see the Video and updated methods

This began as a color coding method of grouping close Matches at AncestryDNA into four columns, one for each grandparent. It has been expanded.

Genetic Affairs by Evert-Jon “EJ” Blom (several spreadsheets free, then a small fee)

http://www.geneticaffairs.com/ Register first, then log in

– automates the retrieval of new genetic Matches from 23andMe, FTDNA and AncestryDNA to a periodic email; and the AutoCluster tool will cluster close/large Matches

DNAGedcom Client by Rob Worthen ($5/mo fee; $50/yr)

Register here to start: https://www.dnagedcom.com/

– log onto your DNA company, and download Match and ICW files

– use Collins” Leeds Methos 3D to run cluster report

Shared Clustering by Jonathan Brecher (free)

https://github.com/jonathanbrecher/sharedclustering/wiki/Quick-start

– installs program on your computer

-currently need to download Match and ICW files at DNAGedcom Client

MyHeritage – offers a free report by Genetic Affairs!

GEDmatch – offering a Genetic Affairs type report soon! Under Tier 1 ($10/mo fee)

My recommendations include:

– Use a large threshold (80cM to 200cM) first to get the hang of it. This will only include your closest cousins.

– If offered, use an upper threshold of 1000cM or so, to cull out parents, siblings, children, aunt/uncle – they only appear in one Cluster anyway, and don’t really add any value in most cases.

– Reducing the threshold will increase the number of Clusters, and those Clusters will tend to form on more distant Ancestors.

NB: Some additional Clustering Programs and ideas may show up in the comments below. I’ve used all of the programs above. I have also continued to do D I Y Clustering, outlined in a different Segment-ology blog post.

[22AF] Segment-ology: Clustering Programs TIDBIT by Jim Bartlett 20190404

7 thoughts on “Clustering Programs

  1. Once I had my FTDNA Match and ICW files downloaded by DNAGedcom, I used Shared Clustering to cluster all of my 5,731 Family Finder Matches into 352 Clusters in less than 60 seconds. I found a very, very high concordance with my 370 Triangulated Groups.

    Like

  2. Jim, you didn’t mention RootsFinder, a dynamite program that combines full function genealogy and excellent DNA features with not one but several clustering features. In fact it uses the triangulation reports from Gedmatch for all kinds of segment analysis and allows analysis of the DNA results from all the major DNA companies via DNAgedcom Client or other means. It recently partnered with (or was acquiredd by?) FindMyPast and will replace the dated Mocavo trees that FMP used. In fact, it combines the user’s tree to evaluate both DNA and genealogy. Take a look!

    Doris

    Like

    • Kay – they are two different things. Clustering groups your Matches by Ancestors. ThruLines shows you Common Ancestor of individual Matches – sort of like Hints and Circles, only much better. With a Hint, you and your Match must have a full genealogy path to a Common Ancestor. ThruLines also looks at Private Trees (which you cannot see) and connects the dots. ThruLines goes even farther and “borrows” information from other Trees to make connections to a Common Ancestor – AncestryDNA can quickly review millions of Tree to find a link – it would take you forever to do that. Using computer algorithms is not perfect, but I’m finding the error rate at well below 5% – only a very few. And I’m finding 3C, 4C, 5C that I would never have found – a lot of them. T

      Like

  3. I attempted to install Shared Clustering Tool and my Avast Anti-Virus software flagged the setup.exe as a potential virus; I’ve notified the developer.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s