Clustering Programs

A Segment-ology TIDBIT

A number of folks have asked me about the different Clustering Programs, so I thought I’d post some information to get you started.

Clustering analyzes your InCommonWith (ICW) Matches at a company, and groups Matches who are ICW each other the most. Each Match in a Cluster will be ICW with most (but usually not all) of the other Matches in the Cluster. With Cluster groups of 4 or more Matches, they tend to group on a specific Ancestor, which would impute the same Ancestor to every Match in the Cluster. NB: this is not a guarantee, but it appears to work almost all the time.

Clustering Programs:

Leeds Method by Dana Leeds (free)

https://www.danaleeds.com/ see the Video and updated methods

This began as a color coding method of grouping close Matches at AncestryDNA into four columns, one for each grandparent. It has been expanded.

Genetic Affairs by Evert-Jon “EJ” Blom (several spreadsheets free, then a small fee)

http://www.geneticaffairs.com/ Register first, then log in

– automates the retrieval of new genetic Matches from 23andMe, FTDNA and AncestryDNA to a periodic email; and the AutoCluster tool will cluster close/large Matches

DNAGedcom Client by Rob Worthen ($5/mo fee; $50/yr)

Register here to start: https://www.dnagedcom.com/

– log onto your DNA company, and download Match and ICW files

– use Collins” Leeds Methos 3D to run cluster report

Shared Clustering by Jonathan Brecher (free)

https://github.com/jonathanbrecher/sharedclustering/wiki/Quick-start

– installs program on your computer

-currently need to download Match and ICW files at DNAGedcom Client

MyHeritage – offers a free report by Genetic Affairs!

GEDmatch – offering a Genetic Affairs type report soon! Under Tier 1 ($10/mo fee)

My recommendations include:

– Use a large threshold (80cM to 200cM) first to get the hang of it. This will only include your closest cousins.

– If offered, use an upper threshold of 1000cM or so, to cull out parents, siblings, children, aunt/uncle – they only appear in one Cluster anyway, and don’t really add any value in most cases.

– Reducing the threshold will increase the number of Clusters, and those Clusters will tend to form on more distant Ancestors.

NB: Some additional Clustering Programs and ideas may show up in the comments below. I’ve used all of the programs above. I have also continued to do D I Y Clustering, outlined in a different Segment-ology blog post.

[22AF] Segment-ology: Clustering Programs TIDBIT by Jim Bartlett 20190404

15 thoughts on “Clustering Programs

  1. Pingback: Wlaking the Clusters Back (WTCB) 2022 | segment-ology

  2. Pingback: Manual Clustering From the Bottom Up | segment-ology

  3. Pingback: Breaking Down Brick Walls | segment-ology

  4. Pingback: Walking The Clusters Back | segment-ology

  5. Pingback: Shared Clustering – A Great Tool! | segment-ology

  6. Pingback: Grouping Matches – Try It! | segment-ology

  7. Jim, I help Asian children (adults) born to U.S. servicemen and local women from Korea, Vietnam and the Philippines. About half the time it comes down to sending them a DNA test due to a lack of documentation and faded memories. Sometimes we’re lucky and they will have a 1st cousin match. Other times they will only have 3rd cousin matches and beyond which makes the task to pinpoint the father fairly difficult if not impossible………. mostly due to my lack of familiarity with clustering programs. It sounds like I’m kind of doing what you’re doing but in reverse and is analogous to forensic genealogy in solving crimes. Do you think GEDmatch would be most useful to me but flipping the threshold parameters or something else? Thanks, Rex

    Like

    • Rex,
      I’ve been a big fan, and proponent, of Triangulation – it locks in segments to TGs that come to you down one Ancestral line – it’s finite. I have about 370 TGs which form a map of sorts of all my DNA. It organizes ALL my Match/segments into one of those TGs. However TGs come at a price in time – it takes time to keep up with the influx of new Match/segments. The payoff is smaller groups with one Common Ancestor. I’m now a fan of Clustering – there is a very high correlation between Clusters and TGs – generally no more that two TGs per Cluster or 2 Clusters per TG. Now I start with a clustering Matrix use VLOOKUP to transfer the Cluster numbers to my spreadsheet – the Triangulation is then much easier. Remember, each segment of your DNA comes from a specific Ancestor. Clustering, Painting, Triangulating, etc. are all tools to group your Matches and their segments – this grouping often provides great insights to Common Ancestors. There is only one correct CA-line for each group, so as you build out a chromosome map (with TGs, Clusters, Painting) your work gets easier. In the end, however, we still have to do the genealogy…
      I’m now trying to develop a process for “Walking the Clusters Back” – that is set the thresholds for a few clusters, which are generally easier to link to grandparents, and then adjust the threshold a little to get more Clusters and “tag” them with Matches who were in grandparent clusters (which narrows down the possibilities for more distant Clusters to the parents of closer Clusters – I need to work on my language/description some to make it clearer – but the distant Clusters should “next” in the closer Clusters – or closer Clusters will subdivide into more distant Clusters (the problem is the close cousin Matches only go with one Cluster) – I need to work on describing this. Jim

      Like

  8. I attempted to install Shared Clustering Tool and my Avast Anti-Virus software flagged the setup.exe as a potential virus; I’ve notified the developer.

    Like

    • Kay – they are two different things. Clustering groups your Matches by Ancestors. ThruLines shows you Common Ancestor of individual Matches – sort of like Hints and Circles, only much better. With a Hint, you and your Match must have a full genealogy path to a Common Ancestor. ThruLines also looks at Private Trees (which you cannot see) and connects the dots. ThruLines goes even farther and “borrows” information from other Trees to make connections to a Common Ancestor – AncestryDNA can quickly review millions of Tree to find a link – it would take you forever to do that. Using computer algorithms is not perfect, but I’m finding the error rate at well below 5% – only a very few. And I’m finding 3C, 4C, 5C that I would never have found – a lot of them. T

      Like

  9. Jim, you didn’t mention RootsFinder, a dynamite program that combines full function genealogy and excellent DNA features with not one but several clustering features. In fact it uses the triangulation reports from Gedmatch for all kinds of segment analysis and allows analysis of the DNA results from all the major DNA companies via DNAgedcom Client or other means. It recently partnered with (or was acquiredd by?) FindMyPast and will replace the dated Mocavo trees that FMP used. In fact, it combines the user’s tree to evaluate both DNA and genealogy. Take a look!

    Doris

    Liked by 1 person

  10. Once I had my FTDNA Match and ICW files downloaded by DNAGedcom, I used Shared Clustering to cluster all of my 5,731 Family Finder Matches into 352 Clusters in less than 60 seconds. I found a very, very high concordance with my 370 Triangulated Groups.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.