Pro Tools Part 16

Sacrilegious Genetic Genealogy

For this post I want to explore a deviation from the normal genealogy and DNA research “requirements”.

Do we need to do comprehensive research on each cousin Match? Do I really need to find the complete link between each Match and our Common Ancestor? The sacrilige: do I care about all my distant cousins – to the extent that I must develop their complete link to me? Do I really care how much DNA they share with me? Must I link the DNA to the Common Ancestor? Or, is it enough to determine that they are on a specific branch of my Tree? I think so!

My standard mantra: our bio-Ancestors and DNA segments are set! We compare each Match to our Tree and DNA to find a Common Ancestor. I’m very close to finding out how 10% of my 100,000 Matches (at Ancestry) are related to my bio-Ancestors.

My experience with Pro Tools indicates many more can be easily found. I acknowledge that some shared DNA segments under 15cM will be false – but that doesn’t mean those Matches aren’t related to me.  Most of our true cousins beyond 3C will not share any DNA with us, so is the cM amount beyond 3C meaningful?  I acknowledge that some Matches will be related beyond a genealogy timeframe.

However, given these negative factors, I believe a lot more of my Matches are related to me within 9 generations back [8C level] – perhaps somewhat more than 20% of my total Matches. It’s taken me 14 years to “collect” and document approximately 10% of my Matches as cousins.  It’s daunting to think what time and effort I’d need to double that.

My sacrilege is to give up on full genealogy research for each Match. Using Pro Tools I’m finding lots of 6-10cM (small segment) Matches (to me) that are children, nieces/nephews, or 1C to strong higher-cM Matches that I have placed in my Tree. Clearly, these Matches are part of a family group well within a genealogy time frame.

I’m inclined to just quickly:

1. Add these small-segment Matches to my Common Ancestor spreadsheet

2. Add a Match Note (at Ancestry) to indicate the Common Ancestor and/or Ahnentafel [e.g. #A0062]

3. Give them my standard star and MRCA Dot; but not the Dot indicating a linked Match

4. Use a new Dot to indicate “Likely” in a family group under the MRCA; but not complete research [I could always filter on that Dot later, and do the research, some day…]

5. Add a shorthand note like:  SMOM: 3,442cM/son of “Match Name” [SMOM: Shared Matches of Match – the cM between them]

I’m looking for a more efficient way to group Matches into known family lines.

There are several points here:

1. Identify additional Matches within a genealogy timeframe (is it over 50% of all Matches?)

2. Group Matches under my Ancestor Couples – often under a specific child or grandchild (why would I need to dig deeper – unless the Match had a robust Tree with many records…)

3. Build a firm interrelated framework for later research on each extended “twig” of my Tree. Get some confidence of my Ancestors and their children and grandchildren.

4. Identify Brick Walls through clear absence of interconnected Matches. My spreadsheet has an Ahnentafel header for each of my Ancestors back to the 8C level – some of them have no known Matches, or what is clearly a small mess of non-interconnecting Matches. These are a judgment call, but with many more Matches involved, these few “problems” become more and more obvious.

5. Connect Floating branches – I now have several strong “clumps” of interconnected Matches, under a single MRCA couple, that I cannot link to my Tree. This is a strong hint in light of #4 above. I plan to explore this more in a separate blogpost.  

For DNAGedCom, Genetic Affairs, DNA Painter: Any way to automate the Clusters/Groups to include only those Matches who interrelate, say, over 90cM (and make that threshold adjustable)?

Bottom line: I think many more , if not most, of our Matches will turn out to be real cousins within a genealogy timeframe (out through 8C level). This includes Matches with no Trees, Private Trees, UnLinked Trees and scrawny Trees – all of these are now put into the mix through Pro Tools. For me, compiling data from my 100,000 Ancestry Matches will be a way to bound (if not counter) the continued warnings that many of our Matches are false and/or distant. Some are, some are not – what can we learn?

As usual, I value your feedback – on the sacrilege of adding Matches to Tree branches based on strong interrelationships, but without fully documenting the genealogy; as well as the bigger picture of possibly linking Floating branches to “bare spots” in our Trees.

[22CX] Segment-ology: Pro Tools Part 16 – Sacrilegious Genetic Genealogy by Jim Bartlett 20241205

12 thoughts on “Pro Tools Part 16

  1. Pingback: Friday’s Family History Finds | Empty Branches on the Family Tree

  2. Five decades after my first DNA analysis and two since that combined with conventional genealogy, I thought I was working about 5-8 generations back. I certainly have a hanging tree problem back then with lots of data strongly suggesting the conventional story from my matches must be lacking a birth record somewhere.
    Meanwhile, something closer maybe. I thought I knew who my maternal GGFs were. One is confirmed, but the other now has questions due to the lack of DNA matches to people who have tested. His wife is DNA confirmed, but he was an only child, with more ancestors back further in the line without siblings’ descendants. At least the ethnicity is Scottish, with a touch of Scots/Irish matches, so if there is a stray male involved, he comes from roughly the same area.
    In both cases it has been chromosomal segments that have showed the most usefulness. Shared Matches (Ancestry “SMoMs”) are of limited help for me that far back*, and none at all when Ancestry won’t let us at the data automatically. When they did, I made some progress on other lines, but now more relevant matches are in on this one. I keep recommending to Ancestry that autoclustering would retain and encourage subscribers, but they continue not to offer it.
    (*Jim, I have something like 1/10 of the matches you have at any given level. It’s seldom enough for one of them to have a reliably referenced tree back that far or otherwise provide useful data.)
    The other door that has opened for me is historical research on other lines.
    Scraps of information turn up in books on my areas of interest.
    I use all the tools I can get my hands on, but the more I go back, the more I use DNA segments.

    Like

    • Chris – Wow – 50 years of DNA analysis – that’s impressive. After 22 years of DNA experience, I’ve added some to my genealogy, but – as you know – there’s a Brick Wall for every line. Using Pro Tools I’m now working on each each Ancestor couple, doing a generation at a time. I just finished my 5th generation (4C level) and all Ancestors appear to be firm. At the next generation (5C level – A64 to A127), I’m missing 21 Ancestors; for the rest I have 1555 Matches with an average of 15cM each. I need to check each one, and bring it up to my current “standards”; and I’m guessing that Pro Tools will probably add over 1000 more. However, for right now, I’m back toA38 for whom I’ve found a floating branch – I want to enlist some of those 15 Matches to see if they can find cousins among the siblings I think the floating branch has…
      FYI, I, too, have a lot of No Trees, Private Trees, Unlinked Trees and Scrawy Tree – but Pro Tools provided many close ties that often connect to one Match with a Tree… so they are now “in play”, Jim

      Like

      • Hi Jim,
        DNA analysis was very, very primitive back then. For undergrads basically unchanged from the 1950s. But at the same time, Fred Sanger had just achieved the breakthrough to sequencing. And 30 years later for genealogy at a price I could afford. 30 years is fairly typical for many technologies to develop from the initial science. Just glad I was still around to benefit.
        And thanks for telling us about the extra stuff Ancestry has added to ProTools. I will give it another go. (Although my heart belongs to segments.)

        Liked by 1 person

  3. Hi Jim,

    Interesting blog post, as always. On GEDmatch, I’ve implemented a new clustering approach (AutoCluster endo) to provide more filters to only include matches/shared matches that share a certain amount of DNA (and even more settings regarding the segment data).

    I could do the same for Ancestry, but as brianschuck already indicated, getting the data nowadays is quite some work. Here is a blog post I provide using locally saved HTML files

    I’ve added an option to use the shared cM between shared matches for AutoKinship, here is a blog post about it. I am now in the process of implementing this feature within AutoLineage, so without needing to run it offline.

    Anyway, I think this approach works fine for people who just started and want to explore the data, or to solve adoptee cases. But for experienced folks, that have gathered thousands of (shared)matches this approach might not be that feasible (because of all the work involved).

    I do have provide a feature to import DNA Gedcom data, so that might be something to consider.

    Like

    • Patricia, Thanks for your feedback and links to your blog posts – very detailed step by step. As you point out, it’s a lot of work, but compare that to years of frustration with a Brick Wall…. Jim

      Like

  4. Jim, great comments as always. Many thoughts..

    I think both of us are where we are at now due to the circumstances that existed when we got into this hobby. You started earlier than I did (I started about 8.5 yrs ago). When I started, the situation was something like this:
    – lots of folks were uploading to gedmatch – and their data (like e-mail addresses) was easily visible.
    – Privacy around data was not being fully considered. The Golden Gate Killer case hadn’t happened.
    – Lots of folks were getting Ancestry tests – new matches all the time.
    – Great tools were coming out – my favorite being Shared Clustering – which could gather the data from Ancestry, including colored dots in a fraction of the time that we can access that data now. Shared Clustering also allowed one to upload notes and colored dots from one Ancestry kit to another – which was a massive time saver for me – as I work on my father’s and aunt’s kits interchangeably.
    -With shared clustering back then – I would estimate that around 15% of the kits would cluster. For a database of 50k kits (which is half what you’re doing), that’s 7500 matches to consider – with 35% of them being +20cM.
    -With Ancestry – we didn’t have MOM’s to consider – much less data.
    -Spending around 500 hrs/yr, I was able to make a lot of headway on the two major kits I research, plus learned a bit from several other kits I have access to. I had already gathered the low hanging fruit, obtained a step ladder and gone through again, gathered a taller ladder and combed again and finally an extension ladder.. I hadn’t found everything – but all the easy work was done, because I really only could look at those 7500 matches. Additionally, a large quantity of those would cluster around ancestors who were well documented – and they didn’t really add to anything I didn’t already know – other than to confirm the validity of the paper trail.

    With all that being the system we had, I developed as my primary system for keeping track of folks being the Ancestry Note fields. I could rapidly upload and download changes. Bring them into spreadsheets – sort and organize.. it was GREAT.

    Fast forward to today:
    -Shared clustering still works, but it’s not really supported.
    -We have ProTools – which has increased the number of matches of interest by at least a factor of 2 – maybe more. (This is wonderful!)
    -Gathering data is 20X slower, many reasons for this, one of which is the MOM upgrade provides so much more information and that Ancestry does need to keep their servers from being overwhelmed – slowing the gathering time.
    -Accessing data once within Ancestry is slower- maybe by half. More things to click to get the information you really want – longer times to pull up the data.
    -Privacy is a big concern – fewer people are testing.
    -Can’t upload data to note fields in Ancestry – and as mentioned takes a long time to download – so whereas it used to make a ton of sense to keep my data in those notefields – now, it takes so long to change, update, redownload that it doesn’t make any sense at all for that to be the primary place I keep information.

    To summarize, we have 2-3X the useful information available to us, but accessing it online is slower, and downloading it is massively slower. I’m pretty good at looking at lots of data and making sense of it, but.. it’s beyond my capability of keeping track of it in my head. We may have much more information.. but where to go look?? There’s new connections everywhere – but are they useful?

    So – I’m pondering what and how I should work on things too. If you started examining a new kit and spent 1 minute on each match of interest – it would take 300+ hours to look at one kit.. This is a dang hobby – not my money paying job. It can’t be done by a human being. The question then comes in what tools (AI, or other) will be developed to really use the power in the data provided by Pro Tools?

    I’ve wondered about using Gephi to do small network node analysis of the twigs you mentioned, the output is cool looking, but I haven’t figured out how to make it useful, or not take longer to get in the right format than just doing it on a piece of paper by hand.

    The ideal solution in my mind is for Ancestry to provide the data in a spreadsheet/database form (sufficiently anonymized) so that 3rd party developers can come up with ways to go through all this data and help us determine where to spend our time researching. I do not expect that to happen.

    There is a lot of power in Pro Tools. Using pre Pro-Tools techniques and a couple thousand hours I had broken through a tough brick wall.. but in the back of my mind there were still doubts. With Pro-Tools-in the last month alone (and maybe only10 hrs of work) I’ve documented at least 50 more connections to the MRCA – making me feel VERY confident I got it right. Would it have taken 1000’s of hours to solve that brick wall with Pro Tools? or merely hundreds? Could AI do it in the snap of the fingers?

    So – no answers here, I just know I need to change my primary record keeping process – and I can sense there has to be better ways of identifying where to work. For the moment, I’m just passing time as I annotate more and more matches with Pro Tools till I get a better sense of what I want to do in the long term.

    Liked by 1 person

    • Brian – we have a lot of parallels (and I’ve got a basement full of documents, including RootsWeb messages from the 1990s and copies of docs from courthouse and archive visits). Yes, beginning to have data overload… At one point c2017 or so, I had the download of every Match and segment segment data from FTDNA, 23andMe, MyHeritage and GEDmatch – and through some all-nighters was able to Triangualate all those segments (setting aside many small ones that were false). For TG’s I’m pretty much done – except the few MRCA I could tease out. For the past few years, it’s been a push to compile the correct Ancestors. As it turns out, only one so far that was wrong, but some Brick Walls remain. Through large Clusters, I’ve gotten through two of them (A37>A74: CUMMINGS; and A49>A98 BROWN). I’ve also formed several Floating Branches….
      I think the future now is going to be taking a step back and looking at the larger picture – fitting the Floating groups into Brick Wall holes…. Jim

      Like

      • No doubt.. My process when I last just really dug in to a cluster – identified several funnels (which could easily have been entered as floating trees/twigs/branches..). Astounding how much easier that is to do with pro-tools. You know which ancestral couple they are likely to tie too, but they are their own brick walls at the moment.

        Most of my tree was right, but I found one big Misattributed Parentage – and while I’m 90% sure who the father was in 1823, (and by extension his parents and paternal grandparents – several hundred connections to the paternal grandparents), there is also significant signal from the maternal grandparents, that does not in any match the existing records…

        I also wonder about at least 2 other 4th great grandparents – where their ancestors just don’t show up with any signal.. and there ought to be some signal.

        Anyway, how to deal with all the funnels as I call them (but I think I copied that term from one of your posts)- or floating branches/trees as you call them is definitely a pressing question.

        Like

  5. It’s really all about time management, isn’t it? Sometimes you need to validate the line and dot all your i’s if that particular line still has mysteries to unravel. Other times it’s not cost effective, as long as you’ve marked them in your records so you know all the confirmed matches you have if something comes up that requires investigation down the track. Your time is probably better spent trying to push back on the troublesome TGs?

    Like

    • I agree. Sometimes I think of genetic genealogy triage: some things are very critical, while others are just added fluff and need little care – with the majority somewhere in between. With this post, I am trying to easily add some fluff in the hope that that would provide a sharper focus on the critical areas – particularly on linking floating branches to Brick Wall Ancestors. But I need to be careful, even with the easy fluff. I just found the mother of a known cousin (usually an easy add to the spreadsheet) – but the known 3C was related through her *father*. So the mother doesn’t belong under that 3C Ancestor… When quickly adding an “obvious” new Match, it’s important to at least check the top of their Shared Matches to confirm there is a concensus. In this case the daughter was clearly in a strong Cluster, and the mother was not in that Cluster. Jim

      Like

Leave a reply to Chris Schuetz Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.