HAPPY 10TH ANNIVERSARY

Featured

10 years ago, I blogged: “What is a segment?”, and noted the difference between an ancestral segment (your DNA segment) – passed down from an Ancestor to you; and a shared segment (created by a computer algorithm) which usually indicates a Common Ancestor for both you and your Match.

This is still the fundamental concept that is key to genetic genealogy.

We’ve looked at a lot of twists and turns based on this concept…

– How segments are measured

– Why the data is a little fuzzy, but that doesn’t negate its power

– How our DNA is passed down in identifiable segments from our Ancestors

– How each generation of our Ancestors contributes two full genomes (46 Chr) to us

– Why some of our segments must be sticky (persistent) for multiple generations

– How we “see” our own segments through shared segments

– How we can map (or paint) our segments on our chromosomes

– How shared segment “size” predicts relationships

– How we can group Matches by segment Triangulation or shared Match Clusters

– How we can use groups to solve brick walls, NPEs, Bio-Ancestors, unknowns

– Which ancestors always, or sometimes, or never have shared Matches

– Why all of our shared segments (6cM and up) may be important to us

– How to Walk Ancestors, Clusters, Segments back in our genealogy

– How spreadsheets can help us collect, arrange, analyze, QC, and use data

– How to use new tools: autoClustering, DNA Painter, browsers, ProTools, etc.

You have all been part of this journey of learning – as in fact, we are all learning from each other. I very much value your feedback and suggestions.

As some of you know, I also host DNA Special Interest Group (SIG), through the Washington DC Family Search Center. It was in person/local until Covid. We are now international via Zoom – 2nd Wednesday of each month 7-9pm ET. This is now an Advanced DNA SIG, and members are encouraged to participate and/or present (learn from each other). If you’d like to join, please email me at jim4bartletts@verizon.net

Happy Anniversary – your suggestions/observations/comments are “gifts” to us all.

[99F] Segment-ology: Happy 10th Anniversary by Jim Bartlett 20250507

SPECIAL ANNIVERSARY COMING UP

Featured

My first real Segmentology blog post was on 7 May 2015 – so an anniversary is coming up soon. I’m looking to consolidate and re-package the approximately 200 posts in Segmentology. If you would like any new or revised topics included, please feel free to use the comments or email me at jim4bartletts@verizon.net. NOTE: The Table of Contents (Outline in the header bar) has been updated, and all the posts are hyperlinked.

[99E] Segment-ology: Special Anniversary Coming Up by Jim Bartlett 20250422

ProTools Part 26

Documenting a GUESS

Setup… A Match, with No Family Tree, is a 1C to a Known Match per ProTools. The Known Match is in my Tree with a specific line of descent from our MRCA; and a 1C estimate is very reliable. I want to put the new Match in my Tree and place them in my Common Ancestor spreadsheet – to “take care of” that Match by placing them almost certainly where they belong in my Tree.

As I’ve blogged before, there are only two options to place a 1C to a Known Match: 1. a grandchild of the Known Match’s paternal grandparents; or 2. a grandchild of the Known Match’s maternal grandparents. In other words, the new Match is a child of a sibling of the Known Match’s father or mother. A quick review of my Shared Match list with this new Match, clearly reveals the Match is on the same side (paternal or maternal) that I am on with the Known Match. In other words, I know the path from the Known Match back to our MRCA is through their father or mother. I can now see, through ProTools,  the new Match is related to me that way, too.

So I know the path from the MRCA down to the new Match – it’s the same path that I have with the Known Match down to, and including, the grandparent of the Known Match. What I don’t know is the name of the son or daughter of that grandparent = the parent of the new Match.

Up until recently, I’ve just named that son or daughter “block” as GUESS or Unknown in my Tree and in the “cell” of my spreadsheet. I’m now up to a dozen or so of these and can see many more on the horizon. My index of people in my Tree is filling up with GUESS and Unknown people…

I see four options for a name:

1. Continue with GUESS or Unknown [I usually reserve GUESS for iffy guesses]. I don’t like this – it’s not helpful to me or others reviewing my Tree – someday it may be very confusing.

2. Child of [name the grandparent]; ex: “Child of Bob JONES”

3. Parent of [the new Match]; ex: “Parent of Horatio Mitchell”

4. Sibling of [name the Known Match’s parent]; ex: “Sibling of Martha SMITH”

The Tree “box” and spreadsheet “cell” would have these entries and appear very close to other, known, boxes and cells. They would also be more specific in the Tree index, instead of a generic “GUESS” or “Unknown”.

I think I like (4) Sibling of Known Match’s parent the best because it specifically precludes the Known Match’s parent. In fact, I just did one new Match who was 1C to two different Matches so the description was: [sibling of John and Mary SURNAME] to rule them both out [after checking with ProTools].

I am interested in feedback on this topic – i.e. how to efficiently document Matches which clearly fit in a specific Tree branch. I am experimenting with 1C1R and even some 2C which clearly cannot fit anywhere else. Keyword here is “efficiently” – there is a LOT to do, and I don’t want to have to write a paragraph about each one. This is primarily for my own research. If I leave them as alive, no one else will see them; and if I mark them as deceased, the only people who will care will be close relatives to the new Match, and they may provide some feedback to me. I hope so…

[22DH] Segment-ology: ProTools 26 – Documenting a GUESS by Jim Bartlett 20250302

MITx Class on DNA is Free

Featured

MITx offers a wide range of free, on-line, self-paced semester-long courses to anyone in the world. Coming up next week is Introduction to Biology – The Secret of Life. I’ve taken this course (actually twice). It’s taught by Professor Eric Lander – the founding director of the BROAD Institute and a principle leader of the Human Genome Project – and a fantastic instructor (his course is fun). This course is targeted at non-biology students. This is not about genealogy, it’s about DNA. Anecdote:  I was about halfway through the course, and one night my wife called out: “Jim, what are you doing – it’s 3 AM.” My reply: “I’m in a lab, folding proteins to capture a virus”.  If you are into DNA and Segment-ology, this is a great opportunity to get a firm grounding.  As a side note, I think MITx is a great undertaking and am a regular donor to that program. Free, world-wide MIT classes…

Here is a link: https://www.edx.org/learn/biology/massachusetts-institute-of-technology-introduction-to-biology-the-secret-of-life

Click on the short YouTube video… Enjoy.

[99D] Segment-ology: MITx Class on DNA is Free by Jim Bartlett 20250128

ProTools Part 25

Featured

The Path Is Key

This may be an extension of my “genealogy sacrilege” outlook or rant.

But before I begin, to each their own – you get to choose your objectives.

My two main objectives are to get my genealogy right; and to get the Chromosome Map of segments from my Ancestors at each generation right. My objectives do not include finding all of the descendants of all of my Ancestors. However, I do think that documenting how my DNA Matches interrelate to me and each other is very helpful in achieving my two objectives – and this swells my Tree somewhat. I’m finding: Match paper trail paths (and ThruLines clues) that are impossible, given the DNA evidence; and DNA evidence that has revealed genealogy paths I never would have otherwise found (not just limited to breaking through brick walls).

So, a lot of work to do to document what will be over 10,000 Matches…  Time is precious…

When documenting DNA Matches and their line of descent from our MRCA to them, the “Path Is  Key”. Dotting all of the “i”s and crossing all the “t”s is NOT! The DNA segments do not “know” their hosts’ names (or dates, or places), just that the segments are passed along. We genealogists document what we can about each of these Match ancestor DNA hosts. It helps us keep track – in time and place. But how much effort do we need to put into documenting our Matches’ lines? My opinion is: not much! We need to be sure of the path. We don’t need to know the full names, or pet names, or titles. It’s nice to know the birth/death years, but how much digging should we do to find the complete birth date or place? What do we do when several different descendants insist on different given names … I could go on and on, but I’ve decided it’s not my job to adjudicate their family “wars” – my objective is to be clear of the path.

Therefore, I’m now using terms like Pvt, Unknown, GUESS, sibling of XYZ, etc. to describe Match Ancestors – particularly those close to the Match.I don’t really care about their parent’s or grandparent’s names or genealogy info – just the path that must exist for a DNA segment. [NB: proving a specific genealogy-DNA link is a separate issue; a potential path is not a proven path.]

I am still documenting the child and grandchild of the MRCA (given name and birth year at least). But, IMO, the further down the path from the MRCA to the Match, the less precise this info needs to be. The Key Is the Path. I don’t want to introduce incorrect info, so I’m introducing “other” terms in the name field when it is unclear, in debate, or might take days to research and resolve. I note the “path” that has to be and move on.This allows me to get as many DNA Matches as possible into the spreadsheet. Then the interrelationships can be better evaluated.

SUMMARY:  Don’t worry about “fully” documenting the MRCA-to-Match path; just that the path does exist, and no incorrect info is introduced (unless your Tree is private). And, of course, it’s up to your own judgment as to if/how much of this recommendation to follow. My plan is to get as many Matches as possible into MRCA family groups in a spreadsheet, and then study the interrelationships with ProTools. Get Matches in my Tree and my Common Ancestor spreadsheet, but “do no harm”.

[22DG] Segment-ology: ProTools 25 – The Path Is Key by Jim Bartlett 20250222

ProTools Part 24

Featured

Small Segment Stats

Ancestry DNA Matches who share 6-7cM and have a known MRCA with me: 1,160.

Total Ancestry DNA Matches at any cM level: 7450.

About 15% of my DNA Matches with a known MRCA share only 6-7cM.

This is NOT a statement linking DNA and Ancestors.

This IS a statement about the many true cousins we will not see in our Match lists because the current threshold at AncestryDNA is 8cM.

I’m glad I Dotted and saved some of my 6-7cM Matches when Ancestry made the threshold change – it was a fraction of the total. I wish I’d have saved them all…

To end on a higher note – I still have 2,600 other 6-7cM Matches to work with – many of them are being determined as close cousins to known MRCA Matches by using ProTools.

[22DF] Segment-ology: ProTools Part 24 – Small Segment Stats by Jim Bartlett 20250221

ProTools Part 23

Featured

Integrating With Genealogy

ProTools is a powerful tool. But it has it’s limits. 1C and closer relationships are very accurate, in my experience. Beyond that, the range of possibilities grows quickly as the cMs fall below the 1C range. But think about what that means… A 1C relationship takes us back to our grandparent level. Think of a 20 year old genealogist with a 50 year old parent, and 80 year old grandparents. Those grandparents would be in the 1950 census. And the census is a pretty good tool back to 1850 – another few generations. You might argue that the census is not rock solid in every case. There may be adoptions, NPEs, etc. That is true, but those individuals will not show up as DNA Matches – for the most part.

Yes, there are still a few situations that may slip through. But on the plus side, the census and ProTools will sort out a high percentage of false relationships, and/or incorrect genealogy “research”.

Used together, the census and ProTools can pretty accurately cover the past 175 years.

[22DE] Segment-ology: ProTools 23 – Integrating With Genealogy by Jim Bartlett 20250131

ProTools Part 22

Featured

A Rant about Relationships

I praise Ancestry for ProTools – just about everything about it is great. I have often reported how accurate the close Relationship Estimates are. I rely almost 100% on 1C and closer relationships; and have found many 2C relationships to be correct. I worked for several days on a 3C relationship – knowing the Trees of the two Matches pretty well – to no avail. This is becoming a regular occurrence.

I’ve noted over the past year, Ancestry has tightened up their Relationship Estimates – all are now within 4C. We can tag a Match at 4C or closer, or Distant. A far cry from the Circles where Ancestry showed us how we were related out to 8C; or even the current ThruLines out to 6C.  Will they change again, tomorrow, to only showing Matches related within 4C or closer? I am long since past that threshold…

So I decided to take a deeper dive, under their hood, to see what they predicted for small cM Matches. I randomly selected a 6cM Match that I had saved. She was predicted to be Half 3C1R or 4C – evidently their deepest estimate. So I clicked on that estimate to get their more in depth analysis. Here are two screenshots of their analysis [sarcasm: based on results from their 27 million testers?]:

It seems to me they have adopted the “Cinderella Principle” – push hard to fit the data into a desired result. Are they really claiming that 99% of all Matches at the 6cM level are a 4C or closer? The Ancestry folks are much smarter than that…  They know better, and, for some reason, AncestryDNA is distorting the truth! SHAME! Our tens of thousands of small cM Matches do not fit into a size 4C Cinderella slipper!!

Bottom lines: still rely on 1C or closer relationships for analysis with ProTools; IMO, beyond 2C, treat the estimates as garbage; let me/us know if you have some insight that I’m missing (other than something related to greed).

[22DD] Segment-ology: ProTools 22 – A Rant About Relationships by Jim Bartlett 20250119

Pro Tools Part 21

Featured

Adding a GUESS

Setup

gk (Match1) is known 5C1R – with grandmother: Anetta b 1926 m SURNAME1 > father: private Male > gk; AND gk has 10 known 2C to Anetta’s father (in the line going back to our MRCA).

Justin (Match2) shares 898cM (estimated 1C) to gk; and has a very small Tree of Private Ancestors.

Analysis

To be a 1C to gk, Justin would need to share grandparents with gk – either gk’s paternal grandparents or gk’s maternal grandparents. From the setup (above), we know the maternal grandparents are SURNAME1 and Anetta b 1926; we don’t know (but can often find) gk’s paternal grandparents. In this case there wasn’t enough info in Justin’s Tree to help.

However, there is another way to determine which set of grandparents Justin descends from. If he descends from Anetta’s side, Justin would also be 2C to the 10 known 2C that gk has (NB: all 2C match each other). If Justin descends from the other grandparents of gk, it is highly likely that Justin will NOT share any of the 10 known 2C to gk.  A quick look at Justin’s Shared Match list, shows he matches ALL of the same 2C that gk has. Justin is clearly a 1C to gk on gk’s maternal side – which is the side back to the MRCA with me!

Therefore, I am very confident in adding Justin to my Tree with UNKOWN parent and KNOWN grandparents: SURNAME1 and Anetta b 1926. The rest of the path gk has back to our MRCA is already in my Tree.

This places another Match into my Common Ancestor spreadsheet and into my Tree. It takes this Match off the list of unknown (aka Mystery) Matches. In Shared Match lists, Justin will now show up as a known (Dotted) Match – reinforcing Clusters. I don’t know if Justin’s addition to my Tree will help AncestryDNA with future ThruLines evaluations, but I hope so. I *know* it will help me.

A similar analysis can be made for a Pro Tools estimate of 1C1R or a 2C, but it gets less reliable with each additional degree of separation. There is also a higher degree of difficulty in the analysis, because the certainty of the cousinship estimate is not as assured and the number of possible alternatives that need to be addressed increases. It’s often not impossible, but it is harder. A strong factor is whether a *candidate* Match shares a lot of the same Shared Matches. In other words, if the candidate Match clusters with a lot of the same Shared Matches (which can be observed in the Shared Match list), to me that is a strong indication that candidate Match has the same MRCA. This needs to be tempered with endogamy or pedigree collapse – judgment is needed in those cases.

[22DC] Segment-ology: Pro Tools Part 21 – Adding a GUESS by Jim Bartlett 20250109

Pro Tools Part 20a

Featured

A Plan and some TIPS (corrected)

At the end of 2024, I wanted to review my Plan for using Pro Tools (and filing in a Common Ancestor Spreadsheet) and highlight some TIPS .

For the long haul – addressing all of your genealogy using Pro Tools – make a Plan! Perhaps a New Year’s Resolution…

I now think the best plan is to start with the closest Ancestors and work back a generation at a time.

That is, start with your grandparents –two grandparent couples [Ahnentafels 4 and 6]. The Matches at this level would nominally be 1C to you – maybe some “removed” – like a 1C1R or 1C2R – particularly as we get older:>( There are only two groups at this generation – one on the paternal side and one on the maternal side. So, two CA-Couple headers in the CA Spreadsheet. For each row under a header row, enter the Match information (name, cM, # segments, cousinship) and then the child of the CA and their birth year, and then the path to the Match.

TIP1: for each, and every, Match I list, I use Pro Tools to show *their* closest Matches – these are often close Shared Matches to them that can be figured out even if the SM has no Tree.

TIP2: for each Match I list, I add them (and their path to the CA) into my Tree (and apply the DNA-connection and/or DNA-Match Tags). I don’t know if the Tags help AncestryDNA build Trees or determine ThruLines; but it does help me when I run across them days/weeks/months/years later. Not necessarily a *certification*, but at least a reminder that I’ve reviewed the path before.

TIP3: Fill in some Notes for the Match – I always start with my CA code – example: #A0064P [the A means I’m satisfied the Ancestor is correct; and the # is a holdover from the days we searched for unique strings; the 0064P is Ahentafel 64 on my Paternal side [in a DNAGedcom Client Spreadsheet Report, I can sort on the Notes column, and they will group in order]

TIP4: I Star & MRCA Dot & Tagged-in-my-Spreadsheet Dot each Match – this unique Star-Dot-Dot “trio” clearly highlights Shared Matches who are already in my CA Spreadsheet. In a Shared Match list they help identify a Cluster.

TIP5: Each of the Matches under an MRCA Couple at this generation should match each other. They are 1C, 1C1R, 1C2R to you and each other, and all should Match. A Quality Control Check. [NB: I am tempted to add in any Aunt or Uncle Matches to my Spreadsheet; but they may be close to the Match, but not on the path to my Ancestor – when that happens they won’t have close cM ties to the other Matches.]

TIP5: I have a separate column in my CA Spreadsheet to indicate I’ve done all of the above. I’ve got about 8,000 Match rows in my spreadsheet and I’m reviewing each one to make sure I’ve covered all of the above and then check it off. As it turns out, some have changed their Trees, some have dropped out of Ancestry, Ancestry continues to update ThruLines, etc., etc. This checkoff indicates a fresh update.

Time now to tackle the four Great grandparent couples [Ahentafels 8, 10, 12, & 14].

Repeat the steps above for each of your Ancestor couples.

Note that TIP5 still applies – under each couple the Matches are 2C (or 2C1R, 2C2R, or maybe 1C1R) with you. These nominal 2C should all be close cousins to each other (sharing large amounts of cMs)

At any point in this process, take a break and chase down a rabbit hole or two. But then come back to this methodical process.

TIP6: Using this process, makes us treat all of our Ancestors equally. I tend toward favorite Ancestor lines, and this process forces me to grind through all of the Ancestors and Matches.  It’s a good thing.

A slight change occurs at the next generation [eight 2xG grandparent couples; 3C level; A 16 – A 30] At this level, TIP5 breaks down a little.

TIP7: Reminder – 2C-100%; 3C-90%; 4C-50%; 5C-15%; 6C-5% – (roughly)… This is the “curve” indicating how often true cousins will be a DNA Match to each other. ALL true 2C will be a DNA Match to each other. Of 10 true 3C, each one will usually have a DNA match with only about 9 of the others; but each of the 10 will have about 9 of the others matching – so these 10 would still form a pretty strong Cluster… Among a group of 4C, each one will only match about half of the total; and they may not all form one, strong/compact Cluster. And it gets worse, at the 5C and 6C levels… – some interconnecting cousin Matches, but not strong Clusters. However, now with Pro Tools we can find groups of strongly interconnected (closely related) Matches – strong ties to each other, but perhaps their strong subgroup is 5C to 8C with you.

At the 4C level, I see interconnected groups around the children of each grandparent couple; and sometimes a few interconnections between children. At the 5C level, as expected, I’m seeing groups (Clusters) form on the grandchildren.

Additonal TIPS

TIP8: multiple marriages; non-marriages: IF you and a Match only share DNA through one Ancestor, then your relationship is “Half”. Pro Tools often includes cMs for Half relationships, but these only apply with when you share only one Common Ancestor.

TIP9: Some Matches may be related to you multiple ways – give them a separate row (and Ahnentafel #) for each relationship. NB: If you are 3C on A16 and on A18 the odds are equal – with one segment, it could be either; with multiple segments, it could be both… However, if you are 3C on A16 and 4C on A38, with one segment, the odds are 4 to 1 that the DNA came from A16; and if you are 3C on A16 and 5C on A76, the odds are 16 to 1 that the DNA came from A16. This is because *shared DNA* is divided by 4 with each generation, on average. If you have shared DNA with a Match, it’s much more likely to be from the closest relationship.  

Please post in the comments if you have good TIPs that would help us all.

Happy New Year!

[I fixed the error in Tip 7, and reposted]

[22DB] Segment-ology: Pro Tools Part 20a – A Plan and some TIPS by Jim Bartlett 20250101