Half-Identical Region (HIR)

Featured

Posted on May 21, 2025 by Jim Bartlett

Your DNA segments (that make up the 23 Chromosomes passed down to you from a parent) are not the same as shared DNA segments with a Match (as described by a chromosome browser) aka a Half Identical Region (HIR). All of your DNA is real, down to any size you want to analyze. This is not necessarily so for a shared DNA segment (or HIR)!

From the ISOGG Wiki: A half-identical region (HIR) is a region of two paired chromosomes where at least one of the two alleles from one person’s pair of chromosomes matches at least one of the two alleles from a different person’s pair of chromosomes throughout the entire region. A half-identical region may be either identical by descent (IBD) or identical by state (IBS).

In my words, for genetic genealogy, a computer compares your DNA test to a potential Match’s DNA test. The computer compares the two raw DNA data files – about 600,000 SNPs with two values (alleles) for each SNP. The two values are one from the DNA passed down from the father and one from the mother. The computer is looking for a long string of matching SNPs, which are then reported as a shared DNA segment. This meets the HIR definition above – at least one value is the same at each SNP in the shared segment. The theory is that, although much of our DNA will be the same, there is some variation, and a long enough string of matching SNPs will indicate this segment of DNA is from a Common Ancestor. This also implies that the long string is on one side – on one chromosome from our mother OR our father. A lot of reported genetic data indicates that such an HIR is true when it’s at least 15cM.

But why aren’t all shared DNA segments true? Because the computer algorithm blindly looks at *both* values at each SNP for you and the potential Match. The computer may create a string of your SNPs that agree with your potential Match’s SNPs, but some are from your father and some from your mother. Clearly this “zig-zag” result, using SNPs from both your parents’ DNA, is not a representation of your DNA on one chromosome. It’s not a DNA segment passed down from one of your parents to you. It’s a false segment! Or this might have happened with your potential Match’s data, or with both of you. Bottom line: wherever the “zig-zag” occurred, the shared DNA segment is false.

The good news is that this “zig-zag” result doesn’t occur with long enough segments – over 15cM. And it occurs very infrequently with 14cM shared DNA segments. And there is a rough distribution curve – probably different for each of us – which drops down to about half of our 7cM segments are false. And most shared DNA segments are false below 7cM – which is why they are generally not used. Some of the companies use other, proprietary, algorithms to discard (not report) some of these false Matches. Also, as I’ve blogged before, Triangulated Groups are very good at culling out the false segments.

This also ties into the ISOGG terms: Identical By Descent (IBD) and Identical By State (IBS), noted above. IBD would apply to true shared DNA segments – you and your DNA Match got the shared DNA segment from a Common Ancestor. IBS means the computer found a “match”, but IBS is usually used in genetic genealogy to indicate the false segments. I usually just stick to “true” and “false” shared DNA segments (or HIRs).

Another quirk in this discussion is using the term HIR to refer to a shared DNA segment. This is proper and OK. But, an HIR only refers to a shared DNA segment between you and one particular Match. We virtually never find exactly the same HIR with two Matches (although it’s possible with Matches who are closely related to each other.) When we look at segment Triangulation, the Triangulated Group is comprised of different HIRs. So HIR should not be used to refer to a TG. A TG represents a segment of your DNA (from a specific Ancestor) – there are many different HIRs in a TG. And each Match in a TG would have a different (but overlapping) segment from the Common Ancestor, with different HIRs. Because the whole process is so random, we just don’t get the same segments from our Common Ancestors that our Matches get.

Bottom Line: A shared DNA segment is also an HIR – formed by a computer by comparing raw DNA test data (about 600,000 SNPs) with two values (alleles) for each SNP. Shared DNA Over 15cM all are true segments (IBD); below 15cM some are false (IBS). A shared DNA segment (aka HIR) is usually unique to a specific Match.

[22DH] Segment-ology: Half Identical Region by Jim Bartlett 20250521

HAPPY 10TH ANNIVERSARY

Featured

Posted on May 7, 2025 by Jim Bartlett

10 years ago, I blogged: “What is a segment?”, and noted the difference between an ancestral segment (your DNA segment) – passed down from an Ancestor to you; and a shared segment (created by a computer algorithm) which usually indicates a Common Ancestor for both you and your Match.

This is still the fundamental concept that is key to genetic genealogy.

We’ve looked at a lot of twists and turns based on this concept…

– How segments are measured

– Why the data is a little fuzzy, but that doesn’t negate its power

– How our DNA is passed down in identifiable segments from our Ancestors

– How each generation of our Ancestors contributes two full genomes (46 Chr) to us

– Why some of our segments must be sticky (persistent) for multiple generations

– How we “see” our own segments through shared segments

– How we can map (or paint) our segments on our chromosomes

– How shared segment “size” predicts relationships

– How we can group Matches by segment Triangulation or shared Match Clusters

– How we can use groups to solve brick walls, NPEs, Bio-Ancestors, unknowns

– Which ancestors always, or sometimes, or never have shared Matches

– Why all of our shared segments (6cM and up) may be important to us

– How to Walk Ancestors, Clusters, Segments back in our genealogy

– How spreadsheets can help us collect, arrange, analyze, QC, and use data

– How to use new tools: autoClustering, DNA Painter, browsers, ProTools, etc.

You have all been part of this journey of learning – as in fact, we are all learning from each other. I very much value your feedback and suggestions.

As some of you know, I also host DNA Special Interest Group (SIG), through the Washington DC Family Search Center. It was in person/local until Covid. We are now international via Zoom – 2nd Wednesday of each month 7-9pm ET. This is now an Advanced DNA SIG, and members are encouraged to participate and/or present (learn from each other). If you’d like to join, please email me at jim4bartletts@verizon.net

Happy Anniversary – your suggestions/observations/comments are “gifts” to us all.

[99F] Segment-ology: Happy 10th Anniversary by Jim Bartlett 20250507

SPECIAL ANNIVERSARY COMING UP

Featured

Posted on April 22, 2025 by Jim Bartlett

My first real Segmentology blog post was on 7 May 2015 – so an anniversary is coming up soon. I’m looking to consolidate and re-package the approximately 200 posts in Segmentology. If you would like any new or revised topics included, please feel free to use the comments or email me at jim4bartletts@verizon.net. NOTE: The Table of Contents (Outline in the header bar) has been updated, and all the posts are hyperlinked.

[99E] Segment-ology: Special Anniversary Coming Up by Jim Bartlett 20250422

ProTools Part 26

Posted on March 2, 2025 by Jim Bartlett

Documenting a GUESS

Setup… A Match, with No Family Tree, is a 1C to a Known Match per ProTools. The Known Match is in my Tree with a specific line of descent from our MRCA; and a 1C estimate is very reliable. I want to put the new Match in my Tree and place them in my Common Ancestor spreadsheet – to “take care of” that Match by placing them almost certainly where they belong in my Tree.

As I’ve blogged before, there are only two options to place a 1C to a Known Match: 1. a grandchild of the Known Match’s paternal grandparents; or 2. a grandchild of the Known Match’s maternal grandparents. In other words, the new Match is a child of a sibling of the Known Match’s father or mother. A quick review of my Shared Match list with this new Match, clearly reveals the Match is on the same side (paternal or maternal) that I am on with the Known Match. In other words, I know the path from the Known Match back to our MRCA is through their father or mother. I can now see, through ProTools, the new Match is related to me that way, too.

So I know the path from the MRCA down to the new Match – it’s the same path that I have with the Known Match down to, and including, the grandparent of the Known Match. What I don’t know is the name of the son or daughter of that grandparent = the parent of the new Match.

Up until recently, I’ve just named that son or daughter “block” as GUESS or Unknown in my Tree and in the “cell” of my spreadsheet. I’m now up to a dozen or so of these and can see many more on the horizon. My index of people in my Tree is filling up with GUESS and Unknown people…

I see four options for a name:

1. Continue with GUESS or Unknown [I usually reserve GUESS for iffy guesses]. I don’t like this – it’s not helpful to me or others reviewing my Tree – someday it may be very confusing.

2. Child of [name the grandparent]; ex: “Child of Bob JONES”

3. Parent of [the new Match]; ex: “Parent of Horatio Mitchell”

4. Sibling of [name the Known Match’s parent]; ex: “Sibling of Martha SMITH”

The Tree “box” and spreadsheet “cell” would have these entries and appear very close to other, known, boxes and cells. They would also be more specific in the Tree index, instead of a generic “GUESS” or “Unknown”.

I think I like (4) Sibling of Known Match’s parent the best because it specifically precludes the Known Match’s parent. In fact, I just did one new Match who was 1C to two different Matches so the description was: [sibling of John and Mary SURNAME] to rule them both out [after checking with ProTools].

I am interested in feedback on this topic – i.e. how to efficiently document Matches which clearly fit in a specific Tree branch. I am experimenting with 1C1R and even some 2C which clearly cannot fit anywhere else. Keyword here is “efficiently” – there is a LOT to do, and I don’t want to have to write a paragraph about each one. This is primarily for my own research. If I leave them as alive, no one else will see them; and if I mark them as deceased, the only people who will care will be close relatives to the new Match, and they may provide some feedback to me. I hope so…

[22DH] Segment-ology: ProTools 26 – Documenting a GUESS by Jim Bartlett 20250302

MITx Class on DNA is Free

Featured

Posted on February 28, 2025 by Jim Bartlett

MITx offers a wide range of free, on-line, self-paced semester-long courses to anyone in the world. Coming up next week is Introduction to Biology – The Secret of Life. I’ve taken this course (actually twice). It’s taught by Professor Eric Lander – the founding director of the BROAD Institute and a principle leader of the Human Genome Project – and a fantastic instructor (his course is fun). This course is targeted at non-biology students. This is not about genealogy, it’s about DNA. Anecdote: I was about halfway through the course, and one night my wife called out: “Jim, what are you doing – it’s 3 AM.” My reply: “I’m in a lab, folding proteins to capture a virus”. If you are into DNA and Segment-ology, this is a great opportunity to get a firm grounding. As a side note, I think MITx is a great undertaking and am a regular donor to that program. Free, world-wide MIT classes…

Here is a link: https://www.edx.org/learn/biology/massachusetts-institute-of-technology-introduction-to-biology-the-secret-of-life

Click on the short YouTube video… Enjoy.

[99D] Segment-ology: MITx Class on DNA is Free by Jim Bartlett 20250128

ProTools Part 25

Featured

Posted on February 22, 2025 by Jim Bartlett

The Path Is Key

This may be an extension of my “genealogy sacrilege” outlook or rant.

But before I begin, to each their own – you get to choose your objectives.

My two main objectives are to get my genealogy right; and to get the Chromosome Map of segments from my Ancestors at each generation right. My objectives do not include finding all of the descendants of all of my Ancestors. However, I do think that documenting how my DNA Matches interrelate to me and each other is very helpful in achieving my two objectives – and this swells my Tree somewhat. I’m finding: Match paper trail paths (and ThruLines clues) that are impossible, given the DNA evidence; and DNA evidence that has revealed genealogy paths I never would have otherwise found (not just limited to breaking through brick walls).

So, a lot of work to do to document what will be over 10,000 Matches… Time is precious…

When documenting DNA Matches and their line of descent from our MRCA to them, the “Path Is Key”. Dotting all of the “i”s and crossing all the “t”s is NOT! The DNA segments do not “know” their hosts’ names (or dates, or places), just that the segments are passed along. We genealogists document what we can about each of these Match ancestor DNA hosts. It helps us keep track – in time and place. But how much effort do we need to put into documenting our Matches’ lines? My opinion is: not much! We need to be sure of the path. We don’t need to know the full names, or pet names, or titles. It’s nice to know the birth/death years, but how much digging should we do to find the complete birth date or place? What do we do when several different descendants insist on different given names … I could go on and on, but I’ve decided it’s not my job to adjudicate their family “wars” – my objective is to be clear of the path.

Therefore, I’m now using terms like Pvt, Unknown, GUESS, sibling of XYZ, etc. to describe Match Ancestors – particularly those close to the Match.I don’t really care about their parent’s or grandparent’s names or genealogy info – just the path that must exist for a DNA segment. [NB: proving a specific genealogy-DNA link is a separate issue; a potential path is not a proven path.]

I am still documenting the child and grandchild of the MRCA (given name and birth year at least). But, IMO, the further down the path from the MRCA to the Match, the less precise this info needs to be. The Key Is the Path. I don’t want to introduce incorrect info, so I’m introducing “other” terms in the name field when it is unclear, in debate, or might take days to research and resolve. I note the “path” that has to be and move on.This allows me to get as many DNA Matches as possible into the spreadsheet. Then the interrelationships can be better evaluated.

SUMMARY: Don’t worry about “fully” documenting the MRCA-to-Match path; just that the path does exist, and no incorrect info is introduced (unless your Tree is private). And, of course, it’s up to your own judgment as to if/how much of this recommendation to follow. My plan is to get as many Matches as possible into MRCA family groups in a spreadsheet, and then study the interrelationships with ProTools. Get Matches in my Tree and my Common Ancestor spreadsheet, but “do no harm”.

[22DG] Segment-ology: ProTools 25 – The Path Is Key by Jim Bartlett 20250222

ProTools Part 24

Featured

Posted on February 21, 2025 by Jim Bartlett

Small Segment Stats

Ancestry DNA Matches who share 6-7cM and have a known MRCA with me: 1,160.

Total Ancestry DNA Matches at any cM level: 7450.

About 15% of my DNA Matches with a known MRCA share only 6-7cM.

This is NOT a statement linking DNA and Ancestors.

This IS a statement about the many true cousins we will not see in our Match lists because the current threshold at AncestryDNA is 8cM.

I’m glad I Dotted and saved some of my 6-7cM Matches when Ancestry made the threshold change – it was a fraction of the total. I wish I’d have saved them all…

To end on a higher note – I still have 2,600 other 6-7cM Matches to work with – many of them are being determined as close cousins to known MRCA Matches by using ProTools.

[22DF] Segment-ology: ProTools Part 24 – Small Segment Stats by Jim Bartlett 20250221

ProTools Part 23

Featured

Posted on January 31, 2025 by Jim Bartlett

Integrating With Genealogy

ProTools is a powerful tool. But it has it’s limits. 1C and closer relationships are very accurate, in my experience. Beyond that, the range of possibilities grows quickly as the cMs fall below the 1C range. But think about what that means… A 1C relationship takes us back to our grandparent level. Think of a 20 year old genealogist with a 50 year old parent, and 80 year old grandparents. Those grandparents would be in the 1950 census. And the census is a pretty good tool back to 1850 – another few generations. You might argue that the census is not rock solid in every case. There may be adoptions, NPEs, etc. That is true, but those individuals will not show up as DNA Matches – for the most part.

Yes, there are still a few situations that may slip through. But on the plus side, the census and ProTools will sort out a high percentage of false relationships, and/or incorrect genealogy “research”.

Used together, the census and ProTools can pretty accurately cover the past 175 years.

[22DE] Segment-ology: ProTools 23 – Integrating With Genealogy by Jim Bartlett 20250131

ProTools Part 22

Featured

Posted on January 19, 2025 by Jim Bartlett

A Rant about Relationships

I praise Ancestry for ProTools – just about everything about it is great. I have often reported how accurate the close Relationship Estimates are. I rely almost 100% on 1C and closer relationships; and have found many 2C relationships to be correct. I worked for several days on a 3C relationship – knowing the Trees of the two Matches pretty well – to no avail. This is becoming a regular occurrence.

I’ve noted over the past year, Ancestry has tightened up their Relationship Estimates – all are now within 4C. We can tag a Match at 4C or closer, or Distant. A far cry from the Circles where Ancestry showed us how we were related out to 8C; or even the current ThruLines out to 6C. Will they change again, tomorrow, to only showing Matches related within 4C or closer? I am long since past that threshold…

So I decided to take a deeper dive, under their hood, to see what they predicted for small cM Matches. I randomly selected a 6cM Match that I had saved. She was predicted to be Half 3C1R or 4C – evidently their deepest estimate. So I clicked on that estimate to get their more in depth analysis. Here are two screenshots of their analysis [sarcasm: based on results from their 27 million testers?]:

It seems to me they have adopted the “Cinderella Principle” – push hard to fit the data into a desired result. Are they really claiming that 99% of all Matches at the 6cM level are a 4C or closer? The Ancestry folks are much smarter than that… They know better, and, for some reason, AncestryDNA is distorting the truth! SHAME! Our tens of thousands of small cM Matches do not fit into a size 4C Cinderella slipper!!

Bottom lines: still rely on 1C or closer relationships for analysis with ProTools; IMO, beyond 2C, treat the estimates as garbage; let me/us know if you have some insight that I’m missing (other than something related to greed).

[22DD] Segment-ology: ProTools 22 – A Rant About Relationships by Jim Bartlett 20250119

Pro Tools Part 21

Featured

Posted on January 9, 2025 by Jim Bartlett

Adding a GUESS

Setup

gk (Match1) is known 5C1R – with grandmother: Anetta b 1926 m SURNAME1 > father: private Male > gk; AND gk has 10 known 2C to Anetta’s father (in the line going back to our MRCA).

Justin (Match2) shares 898cM (estimated 1C) to gk; and has a very small Tree of Private Ancestors.

Analysis

To be a 1C to gk, Justin would need to share grandparents with gk – either gk’s paternal grandparents or gk’s maternal grandparents. From the setup (above), we know the maternal grandparents are SURNAME1 and Anetta b 1926; we don’t know (but can often find) gk’s paternal grandparents. In this case there wasn’t enough info in Justin’s Tree to help.

However, there is another way to determine which set of grandparents Justin descends from. If he descends from Anetta’s side, Justin would also be 2C to the 10 known 2C that gk has (NB: all 2C match each other). If Justin descends from the other grandparents of gk, it is highly likely that Justin will NOT share any of the 10 known 2C to gk. A quick look at Justin’s Shared Match list, shows he matches ALL of the same 2C that gk has. Justin is clearly a 1C to gk on gk’s maternal side – which is the side back to the MRCA with me!

Therefore, I am very confident in adding Justin to my Tree with UNKOWN parent and KNOWN grandparents: SURNAME1 and Anetta b 1926. The rest of the path gk has back to our MRCA is already in my Tree.

This places another Match into my Common Ancestor spreadsheet and into my Tree. It takes this Match off the list of unknown (aka Mystery) Matches. In Shared Match lists, Justin will now show up as a known (Dotted) Match – reinforcing Clusters. I don’t know if Justin’s addition to my Tree will help AncestryDNA with future ThruLines evaluations, but I hope so. I *know* it will help me.

A similar analysis can be made for a Pro Tools estimate of 1C1R or a 2C, but it gets less reliable with each additional degree of separation. There is also a higher degree of difficulty in the analysis, because the certainty of the cousinship estimate is not as assured and the number of possible alternatives that need to be addressed increases. It’s often not impossible, but it is harder. A strong factor is whether a *candidate* Match shares a lot of the same Shared Matches. In other words, if the candidate Match clusters with a lot of the same Shared Matches (which can be observed in the Shared Match list), to me that is a strong indication that candidate Match has the same MRCA. This needs to be tempered with endogamy or pedigree collapse – judgment is needed in those cases.

[22DC] Segment-ology: Pro Tools Part 21 – Adding a GUESS by Jim Bartlett 20250109