Which Sibling Is the Bio-Ancestor?

Featured

A Segment-ology TIDBIT

Up Front – it’s the one with the highest average cM among Match cousins.

Setup: You’ve pretty much determined a particular couple are bio-Ancestors to youself (or someone else) – often by a consensus of Match Trees in a group (usually a Cluster) – see here. However, this bio-couple had a number of children. Which one of them was the bio-Ancestor? It gets harder and harder the more generations back you are researching.

Process: I’ve had good outcomes by determining as many DNA Match cousins as possible for the bio-Ancestor couple. Line up the DNA Matches and the shared DNA cMs under each of the children, and then determine the average cM for each child. In general, one of the averages will be somewhat more than the others – even when you don’t know the link. That’s because you are a closer cousin with Matches who descend from the same child as you do.  For instance, you may be a 5C through most of the children – sharing an average of 25cM with those Matches; and you would be 4C with the Matches who descend from then one child who is your Ancestor – sharing an average of 35cM with them. Of course, our results may vary somewhat from the Shared cM Project, but it’s the concept we are focused on here.

When I do this analysis, I drop down into the smaller segments, in order to get a fair comparison among all the cousins I can find. The more Matches we use, the more it averages out to the Shared cM Project and the correct bio-Ancestor child.

[22CI] Segment-ology: Which Sibling Is the Bio-Ancestor? TIDBIT by Jim Bartlett 20240403

Celebrating the First 25 years of Genetic Genealogy

Featured

Free eBook: Genetic Genealogy: The First 25 Years – 82 pages – the reflections of 34 Contributors – compiled and edited by Diahan Southard. This is a fascinating read from cover to cover. And it’s free to download here: https://diy.yourdnaguide.com/so-far

I am honored and humbled to be included in this project. And a grateful hat-tip to Diahan who conceived this project; herded the cats to gather the various perspectives; curated and edited the inputs and got it ready before RootsTech 2024. And made it free to everyone!

Thanks, Diahan Southard.

[99C] Segment-ology: Celebrating the First 25 years of Genetic Genealogy by Jim Bartlett 20240229

ThruLines Is Quick – Really Quick!!

Featured

A Segment-ology TIDBIT

My previous post noted that ThruLines quickly adapted when I changed my Tree.

Setup: I have looked at every one of my ThruLines Matches. If you are not sure, just open your DNA Matches list and select the Filters: Unviewed AND Common Ancestors. If you’ve looked at them all (and hopefully added appropriate information in the Notes box for each one), after a minute or two you’ll get a message: No matches match the selected filter. You’re now ready to take advantage of this status.  

I have a pesky female Ancestor. I’m not really positive where she fits in a larger part of my Tree (or to any of several floating branches).  So I called up her profile; clicked on Edit (top right); clicked on Edit relationships; and clicked on the parent “X”s (to separate, not delete, them). I now went to the Father box and clicked on Add father; and typed in a name I wanted to test as a parent. I then closed the Edit relationships page and went back to my DNA Matches List and filtered on Unviewed AND Common Ancestors…. and ThruLines immediately populated appropriate new Matches who would be cousins through that parent. In the one to two minutes it takes ThruLines to search my 93,000 Matches, it found and listed Matches with ThruLines. Since I had already opened all previously known ThruLines, this new listing was only Matches who were related through the change I had just made. I quickly took notes and reset the original pesky Ancestor. Ready for the next trial. In and out very quickly.

There is more to this story for a later blogpost. The point for this blogpost is twofold:

1. AncestryDNA must already have most of these relationships already worked out, just waiting for me to ask the right question (do you have cousins for “this” relationship?)

2. There is no waiting days for a “refresh” – ThruLines reports as fast as it can scan my Match list (down to 6cM). Just WOW!

Both of these are pretty amazing, IMO.

[22CH] Segment-ology: Thru-Lines is Quick – Really Quick!! TIDBIT by Jim Bartlett 20240228

ThruLines is Quick!

Featured

A Segment-ology TIDBIT

I was entering a ThruLines line of descent into my Common Ancestor Spreadsheet, when I noted an error in the Match’s Tree. The Tree and ThruLines were at 6C. When I inserted the missing generation in my Tree, the relationship changed to 6C1R. As soon as I clicked back to the Match, the ThruLines was gone!  AncestryDNA now *knows* the correct relationship, and since it was beyond 7 generations for one of us, they won’t show it.

Heads up. Copy or screen-shot before you lose the ThruLines link. I guess in a pinch, I could go back to my tree, take out the generation I added, and ”reincarnate” the ThruLines link. Sometimes you have to think like a computer…

[22CG] Segment-ology: ThruLines is Quick! TIDBIT by Jim Bartlett 20240225

AncestryDNA Side vs ThruLines Side

Featured

As I look at ThruLines Matches under 15cM, roughly half of them have a Side (Maternal or Paternal) which is different from the Side of the Common Ancestor proposed. What’s up?

AncestryDNA has determined a “side” (Maternal or Paternal) for most of my Matches. Pretty slick! And very helpful!! For above-20cM Matches they appear to be fairly accurate. This is despite the fact that all of my Paternal and half my Maternal Ancestor were mostly from Colonial Virginia. I was expecting a lot of Matches to be “Both”, but relatively few are. The bulk of my Matches are in the Maternal and Paternal categories. And below 15cM, the Maternal or Paternal “sides” are not aligning with the “side” for many of the ThruLines Common Ancestors. Side note: it appears that Ancestry is now only reporting one ThruLines Common Ancestor per Match – they used to report two or three if they found them….

What are the possibilities?

1. The AncestryDNA “sides” may be incorrect. I’d like to think (hope?) that the science behind them is valid and that they are largely correct. Most of mine above 20cM appear to be.

2. The ThruLines may be incorrect. This is a genealogy area (not DNA). With my 50 years of genealogy research, I already know many of the descendants of my Ancestors, and I run a check (not-GPS-comprehensive) on each ThruLines reported. I used to spot about 5% with errors (some of which were easily fixed), but now there are more and more as AncestryDNA appears to have become fairly aggressive at finding Common Ancestors. It appears they have loosened up the algorithms to allow “close” name variants and “close” dates, resulting in more false results. But even with the ThruLines I review and accept the Common Ancestor from a genealogy point of view, there are roughly half which don’t agree with the “side”.

We cannot have it both ways… or can we?

When AncestryDNA determines a Maternal “side”, does that guarantee that 100% of the Match’s atDNA can only be on my Maternal side? I really think that is absurd! Particularly when you consider most of my Ancestry is from Colonial Virginia. Surely my Colonial Virginia Matches could descend from Ancestors who would be on both sides of my Ancestry. In fact, I have several of my own Ancestors who, due to distant pedigree collapse, are on both sides of my Tree.

I think it is entirely possible that the bulk of a Match’s atDNA could align with my Paternal or Maternal DNA, but that some of the Match’s segments could be from the other side. I’m scratching my head over whether or not this could occur half of the time.

3. Both Ways! My conclusion is that we can have it both ways! I have a colored Dot for cases with both “sides”, but I’ve decided not to let that, by itself, stand in the way of accepting a ThruLines Common Ancestor as valid.

I’m curious about your overall experience and observations about conflicting “sides”. You are encouraged to add your insights in the comments.

[35AA] Segment-ology: AncestryDNA Side vs ThruLines Side by Jim Bartlett 20240213

ThruLines Levity

Featured

A Segmentology TIDBIT

Ancestry’s ThruLines is like “dumpster diving”… sometimes you have to dig through the trash to find the pearls. Sometimes there is a smorgasbord of various genealogy junk, but sometimes there is a treasure trove of good information. Pick and choose wisely…

[22CF] Segment-ology: ThruLines Levity TIDBIT by Jim Bartlett 20240211

Let the Chips Fall Where They May

Featured

A Segment-ology TIDBIT

Thinking about Small Segments and Distant Matches…

Many have used the Speed and Balding IBD Statistics in Figure 2 of their Paper …  This chart has often been used to scare us away from small segments [by small I mean 7-to-15cM Shared DNA Segments – I do not encourage anyone to use smaller/”tiny” segments].

The vast majority of our Matches at AncestryDNA fall into this 7-to-15cM category, and I get many ThruLines Matches which have valid paper genealogies. They may not all link to the DNA, but I see no reason to discount them based on the small size of the Shared DNA alone. ThruLines is limited to Matches who are related as 6C or closer – not what I would call a “distant” Match. Only the small Shared Segments and the constant reference to the Speed and Balding chart, warning that small segments are usually distant, stand in the way.

This got me to thinking (watch out!)… The AncestryDNA Timber algorithm is well known to “down weight” the cM of many of our Shared DNA segments. Click on the “DNA” line in any Match Profile to see the “Unweighted share DNA” amount – often somewhat larger than the amount shown on the DNA Profile. This is Timber at work, downweighting the DNA that would be shown at, say, GEDmatch. One of the effects of this downweighting is that many of the AncestryDNA customers who would show up as a Match at GEDmatch are never shown as a Match to us at AncestryDNA!  It seems to me that AncestryDNA has already compensated for the statistics reported by Speed and Balding. It is thus unfair to compare our Match lists with the Speed and Balding statistics.

I’m not saying that some of our Matches are not distant – some of them are. What I am saying is to let the chips fall where they may. If we can find a Common Ancestor – at *any* Shared cM amount – why not accept it (if it also passes a genealogy review). The Shared cM Project clearly shows small Shared DNA Segments in the range for cousinships at 3C and more distant. Why should we be frighted away when our Match falls into the small segment category?

My blog post about a Common Ancestor Spreadsheet (here), now has over 8,000 rows of Matches with Common Ancestors with me. I sort them to get “nested” family groups, and draw comfort as I see the closer families and note they are Shared Matches with each other. New ThruLines have been pouring in recently (and the quality is dropping off a little). As expected in my Common Ancestor spreadsheet, a majority are in the small segment range. I am not worried about the cM size as long as the genealogy is valid!  

Bottom line: Let the chips (small Shared cMs) fall where they may; and focus on the genealogy.

Happy New Year!

[22CE] Segment-ology: Let the Chips Fall Where They May TIDBIT by Jim Bartlett 20240101

Gold Stars

Featured

A Segment-ology TIDBIT

There are several key elements of good genetic genealogy – I’m going to call them Gold Stars.

1. DNA Match – as designated by the testing companies and GEDmatch. Most of these are our genetic cousins. I have a lot of them (over 120,000); and they are a good subset to work with. Worth a Star.

2. IBD Segment – We generally assume that virtually all Matches above 15cM have true genetic links; and my analysis is that about 66% of those 8 to 15cM are also true. Granted, some of the under-20cM Matches will be beyond a genealogy time frame (about 9 generations for me), IBD gets a Star.

3. Common Ancestor – This is a primary goal of genetic genealogy – finding a Common Ancestor with each Match. Notes: some Matches will have multiple CAs within a genealogy timeframe; just finding a CA does NOT necessarily mean that the Shared DNA segment came from that CA; a Match may share multiple DNA segments, and possibly multiple CAs. So finding a CA is worth a Star.

4. ThruLines (and Theory of Family Relativity) – I’ve found these to be over 90% correct. If you agree with them – add a Gold Star.

5. Same side – Ancestry and FamilyTreeDNA now indicate the “side” that each of our Matches is probably on. So far, I think this process is pretty accurate. The Common Ancestor should agree with the “side” for a Gold Star. If there is not agreement with the side, there may an additional Common Ancestor with the Match (on the same “side”]; or the “side” may be incorrect.

6. Paper Trail – each Common Ancestor should be supported by good genealogy paper trail of solid records. Not always possible; but add a Gold Star if you can document your and your Match’s paper trails.

7. Segment Triangulation – indicates your DNA segment is an IBD (true) shared segment; and probably the Matches’ segments are too. A Gold Star.

8. Shared Matches – [aka In Common With; Relatives in Common]. If most of the Shared Matches are in agreement, add a Gold Star.

9. Clustering – tends to group DNA Matches on an Ancestor. If the consensus of Matches in a Cluster is an Ancestor (or even 2 or 3 in an Ancestral line), add a Gold Star.

10. Reasonable Tree – does the Match with a Common Ancestor have a reasonable Tree? If a Match has a Tree with just one descendant (the Match’s Ancestor), that is a warning signal [NO Gold Star]. If a Match has a Tree with way too many children, given names repeated, different children with same birthdate – this is probably a research Tree with a collection every possible child – sometime born at many different locations – warning-warning! This is very flimsy evidence (NO Gold Star]. However, if the Match’s Ancestral line shows a reasonable number of children, spaced 1 to 3 years apart, that is a good sign. Alignment with census records is a plus. Use judgment to claim a Gold Star.  

Ideally, we’d have 10 Stars for each Match – but, that ain’t gonna happen very often… And I probably won’t be adding a Star # in my Notes. But I do review most of these when I accept a Match with a Common Ancestor. I just thought I’d share my compilation of thoughts when I find a CA.

This may be an imperfect list, but I hope it is helpful. Improvements/suggestions are welcomed in the comments. This Gold Star concept is not a set of hard rules – it’s intended to be helpful ideas. Your judgment should be the final say for your genealogy.

Note for genealogists – our genetic cousins are a small fraction of all our true cousins. I often add individuals to my Tree who are not DNA Matches.

[22CD] Segment-ology: Gold Stars TIDBIT by Jim Bartlett 20231229

Quandary

Featured

A Segment-ology TIDBIT

What if the genealogy is correct but the shared DNA is on the other side? Discard because the relationship is not from the Ancestor who passed down the DNA segment? Save because we are in fact real cousins, despite the DNA? Most of our real cousins beyond 3C won’t share enough DNA to be designated as a Match.

Same quandary with a Match sharing one DNA segment, but related two ways. Both ways cannot be through the same segment.

Now that Ancestry shows “sides” (Maternal/Paternal), I’m finding that some of the ThruLines are not on the same “side”.

Sometimes this happenstance leads to finding a genealogy error and/or finding another genealogy relationship which is compatible with the shared DNA segment – sometimes not.

With almost 50 years of genealogy research under my belt, I’m very reluctant to “discard” any true relationship. I worked for 35 years finding cousins before atDNA testing came along – I’m not going to trash tens of thousands of cousins just because they don’t share DNA with me. They certainly share Ancestry with me – and records and stories and friendships.

On the other hand, my current quest is a deep Chromosome Map – linking my DNA segments to my Ancestors. Sort of a “who is responsible” for each of my quirks. A relationship that is not based on a DNA segment, is a distraction at best… a wrong rabbit hole… a misdirection… an error!

I think the solution is to keep all the findings, but clearly mark the genetic genealogy ones.  What is your take? Please leave comments.

[22CC] Segment-ology: Quandary TIDBIT by Jim Bartlett 20231224

Go for the Triple Play!

Featured

A Segment-ology TIDBIT

When reviewing Ancestry ThruLines (or any potential Common Ancestor), go for the Triple Play!

Make sure the Common Ancestor AND the Side (Maternal/Paternal) AND the consensus of Shared Matches are all in agreement. If the CA is correct, they should be. Or at the least, there shouldn’t be a large conflict. I am finding a number of ThruLines under 15cM which do not agree with the Side. It is entirely possible to have a genealogy relationship (per ThruLInes) which is not the same as the genetic relationship (I believe most of the “Side” designations are valid). This would mean there is also another Common Ancestor that agrees with the Side – entirely possible for my Colonial Virginia ancestry. Or the Side could be wrong…

In any case, when you don’t have a Triple Play, it calls for some extra thought and/or research.

Just saying…

[22CB] Segment-ology: Go for the Triple Play! TIDBIT by Jim Bartlett 20231220

What Is the Next Segment?

Featured

A Segment-ology TIDBIT

A question recently came up: Are the Ancestors on two sides of a crossover point, always a mother and father (in either order)? Or: If I know the Common Ancestor (i.e. the father or the mother of the TG couple) of a TG segment, must the next TG segment be the other parent of the TG couple.? The answer is YES, with an important caveat: only when we are talking about mother and father of our Ancestor who created the crossover.

Important scientific fact: A crossover is formed when a human recombines two Chromosomes to create a new Chromosome that is then passed to a child. One of the two Chromosomes is from the Mother, and the other is from the Father. So one parent is on one side of each crossover, and the other parent is on the other side of the crossover.   Here is Figure 6 from my 2015 blogpost: Segments – Bottom Up:

Note: each of the Chr 05 lines above is your Maternal Chr 05 – it’s just broken down for each generation. In the Grandparent look, the two crossovers were created by the parent using grandparent segments (assuming an average of 2 crossovers per generation for Chr 05). Note the Ahnentafel numbers to represent generic ancestors – even numbers are males, odd numbers are females. The first crossover created by the parent shows 7 & 6, or female & male, on the two sides of the crossover. When the first grandparent segment ends at the crossover, the next segment is the opposite parent. The second crossover created by the parent has 6 & 7 (male & female) on the two sides of the crossover.

The next line – the Great grandparent look has 2 more crossovers – created by the grandparents, when each of them recombined their respective 2 Great grandparent chromosomes. One of the crossovers is between 14 & 15 and the other between 13 & 12 (there was no crossover when the Ancestor 14 segment was passed to daughter 7). So again, each new crossover has a male and a female (in some order) on the two sides of each crossover.

Check out the two crossovers (on average) added at each of the next two generations – they all have the mother on one side and the father on the other side of the crossover.  Note carefully the word “added” (or created or formed).

Now here is the catch… In the Great grandparent look above, the last crossover has 12 & 14 on each side – two males. This seems to contradict the basic concept. And if we were applying the basic concept to TGs at the Great grandparent level it would be wrong. What’s up? Well, what looks like a crossover between Ancestors 12 & 14 is in fact a crossover – but it was formed by Ancestor 3 when she recombined Chr 06s from her parents 6 and 7 – these are the two parents of the ancestor who first formed (or added or created) the crossover.

When we form Triangulated Groups (TGs), we use groups of overlapping segments. But there is nothing in the TG criteria about the generation of the TG. We do understand that the TGs start and end at crossover points – when we shift from one Ancestor’s DNA to another Ancestor’s DNA. But until we can Walk the Segments Back (generation by generation), we don’t know when the crossovers were formed. There is one generation for each crossover, but until we have Chromosome Mapping we don’t know which generation it is.

Note: A TG Summary Spreadsheet will give good clues to the formation of crossover points – see Observation 5 (see linked blogpost).  In generation after generation the older crossovers can be seen, with only about 2 new crossovers in each generation. So the farther back we go with Chromosome Mapping, the newly formed crossovers will be there (with mother and father on the two sides). But the other crossovers may not appear to be mother/father, unless the origin of the crossover can be determined.

Bottom Line: With TG segments, sometimes the next TG on a chromosome will be the other parent, but more often it will not.

Edit 20240403: It was suggested that I add a Chromosome Map, showing segments from my 16 2xG grandparents. Here is one I did in 2013:

[22CA] Segment-ology: What is the Next Segment? TIDBIT by Jim Bartlett 20231209

Consensus

Featured

A Segment-ology TIDBIT

I was adjudicating a ThruLines from a Common Ancestor (CA) down to a Match. The grandchild of the CA didn’t fit. I find about 5% of my ThruLines are wrong so I just dotted the Match yellow (TL Wrong) to add it to that group. But as I was about to close out the Match, I clicked on Shared Matches (which I usually do anyway). The Match was at 13cM so I didn’t expect much. Surprise – over 20 Shared Matches, and almost every one was confirmed or “likely” to be on the line indicated by ThruLines! A clear consensus. I went back to the Match’s line and found another path that worked – back another generation from the ThruLines CA hint!!

The details don’t matter. The moral of this story is that a ThruLines CA AND a consensus of Shared Matches AND the AncestryDNA “side” should all be in agreement. This applies to CAs at other companies, too – the clues should be in agreement.

Takeaways:

1. When you find a CA, be sure to also review the Shared Matches and the side.

2. When you are searching for a CA with a Match, review the Shared Matches first to see if there is a consensus clue.

PS: this assumes you have diligently done your homework and put all known or likely CAs in the appropriate Notes (same for every company).

[22BZ] Segment-ology: Census TIDBIT by Jim Bartlett 20231206

WTCB Observations and Advice

Featured

BLUF: 1. Focus on lower cM Matches in a Cluster to determine the Common Ancestor; 2. Reduce the Cluster upper limit to cull out closer Matches once their ancestral line is imputed to more distant Matches. Known Matches impute to unknown Matches who carry the ancestral line to the next group of Clusters which split only to the parental Ancestors of the previous Matches. Unknown Clusters may well be Bio-Ancestors.

In case you missed one of my many blogposts on Clusters: Clusters form on Ancestors!

This comes from two “facts”: 1. Each of our DNA Matches shares at least one segment of DNA with us that came from a Common Ancestor (CA) – the basic tenant of genealogy DNA testing – the caveat being the segment needs to be Identical By Descent (IBD); i.e. a true segment; and 2. Matches who share the same CA will tend to show up on each other’s Shared Match lists. The inverse is that when a group of Matches show up on each other’s Shared Match lists (i.e. each of your SM lists with them include many of the same Matches), they will almost always share the same CA.

“Clusters form on Ancestors” is a powerful observation – when it happens… And beware the Cinderella slipper – don’t try to force fit a Match into a Cluster if they only share with one or two other Matches – easily seen in SM lists and on the fringes of some Cluster diagrams.

So let’s dive a little deeper. A lot depends on the mix of Matches that are being Clustered. In a perfect world we’d like to Cluster, say, only 3rd cousins (3C) – the resulting 8 Clusters (hopefully) would be 1 Cluster for each Great grandparent. The average for a 3C is 73cM, but a true 3C can range from 7cM to 234cM (per the Shared cM Project 4.0). The point is there is NO cM range that would only include 3C. And it only gets worse with 4C and beyond (and if you follow me – I go way beyond 4C). So: Live with it! We can take some measures to tighten up our Clusters as we Walk The Clusters Back (WTCB).

When we start with an 80 or 90cM lower threshold for a Cluster run, we usually get 4 Clusters, with one for each grandparent. These Clusters tend to follow the rule. But beyond that, with smaller cMs and more distant cousin-Matches, the randomness of atDNA comes into play. We can say the growing numbers of Clusters (as we lower the cM threshold) will tend* to a CA. But I use “tend” because it’s not a guarantee – it cannot be a rock solid rule – “the random DNA didn’t get the memo”.

So, can we have a Cluster using a Cluster run of 60-300cM have 2C, 2C1R, 3C, 3C1R, and 4C Matches in it? Absolutely! They should all be on the same line, but that brings up two important points.

1. Old saying: “Everybody has to be someplace”. The 60-300cM range covers all those cousinships (and more); and in Clustering, every Match “has to be someplace” – it will go into some Cluster! Some of the higher cM Matches (closer cousins) may well have gray-cell links to other Clusters. The way I think about it is that they are “confused” about which Cluster to be in – they are tugged in several directions – but the Cluster algorithm always picks one Cluster. My advice for these Clusters is to focus on the CAs of the smallest cM Matches in each Cluster – usually the most distant Matches – to determine the CA of the Cluster. Hopefully we’ll get a clear consensus (but remember Cinderella’s slipper).  The higher cM Matches in each Cluster often will have gray-cell links to other Clusters – this serves as a QC (Quality Control) check that the several Cluster CAs are related and appropriate. It also confirms these higher cM Matches are closer cousins, descending from all the CAs of gray-cell-linked Clusters.

2. It will also help to reduce the upper cM limit, to cull out some (but probably not all) of the closer cousins as the lower threshold is reduced in the WTCB process. These “closer cousins” have already “done their job” for WTCB. They have helped determine Matches who are one more generation back. In other words, your 2C Matches will help identify your 3C Matches (who have to be from one or the other of the 2C parents). At each Cluster run this information is imputed to the other Matches in their respective Clusters. This is not perfect – there will also be some 2C1R, 3C1R, half 3C, 4C1R in the mix. But it gives you a much better/tighter picture of the CA of each Cluster. These imputed/”tagged” 3C Matches will carry the ancestral thread to the next round of Clusters. Remember, going back in generations, there are only 2 possibilities in the next generation – the father or the mother. Reducing the upper cM threshold will cull out Matches that have already “passed on” their Ancestral line, and will force each new Cluster to group on itself.

The point is to make successive Cluster runs, lowering the thresholds each time to get more Matches, who tend to be a little more distantly related and will divide up into new Clusters which will be a little more distant. In each of these new Clusters there should be a mix of “old” Matches (from previous Clusters, some with known relationships, and some with imputed CAs), and “new” Matches (some with known relationships and CAs, and some unknowns which will be imputed based on analysis of all the Matches in the new Cluster).  

Note 1: Usually, I use CA to note the Common Ancestral Couple between myself and a Match. Clusters tend to form on specific Ancestors. Are they individual Ancestors or the parental couple? I’m not real sure. I will say that I rarely find a Cluster that I can identify solely to a female Ancestor. This makes sense because most of the time the husband/wife couple are together. So I will continue to use the male Ahnentafel number to describe my Cluster CAs.

Note 2: WTCB is a relatively easy process to start, but with each iteration it gets harder – both because of the approximate doubling of Matches at each step but also the inevitably difficult Cluster(s) to sort out (probably a brick wall). In any case you can start manually by just walking down your list of Matches in cM order and coding them (I’d use the Ahnentafel Number) and checking with their respective Shared Match lists. Stop whenever you want. (For me, the first two WTCB iterations were easy (a few hours); and then I worked on one a day for several more…. Your results will vary, depending partly on the amount of “Notes homework” you’ve already done.

Note 3: This is a great tool for bio-Ancestors, Brick Walls, NPEs, etc. Using this WTCB process will identify known Clusters. Some may leave you stumped (a few did for me). One reason you may be stumped is because you have no known/imputed Matches for a new Cluster – just Matches staring back at you with no clue how you are related. The WTCB Cluster comes to a halt. Now’s the time to examine all of the available Trees from the Cluster. If the Cluster Matches have their own CA, you have a BINGO! That’s probably your Ancestor, too. Check any gray-cell links to other Clusters to learn more about where in your Tree this could fit.

[19O] Segment-ology: WTCB Observations and Advice by Jim Bartlett 20231130

Triangulation and Clustering Among Companies

Featured

A Segment-ology TIDBIT

Bottom line up front: Triangulation and Clustering should have pretty much the same result at each company. This may allow imputing some Common Ancestors from Ancestry Clusters to the other companies and imputing some Triangulation segments from other companies to Ancestry Matches.

It may seem obvious, but it bears repeating. Your ancestry is fixed, static, unchanging. Your biological ancestors are determined at conception and cannot change. Your parents, grandparents, great grandparents, etc. remain the same no matter where you test.  Likewise, your DNA segments from Ancestors are determined at conception. These segments do not change throughout your life and are the same no matter where you test.

Therefore, the grouping methods we use should be roughly the same no matter where you test.

Your DNA segments don’t change. Segment Triangulation is based on your DNA segments. Each Triangulated Group (TG) is based on one of your DNA segments. A TG formed at any company would be the same specific segment at each of the other companies. In fact my DNA segment spreadsheet has 372 TGs formed from segments from all of the testing companies. The segments identified by each company “fit” into my TGs. There might be very slight differences among the companies, but the overall segments still fit only one way.

Your Ancestors don’t change. Take, for instance, the LEEDs method, which groups your 90-300cM Matches into four groups – one group for each of your four grandparents. No matter where you test, the four groups would be the same – one  for each of your four grandparents.

LEEDs is a special subset of Clustering. Clustering groups Matches on your Ancestors. Clusters should form on the same Ancestors, no matter which company is being used (Clustering depends on Shared Matches aka In Common With or Relatives in Common). It’s almost like a parallel universe at each company – for a given range of Match cMs, about the same Ancestor Clusters should result – based on your Ancestors. Clearly a Match who has tested at two or more companies, should show up in the same Cluster at each company. Maybe not 100 percent of the time, due to the vagaries of Shared Matching, but most of the time.

I need to try Walking The Clusters Back at each of the companies (at various cM thresholds) and see how parallel they are. I strongly suspect very strong concurrence with the larger Clusters, and large cM thresholds. Perhaps at some point, with lower cMs, the concurrence will drift away .To be continued…

Takeaway. We can Triangulate segments at 23andMe, FTDNA, and MyHeritage – giving each Match a TG for each shared segment. We can Cluster Matches at 23andme, FTDNA and MyHeritage, and note any concurrence of TG segments in these Clusters (usually one or a very few). We can determine some Common Ancestors at these companies. We can determine many more Common Ancestors at Ancestry (particularly out to 6C with ThruLines). We can Cluster at Ancestry and note any concurrence of Ancestors in these Clusters (there usually is one). Some Ancestry Matches have also tested/uploaded elsewhere, so we can determine their TGs. We can then compare Ancestry Clusters with Clusters at the other companies for congruence – allowing us to impute Common Ancestors to the Matches at other companies, and TG segments to Ancestry Matches. Maybe not in all cases, but probably in some cases.

[22BY] Segment-ology: Triangulation and Clustering Among Companies TIDBIT by Jim Bartlett 20231028

Can Three Fourth Cousins Share the Same Segment?

Featured

Bottom Line Up Front [BLUF]: Yes, but caution.

Here is the original statement that prompted this blog post:

The chance that three fourth cousins will all share the same matching segment is practically zero.

A bold statement – repeated several times – that has implications for Triangulated Groups. It appears this was part of the education material provided by AncestryDNA for their DNA Circles feature [Hat Tip to Debbie Kennett – the material is no longer online].

This means you and two 4C Matches sharing the same matching segment [all three of you descending from 3 different children of the Common Ancestor].

Mitigating factors:

1. Shared DNA segments in a Triangulated Group (TG) are rarely “the *same* matching segment”. We are almost always talking about overlapping segments of different sizes. So that gives some wiggle room. Maybe the odds are just small (not practically zero) with a group of different sized segments in a TG.

2. As Debbie pointed out to me, these were simulations by Ancestry, using “perfect” data. In genetic genealogy our data is usually somewhat messier than simulated data, so there is even more wiggle room. Maybe the odds are on the low end…

3. Another factor is that the data has grown substantially since the simulations were done for the Circles feature. The information has been removed.

The bottom line for me becomes: If you find 4C Matches in one TG from more than two other children of the Common Ancestor, take a closer look at it. It is possible, but there may be other factors at play.

Segment-ology CONCEPT – For Matches forming a TG (overlapping segments in a range), the odds decrease with each generation going back and with each additional child of the Common Ancestor. Take a critical look within TGs beyond 3C Matches spread over more than 2 other children. The odds are very small with Matches from 3 other children (total of 4 children).  This is not a “rule”.

Important Note: This does not mean that we cannot have DNA Matches from 4 or more children. We can! Instead of a double negative let me say: We can have 4C Matches from more than 3 other children of the Common Ancestor – we can have 7C Matches from 5 other children of the CA. It just means that there is more than one segment (TG) involved.  Over the different children, we should expect to see several TGs. We can have over a hundred Matches in a TG going back to 7XG grandparents, for example. We just need to carefully screen for the number of children per TG.

Takeaway: It’s hard to have a hard “rule” on this subject. However, it makes sense to pay attention to our data. The further back we go (in generations), the more constrained our options become.

I’m inviting discussion on this Segment-ology CONCEPT, and on your experience with TGs and numbers of Common Ancestor children.

[08E] Segment-ology: Can Three Fourth Cousins Share the Same Segment? By Jim Bartlett 20230812

A Triangulated Group is an atDNA Haplogroup

Featured

A Segment-ology CONCEPT and Thought Stimulator

Per Wikipedia: A haplotype is a group of alleles in an organism that are inherited together from a single parent, and a haplogroup is a group of similar haplotypes. Your atDNA segment from an Ancestor is likewise a group (or string) of alleles (SNPs) that is inherited from a single parent – an atDNA haplotype. The Match segments in a Triangulated Group (TG) have this same string of SNPs – they have matching shared DNA segments – and this group would then be a Haplogroup (Hg).

A Triangulated Group of segments would be a Haplogroup.

Wikipedia also notes that in human genetics, the haplogroups most commonly studied are Y-Chromosome (Y-DNA) haplogroups and mitochondrial DNA (mtDNA) haplogroups, each of which can be used to define genetic populations.

In exactly the same vein, a Triangulated Group (TG) defines a genetic population. It’s the population of descendants who carry the same segment of DNA passed down by an Ancestor. DNA test takers in this population have shared DNA segments with the same string of SNPs – they match each other!

An mtDNA Hg is often many thousands of years old (because the mtDNA rarely changes). A Y-DNA Hg is usually somewhat closer, and with a Big-Y test, is often found within a genealogical timeframe. My estimate is that an atDNA Hg (a TG) is usually 5-9 generations old – generally within a genealogical timeframe. We could argue that a TG Hg is a better tool than Y or mt. For me, it is a very good tool. In any case, each DNA Hg tool has strengths in genetic genealogy.

Note that the process of Triangulation culls out most, if not all, false shared segments. A few false Match segments (under 15cM) may slip in; but your own DNA, as the base in a TG, is true. If such an under-15cM Match is critical to you, you need to check for Triangulation with that Match segment as the base.

MUSING….

Dr. Tim Janzen – one of the earliest pioneers in atDNA (and my early mentor), has often advocated for a database of unique atDNA segments from our Ancestors. I used to think of this as a giant TG database and wonder how we would describe each TG. Now I think it would be an atDNA Haplogroup database, but still wonder how we would describe each Hg. Each segment would be unique to a specific Ancestor and would be on a specific chromosome (with start and end points). Note the chromosome could be maternal or paternal, depending on each Match’s ancestry. This segment would manifest itself in a TG, with shared segments from other descendant Matches. Each Match would likely have his or her own unique TG. These TGs taken together would represent an atDNA Hg from that Ancestor.

NB: if we can phase our data, we could actually record the SNP alleles (ACGTs) in each TG (or atDNA Hg)! Alternatively, by comparing raw DNA data among the Matches in a TG, we could probably determine the individual the SNPs. Remember your TG segment is the equivalent of phased DNA.

This post is about an atDNA Haplogroup. It’s a concept to think about. Your thoughts are welcome here.

[14B] Segment-ology: A Triangulated Group is an atDNA Haplogroup by Jim Bartlett 20230802

Triangulation on a Side Is a Snap

Featured

When working with Matches on one side (Maternal or Paternal), segment Triangulation is a snap. Overlapping segments are all you need! The overlap should be at least 7cM, and more is better.

The basic rules to form Triangulated Groups, were designed to insure your overlapping shared DNA segments were on the same side – in other words on just one of your chromosomes. This means, from your viewpoint, the overlapping segments were both (or all) on your maternal *or* paternal chromosome. It didn’t matter which side it was on for your Match. You can have lots of shared segments on one chromosome, but some may be on your maternal chromosome and the others on your paternal chromosome. It is virtually impossible for Match A’s shared segment on your maternal chromosome to also match Match B’s shared segment on your paternal chromosome. So the requirement is/was to compare Match A and Match B to insure they match each other – and are thus on the same chromosome with you.  *IF* you already know Match A and Match B are on, say, your maternal side, then their shared DNA segments with you would be on your maternal chromosome, and there is no additional need to compare them to each other – they Triangulate.

I am sure, in the grand scheme of genetic genealogy, that an occasional glitch could occur. I’d estimate this as way less than 1% probability.

FTDNA has maternal and paternal buckets which appear to be pretty accurate. If the companies designated a “side” and allowed us to filter Matches based on that side, it would sure speed up segment Triangulation. Just look at a spreadsheet for natural crossover breaks in each chromosome.

In the meantime, if you can designate your Matches as Maternal or Paternal in some way (compare to a parent’s test, ethnicity, shared matches, etc.), you can use that info to filter your Matches and ease the segment Triangulation process. There’s still a lot of work to do, but this should ease the process some.

[10E] Segment-ology: Triangulation on a Side Is a Snap by Jim Bartlett 20230730

Triangulated Group Segments Are Like mtDNA

Featured

A Segment-ology TIDBIT

mtDNA is passed from a female Ancestor down the all-female line to each of us. A Triangulated Group (TG) DNA segment is passed down from an Ancestor to us. The concept of DNA being passed down a specific ancestral line – from an Ancestor to us – is the same. Such is also the case for Y-DNA – it is passed from an Ancestor down the all-male line to a man. In the case of mtDNA, the ancestral path is all females; in the case of Y-DNA, the ancestral path is all males; but in the case of atDNA, the ancestral path can zig-zag between male and female Ancestors. Any of our Ancestors could pass an atDNA segment down to us.

The point is the TG segment is found only on one specific ancestral line (like the mt or Y line). However, it is still a genealogy task to figure out which line. As we “walk the segment back” from our own DNA back up our ancestry, there are only two options at each generation. If we know a TG segment is on our maternal side, the next generation back must be one of the maternal grandparents – and so on.

Just as we use mtDNA or Y-DNA, looking for a someone who shares that same DNA with us, to find our Common Ancestor; so, too, we understand that our atDNA Matches in a TG (thus sharing that same atDNA with us) will have a Common Ancestor with us.  

This is just another way to think about our DNA segments – they are just as focused as the mt or Y on *one* ancestral line.

[22BX] Segment-ology: Triangulated Group Segments Are Like mtDNA TIDBIT by Jim Bartlett 20230728

Getting ThruLines to Work for Me

Featured

A Segment-ology TIDBIT

Here is the set up for my BROWN story, without dragging the reader through the whole back story. This line includes most of the descendants in the BROWN Y-DNA Project Group-40.

I’m searching for the children of Wilson BROWN (he probably had 10 children, only two daughters are known). This is my Tree at AncestryDNA. I expected ThruLInes to find some Matches… Nada. I had Matches from Keziah (and her husband Elliott BAKER) on down. I had none from Wilson – not too surprising because no one has any Trees for Wilson (except for daughter Keziah). I expected some 6C Matches from James, because I know they are out there – but… nothing.

So I used my “Search on a Surname” process [here] – I searched for the BROWN surname, and checked each Match’s Tree for likely families. One family that quickly became the standout was the family of Thomas BROWN 1773 married Nancy LITTON. I was getting a lot of “hits” on that family. So, I looked them up at Ancestry – there are 2,668 Trees for that line! Almost everyone who shows his parents, has John/James BROWN b 1731 MD; married 1755 Plymouth, MA; d VA & Sarah LITTLE b 1737 VA; d 1779 VA.

Two key points about Thomas BROWN 1773:

1. I have found over 70 Matches who descend from Thomas BROWN b 1773 (shared DNA segments from 10 to 30cM). These are spread over virtually all of his children.

2. Two descendants of Thomas BROWN 1773 – through different children – have taken a Y-DNA test and are in BROWN Group-40. So, Thomas BROWN 1773 is BROWN Group-40. No one else in Group-40 has claimed descent from his father, John BROWN 1731.

I have concluded that Thomas BROWN 1773 must be a son of Wilson BROWN and so I added him (and his children) to my Tree.  I stand alone in doing so…

I waited over a month for Ancestry’s ThruLines to show me the 70 Matches I had found – nada. Disappointing… Ancestry clearly had Thomas BROWN 1773 locked onto John BROWN 1731. I’ve written at least 10 blog posts about the power and usefulness of ThruLInes – search for links to them in the Segmentology Outline [here]. One post is about ThruLines X-Ray vision looking into Private Trees…

So, I decided: maybe Ancestry is correct! Maybe if I accepted their version, ThruLInes would report some of my DNA Matches as cousins. So, I changed my Ancestor Keziah BROWN from the daughter of Wilson BROWN to the wife of Thomas BROWN 1773 (so the two of them looked like the parents of my ancestor, John Brown BAKER – almost like Thomas had an affair with Keziah.)

The next day Ancestry listed 31 new ThruLines Matches (spread from 6 to 30cM) – all descending as half-cousins from Thomas BROWN 1773 – WOW.  All of these were new to me. 2xWOW! Near the top of the list was a Match with a Tree with only 2 parents, and 3 grandparents – ThruLines built the Tree back to Thomas 1773. I have built a lot of Quick&Dirty Trees in my BROWN searches, but I would not have tried that one. 3xWOW! And another Match had a Private (but searchable) Tree. I’d never have found that one. 4xWOW!

The fact that I adopted the on-line version of BROWN Tree does not detract from my goal: find more DNA Match cousins from Thomas BROWN 1773. And ThruLines delivered.! The Matches share DNA with me (no matter how the Tree is drawn).

I still need to put all of these in a spreadsheet; make sure they are reasonable; figure out the averages and see how they compare to the Shared cM Project. And I’ll wait a few more days – fully expecting another tranche in the next day or two.

BOTTOM LINE:

This method will sure save a LOT of scrolling through all the thousands of 8-9cM Matches for BROWN Matches (it took me over a Month of steady focus to just get through the 10cM BROWN Matches). And it will find cousins with Private Trees and cousins with very small Trees that don’t have BROWN in them!

[22BW] Segment-ology: Getting ThruLines to Work for Me TIDBIT by Jim BARTLETT 20230707

Identifying False Shared DNA Segments

Featured

A Segment-ology TIDBIT

I contend that segment Triangulation will identify most of the false shared DNA segments reported from your DNA test. This includes a Match with one segment which is false; as well as a Match with multiple segments, some of which are false. I have Triangulated DNA segments at FTDNA, 23andMe, MyHeritage, and GEDmatch, and found many false segments (segments which did not Triangulate with other overlapping segments). In almost all cases these false segments are under 15cm. I cannot guarantee that all the false segments can be identified this way, but I am confident that most can. Triangulation is a time-consuming process – starting with a download of all your segments from one company at a time; sorting them by Chr and Start, and then working down the list to see which ones Triangulate. I did a blogpost using MyHeritage as an example for segment Triangulation: https://segmentology.org/2020/12/29/triangulating-your-genome/

Warnings: this takes weeks; there are some Triangulations that are difficult – just skip over these; a few might slip through the cracks; there may be some bare spots in your DNA.

Special Note: Although a Match’s segment(s) may be false, that does not mean the Match is not a cousin. This looks like a double negative, so let’s phrase it this way: a Match may be a cousin and not share any DNA segment with you, or they may share a false segment with you. In fact, about half of your true fourth cousins (4C) will not share a DNA segment with you.

Recently, at the Genetic Genealogy Tips & Techniques facebook group, there was a post looking for ways to identify Matches at MyHeritage which are random junk. Segment Triangulation would identify a lot of false segments. However, at MyHeritage, it might be efficient to just download all segments, focus on those below 15cM (where most false segments would be) and work down the list to see which Match segments don’t have a TG Icon. Still a lot of work… If anyone tries this, please post about your experience – we can learn from each other. 

[22BV] Segment-ology: Identifying False Shared DNA Segments TIDBIT by Jim Bartlett 20230618

A Means To An End

Featured

A Segment-ology TIDBIT

I have long been a proponent of segment Triangulation (and Triangulated Groups (TGs)) and also Shared Match Clusters. Both of these are powerful tools. Both TGs and Clusters group your DNA Matches who share the same Common Ancestor (CA) with you.

But this is just a means to an end. By themselves these groups (TGs and Clusters) do not magically name an Ancestor, they point to a specific, but unnamed, Ancestor. They are just groups of Matches. We must also use genealogy!

By analyzing the Trees of Matches in a TG or Cluster, we can often find a consensus Ancestor. This Ancestor may be a known ancestor, and the Matches’ Trees may provide additional information for our research. Alternatively, this Ancestor may be a new Ancestor for us – a bio-Ancestor, a Brick Wall Ancestor, or even a “floating” Ancestor (unknown connection to our Tree). Or perhaps a fluke, a coincidence, a curve ball from our DNA Matches.  Although a fluke is possible, as you research this new “ancestor” more, it either becomes more and more probable as your Ancestor, or less and less likely. In my experience, the evidence usually starts to mount . In only one instance for me did it pretty quickly fall flat (and in that case, I found a “secondary” consensus Ancestor in the group which worked out). As usual, treat this “consensus Ancestor” as a good clue.

Another way to frame this is: TGs and Clusters are good tools – more genealogy work is needed to make them useful – to find out more about your Tree.

The point of this TIDBIT, is that forming TGs and Clusters are good processes, but they are only a means to an end. IMO, they are definitely a step in the right direction, but the research journey is not over with that step. We need to take the next, genealogy, steps of analyzing the groups to find the CA and then integrating that information into our own genealogy.

BOTTOM LINE: TGs and Clusters are a good step – analyzing these groups is an essential next step.

[22BU] Segment-ology: A Means To An End TIDBIT by Jim Bartlett 20230611

Segmentology Outline Updated

Featured

I have just updated the Outline of Segmentology. This is located in the black bar in the header of every page – just click it. The posts over the years have jumped around; the Outline tries to put the titles in some order – sort of like a Table of Contents. Each one is hyperlinked to the blogpost.  This Outline provides an easy way to scan through the topics and blogposts and navigate to the ones that interest you. Don’t forget to read the comments and questions for each blogpost – they often contain additional information.

[00] Segment-ology: Segmentology Outline Updated by Jim Bartlett 20230604

Using the Shared cM Project in Reverse

Featured

A Segment-ology TIDBIT

Subtitle: Filter Your AncestryDNA Match List

Bottom Line Up Front: Filter your AncestryDNA Match list by cMs when your objective is a distant Ancestor. Also use Side and Surname filters to further reduce your Match list for review.

Most of us use the Shared cM Project to look up the Shared cMs of a DNA Match to see the possibilities of our relationship. We’ve learned that 3,500cM means a parent/child relationship; 2600 cM is a sibling; 1750cM is a small group of close relatives; 880cM is often a 1st cousin (1C), or other close relatives. And we understand that as the shared cMs get smaller, the range of possible relationships expands.

But we can use the Shared cM Project in the opposite way – what is the probable shared cM for a given relationship? We see narrower ranges for close relationships, and fairly wide ranges for more distant relationships. However, even though the range may be relatively wide for distant relationships, the average cM does shift down as the relationship gets more distant. For example, a 3C relationship would not have more than 234cM; a 7C would not share more than 57cM (and the average for a 7C is 14cM).

Suppose we are focused on a particular Ancestor in our Tree. Our Target could be a brick-wall Ancestor; or a potential NPE (Not the Parent Expected) Ancestor; or a suspicious Ancestor; a known Ancestor for whom we’d like to find a new cousin who has some in-depth knowledge of that family; etc. We can use the Shared cM Project to narrow down our search! This works for close relationships as well as distant ones – my focus here is on the distant ones, but the process is what is important.

A good site to use the Shared cM Project is at DNAPainter: https://dnapainter.com/tools/sharedcmv4 – this site provides a number of interactive tools along with the basic Shared cM chart. For example, a 5C relationship (back to a 4xG grandparent – we have 64 of them) would average 25cM – and 50% of them would be in the under-20cM range. In this case we might want to start with the 50% in the over-20cM range. But for a 6C relationship (for 128 of our Ancestors) the average is 18cM and 70% of the Matches will be in the under 20cM range. Each of us gets to pick our own objectives and projects to pursue. If yours is like many of mine and beyond the 5C range, working within a range of cMs may be helpful.

Here’s how. At AncestryDNA, open up your list of Matches, and look at the row of “Filtered by:” tabs – we can use many of these in combination.

For this example I’m going to use the Shared DNA tab to select a range of cMs, AND the Groups tab to select my Maternal side.

In the project I’m working on, I’ve already looked at the Matches down to 16cM. By selecting a range for the next search, I also speed up the time for AncestryDNA to produce my filtered list of DNA Matches. In this case I’ll be looking at 11cM to 15cM. Remember to Click on Apply!

I’ll also filter my list by using the Groups tab to restrict the list to only Maternal-side DNA Matches:

In this case, I’ve chosen the Maternal side (remember to click on Apply). This will filter out a lot of Matches that don’t really apply to my Target Ancestor [I can later select the “Unassigned” Group to check those, too.]  Note that you can ALSO select several categories under Custom groups – such as New Matches (especially helpful if you want to revisit this project at a later date to check on new Matches), and/or any of your “dotted” Match categories (not much help in my current project because I’ve already visited – and dotted – all of those Matches I could determine, and I’m looking for new ones now in the under-20cM range.) In some projects, this “dotted” filter may be valuable.

And there is one more filter I often use. For my current project, I am looking for my DNA Matches who have a BROWN Ancestor. So, I click on the “Search” tab, which brings down a row of options. I type BROWN into the “Surname in matches’ trees” search box:

You can decide to check the “Include similar surnames” box, or not. For this project, I got plenty of results with just BROWN, and, if I wanted to, I could go back and try BROWNE or BRAUN, etc. I’ve had mixed results with the “Birth location in matches’ trees” filter box – sometimes the result is either the surname or the location, and I’d be wanting both. I did have very good results on a project with HIGGINBOTHAM surname and Amherst County, Virginia location. You might need to try some combinations to see which works the best for your project. Remember to click on the blue, highlighted Search box to include this filter.

This process of filtering is a powerful way to shorten your list of DNA Matches, tailoring them to your project goals. The addition of a cM range has helped me focus on more distant Ancestors and to speed up the AncestryDNA listing algorithm.

BOTTOM LINE: When you are searching for DNA Matches, think about the best way to combine the filters (including a cM range) and search parameters at AncestryDNA.

[22BT] Segment-ology: Using the Shared cM Project in Reverse TIDBIT by Jim Bartlett 20230601

Does the DNA Come from the MRCA Father or Mother?

Featured

A Segment-ology TIDBIT

Bottom Line: We cannot tell from a single Match; but there are at least three ways to figure it out.

SETUP: One cousin, or even several cousins, who share a Most Recent Common Ancestor (MRCA) with us. The “Common Ancestor” is really a Common Ancestor “Couple” – usually a husband and wife. You descend from one child of this couple, and the matching Cousins descend from one or more of their other children. The question comes up, which parent passed down the DNA segment to us? From this data, we cannot tell which parent passed down the DNA. All we know is that one or the other parent passed it down. I know of three ways to figure this out – maybe you know of additional ways…

1. Grouping *to* the parents’ parents. One DNA Match-cousin back to the MRCA won’t do it. We need a group of Matches – at least one back to the MRCA, and other Matches – often with smaller shared segments – that go back another generation. The group can be formed via Clustering (grouping Shared Matches); or segment Triangulation (or DNA Painter). The goal is to find a Match in the group who is a cousin at least one generation farther back than the MRCA. This will almost always tell you which side of the MRCA the DNA came from. Note: there is a very low chance, this might not work, but finding more than one more distantly related cousin on this line adds insurance. It is best to do this with segment Triangulation which represents a single DNA segment going back, but Clustering works, too. As you find more and more Matches with the MRCA, eventually (with smaller cM segments) they will break into two groups, one for each parent in the MRCA. Each of these groups will be based on a different DNA segment (a different subset of a Triangulated segment, involving smaller shared segments).

Note this is really a subset of Chromosome Mapping, and/or Walking the Ancestors Back.

2. Different partners. If one of the parents had children with a different partner (married or not), and the other Matches descend from this other partner, then you know the shared DNA segment had to come from the Ancestor who had multiple partners – the same DNA could not have come from different partners.

3. Differing ethnicities. If the two parents in an MRCA have very different ethnicities overall (or you can tell the ethnicity is different for this specific shared DNA segment (usually a TG segment), then a review of the Matches’ ethnicities might indicate which one passed down the DNA.

As with many things in genetic genealogy, the DNA may throw you a curve ball. I’m old enough to remember the Mickey Mouse Club on TV – and the “anything can happen day”. As you continue on your genetic genealogy journey, the evidence will mount. It should all point to the same results – which match your unique Ancestors and DNA segments.

Feel free to use this blogpost as a way to answer this recurring question on-line.

Amended slightly to call out the different partners as one of three ways.

If you have a favorite method of figuring this out – please post in the comments.

[22BS] Segment-ology: Does the DNA Come from the MRCA Father or Mother? TIDBIT by Jim Bartlett 20230515

How to define the Most Recent Common Ancestor (MRCA)

Featured

There are, generally, two discussions about this.

1: One is the MRCA is really a couple – and both your and your DNA Matches descend from this MRCA (couple). You descend from one child of the couple, and your Matches descend from other children of the couple. We may not know which individual in the couple passed down the DNA we share, but we do agree it’s from one or the other.

2: The other discussion is that the child of the couple that you descend from had to have the DNA segment and is then *the* MRCA (singular). But, at least in my mind, this MRCA (singular) is not the MRCA of the Matches, and things get a little complicated trying to have a discussion among the Match-cousins. The various MRCA (singular)s are not “common” to the various Matches.

In Segment-ology, I use MRCA to mean MRCA (couple). This couple is common to Matches who descend from them.

[22BR] Segment-ology: How to Define the Most Recent Common Ancestor by Jim Bartlett 20230513

How To Detect False Matches Theory

Featured

Small-segment Matches (<15cM) with conflicting Shared Matches (no consensus) are probably false.

This Observation/Theory comes from two directions:

1. Science and observation tells us that (A) almost all Shared DNA Matches (Matches) >15cM are true; and (B) to varying degrees, some Matches below 15cM are false. There is a distribution curve which has a small percentage of 14cM Matches as false, down to about half of 7cM Matches are false and larger percentages of even smaller segments are false. It’s hard to sharpen this marshmallow of data, so please grasp the overall concept that as the cMs decrease below 15cM, the percentage of false Matches increases to areas where most Matches are false.  And don’t be confused – you can have a true Common Ancestor with a false Match – it’s just that there is no DNA link (or “proof”). In fact, most of our true 4th cousins (4C) and greater will not be DNA Matches at all. Another point is that there is no such thing as “partly true or false” – the shared DNA segment is all true (from a Common Ancestor) or all false (not from a Common Ancestor). Please don’t go down the “rabbit hole” that *part* of a small segment may be true – we are already down to small segments, even tinier segments aren’t a step in the right direction, unless you have a very, very special case. Not the thrust of this blogpost!

2. My observations of Matches under 20cM shared segments as I search/analyze this huge group (80,000) for Matches with a BROWN Ancestor – also a huge subset (thousands) of my under 20cM Matches.

Background 1: I’ve done the “homework”. Using the Walking-The-Clusters-Back process (starting here), I was able to identify hundreds of Clusters to specific parts of my Ancestry. Almost all to a parent; 98% to a grandparent; roughly 90% to a Great grandparent; etc. out to some that were tagged to 7xG grandparents (8C level). Overall, I have in my AncestryDNA Notes for almost all over-20cM Matches, *some* indication of their line in my Ancestry. Most of these have also been tagged to specific Triangulated Groups (TGs). Part of my analysis of the under-20cM Matches is a check of their over-20cM Shared Matches. Most have some Shared Matches. Sometimes, an under-20cM Match will have many Shared Matches in consensus – most in the same Cluster or TG. Sometimes they will not.

Background 2: My BROWN Project. For this project, I filter my Match List by Maternal (the side for my BROWN line), a cM range (ie: 12 to 13cM); and the BROWN surname. I then look at each Match, and put something in the Notes box…

Background 3; Keeping Track.

[“Good” Matches] I’ll add (ie: impute) the Cluster/TG to the Notes for each concensus Match. Note: some of these will be a consensus on the BROWN line; some will be a consensus on some other line. I also add a Note about the oldest BROWN I’ve found for each of these Matches, if it’s likely to tie to my BROWN line – a judgment call.  

[When I’m finally done ferreting out all of these lines in a consensus BROWN Cluster and/or a good probability of a link to my Tree, I’ll look at the number of Matches in each family group, and really dig into the research – this is sort of like collecting clues in Quick-and-Dirty Trees to see what’s worth pursuing.]

[“Iffy” Matches] Some Matches don’t neatly fall into the Good or Bad category – I look at them and usually build their Tree back far enough to determine if a tie-in to my Tree is possible – judgment call.

[“Bad” Matches] I add a Note to Matches with no Shared Matches: ”SM:0”. And I add a Note to Matches with various, conflicting Shared Matches (often on both sides): “SM – var”.  

Observation 1: Research/Tree building for Matches with “SM:0” and “SM – var” Notes almost always went nowhere. They had a BROWN Ancestor (a search filter), but their BROWN Ancestor was from England or Scotland or New England, etc. Clearly a very low probability of linking to my BROWN line in Colonial VA. For these Matches I added “X BROWN” in the Notes to remind me they’ve been looked at and discarded.

Observation 2: Finding a Match with consensus Shared Matches that had known/suspected BROWN Clusters/TGs was a BINGO! They *almost always* had BROWN Ancestry that linked to mine – whether it was in their Tree or in the Q&D Tree I built out for them. Some of the iffy Matches which had a few Shared Matches with favorable BROWN Clusters/TGs, but not a consensus, turned out to link to my BROWNs. Some did not.

In my BROWN project, I’m done with the 11cM to 19cM DNA Matches and am part way through the 10cM Matches – a lot still to do. I wanted to record my observations that DNA Matches who had a BROWN Ancestor pretty easily fell into probable/possible vs “no way” categories. And my growing belief is that Matches with “SM – var” are probably FALSE Matches – not going to be a genetic cousin on any line.

BOTTOM LINE: Matches under 15cM with various conflicting Shared Matches are probably FALSE. Certainly, to be culled out to focus on Matches who have a clear consensus of Shared Matches on one line. This is *not* a guarantee, nor does it mean all other Matches are true. But given the shear number of BROWN Matches to go through, I’m going to begin using this theory as a “TRIAGE” method. It’s the only way I can get through this BROWN Project.

[22BQ] Segment-ology: How To Detect False Matches Theory by Jim Bartlett 20230506

Who Ya Gonna Call? (hint: NOT Ghostbusters!)

Featured

A Segment-ology TIDBIT

The Visual Phasing process looks at full chromosomes of three siblings to determine grandparent crossover points. The Leeds Method uses Matches over 90cM to group by grandparents. Great grandparents rely on 2nd cousins (2C) Matches which average 229cM. Even out to 4xG grandparents, we rely on 5Cs at an average of 25cM. What if your genealogy question or interest is more distant?  

I recently broke through a 48 year brick wall. My known ancestor was Wilson BROWN c1751-1793 who died without a Will or any other document listing his wife or children. I’ve always known his name, because the marriage license of Keziah BROWN to Elliott BAKER in 1801 listed her father as Wilson BROWN, decd – but little else, except a probable brother Isham BROWN. Finally, the 1776 Will of James BROWN came to light – it listed 16 children including Wilson and Isham. With literally no Trees with Wilson BROWN, we have to find Match cousins from James BROWN – who would be my 6Cs.

Who Ya Gonna Call?

SMALL SEGMENTS!

The average for a 6C relationship is 18cM – and over 70% of the segments are under 20cM. We have to find and use and group these under-20cM Matches in order to build a case for a 6C relationship. Boy, did I pick the wrong surname to test this out – BROWN. So I’m searching my maternal Matches with a BROWN surname below 20cM – there are many.  I’m now down to 11cM, and the “hits” in VA, NC, SC, TN, KY area are showing up. Some appear to be single “hits” in otherwise large BROWN families (not helpful);  but some are starting to group on particular lines (promising). I think by the time I get down to 8cM Matches, I will have a number of strong candidate BROWN families, with a number of potential cousins on each line. I’m letting these small segment Matches tell me lines I’m related to.

Now, I recognize that some of these small segments may be false. At the 7cM level, we expect about half to be false. But the flip side is  half will be true (Identical By Descent). When I see what appears to be a single line of descent from a BROWN ancestor in the 1700s, I can well accept that it may be a false segment. On the other hand, if a number of Matches all descend from several children of the same BROWN patriarch, I’m more inclined to think that consensus indicates true, matching, segments. Even if we insist that half of these shared segments are false, we still have a lot of them which are true and all pointing to the same family.

In my case I’m sure my BROWN line is BROWN Y-DNA Group 40 – so a link to known Group 40 lines is another reinforcing piece of evidence. Also, from my Walk The Clusters Back process, I’ve identified almost all of my greater-than-20cM Matches to a Cluster and many of those to a Triangulated Group (TG) segment. Many of the under-20cM Matches have over-20cM Shared Matches (SMs). Sometimes there is a clear SM consensus (to a TG), and sometimes the SMs don’t have a clear consensus. When there is a clear SM consensus on a suspected “BROWN” TG, more often than not, I can build a Match’s BROWN ancestor back to the patriarch of a consensus group. This further reinforces these family groups.

BOTTOM LINE – If you are looking for cousins at the 6C or 7C or 8C level, you have to rely on Small Segments! And, IMO, when you factor in that they form a solid consensus group in one family, a high percentage of them will be true segments.

[22BP] Segment-ology: Who Ya Gonna Call? TIDBIT by Jim Bartlett 20230428

The Two Meanings of TG

Featured

A Segment-ology TIDBIT

A Triangulated Group (TG) has two meanings – segment and group.

SegmentA TG represents a segment of DNA. A TG is defined as a Start Position and an End Position on one Chromosome (on one side – Paternal or Maternal). It describes one part of your DNA – accurate between two cross-over/recombination points*. Thus, a TG is a long string of SNPs on one chromosome – it is phased data on one side or the other. If we were to compare the raw data from each of the Matches in a TG, we’d see that they all had the same value at each point.

GroupA TG is also a group of your Matches. Each Match shares a phased DNA segment with you** that came from one of your Ancestors***.  All of the Matches should share the same Common Ancestor (CA) with you.***

Hedging a little…

* There is no “sign post” in our DNA to indicate the cross-over points, and the matching algorithms cannot exactly determine the start/end points of a segment – they may start before, and/or end after, the real ancestral segment. But they are pretty close. I say the segment ends are “fuzzy” – but the bulk of the TG segment is definitely from one Ancestor.

** It is possible that some Match(es) may have a false segment that happens, by chance, to match the phased string of SNPs – virtually always a segment under 15cM. If this Match/segment is critical to you, the Match can Triangulate at that location to determine if the segment is true or false.

*** The segments that pretty much cover the TG segment should be from the CA. Smaller segments that appear to be from one end or the other of the full TG segment, may well be from Ancestors beyond the CA. Note this is often the best way to determine which parent (in a couple) the DNA segment came from – the more distant Ancestor would be ancestral to one or the other of the couple.

Usually the TG meaning can be gleaned from the context – is the author talking about segments or Matches. If it’s not clear – call the author (often me) out!

[22BO] Segment-ology: The Two Meanings of TG TIDBIT by Jim Bartlett 20230407

Percent of Shared Cousins Indicates Relationship

Featured

Subtitle: Teamwork in Practice

In my last blogpost about my ancestor Wilson BROWN, I hinted at a large group of Shared Matches to a Thomas BROWN b 1773 (m Nancy LITTON). Over 2,000 people have this Thomas in their Trees at Ancestry. After some collaboration, I was given access to the DNA kit for the person named MATCH in the diagram below [credit to Allen Brown]. I wanted to look at the ThruLines Matches for MATCH. Well… it turns out MATCH has 756 ThruLines Matches from Thomas BROWN b 1773, spread over 7 different children – just WOW! I looked at 276 of them, spread over 5 different children (not including MATCH’s direct ancestor). I clicked on each Match name to see if I was also a match to that person. Drumroll…. 28 of them were also DNA Matches to ME. So, using this information, how is Thomas 1773 related to ME?

Diagram of descent from James 1705 to ME and descent from Thomas 1773 to MATCH:

Remember the rough guidelines that true 2nd cousins (2C) will match 100% of the time; true 3C will match 90% of the time; true 4C 50%; true 5C 10%; and true 6C 2%? My little exercise resulted in ME matching 28 of 278 given cousins identified by ThruLines for MATCH. This is right in line with our expectation for a 5C!  In the diagram above, Thomas would most likely be the son of Wilson, rather than a nephew (which would result in a 6C relationship between ME and MATCH). Note: we had already determined that the James BROWN 1705 line and the Thomas BROWN 1773 lines were both in Group 40 of the BROWN Y-DNA Project.

Maybe this is a fluke. I’d like to find another Match descendant of Thomas BROWN and see if I have the same ballpark results. Also, I’m still reviewing all of my AncestryDNA Matches with a BROWN Ancestor to see if there is another firm group (or Cluster) of BROWNs, so I can see if they also might descend from Wilson BROWN. Spoiler alert: I do have a very large (100 Match) Cluster that I have linked to my Triangulated Group [06F36] – so I’ve used [06F36] to tag my AncestryDNA Matches in that Cluster. As it turns out, virtually all of the Matches I have under Thomas 1773 are tagged [06F36] – another indication of the power of Clusters. I can now really dig into the other [06F36] Matches (tagged at AncestryDNA and in the [06F36] Triangulated Group with 284 Matches from 23andMe, FTDNA, and MyHeritage) to find their BROWN ancestry.

James BROWN c1705-1776 [see diagram above] left a Will naming 16 children. Other than the given names, there are very few records to tie the surviving children back to James [we are dealing mostly with burned out counties in Colonial Virginia]. Very few on-line Trees are tied back to James. However, we have found families with the same given names as the children.  Isham BROWN is an example – same given name, but no records to link him to James, just a first name. But there are 2 descendants of Isham who are in Group 40 of the BROWN Y-DNA Project who claim Isham as their Most Distant Known Ancestor. Eake BROWN is a fairly unusual given name, and we are finding some records and descendants for him – looking for a living DNA Match… In his Will James named son George and George’s sons George and Archibald – two men in BROWN Group 40 claim a George and an Archibald (independently) as their Most Distant BROWN Ancestors… Theoretically we should be sharing about 2% of our cousins at the 6C level.  Yes, it’s a stretch, but it’s doable.  With virtually no good records, it might be the best avenue we have for linking these lines.

If enough folks try this process, we might get enough data to build probability curves and averages for the percent of shared cousins at different cousinship levels – a parallel to the Shared cM Project.

BOTTOM LINES

1. If you and a DNA Match can share your lists of Matches from a potential Common Ancestor, percent of Match overlap may indicate the cousinship level.

2. This takes work and time – I used it as a last resort, when my Ancestor left no records of children.

3. This is best done at AncestryDNA, with ThruLines, and therefor limited to 6C relationships, or closer.

CODICIL: In my excitement here I have presumed [06F36] is a BROWN Cluster or Segment. Not necessarily! I have concluded that [06F36] goes back to the Wilson BROWN couple – that [06F36] segment could have come from or through either Wilson OR his wife. It’s a 50/50 probability either way. I must do a lot of other analysis to figure that out.

[23_98Mb] Segment-ology: Percent of Shared Cousins Indicates Relationship by Jim Bartlett 20230315

Testing a Guess with Teamwork!

Featured

This is a developing story about a Brick Wall I’ve had for 48 years. My mother was a BAKER, and I know her ancestry back to the “Gunsmith” BAKERs in Pennsylvania in the late 1600s. My mother’s brother did Y-DNA to prove this line. One ancestor in the line was Elliott BAKER c1775-1836 who married Keziah BROWN in 1801 Prince Edward Co, VA. Keziah named her father as Wilson BROWN, Dec’d on the marriage license. In the 1850 census she stated she was born in adjacent Buckingham Co, VA. Sure enough in the 1782 to 1792 Buckingham Personal Property Tax Lists, there was a Wilson BROWN. In the 1793 PPTL, Wilson BROWN Estate was listed – Wilson had died. Skipping  over a lot of research and steps, I know: no one else has Wilson BROWN in their Tree(except my line); there are several different BROWN Trees in Buckingham Co, VA; adjacent to Wilson in most PPTL listings was an Isham BROWN (some DNA Matches, indicating he was probably a sibling, but no genealogy help); Wilson BROWN left no Will (or any documents naming wife or children). No known male-line descendants for Y-DNA. Dead end – Brick Wall.

In January 2023 a small group of us, decided to dig in.

1. We found a hitherto unknown 1776 Will of a James BROWN in nearby Cumberland Co, VA [credit Kevin Baker]. Note: There is a point that Cumberland, Buckingham and Prince Edward counties all touch – this is now the focus location. James’s will named 16 children, including Isham and Wilson. BINGO! I could find no one with this James in their Tree, despite several of his children with given names that recur in BROWN Trees in VA, NC, TN, KY, etc. But I did find a lot of BROWN Trees that had other, undocumented Ancestors about this generation – hmmm.

2. Another important find was linking Isham BROWN to the BROWN Surname Y-DNA Project – Group 40! Some BROWN Y-DNA experts [credit Bill Davidson] helped us rule BROWN lines in and out of consideration . This included several BROWN lines in Buckingham and nearby counties.

3. Two of the 16 men in Group 40 list Isham BROWN, born 1749, as their Most Distant BROWN Ancestor. They were sure of Isham, but could not determine his father – hmmm – the recently uncovered James? I can almost guarantee that if the 1776 Will of James BROWN had been easily found, many would have latched onto his son Isham [please excuse my cynicism].

4. If Isham was a brother of Wilson, then Wilson, and his male-line descendants (none of them known at this point), would also be Group 40.

5. Within the BROWN Group 40 were several lines back to the 1700s, but brick walled – and most of them were in this general area of Virginia.  My experience with Y-DNA Projects (I’m an Admin for 3 of them) is that American Y-DNA testers who form a family group, almost always have a Patriarch in America. In other words, it’s my expectation that there is a Patriarch of Group 40, probably in Virginia [I suspect James].

6. At least two of the 16 men in Group 40 list Thomas BROWN, born 1773, as their Most Distant BROWN Ancestor. Ancestry lists over 2,000 Trees for this Thomas BROWN (and his wife Nancy LITTON). Most have additional generations back, but with very sketchy documentation – pre-Revolution War records are hard to find in these counties. Communications with several Tree owners (not necessarily DNA Matches) revealed that they were unsure of Thomas’s father…

7. As it turns out, I have already found over 30 DNA Matches to this Thomas BROWN – ranging from 8 to 26cM – and I’m not even halfway through the list of possibilities. These DNA Matches span 7 of the 10 known children of Thomas – a good indication that Thomas is a relative of mine.

8. Looking back at the list of 16 children of James BROWN, and putting all the clues together, I estimate Wilson was born c1752 (3 years after Isham); and he probably had 9 to 10 children before he died (probably unexpectedly, without a Will) in 1792. Their birth years would be roughly c1773 to c1791.

Putting all of this together, my educated *guess* is that Thomas 1773 was a son of Wilson 1752; or a nephew. As a son, DNA Matches from him (and Wilson), would be 5th cousins (5C) to me. As a nephew, we’d need to go back another generation (to James) and the DNA Matches would be 6C to me.

How to figure this out? Use Teamwork to Test a Guess!

IF the relationship is “Thomas is son of Wilson,” then my DNA Matches to descendants of Thomas would also descend from Wilson and be about 5C to me. Going the other way, those DNA Matches should also have nominal 5C Matches to other descendants of Wilson, like my ancestor Keziah who married Elliott BAKER and had 8 known, surviving children with descendants who have DNA tested.

So I’ve asked such a DNA Match to please go to their AncestryDNA Match list and search for the surname BAKER, and see if some going back to Elliott BAKER (or any BAKER in Prince Edward Co, VA – there were several generations of this line there) show up as Matches.

An alternative is for me to list my 30 DNA Matches under Thomas. We expect to Match about 10% of our true 5C. So I’d expect any DNA Match from Thomas to also match about 3 of the same Matches. I have. A different DNA Match through Thomas should also match 3 of my Matches, but probably a different 3.

No need to go through the process of “Target Testing” when we already have a lot of known testers…

Testing a Guess With Teamwork!

This is a concept I’m working on – teamwork. I know it’s hard to get Matches to respond, so I’m hoping that a very clear, short objective, coupled with a relatively easy test process, would encourage a cousin to get involved. Particularly if the result would indicate new, more probable, Ancestor for them.

BOTTOM LINE – Form an educated *guess* and think of it as a given relationship. Then get widely spread DNA Matches from that added branch to look for DNA Matches in your branch.  Using daughter’s married Surnames makes this test even more precise. If you can get several to do this, and find their Matches in your branch – this would be very powerful evidence of a genealogy link. It seems to me that this is a particularly good process for common surnames, like BROWN. If you could also find Matches with DNA segments, you’d probably have a few Triangulated Groups – but that’s another story;>j

Wilson BROWN is my Ahnentafel 98 Ancestor. I plan to update this Brick Wall story as it develops.  Think about trying to get 2,000 folks to change their Trees…

[23-98Ma] Segment-ology: Testing a Guess With Teamwork! by Jim Bartlett 20230310

Distant Common Ancestor Couples

Featured

A Segment-ology TIDBIT

There has been some recent discussion about how far back autosomal DNA is useful. Some indicate the “practical” limit to be around 2xGreat grandparents (the 3rd cousin (3C) level). I put the word practical in quotes because I don’t believe the 3C level was intended to be a rock-solid/absolute limit. I think it was intended as recommendation for many genealogists – perhaps for most genealogists just getting started.

I’ve been a genealogist for almost 50 years. I have long since researched most of the available paper records. In the late 1970s, I worked for the Smithsonian, and spent my lunch hours scrolling microfilms at the nearby National Archives; or a weekly drive over to the DAR library to roam through their stacks. I look at DNA as a new tool to add more evidence to my existing Tree and extend it even farther.

I use segment Triangulation to group Matches and to build a Chromosome Map, which also informs about the contribution to my DNA from each generation of my Ancestors. But I also value what’s called genealogy Triangulation. This is when at least three of us (me and two widely separated cousins) agree on the same Common Ancestor (CA). For almost all of my known Ancestors, I have genealogy Triangulation well beyond two other Matches.

To document, and learn from, these CAs, I developed a CA Spreadsheet. See my Common Ancestor Spreadsheet blogpost for a description, a sample and a table of the columns. My CA Spreadsheet includes thousands of DNA Matches and their CA with me. For each Ancestor Couple, this spreadsheet documents way more than two Matches for genealogy Triangulation. It usually has many DNA Match cousins, all in a large genealogy “Triangulation” for each Ancestor Couple.

This spreadsheet also includes the Shared cM amount for each DNA Match. So, it is now easy to sum up the number of DNA Matches and the average cM amount for each generation:

The takeaways here include:

1. I have many DNA Matches in 5C, 6C, 7C relationships.  These stats are from my three grandparent’s Ancestors, almost all of them from Colonial Virginia – my maternal grandmother was a recent immigrant, and I get very few Matches on her line. [My parents were not related per GEDcom.]

2. At this distance, the cM relationships trend downward, as expected.

3. The averages are below 20cM.

4. For genealogy beyond the 4C level, I agree with the general concept to ignore the segment size. This is an analysis of genealogy agreement (Triangulation), that happens to be among DNA Matches. I am not claiming that the DNA segments, individually, “prove” each relationship. However, on average, some of the segments will be Identical By Descent, and when included in such large genealogy Triangulations, they increase the confidence that the genealogy is right.

5. Where known (or imputed), I also track the Clusters and Triangulated Groups (TGs) in this spreadsheet. There is usually only one or two Clusters indicated for each Ancestral Couple; and there are usually multiple TGs for each Ancestral Couple.

6. Disclaimer: Is my CA Spreadsheet 100% accurate? NO! Is it over 90% accurate? YES, IMO! Certainly the “story” in the table above is valid.

My MAIN OBSERVATION is that atDNA “works” beyond 4C.

How far back you want to take your genealogy is a very personal decision. You get to set your objectives. This post is to let you know the ability of atDNA to help you with Ancestors back at least to the 7C level.

[22BN] Segment-ology: Distant Common Ancestor Couples TIDBIT by Jim Bartlett 20230307

Small Segments Needed for Distant Ancestors

Featured

Segment-ology TIDBIT

To find bio-Parents we usually use Matches in the 90-300cM range. Grouping Shared Matches in this range usually gives us four Clusters – one for each grandparent. For a bio-Grandparent, we need to lower the threshold some to include third cousins (3C), and select the Clusters that would include the bio-Ancestor we want to find (in other words select-out the known Clusters). See my 2022 blog post: Finding Bio-Ancestors here.  That blog post includes a handy Crib Sheet to orient this kind of project. The Crib Sheet indicates the estimated number of Triangulate Groups (TGs) involved, but Shared Match Clusters also work. Since this is largely a genealogy project, using Shared Match groups (Clusters) at AncestryDNA is usually the best place to work on these projects.

For more distant bio-Ancestors, smaller cM Matches are needed. Try various cM thresholds, down to a 20cM threshold, and select the Clusters that point to your Target bio-Ancestor.

For even more distant bio-Ancestors, I subscribe to the concept of ignoring the cMs, and just focus on the genealogy. AncestryDNA only shows Shared Matches down to 20cM, but Clustering can be done at the other companies, down to small cMs. Grouping by segment Triangulation can also be done, and then selecting the TGs that point to the bio-Ancestor.

If you get a hint of a surname, or a specific geography, you can search your Matches at AncestryDNA, to find under-20cM Matches that may have Shared Matches – indicating the Cluster they would be in.

For 4x to 7xGreat grandparents, many of the Match cMs will be under 20cM. Remember this is a genealogy project – the cMs don’t matter.

In all projects looking for (or even just confirming) a bio-Ancestor – let our Matches identify you Ancestor, by determining their Common Ancestor – see the link in the first paragraph.

The main point of this blog post is that Matches with small segments are needed to work on distant Ancestors.

[22BM] Segment-ology: Small Segments Needed for Distant Ancestors TIDBIT by Jim Bartlett 20230306

Triangulated Group Analysis

Featured

Segment-ology TIDBIT

Let’s analyze a generic Triangulated Group (TG).  There are several facets to this analysis…

Facets related to me:

1. My DNA segment – A Triangulated Group (TG) “segment” is a specific segment of my DNA. It is defined by a Chromosome number, start and end positions, and the total Mbp. The number of SNPs included and the cMs can be obtained through look-up tables on the internet (I have not done that).

2. TG Ancestor – A TG segment of my DNA first came from a specific Ancestor of mine. I’ll call this the TG Ancestor. This TG segment was passed down through descendants of the TG Ancestor, to one of my parents and then to me. All of my Ancestors who descended from the TG Ancestor also had that TG segment in their DNA. NB: the TG Ancestor started with a full Chromosome (a big segment) which he/she passed down – the TG segment was part of that larger segment/Chromosome. This original larger segment was then whittled down through the generations, but each generation had, at least, the full TG segment. NB: A segment may be passed down, one or more generations, intact (i.e. no whittling down), but the TG segment is always intact from the TG Ancestor down to you.

3. TG segment origin – The TG Ancestor received the TG segment (usually a larger segment) through a recombination process. His/her parent recombined DNA from their two parents to create a new chromosome to pass to the TG Ancestor. At this point, in our TG Ancestor, our TG segment is made up of parts from the TG Ancestor’s parent’s two chromosomes – one from each of the TG Ancestor’s grandparents. Thus, this whole TG segment did not exist, on one chromosome, in one person, before this time. The TG Ancestor is the first person to have this particular TG segment.

4. Logic – Matches who share this full TG segment, should also share this Common TG Ancestor – because this TG segment is unique to this TG Ancestor. [It can be argued through logic, that there is a possibility of this exact same segment being created in another person – granted. But the odds are extremely low, and even more distant when you consider this happening in the small subset of your DNA Matches in a TG]

Facets related to Matches and *shared* segments:

5. Cousin segments – In general, our cousins will get somewhat different segments than we do from our Common Ancestors. Apply #2 above to a Match. Our Ancestor passed down a chromosome to their children – some of it identical, some different. The DNA segments passed down through their children and their descendants to our Matches will be randomly different. What we see through a DNA test, is the overlap created by shared DNA segments – the part of our DNA from a Common Ancestor that overlaps. We might get Chr 06: 53-86Mbp and the Match may get Chr 06: 64-97Mpb – the “shared segment” is the overlap: Chr 06: 64-86Mbp.  Our segments from Ancestors are rarely the same as our Matches’ segments from the same Ancestors, but

5. Cousin segments – In general, our cousins will get somewhat different segments than we do from our Common Ancestors. Apply #2 above to a Match. Our Ancestor passed down a chromosome to their children – some of it identical, some different. The DNA segments passed down through their children and their descendants to our Matches will be randomly different. What we see through a DNA test, is the overlap created by shared DNA segments – the part of our DNA from a Common Ancestor that overlaps. We might get Chr 06: 53-86Mbp and the Match may get Chr 06: 64-97Mpb – the “shared segment” is the overlap: Chr 06: 64-86Mbp.  Our segments from Ancestors are rarely the same as our Matches’ segments from the same Ancestors, but the TG segment is always intact from the TG Ancestor down to your Match. Always be mindful of the difference between your own DNA segments, and a “shared segment” with a Match.

6. Cousins on the TG Ancestor – these Matches may share roughly the same amount of DNA as the full TG segment, but some may well share smaller segments. It all depends on the recombinations that occurred in the generations between the TG Ancestor and the Match. Matches in a TG are already analyzed to share at least part of the TG segment with you and with other Matches.

7. Closer Cousins – these Matches also may, or may not share the full segment. Actually close cousins may share somewhat larger segments with us – beyond the scope of the TG segment. This indicates these closer cousins share more than one TG with us and a closer Common Ancestor. However, this closer Common Ancestor would have to be a descendant of the TG Ancestor. Maybe the closer Common Ancestor would be a grandparent or Great grandparent.  

8. Distant Cousins – Other Matches in the TG group may share smaller segments and be related through a more distant Ancestor of the TG Ancestor. Refer to #3 above. We could have a Match cousin related through a parent or grandparent, etc. of the TG Ancestor. In this case the Match would have only received the smaller DNA segment that was part of the full TG segment in the TG Ancestor. It is probable that Matches sharing small segments (in my case down to my 7cM threshold), could be cousins way beyond my genealogy horizon. This is particularly true with pile-up regions within a TG. The whole TG segment may come from a TG Ancestor well within my genealogy horizon, but the pile-up Matches are much more distant (or potentially false segments – a different story).

Summary – the Matches in a TG group can be cousins from many different generations, but all on the same ancestral line. The best estimated guess of relatedness to a TG segment would be a look-up of the cMs and then refer to the Shared cM Project. Generally on Chr X, the relationships may be further back.

Remember a TG segment represents your DNA – only your DNA – your DNA Matches will have a different TG.

[22BL] Segment-ology: Triangulated Group Analysis TIDBIT by Jim Bartlett 20230110

Review The Comments

Featured

A Segment-ology TIDBIT

I’d like to encourage all “Segmentologists” to periodically review the comments to my blog posts. I try to respond to every one of them and often go into more detail and/or provide suggestions for specific issues. If I may say so, there are often some more gems in the comments – including feedback from followers of this blog. I recently got a comment and elaborated on a post from over 7 years ago…

[22BK] Segment-ology: Review the Comments TIDBIT by Jim Bartlett 20221213

Sibling Crossovers

Featured

The question came up about siblings sharing the same crossover points. The answer is yes – some of them will be the same. Let’s look at this generation by generation. [There is often good discussion in the Comments to these blog posts – we are all learning on this journey. As a result of a recent comment, I decided to do a blog post about this topic]

The set-up:

1. One genome – let’s use our Mother’s side – 23 Chromosomes

2. Assume the average of 34 crossovers per generation.

3. A crossover is the point where DNA changes from one grandparent to the other grandparent, when the mother recombines her two chromosomes into a new one to pass on to a child.

4. Crossover points are random.

Mother’s DNA already has crossovers created by many of her Ancestors. She will recombine the DNA from her two parents at 34 places over the 23 chromosomes, and pass these new chromosomes to a child. Note: this means usually 0, 1, 2 or 3 crossovers per chromosome (on average 1 per 100cM). Since these crossovers are randomly formed for each egg, it would be rare for any of her children to have the same crossover from her.

The 34 new crossovers created 34+23=57 segments. These 57 segments “cover” all 23 chromosomes, from beginning to end of each one. These 57 segments are from Mother’s parents – our grandparents. All the crossover points from recombination events in prior generations are fixed (static) in the two grandparent’s DNA.

Example: Mother’s paternal DNA on Chr03 – from 47Mbp to 123Mbp has a crossover point at 68Mbp. Each of Mother’s children who got her paternal DNA that included the point at 68Mbp would include that crossover point. Mother could pass a paternal Chr03 segment 47-83Mbp to one child and paternal Chr03 59-119Mbp to another child – both of these children would have the same crossover at 68Mbp.

Note: The 68Mbp crossover could have occurred at the great-grandparent generation, OR at some previous generation.

This is a good example of why Chromosome Mapping *by generation* is important. In general Segment Triangulation results in Triangulated Groups (TGs) from different generations of ancestors. The TGs are not all from 4xG grandparents, or any other specific generation. However, if you have the Common Ancestor (CA) for your TGs, you can easily build a Chromosome Map for different generations. In my case I have 372 TGs – I know the CA side and grandparent for almost all of them – they roughly “fit” into about 114 groups (representing my 4 grandparents on both sides) on my 45 chromosomes.

Bottom line: Siblings won’t (generally) get the same crossover points from their parents, but likely will share some crossover points from grandparents and more distant Ancestors.

[05F] Segment-ology: Sibling Crossovers by Jim Bartlett 3 Dec 2022

WTCB SITREP NOV 2022

Featured

I wanted to share one of the huge benefits of WTCB. I’ve pretty much completed the WTCB down to 20cM Matches and have added in a number of under-20cM Matches for which I had segment data (from GEDmatch, primarily, and some who had tested at the other companies). These under-20 Matches can be Clustered by looking at their over-20 Shared Matches for a consensus.

There are positives and negatives to WTCB. Overall, a large percentage of the over-20 Matches fit into very solid Clusters. But, just like a distribution curve, some of the Matches do not have many Shared Matches (a few have 0), and some just don’t seem to form a good, solid consensus. If you know me, I focus on what I *can* do – so I want to give you an example of a successful Cluster. And I want to note that this is not the best example, but it is a good one.

Here is a picture of Clusters 54 to 79 in a Super Cluster. The 281 Matches in the Super Cluster range from 20cM to 56cM (the upper threshold was 60cM for this run).

In my review of most of the Clusters and SuperClusters, I’ve found that the individual Clusters look prettier and more solid, but they do not represent a split in ancestral lines within my genealogical time frame (roughly 9 generations back; 8C level). So I have combined most of them into Cluster 54 – a total of 281 Matches.

In this Cluster I now have 3 Matches with an MRCA of A0020 (MITCHELL/UNDERWOOD couple); 12 Matches with an MRCA of A0084 (UNDERWOOD/CANNADAY) and 27 Matches with an MRCA of A0170 (CANNADAY/HILL). I also have 4 Matches who have MRCAs on different lines. The Cluster is very solid, so I suspect these 4 Matches are probably *also* related to me somewhere on my MITCHELL to UNDERWOOD to CANNADAY line. But clearly the 42 Matches on one line show a consensus!

Also within Cluster 54, I have 9 AncestryDNA Matches with segment data – they are all in Triangulated Group [17D25] – another pretty clear consensus. In DNA Painter, I could paint all 281 Matches on Chr 17, from 24 to 45Mbp. Note: In my TG spreadsheet I have over 150 Matches in TG [17D25] – 9 of them from Ancestry Matches and the rest from the other companies.  

I have Ancestors in my Tree beyond A0170 (CANNADY/HILL) which are fairly well known and also in many other Trees, and I’ve found Matches with those more distant MRCAs in other Clusters, but not in this Cluster 54. I’m coming to the conclusion that the 21Mbps in [17D25] probably came to me from either William CANNADAY 1730-1801 (A170) OR his wife, Nancy HILL 1733-1801 (A171).

But the best is yet to come. This Cluster 54 is a classic *pointer*.  I am now pretty sure that the rest of the Matches in this Cluster will have an MRCA with me on the same line. In fact, I’ve only recently found several of the MRCA Matches by building Trees back and/or looking at Unlinked Trees. Here is an example:  

In Cluster 54, I had a 36cM Match with an Unlinked Public Tree with 6 people in it. I opened it up to find only one real lead – Audrey (so I searched Ancestry for her):

BINGO! Note Audrey’s mother is a CANNADAY!! The rest was easy – I quickly found the Match’s link to A0170 (CANNADAY/HILL).

Note: I’ve had others that were just as easy; and some that took more searching and digging; and some that I threw in the towel and moved on.

The bottom line is that the WTCB tool can be very valuable in many cases. And when it works, I’ve got a Cluster which is a great MRCA-focused tool; I’m compiling consensus data for the Cluster (firming the TG and Chromosome Map), adding to the Ancestry Match Notes and helping ThruLines find more MRCAs in Private Trees.

[19Nc] Segment-ology: WTCB SITREP Nov 2022 by Jim Bartlett 20221112

WTCB Issue – Alt MRCA

Featured

I have a number of cases where the Match has an MRCA, but Clusters with a different group of Matches who clearly have a different MRCA and/or TG.  Example: A Match who has an alternate MRCA which doesn’t align with a TG. I discount the MRCA because the shared DNA segment with the Match could not have come from that MRCA. Some have a paternal TG and a maternal MRCA; some are clearly from different grandparents on the same side. I now have found two examples of such a Match who Clusters with other Matches who share the MRCA but not the TG. It is not unusual for a Match to have more than one MRCA from Colonial Virginia, but usually one is closer and the closer MRCA has a much higher probability of being the one who passed down the shared DNA segment. But “higher probability” does not mean always.  

My latest example is a Match with TG [01Y36] 14.0cM (on my mother’s side], but over half of the 23 Shared Matches have TG [17D25] (on my father’s side). [17D25] is a pretty well-established TG for me with an MRCA of A0170P (my CANNADAY/HILL ancestral couple at the 6C level). So I checked my Notes and found the Match has a ThruLine to CANNADAY/HILL. That explains why the Match Clusters with other Matches with MRCAs of A0170P.

Bottom line: Although my main objective is deep Chromosome Mapping, the ultimate goal is to get the genealogy right. In this case I want to also figure out the [01Y36] MRCA, so I must remove this Match from the A0170P-[17D25] Cluster. I also have to remind myself to follow the data – the data is talking to me, I need to listen…

[19Nb] Segment-ology: WTCB Issue Alt MRCA by Jim Bartlett 20220926

Segment Data for Ancestry Matches 2

Featured

A Segment-ology TIDBIT

My first post with this title (here) listed 4 ways to get Segment data for AncestryDNA Matches; and then added another way using GEDmatch.

Here is yet another way.

In Walking The Clusters Back (see WTCB 2022 and WTCB SITREP) I’ve now completed the analysis of all Matches over 20cM – almost all fit into one of several hundred Clusters. I’m now integrating the below 20cM Matches who have segment data (a TG ID), usually from GEDmatch – over 800 of them. Most of them have Shared Matches which usually provides a consensus on the Cluster.  In checking my Master segment spreadsheet (with all of my Match Shared Segments), I noticed a number of AncestryDNA kits which didn’t yet have a link to an Ancestry profile. It turns out that usually all the Matches with the same TG ID will be in one Cluster. It is a relatively easy task to find that Cluster (particularly with WTCB) with some of the Matches – and then review the few other TG Matches with the other Matches in that Cluster. I usually send a message to the Ancestry Match to confirm they indeed uploaded to GEDmatch (and promise to help them with the DNA if they are).

Result: more AncestryDNA Matches linked to specific DNA segments (TGs).

[22BJ] Segment-ology: Segment Data for Ancestry Matches 2 TIDBIT by Jim Bartlett 20220223

Walking the Clusters Back (WTCB) 2022

Featured

An Advanced Segment-ology Topic

Introduction.

This will be a longer and more detailed post than usual. The process I’ll outline takes a lot of precise and detailed work. And preparation work. You have to decide if it’s worth it for your objectives.

I’ve tried several blog posts about Walking The Clusters Back. In my opinion, they all failed. I was trying to find a sweet spot that would give us groups of Matches at each generation. That generally works at the grandparent level (the Leeds Method works in most cases to provide 4 columns for 4 grandparents), but the Clusters quickly get jumbled up as the cM Threshold decreases. I should have known better. Each Cluster still tends to focus on an Ancestor, but the different Clusters have Ancestors of different generations. The Clusters sort of mirror the Shared cM Project – as the cM value decreases, the Shared DNA Segments come from a wider and wider range of generations. The overlap of possible relationships grows. The Shared cM Segment pattern gets more and more jumbled – just like the Clusters.

So if we can’t use brute force on the data, lets go with the flow, and develop a process that tracks Clusters – by tracking the Matches in them. As the cM Threshold is decreased, the number of Matches being Clustered increases. This results, generally, in more Clusters with more Matches in them.

Overview

Overall WTCB Process:

1. Run a Cluster report with a high cM Threshold (say 80 or 90cM) to get at least four Clusters that you can identify a consensus Most Recent Common Ancestor (MRCA) in each Cluster.

2. From information in the Match Notes, determine the consensus Root Ancestors (RA) in each Cluster. The RAs start with your parent, and include your Ancestors out to, but not including, the consensus MRCA for each Cluster. These RAs should “fit” all the known Matches in the Cluster.

3. Impute (copy) these RAs to all the other Matches in the Cluster.

4. Repeat for all Clusters

5. Run a new Cluster report with a lower Threshold.

6. Combine the Matches from the previous and the new Cluster reports into one spreadsheet.

7. Sort on Match names.

8. Merge the duplicate Matches into one Match (much more on this step later)

9. Return to step 2, and continue…

10. Gradually reduce the upper cM Threshold, to cull out the closest Matches – this fine tunes the MRCA of the Cluster.

As part of this overview, I must provide a warning:  there is a lot of homework required before you can start this process – see the Homework section below. The Cluster runs include the Notes for each Match. These Notes should include  “known” Match MRCAs and cousinships [including multiple MRCAs], and any TG IDs, that  you have determined. This is the information you need to populate your Master WTCB Spreadsheet. This is your source for RAs.

In the paragraphs to follow, I’ll offer a spreadsheet template, and specific steps to accomplish the steps in each Cluster run. It’s a repetitive process that I have tweaked to make it as standard and efficient as I can. The number of Matches about doubles with each Cluster run – so the work gets harder. I’ve also incorporated my short cuts and tips into the steps…

I started with an 80cM Threshold and found 8 Clusters – I was confident I knew the consensus MRCA for each Cluster. More importantly, I knew the Root Ancestors for each Cluster (the RA being the parent and grandparent and sometimes more) back to, but not including, the MRCA). As I lowered the cM Threshold (usually by 10cM at first) and ran a new Cluster run, I found the number of Matches about doubled and the number of Clusters increased. The increases were not in a predictable way, but the Clusters grew in size (more Matches) and slowly, but surely, pushed the RAs out to more distant MRCAs.  

I’m now confident this process works. By that I mean for each new WTCB Cluster, we get some RAs which point to the MRCA of the Cluster; and this MRCA is very close to the MRCA we’ll find with each of our Matches in the Cluster. A strong, helpful, clue…

Homework

Some *essential* homework is required before you try this:

ANCESTRYDNA TREE & MATCH NOTES:

1. Test at Ancestry and build your Tree out (as much as you can to 7xG grandparents where possible [you only need Ancestors (use standard names); birth/death dates/places]. AncestryDNA needs this information for ThruLines to work. See some of my posts on ThruLines starting here.

2. Link yourself to your Tree – this let’s AncestryDNA do it’s magic with ThruLines and other hints.

3. Find as many MRCAs as you can – some are close low hanging fruit; many will be via ThruLines (which will find MRCAs in Private, but searchable, Trees); some you’ll find in Unlinked Trees (which ThruLines does not review).

4. Add what you find to the Notes of your Matches – see my blogpost here.

5. Review: It Is Iterative here – the goal is to get info into the Notes of your Matches.

6. It is very important that you have information in the Notes of as many Matches over 20cM as possible.

SET UP DNAGEDcom CLIENT:

1. Subscribe to DNAGEDcom Client (DGC) (you can subscribe for one month to try it out). See links in this blogpost.

2. Click on the DCG Icon and Log in.

3. Set up your folder (you’ll access this folder regularly in the WTCB process)

IMPORTANTDo not go beyond this point until you have completed the ANCESTRY TREE & MATCH NOTES Homework – we need the data in the Match Notes before we gather it in the next step!

4. Gather Matches and ICW from 20cM to 400cM (ignore Trees and Ethnicity for now – they are not needed for this WTCB process, and they can always be gathered later). This gathering process may take a day or more (depending on the number of Matches you have). I think the % Complete indicator is based on gathering all of your Match, so it may be misleading, and the gathering process will finish somewhat sooner.

5. This process will store several files on your computer:

        a. m_yourname CSV file of your Matches with lots of information about each one, including your Notes, URLs to the Match and their Tree, Shared cMs, etc. This file is a gold mine by itself – I highly recommend you save a Working Copy of this file in Excel – it’s very useful.

        b. icw_yourname CSV file – this is a large file used by the Clustering program

        c. DNAGedcom Data Base File – where all the data is stored

6. The Clustering reports are run separately. Each run takes about a minute (not a typo), and produces 3 reports:

        a. clm3d_yourname_[date,time,threshold string]_clusters CSV file – a list

        b.  clm3d_yourname_[date,time,threshold string] Excel file – includes a TAB you’ll use. [I make a copy of this file – appending the word “Working” – to use in WTCB.

        c. clm3d_yourname_ date,time,threshold string]  HTML doc – the colorful display.

SET UP YOUR MASTER WTCB SPREADSHEET

The last part of our homework assignment is to set up a Master WTCB spreadsheet template.

There are 3 features about this spreadsheet template:

1. It is a tool, to incrementally follow and interpret the data.

2. It is the culmination of many variations I have tried. It is fairly easy to set up, and it offers a lot of flexibility.

2. A standard spreadsheet will help me explain the various steps later in this post. Of course, you are free to use any format you want. In fact, I encourage feedback on improvements to this Master spreadsheet, or the whole WTCB process.

Here is a sample of my Master WTCB Spreadsheet with some data:

Notes:

1. This is from the initial CL 80cM Cluster run, and there are columns to the right for the 70cM Cluster results from the next run.

2. I have Notes for all of these close Matches – they were in the AncestryDNA Match Notes and then captured by the DGC Cluster program.

3. The data in the known columns was from the Notes

4. The data in the ROOT ANCESTORS columns was derived from the known data, and then imputed (copied) to the other Matches in the Cluster.

Teaser:  these 5 Clusters have Root Ancestors from three of the four grandparents (4, 6 and 7) and CL 3 and CL 4 appear to be splitting to Great-grandparents 8 and 9. The Walk has started!

Here is a list of the 10 columns from the DGC Cluster run spreadsheet and where 9 of them go in the Master spreadsheet:

Note: the CL [B] and Super CL [C] columns copy to different columns in the Master spreadsheet, depending on the Cluster cM Threshold.

Here is a list of the 49 columns from the Master spreadsheet – with a brief description of each:

This covers the Homework section. Get ready to Walk The Clusters Back…

WTCB Master Spreadsheet overview:

Let’s divide the process into several stages for each Cluster run – details later:

1. Run a Cluster report at DGC; copy the data to the Master Spreadsheet; do some additional housekeeping chores to get the Master Spreadsheet Ready.

2. Merge duplicate Match rows (not with the initial run, but needed with subsequent runs, after the previous Matches are added to the spreadsheet)

3. Sort the Master Spreadsheet to show the Cluster Groups.

4. Type information into columns L, M and N from the Match Notes (when available).

5. Analyze each cluster, and fill in Root Ancestors (RAs) for all (in columns F-K as needed)

6. Save Master WTCB Spreadsheet for this run.

There are some other “details” I’ll explain as I expand on each of these stages below.

WTCB Master Spreadsheet details:

Here are the details for each stage:

1. Run a Cluster report at DGC; copy the data to the Master Spreadsheet; do some additional housekeeping chores to get the Master Spreadsheet Ready.

        a. At DGC, click on the Autosomal TAB and select the Collins Leeds Method (CLM).

        b. Select the Thresholds (start with 80cM and subtract 10cM for each of the next few runs); leave the upper limit at 400cM, and reduce that in later runs. I uncheck Paint Midline & Include Ancestors. Then click on the Run Grouping bar. It takes about a minute to produce the three files.

        c. Open the file:  clm3d_yourname_[date,time,threshold string] Excel file. I make a copy of this file – appending the word “Working” and save it in Excel format. Open the second TAB labeled Data.

        d. Open the Master Spreadsheet and save it with the cM Threshold number (e.g. 80cM) append to the file name.

        e For the next 4 steps – make sure you copy to the same blank row at the bottom of your Master Spreadsheet, so the columns line up properly with the Matches.

        f. Copy columns B and C to the appropriate Master spreadsheet columns (this would be Q and R for the first 80cM run – it shifts with each subsequent run)

        g Copy columns D, E and F to Master columns A, B and C

        h. Copy columns G , H and I to Master columns AU, AV and AW

        i. Copy column J to Master column P

        j. In the appropriate order column [S for the first run], type a 1 for the first Match, then drag this down to the last Match to create a series. [I sometimes want to recreate the original Cluster order]

        k. Use a new row to create a Header. In column O type: CLUSTER 80cM (or whatever the cM Threshold is for that run). Type 0 in the appropriate CL column and 0 in the appropriate order column. Yellow highlight this row.

        l. Use another new row to create another Header. In column P type: CLUSTER RUN 80cM (or whatever the cM Threshold is for that run). Type 1 in the appropriate CL column and 0 in the order column. Highlight this row in light grey. Copy this row so there is one for each Cluster. Drag the 1 in the CL column down to fill in the series – this provides a numbered header for each Cluster.

2. Merge duplicate Match rows. Note: this step is not used for the first (80cM) Cluster run – there is only one set of Matches, so merging is not required. In subsequent runs, the prior Matches are in the Master Spreadsheet (all those above 80cM, in the second run), and all the Matches from the new Cluster run will be added to the Master Spreadsheet (all those above 70cM). This means the prior Matches (above 80cM) will be duplicated, but in the new run they will only have Cluster data from the new (70cM) run in the CL and CL Super columns. This step will merge the duplicated Matches into one row with prior and new CL and Super CL data (and the order numbers); and delete the other row.  Here we go…

        a. Tip – sort the Spreadsheet by C [cM]. This puts all of the new, smaller cM, Matches at the top.

        b. Highlight the rest of the Match rows and sort by Name and cM and CL [for the current run]. This puts all the duplicate Matches together, with the ones from the new run (with a value in the CL column) on top.

        c. a view of the spreadsheet at this point – showing the duplicate Matches on the left, and the CL 70, Super, and Order data that needs to be copied down one row. Notice also that the Matches below 80cM are still in this sort. This is OK, but be careful dragging the data down. By using the Tip above, these can be sorted out, which makes this merging step a little easier.

        c. Copy (or drag) the 3 cells (CL, CL Super and order number data) down one row and paste it into the same columns (to add it to the duplicate Match who already has some data from previous runs). Then delete the Match which you just copied from. [Tip: an alternative to deleting the rows one-by-one, is to type an x in the order cell of the top Match and later sort the spreadsheet by that column and deleting all the rows with an x.]

        d. Continue to the bottom of the Matches in this Cluster run – a boring task .

3. Sort the Master Spreadsheet to show the Cluster Groups.

        a. Highlight all the rows of the spreadsheet (under the main header) and sort on CL and order – columns Q and S in the first run – it shifts to the right with subsequent Cluster runs.

        b. You should now have a nice looking WTCB Master Spreadsheet with a group of Matches under each grey CLUSTER RUN Header – see the sample above.  You’re ready to start working with the data.

4. Type information into columns L, M and N from the Match Notes (when available).

        a. Work down the spreadsheet, looking at the information in the Notes. For Matches with known MRCAs and/or TGs, type in the MRCA Ahnen in column L; cousinship in Column M and TG ID in column N

        b. This is populating the Master Spreadsheet with data from the Notes – this is why the Notes Homework (before running the DGC gathering program) is so important.

5. Analyze each cluster, and fill in Root Ancestors (RAs) for all (in columns F-K as needed)

        a. This is where the Ahnen system shines (using numbers instead of typing out the Ancestor names); and descendants are always half the father’s Ahnentafel, so we can easily work from the MRCA Ahnen back down to a parent)

        b.  Fill in the Root Ancestor Ahnentafel numbers where an MRCA is known. Note: The “Root Ancestor(s)” are the closest ones to you – NOT including the MRCA Couple. The basic RA is a parent – using Ahnentafels, this would be a 2 or 3 (father or mother). The next RA (at Generation 3) must be a grandparent – a 4, 5, 6, or 7 – 4 and 5 are the parents of 2; 6 and 7 are the parents of 3. The line of descent (and most probably the shared DNA segment) comes from the MRCA to you along this path.

        c. Use judgment to determine the consensus RAs that would apply to all the Matches with known MRCAs. Note: if there is a Match who is clearly inconsistent with the rest, ignore or move that Match row (to a different Cluster or a “time out” area at the bottom of the spreadsheet).

        d. Copy these consensus RAs to all the Matches in the Cluster. The concept here is that each Cluster is formed around an Ancestor, and that all the Matches would have these same RAs. The stronger the Cluster consensus is, the stronger the case for the same RAs. There may be some Match anomalies, but by Walking The Clusters Back, I’ve found that the same RAs are almost always consistent.

        e. A very few Matches in a Cluster may be at odds with the consensus. This may be due to an incorrect MRCA (it happens to me and to ThruLines). It may also be due to the Matches having multiple MRCAs with me, and/or multiple segments. Check the DGC HTML file to see if there are grey cells that link the Match to another Cluster(s). When a Match appears to me to be “better” in a linked Cluster, I move their row to that other Cluster in the spreadsheet and change the CL number to match (I leave the order number in case I want relook at the original Cluster list.)

6. Save Master WTCB Spreadsheet for this run. For each Cluster run, I save the Master Spreadsheet with a new descriptor added to the file name – like 70cM.

This is a repetitive process – go back to #1, and run a new Cluster report – keep going….

Objectives of WTCB

Identify root ancestors for Clusters, and, by inference for all the Matches in them. This provides a pointer when investigating any Clustered Match. It gives direction (names, dates, places) when building a Match’s Tree back; to finding an MRCA with any Match; to researching Brick Walls.

Notes/Observations:

1. Start with a high cM Threshold, say 80cM or 90cM. I have found that reducing the cM threshold by 10cM about doubles the number of Matches in the next run – to a point. The shift from a 50cM threshold to a 40cM threshold added much more than double – so I back tracked and started using a 5cM reduction to get a 45cM run. Similarly when I got to the 30cM range, I then reduced by 4cM, then 3cM, then 2cM (for a final run with a 20cM threshold.

2. A very few Matches turned out to be anomalies – they did not “fit” in the Cluster they were assigned by DGC, based on the MRCA we had. If they had a grey cell link to another Cluster with a good fit, I moved them to that Cluster. If they didn’t appear to fit any grey cell Cluster, I moved them to a “time out” section at the bottom of the spreadsheet, with an X in the CL column. These very few Matches probably had an issue with the MRCA, that I needed to investigate. They were in “time out” so they didn’t “taint” the Cluster analysis – I could look at them later. The Cluster is talking to you – try to understand what the message is.

3. The Clusters *tend* toward a single MRCA, as the upper cM Threshold is decreased.

4. Do not be afraid to move a Match from one Cluster to another. Review alternate “grey” cells in the the HTML Cluster diagram. If a Match has, say, 5 squares in a Cluster, and several grey links to another Cluster (which other Cluster is a much better “fit”), I would not hesitate to move that Match. Usually this will resolve itself in subsequent Cluster runs.

5. Excel Macro – for the task of copying 3 cells from a Match from a new Cluster run and pasting it into that Match from the previous Cluster run, and then deleting the first Match. Here are the steps:

        a. Go to File > Options > Customize the Ribbon > add “Developer” to the Main Tabs

        b. In the spreadsheet, insure “Use Relative References” is ON [highlighted]

        c. Position cursor on the CL cell of the top Match;

        c. Click Record Macro [fill in the popup – the only critical thing is a letter for the Macro]

        a. Highlight the three numbers [in CL, Super CL, and order columns]

        b. Control-C to copy that data

        c. Click on next cell to the right

        d. Type: x [this will let you easily delete all these rows later)

        e. Click on the CL cell in the next row (this should be the same Match from previous run)

        f. Control-V to paste the data into three cells

        g. Click on the CL cell in the next row [to preposition the curser for the next Match]

        h. Click Stop Macro.

        i. Save Spreadsheet with Macro Enabled

        j. Good luck – it took me several tries to get it right. Practice on a spreadsheet copy.

6. Special Note: Some close Matches have multiple MRCAs with me. They may well be related though multiple Clusters. I make duplicate copy of that Match and add it to other Clusters per the gray cells. Once moved I adjust the CL and super columns per their new Clusters. Use judgment, but I think after about two cycles with the multiple copies of close Matches (closer than the Cluster Root Ancestors indicate), they can be eliminated from future Clusters. They have done their job of solidifying the root ancestors in other Matches.

7. I also think the maximum/upper cM Threshold needs to be reduced as the Clusters evolve. We don’t need the higher cM/closer Matches – they have already passed on their Root Ancestors to the Clusters in the Master Spreadsheet. They should be dropped from the Spreadsheet. I put an X in the CL column to remind me they are no longer needed.

8. Some Matches wind up in singleton Clusters – this is silly, a Match doesn’t form a Cluster with itself. And most of the time these Matches show a grey link to another Cluster. I move (Ctrl-X; Ctrl-V) the Match row to the other Cluster and change the CL cell to match that Cluster (so they will sort with that Cluster in the future). I sometimes also move Matches out of very small Clusters when that seems appropriate. Most of the time subsequent Cluster runs resolve these issues.

9, If a Cluster goes through several iterations without any indication of a more distant RAs, there may be an MPE or brick wall involved –a strong potential clue from the data.

Manual WTCB Process

If all of this is overwhelming, you can try a few iterations using manual Clustering. Start with the Leeds Method that results in 4 Clusters, one for each grandparent. So in these 4 Clusters you already have two Root Ancestors for each [2-4, 2-5, 3-6 and 3-7, using Ahnentafels]. Find your Matches who are in the 80 to 90cM range and manually Cluster them. Start by seeing which ones are Shared Matches with the ones in the 4 Clusters – that automatically gives each one the same two Root Ancestors as the Cluster they share {actually the Matches they share). Now, from the information you know about these new Matches, do any have an MRCA at the 2xG grandparent level – this would give you the next Root Ancestor – for that Match, and that Matches shared Matches. Keep dropping the cM Threshold, checking Shared Matches for Cluster affinity, and using the Matches with MRCAs to tease out the next Root Ancestor for each Cluster. This is workable with a small number of Matches, but when you have 500 or 1,000 Matches to work with, you will yearn for automated Clustering…

Tracking RAs

Some results so far:

At 60cM run: 11 Clusters: with generation 5 (G5) RAs:

        Paternal RAs: 8, 8, 9, 9, 10, and 11; Maternal RAs: 12,, 12, 12, 13 and 14/15

            -The last Cluster, 14/15, is my maternal grandmother whose immigrant parents had two brothers married two sisters resulting in few Matches, those are hard to separate until I can get more distant Matches.

            -I’m happy with this spread – it includes Clusters for all 8 of my Great grandparents. The WTCB is working…

            – The 70cM run had 47 Matches in 8 Clusters; 60cM run had 75 Matches in 11 Clusters. Roughly double the number of Matches (and analytical review work) in 3 additional Clusters. My experience is that the doubling of Matches with each 10cM decrease in Threshold continues…

At 50cM run: 128 Matches in 24 Clusters (net, after moving several singleton Matches to Clusters they shared with other Matches).

        Paternal RAs: 8, 8, 17; 9, 9, 9, 9, 9, 18; 10, 10, 11, 22; Maternal RAs: 12, 24, 24, 24, 24; 26, 26, 26, 27, 27 and one 14/15.

            -These are broken apart quite nicely, I think. And the uneven nature of the splits (not cleanly by generation like the 4 grandparents often do); illustrates the folly of trying to find a sweet spot in the Thresholds to result in one specific generation (like we get with grandparents). I should have expected this – beyond the grandparent level the Shared cM Project shows growing overlap of cM values for a growing range of cousinships. So, this WTCB process just lives with that, and tracks the Matches as the Clusters grow in size and split apart – Walking The Clusters Back!.

Abbreviations

Ahnen – abbreviation for Ahnentafel number – a system of numbers to represent our Ancestors [e.g. 2 for father; 13 for mother’s father’s mother] – see also this blogpost.

CL – Cluster, or Cluster Run [usually combined with a number representing the lower cM Threshold]

Czn – Cousinship – how we are related to a Match. Second Cousin is abbreviated 2C; 5th cousin once removed: 5C1R.

DGC – DNAGEDcom Client – an automated Clustering program – runs from your computer.

MRCA – Most Recent Common Ancestor – this is usually a couple that you and a Match have in common. Usually represented by the Ahnentafel of the husband, but we really don’t know which parent (husband or wife in the MRCA couple) the shared DNA came from.

RA – Root Ancestor – the Ancestors you have leading up to the MRCA. This should always include your parent and grandparent (each is a RA). During the WTCB process, the number of RAs will generally increase (adding generations) and increasing the ancestral “focus” for each Cluster.

TG ID – Triangulate Group Identification Code – see this blogpost.

WTCB – Walking The Clusters Back – the process discussed in the post which helps determine the MRCA of most Clusters – sort of a Leeds method on steroids.

Final Thoughts

This WTCB process uses the power of Clustering to link large groups of Matches to specific areas of your Ancestry. As the process develops, the Clusters become more and more precise on the path back to an MRCA. There are only two options for each Cluster going back another generation – going back on the paternal side or the maternal side. Larger Clusters with more distant Matches, tease this information out of the data. The Homework is essential – recording what you know in the Notes; and the work is sometimes tedious; but the end result is very powerful.

I’m confident this process will tell us some Root Ancestors for all of our Matches down to 20cM. Just think what we could do with those clues…

Feedback on this process and suggestions for improvement are welcome.

[19N] Segment-ology: Walking The Clusters Back by Jim Bartlett 20220822

Segment Data for Ancestry Matches

Featured

A Segment-ology TIDBIT

Genetic Genealogy has two main parts: genetic – the Shared DNA Segments; and genealogy – the Most Recent Common Ancestor (MRCA) with a Match. In a perfect world we link a Match and his/her Shared DNA Segment to the MRCA who passed it down to both of us.

Shared DNA Segments can be found for Matches at 23andMe, FamilyTreeDNA, MyHeritage and (by uploading our raw DNA data file) at GEDmatch. Unfortunately, none of those companies have nearly as many good Trees as Ancestry has. So finding MRCAs is hard.

Finding MRCAs is best done at AncestryDNA – many more people have tested there, and more of our Matches have good, in depth, Trees there. Unfortunately, AncestryDNA does not provide the precise Shared DNA Segment data that the other companies do.

The best outcome are Matches with MRCAs and Shared DNA Segments. I’ve run out of patience looking for MRCAs at FTDNA, MyHeritage, 23andMe and GEDmatch. Instead I am now looking for DNA segment data for my thousands of Matches at Ancestry with MRCAs.

This post will cover ways to get Segment data for AncestryDNA Matches – there are several:

1. Click on the Match name to bring up their profile – some have already uploaded to GEDmatch and list their Kit number in their profile.

2. Message the DNA Matches and request, suggest, cajole them to upload their raw DNA data to GEDmatch. I wrote a blogpost, here, about doing this. I’ve messaged many Matches requesting that they upload to GEDmatch. A few have…  The best results occur when I include my email address and promise to report back my findings and to help them with autosomal DNA.

3. Ask the DNA Match if they have tested at one of the other companies, and what is their user name there. Some have…  I’ve tested at all the companies and can usually find them.

4. Try to find the Ancestry user name at GEDmatch or vice versa. It sometimes works.

However, in looking at my GEDmatch One-to-Many list, I see many more Ancestry kits, that I have not yet linked to Ancestry names. Many folks use use very different names.

NB: Large segments (say over 30cM) will usually be about the same cM at AncestryDNA and the testing companies/GEDmatch. However, many segments below 30cM have been “Timbered”, and Ancestry then reports a smaller segment than the other companies report. You can always click on the “segment” line on their Match page and see what the “unweighted” cM value is – this is usually fairly comparable to what you see at GEDmatch. It’s a good idea to check this when there is an apparent discrepancy.

A better way – a Segment-ology TIDBIT

1. At GEDmatch Tier 1, run the One-to-Many list. When I set the limit to 1,000 Matches, the smallest Match shares 22.6cM – a good place to start.  NB: By default, this list is sorted with the Matches with the most shared DNA at the top.

2. Sort the list on the Source column (it has the source of the DNA test data)

3. Scroll down the list to the beginning of the Ancestry kits. NB: these Ancestry Matches are still listed with the largest total cM at the top.

4. Work down this Ancestry list one by one, trying to find the Match at Ancestry. The closest ones at the top of the GEDmatch One-to-Many list are usually the easiest to find near the top of your AncestryDNA list of DNA Matches. Usually the largest Matches (most cM) will have the same total Shared DNA cM at GEDmatch and AncestryDNA – so even if the names are different, it’s often easy to find the right one at AncestryDNA.

5. As you go down the list, the AncestryDNA cM total tends to be smaller than the GEDmatch total, due to the Timber down-weighting. NB: you can always click on a Match’s AncestryDNA cM total to see what the unweighted total would be – it is usually pretty close to the GEDmatch total.

6. By working down both lists (the GEDmatch list and the Ancestry list), I’ve found they are roughly in the same order. And, through a combination of cM amount, user names and email addresses, I’ve been able to find most of the top GEDmatch Matches at Ancestry. If there is some doubt, I’ll look at the Shared Matches at Ancestry to see if any grouping would provide a clue. UPDATE: GEDmatch info puts the Match in a TG – look in that TG for other Ancestry Matches, then search Ancestry for one of those Matches and scroll down their Shared Matches for a likely link (this is generally a somewhat shorter list).

So far I’ve been able to link over 90% of my top GEDmatch kits with my Ancestry Matches. It’s easy to determine the TG at GEDmatch, and I put the TG ID in the Match Notes. Even if I cannot determine an MRCA with the Match at Ancestry, the Notes are invaluable in the Shared Match lists – they clearly form Clusters in most cases.

In just a few hours, I’ve been able to link over 100 Ancestry Matches to TGs. It will get harder as the segments get smaller and more scrolling is necessary at Ancestry to find a “fit”. But this process is worth the work, IMO, as it adds TGs to Matches at Ancestry. It adds evidence about the true ancestral line for each TG.

[22BH] Segment-ology: Segment Data for Ancestry Matches TIDBIT by Jim Bartlett 20220706