About Jim Bartlett

I've been a genealogist since 1974; and started my first Y-DNA surname project in 2002. Autosomal DNA is a powerful tool, and I encourage all genealogists to take a DNA test.

Manual Clustering to Find Ancestors

Below I will outline a process to find a Target Ancestor (TA) – often a Bio-Ancestor, or a brick-wall Ancestor, or maybe to confirm an “iffy” Ancestor.  This is a follow on to Manual Clustering From the Bottom Up. But first, here is a little background.

DNA – We all get exactly 1/2 of our autosomal DNA (atDNA) from each parent. Pretty close to 1/4 from each grandparent ; 1/8 from each Great grandparent; etc. Yes, beyond parents, these fractions are not exact, but for genealogy they are pretty close. The point is that for several generations going back, we get a lot of DNA from each Ancestor – and roughly the same amount from each one in any generation.

Matches – All things being equal, we would get roughly 1/2 of our Matches through each parent; 1/4 through each grandparent; etc. But all things are often not equal:

            1. We tend to get fewer Matches from Ancestors who are recent immigrants (say in the last 4-5 generations). It’s because most test takers are Americans.

            2. We will get fewer Matches from “skinny” families – an Ancestor who had 2 children will have far few descendants (and Matches) than an Ancestor who had 15 children.

            3. Endogamy results in many more Matches than usual (think Jewish, Mennonite, Polynesian, etc.)

Each of these factors will unbalance, or skew, the number of Matches we get for each Ancestor.

For the purposes of this process, I’m going to assume that our Matches are generally spread fairly evenly over our bio-Tree. To the extent this isn’t true in your case, this process may be more complex, or it may not even work.

When we are looking for a TA, the concept is that a good chunk of our DNA came from that Ancestor (depending on the number of generations back), and a good chunk of our Matches will be cousins – either to or through that Ancestor.  Although the Ancestor is not known to us, the TA did have 2 parents, 4 grandparents, etc., and those more distant Ancestors may be well known to our Matches. NB: an immigrant Ancestor may throw us a curve ball here.

The process overview is:

1. Group Matches;

2. Find the Common Ancestor (CA) in each group;

3. Build down from the CA to find links between groups (usually, but not always, a marriage);

4. Build down from those couples;

5. Repeat as necessary (usually down to parents or grandparents of the TA);

6. The end-game may involve date and location issues and further DNA target testing to isolate and identify the TA.

More step-by-step details:

Step 1: Group Matches – this is basically Manual Clustering at AncestryDNA. Start with the Match list from 400cM down. How far down depends on the generation of the bio/brick -wall Target Ancestor (TA). You want to go back 2 more generations. So, if the TA is a one of 4 grandparents (1C level), you’ll want Matches who are 3C level, say a 50cM lower threshold. You want 16 groups. NB: if you can filter out some Matches – say you know one “side” and can identify those Matches, you can cut the groups to 8. And you may be able to quickly identify 4 of those groups to a known grandparent. This would leave you with the 4 groups that represent the 4 grandparents of the TA.

1A. List these Matches in a spreadsheet or write on a piece of paper

1B. Select a Match about 3/4 of the way down the list [avoid starting at the top!]

1C. Open that Match’s Shared Match (SM) list

1D. Put an A next to that Match and each Match on your List who is on the SM list. This forms a Manual Cluster A, which tends to have a Common Ancestor (CA).

1E. Start over at Step 1B, selecting a Match who is not A, and use B.

1F. Repeat as often as necessary, using new letters, until all Matches on your list have at least one letter. NB: The Matches at the top of your list may wind up with multiple letters.

1G. If lower cM Matches have multiple letters, review their SM list – usually one of the letters is a one-of-a-kind and that letter can be deleted. If there is a lot of overlap, between two letters, they can be combined, using one letter.   Use judgment.

Usually this Step can be done in a few hours.

Step 2: Find CA for each Group – this takes some poking around…

2A. Select a Group, and open any available Trees (including Unlinked Trees)

2B. Type/write next to the Match the closest 10-15 surnames

2C. Repeat for as many Matches in the Group as possible

2D. Look/search among the surnames for common surnames

2E. Open the Match Trees and select Ancestor information with the common surnames

2F. Analyze and record the probable Common Ancestor for the Group [if necessary, look at more SMs for the Group Matches for confirmation of the CA]

2G. Repeat for each Group

2H. Note the place/date-range for each CA [these may be a clue to links between Groups]

This Step will take a little longer, depending on whether you want a quick result, or if you want to document the CA with records for the longer term.

If you recognize some Groups as being from known Ancestors in your Tree, they can be set aside. Ideally you want to end up with 4 Groups who would represent the 4 grandparents of the TA.

Step 3. Build down from the CA to find links between Groups – a genealogy exercise…

3A. Use genealogy tools to list the children, spouses, and grandchildren of the CA

3B. Pay particular attention to dates and places

3C. Sometimes a marriage between Groups will pop right up; but sometimes it takes a process of elimination (dates/places help here). It’s possible the bio-parents were not married; or other scenarios. You’ve narrowed the possibilities down a lot, but sometimes, there just isn’t a record of what really happened.

3D. Repeat for other groups

3E. Once you have linked some Groups (by marriage or by place/date or by ethnicity, etc), this helps link the remaining Groups.

If records exist, this Step may follow relatively easily; if not, follow-on DNA testing may be necessary

Bottom line: This Step will provide some family lines that are ancestral to the TA. The top DNA Matches have led you to specific CAs. These may, or may not, mesh with information you already had.

Steps 4 & 5 – see Step 3

Step 6. The end-game – This may involve date and location issues and further DNA target testing to isolate and identify the TA. The best solution is that the TA is obvious. However, sometimes the TA is still buried, but you are somewhat closer.

Sidebar: I do this manually in Excel and Word. It is possible to use one of the auto-Cluster programs to group the Matches. However, I prefer “getting to know” the Matches and their Trees, and this process is fairly straightforward. It also lets me see any overlap between groups. I prefer to manually Cluster for a targeted case. I use the auto-Cluster programs when I’m grouping my entire Match list.

In one recent case I did, the marriage between two Groups popped right up – no secrets. The children included 5 potential sons as the bio-father. All five from PA, went into WWII, 4 came back to PA; and one settled in another state a few blocks from the bio-mother!! We’d never have sorted this out without the process above. And luck is sometimes the key factor.

In another case I’ve worked on for years, I used SMs and records to group many Matches on 8 Great grandparents, and 4 grandparents of the TA. Places and dates all work out, and all the top Matches are in agreement. WATO points to the same place. It now appears the father and mother were not married, and both of them apparently died without out any other issue, or any records.  It’s frustrating to have basically 100% of the Matches all pointing at the same TA, but without revealing the parent’s names. No luck on this one – just a lot of work. Maybe someday a Newspaper article from the 1880s will shed some light. The DNA can only do so much…

SUMMARY – The process above is my current best practice to squeeze out what I can from Matches and Shared Matches at AncestryDNA. This whole process can be done on notebook paper, in a relatively short time, but I still prefer Excel. Note that the process does not depend on knowing any genealogy of the TA, it relies totally on information from Matches and Shared Matches. Hopefully the TA, the last puzzle piece, “fits”.

[19M] Segment-ology: Manual Clustering to Find Ancestors by Jim Bartlett 20220226

Manual Clustering From the Bottom Up

Clustering DNA Matches results in groups that tend to form on one Ancestor. Clustering is a great tool for grouping our Matches. And, if we can figure out the Ancestor for the Cluster, there is a very high probability that the rest of DNA Matches in the Cluster will also have the same Ancestor. In this case the Cluster becomes a “pointer” or a “focus” for investigating the rest of the Matches in that Cluster. This is powerful. In several cases, I’ve been able to use this focus to find a Common Ancestor with a Match who had only one parent in their Tree! I knew the who, what, when and where for my search…  Of course, it’s always easier with closer Match cousins, but I’ve been dogged, and successful, even when I needed to build the Match’s Tree out a number of generations back.

There are Auto-Clustering tools which I covered here. Several of these have been improved since I wrote that post in April 2019. 

However, for relatively straightforward tasks/issues (like finding a bio-Ancestor, or tackling a specific Ancestor), we can also manually Cluster our Matches. The classic example is the Leeds Method: the closest Matches (90-400cM range), will usually form into 4 groups (Clusters) which align with our 4 grandparents (which Cluster is which grandparent is still a genealogy task). This usually works very well because the 90-400cM Matches tend to be 2C and 3C whose MRCAs should align with the grandparent level.

In the following case, I was looking for two Great grandparents. The subject’s maternal side was known (and was from a different continent). His father was known to be his bio-father, but his paternal bio-grandparents were unknown. At 23andMe I share 40cM – and our Y-DNA is the same unusual E-V13. So, I was sure his male line was a BARTLETT. A quick search of his AncestryDNA Matches showed many WV BARTLETT Matches (at least 17 ranging from 30 to 269cM) – and a clear “hot spot” in my fairly extensive BARTLETT Tree. But the “spouse” was not readily apparent, nor were the other surnames that had to populate his father’s ancestry.

I decided to use Manual Clustering. It was easy to “dot” the maternal-side Matches from another continent, leaving only the paternal-side Matches (at AncestryDNA). I decided to list them down to about 50cM – this would include most of the 3C and 4C. Note: 3C Matches would have 2xG grandparent *couples* as the MRCA, which would identify a Great grandparent – I am looking for four Great grandparents, one of whom is probably a BARTLETT. The 4C Matches would potentially take me back another generation – but that’s OK. The surnames I’m looking for must fill the ancestral boxes from 2 grandparents going back.

So, I typed the top paternal-side Matches in Column A of a spreadsheet, and put the cMs in Column B (for reference). All that remained was to pick a Match, put an A in Column C, open the Shared Match list, and add an A in Column C for each Match who was a Shared Match. Then put B in column D for a Match who did not have an A in Column C, open that Shared Match list and add Bs in column D for each Match who was a Shared Match. Continue. I actually blogged about this process (Think Icicles!) in 2018 here, and in 2019 here. But it didn’t work very well. For one thing there was too much overlap.

As I thought about this process again, it struck me that instead of icicles, I should have used a stalactite analogy. Stalactites hanging from a cave ceiling might have given me a clue to the problem. The cave ceiling was more like a very close relative whose DNA was spread over many different lines (different stalactites). I should not have started with the top of the list of Matches. The top of the list are the Matches with multiple segments which can represent multiple stalactites. Those large Matches have an affinity for several Clusters (but can only be placed in one of them). In a Cluster Matrix, they would have a lot of gray cells. Maybe the trick is to start closer to the bottom of the list and work up…  It’s more like working with stalagmites, where there is only one for each source of dripping water.

It actually worked much better!

I started with a 60cM Match, and typed an A (in Column C) for that Match and all the Shared Matches. Then selected the next Match down the list without an A, and typed a B (in column D) for that Match and all the Shared Matches. By the time I got to the bottom of the list, I had 7 groups (A through G) with only a couple of overlaps. They looked like Icicles or Stalacmites… The overlaps were clearly one-time events which could be ignored. I then worked my way back up the list – starting with the 61cM Match. Most of them clearly fell into only one of the 7 groups, while some of the larger Matches began having Shared Matches in two or three groups. This almost always means, that these Matches are 2C or 3C who will span multiple groups (an important clue for those Matches).

The next step was to look at all the available Trees and type the closest, say, 8 surnames in the row for the respective Matches.  I then use a Word document (or scratch paper would also work), to outline Trees for duplicate surnames. I was looking for the Ancestors of a man born c1900 in Harrison Co, WV. The Trees of the 50cM-and-above Matches, tended to be from the same area, and ranged through the 1800s as expected. I outlined families for 8 surnames. Most of them interconnected, and I was able to go back to the groups, and, knowing what I was looking for, I teased out a number of additional Trees that linked. In this process, I also found the intermarriages between the groups. As it turns out this case had an extra degree of difficulty. All of this data and the Trees pointed to a man who never married and a woman who never married. Quite possibly the bio-father never knew…

SUMMARY

Manual Clustering of the top Matches is a relatively simple task. In this case, it involved about 65 Matches, ranging from 50cM to 269cM. KEY: Working from 60cM down – grouping Shared Matches by letters in a spreadsheet, resulted in 7 groups. Then, working from 61cM up, it was pretty easy to add those Matches to the extant groups (a few to multiple groups). It didn’t take long to open the available Trees and note the closest surnames. Duplicate surnames in a group, led to skeleton family outlines for most of the groups. This then provided a “pointer” the relook at, and extend, small Trees and Unlisted Trees, to build out the outlines some more. By that point it was clear which families had married the other families (the closest Matches were a big help here). And so the Ancestry was built. One Quality Control check, is to search for other Matches who have these Ancestor surnames in their Trees – particularly finding the MRCAs with Matches below 50cM. Once you know, or even think you know, the Ancestors, the Matches should also have those ancestral lines.

[19L] Segment-ology: Manual Clustering From the Bottom Up by Jim Bartlett 20220215

Ahnentafel 18P – John ALLEN Family

What a hoot! This is a story of discovery including the confluence of many facets of genetic genealogy. It’s a small piece of my genealogy, but it illustrates how many different tools work together. Here is a list of teasers….

1. A genealogy note from the 1980s

2. A Triangulated Group [12D24]

3. A DNA Match in TG [12D24] looking for her bio-father

4. Another DNA Match in TG [12D24], also searching for our Common Ancestor

5. An ALLEN Ancestor with a “sketchy” Tree.

6. My Ancestry focused on VA/WV, and nothing west of there

Without further ado – here’s the story.

The focus is on TG [12D24]. I have 88 different Matches in this TG, including a half 1C; a 2C; a 2C1R and a 3C1R. These four cousins are all on the same line from my father, to his father, to his mother, to her parents – OR: P,P,M,CA – OR: 2, 4, 9, 18 (using Ahnentafel numbers). A18 represents my Ancestral couple: John NEWLON b 1798 Loudoun Co, VA; d 1872 Harrison Co, WV and Marie ALLEN b 1805 Monongalia Co, VA; d 1882 Harrison Co, WV. Despite a lot of research among the other 84 Matches in TG [12D24], I’ve not been able to find a more distant Common Ancestor.  FYI: John NEWLON’s parents were Thomas NEWLON 1767-1813 and Susan CUMMINGS c1772-c1805. I have found many Common Ancestors on these NEWLON and CUMMINGS lines. Maria ALLEN’s parents were Joseph ALLEN c1775-1848 and Elizabeth [maiden name not proved].  I have not found very many Matches with Common Ancestors on this ALLEN line – clearly ALLEN is a high potential for [12D24].

One of the Matches in [12D24] has been looking for her bio-father for several years. She recently found a good candidate and built a Tree for him. I found a Common Ancestor on her new paternal line – we are 4C1R on my SHIELDS/FINLEY line. Well… Per the paper trail, we are 4C1R on that line, but SHIELDS/FINLEY are on my maternal side, so I could not have gotten the [12D24] (paternal) segment from them. Dead end!

This points out the value of combining the genealogy and DNA – had it not been for the TG [12D24] segment on my paternal side, I might have been content to just accept the SHIELDS/FINLEY Common Ancestor couple, and not continued the search with much earnest. Worse – I could have gone down that rabbit hole, and searched the other 83 grouped Matches for those same lines…

Enter another Match in TG [12D24]… She and I had corresponded years before and tried to discover our Common Ancestor. She contacted me again, and also knowing our adoptee Match had now updated her Tree with new information, noted that she had an ALLEN in her Tree. I quickly checked – her Ancestors were: Garrett G MYERS 1850-1906 married to Lucinda ALLEN b 1856 IA (no parents).

Well, an ALLEN in IA was pretty far from my WV ALLENs… However, it was a potential surname link, so I researched some more. Lucinda’s parents appeared to be Joshua ALLEN b 1816 VA and Eleanor LANE b 1821 OH. OK, Joshua was born in Virginia – maybe this was the link. However, as I researched Joshua, all the Trees had him as son of Joshua, son of Joshua, son of (Joshua) Barnes ALLEN (and Eve SWIGER). I am well aware of Barnes ALLEN and Eve SWIGER as the parents of several children who intermarried with descendants of my BARTLETT line in WV – but they were not my Ancestors. And in calling up my old research notes on the Barnes ALLEN family, I did not show a son Joshua b 1816. This whole connection looked like: “well, all the first names look the same, and were in Virginia, and I don’t see anything else that might fit, so let’s just say that’s it.” My guess is that many of you have seen this kind of “genealogy research” before. And you’ve also seen virtually everyone else copy that same information.

You can call it a hunch, or confirmation bias, or whatever you want, but the fact that I had a TG [12D24] stuck at the 3C level, and probably going back on the ALLEN line, kept tugging at me. [NB: I wasn’t completely blind to the possibility that the link might be on ALLEN’s wife’s side.] I believe in DNA “magnetism” – the DNA tends to draw the puzzle pieces together, but I needed some more information.

So, I dredged up my old ALLEN research file – research that I had done in the 1980s. During the 1930s Depression, a Harrison Co, WV newspaper asked readers to submit Family Group Sheets. Many thousands responded. I remember going to an office in Clarksburg, WV filled with floor to ceiling bookcases with 5-inch binders filled with the Family Group Sheets (today over 30 microfilm rolls). I transcribed as many as I could over a 2-day period. Among my notes from that trip were the John ALLEN family. It was very sketchy – descendants reported some of his children and noted that several had moved away – including a note about son, Joshua ALLEN who “went west to Iowa, never to be heard from again”.  Could this really be the Joshua ALLEN b 1816 VA, father of Lucinda ALLEN b 1856 IA m Garrett MYERS – great grandparents of my DNA Match who found her bio-father? If so, we would be 4C – a very reasonable relationship. And on my father’s side!

I quickly went to my list of DNA Matches at Ancestry and searched for ALLEN in Iowa. BINGO! I could trace at least 4 of them back to Joshua ALLEN b c1816 VA who had at least 8 children born in IA. None of my Matches knew the parents of Joshua ALLEN.

This looks very promising to me. Next steps:

1. Find some DNA Matches who descend from John ALLEN’s other children “who went west”.

2. Message Matches at other companies who are in TG [12D24], and ask if they had any ALLEN ancestors. I did this with two TGs from my HIGGINBOTHAM line and got several dozen positive responses. Many Matches at the other companies don’t have Trees, but I’ve found they are more likely to respond to such a pointed question.

3. Find some of my Matches who have Barnes ALLEN and Eve SWIGER ancestors, and ask them to search their own Matches for ALLEN + Iowa (I’m betting they won’t find Joshua ALLEN b 1816…)

SUMMARY

You won’t have exactly this same confluence of facts. The point is to think about the possibilities with the information you do have and test it out. Make a reasonable assumption (a hypothesis) and try it out. Think about the way to reinforce it, and to refute it.

[23-38P] Segment-ology: Ahnentafel 38P – John ALLEN Family by Jim Bartlett 20220126

The Life of a DNA Segment

As readers of this Segment-ology blog understand very well, DNA segments are passed down from our Ancestors, through a line of descent, to one of our parents, and from that parent to us. Well… you might say: I thought our parents passed down whole chromosomes to us – 23 of them to be exact. That’s correct – and each chromosome is made up of many segments. Even the segments are made up of segments. While a parent passes down 23 chromosomes to us, those 23 very large segments, are actually made up of about 57 segments from our grandparents; and 91 segments from our great grandparents – see Figure 3 in my post “Crossovers by Generation” here. From a theoretical perspective, these segments can start and end anywhere and there are infinite possibilities. But for each of us, all of our segments are fixed before we were born – they are very specific in our body.

For this post, I want to look at the life of a DNA segment which is represented by a Triangulated Group (TG).  You will recall a Triangulated Group is a segment of our DNA identified by a Chromosome, Start Location, and End Location – it’s a specific part of a chromosome passed to us by one parent. This is real DNA in our body – a long string of millions of base pairs, usually represented by over a thousand unique SNPs (which are the markers which are actually measured in a DNA test). We can “see” this segment of our own DNA because of overlapping shared DNA segments with various Matches.  Because multiple Matches share the same long string of SNPs with us, we understand that this DNA segment had to come from a Common Ancestor (CA) to us, and to each of our Matches in the TG. The science tells us that that’s the only way we would have gotten such long segments in common.

So, exactly where did this TG segment come from?

Which of our Ancestors first had that segment – the Most Distant Common Ancestor (MDCA)?

Which generation back?

How was it formed?

What did it originally look like?

How was it transformed into the TG segment we now see?

Are some of our Matches in a TG actually related further back than the MDCA?

Are some of our Matches in a TG actually closer cousins than the MDCA?

Read on – all these questions will be answered in The Life of a DNA Segment.

Some ground rules – we are not talking about small segments – let’s use at least a 15cM IBD segment. This is not hypothetical DNA in a mathematical simulation of many possibilities and variations with some distribution curve. We are going to look at a specific segment that was identified by a Triangulated Group – it came from one of our Ancestors. We start with a real DNA segment in our body.

Let’s just start with an example and work from there. Let’s say my TG-segment is:

Paternal Chr 04 from 20Mbp to 45Mbp.  My TG ID would be 04C2 – meaning it’s on Chr 04, starting about 20-30Mbp, and on my father’s side [2]. [You can review ID codes here].

Let’s say it first appeared in a 3xGreat grandparent – specifically (to be generic, I’ll use Ahnentafel numbers): my father’s [A2], father’s [A4], mother’s [A9], father’s [A18], mother [A37].

Figure 1 – My Paternal Chr 04 and TG 04C2 on that Paternal Chr 04.

 In order for a segment to have “first appeared” in A37’s DNA, it must have been formed from (usually) two separate segments. Let’s say such a recombined segment was passed down by A37’s mother, A75. A75 had two Chr 04s (a paternal one from A150 and a maternal one from A151.

Figure 2 – A37’s Two Chr 04s.

Figure 3 – A75 crossovers and recombination of her Chr 04

In this diagram, A75 is going to pass to her daughter, A37, the yellow segments, with crossover points at 35 and 115 (two crossovers, creating 3 segments, are typical on Chr 04).

Figure 4 – Chr 04 passed from A75 to A37

In this diagram, the top Chr 04 shows the grandparent segments passed down to A37 from her mother A75. There is no clue to the future TG. In the second Chr 04, I’ve overlaid TG  04C2, so we can track what the future has in store. It now has a crossover in it that makes the area of 04C2 unique – DNA from two different people. As we shall see, this area of A75’s DNA was passed down to me (after all, we started with a TG that I had). Also, some part of this area was passed down to others who became Matches in TG 04C2 because their overlapping shared segments meant these Matches matched me and each other.

Let’s continue with the Life of DNA segment 04C2…  A37 is going to recombine the above maternal Chr 04 with the Chr 04 she got from her father (probably with two crossovers again), and pass the recombined Chr 04 to her son A18.

Figure 5 – Chr 04 with A37 recombination and passing to A18

The top two are A37’s two Chr 04’s with crossover points at 45Mbp and 95Mbp, with yellow (and blue) highlighting showing the three segments that are recombined to make the bottom row: a single Chr 04 being passed to A18. Note that the 35Mbp crossover point exactly coincides with the end point of TG 04C2. When A37 passes a Chr 04 to her other children, some may get the part the includes the imbedded 04C2 segment, and some may not. NB: This is a critical generation. At some point on Chr 04, we must have a crossover point at 45Mbp, because my TG 04C2 had that crossover point – this is the generation where that crossover occurred.

Next A18 will pass a Chr 04 to his daughter A9.

Figure 6 – Chr 04 passed from A18 to A9

Again, the top two rows are A18’s two Chr 04’s, this time with crossovers at 60Mbp and 160Mbp, with highlighting showing the three segments that are recombined to make the bottom row: a single Chr 04 being passed to A9. Note that neither of the new crossover points had any bearing on my TG 04C2 – in fact that whole end of Chr 04 (1 to 60Mbp) passed intact to A9, as part of her paternal Chr 04. Next we’ll see A9 passing a Chr 04 to A4.

Figure 7 – Chr 04 passed from A9 to A4.

In this generation A9 has crossover points at 20Mbp and 120Mbp. Other crossover points may be at other points for other children of A9. However, the 20Mbp is needed here (or in some generation for me to wind up with TG 04C2 [Chr 04: 20 to 45Mbp] – a unique TG segment made up of segments from A150 and A151.

Figure 8 – Chr 04 passed from A4 to A2 and from A2 to me

I just added a few more crossovers here – they don’t really matter as far as TG 04C2 is concerned. The crossovers had to miss TG 04C2 in order for me to get that particular segment. Remember this is not hypothetical, we started with a real example of a TG. The crossovers could have been different at each generation, but somewhere along the line, before TG 04C2 got to me, it had to be boxed in at 20Mbp and 45Mbp in order for me to have that TG. Because I have multiple Matches in TG 04C2, each of them had to have similar stories – they rarely get exactly the same TG I got, they got different segments from A37, but there was overlap in the area 20-45Mbp. If the other Matches did segment triangulation around this same area on Chr 04, they might have gotten TGs like: 15-35 and 18 to 42 and 27 to 53, etc. See The Anatomy of a TG here.

There are many, many ways the life of TG 04C2 could have played out, but they all would have resulted in a unique segment at Chr 04: 20-45Mbp, because a bunch of my Matches had overlapping/triangulating segments with me at that location. We started with the fact of TG 04C2 and all the included Matches.

With this TG originating in A37, I can have 4th cousins (4C) on my Ancestral couple A36 & A37 (the DNA coming, in this case, from A37). I could also have closer cousins, maybe some of them sharing more than just TG 04C2 with me. Can I have more distant cousins among my TG 04C2 Matches? Sure, I may well have a 5C Match on A150 OR A151, depending on which path the DNA came down. NB: a 5C could only share a maximum of 20-35Mbp from A151 OR a maximum of 35-45Mbp from A150 (because that’s all I got). I could have a 6C or a 7C on Ancestors of A150 or A151 – depending on which path the DNA came down. NB: none of my Matches has to share the full 45Mbp with me, just enough to have overlapping matches. I cannot share a full TG (20-45) with a 5C – this segment did not exist back that far.

Let’s see if we answered some of the questions:

So, exactly where did this TG segment come from? From Ancestor A37 [this has to be determined by genealogy research among the Matches in the TG – Walking the Ancestors Back – A37 was assumed for the purposes of this example]

Which of our Ancestors first had that segment? All of the Ancestors in the line A2, A4, A9, A18, and A37 had that segment, but neither A74 or A75 carried that entire, unique segment – A37 was the first.

Which generation back? 5th

How was it formed? By recombination of A150 & A151 DNA by A75, when she passed a Chr 04 to her daughter A37 [however, a unique segment could be created many different ways].

What did it originally look like? Originally, in A37, TG 04C2 was just a part of Chr 04. Review Figure 4 – A37 had a very large segments from A150 and A151.

How was it transformed into the TG segment we now see? Subsequent crossover points nibbled away some of the DNA, until only TG 04C2 remained.

Are some of our Matches in a TG actually related further back? Yes, on smaller parts of TG 04C2 some Matches could be 5C or 6C or 7C, depending on the ancestry of the A150 and A151 segments, and depending if the shared segments are large enough to form Triangulation).

Are some of our Matches in a TG actually closer cousins? It is definitely possible. It’s these closer cousins that help us to Walk the Ancestor Back.

With reference to the real possibility that some of the Matches in TG 04C2 may be more distant cousins (beyond A37)… What if A37 was a Brick Wall? Those more distant cousins would be on A74 or A75 or A148 to A151. AncestryDNA shows ThruLines Common Ancestors back to 6C (the level of A148), so it is entirely reasonable to expect the DNA to “work for us” back that far. Look at the Trees of those distant cousins for a Common Ancestor, and build out their children and grandchildren. The DNA has done it’s part, the rest is a genealogy task to find the link.

One observation is that TG 04C2 (20-45Mbp) existed in all 5 generations from A37 down to me. That is the unique string of SNPs exists in the area 20 to 45Mbp in each of those Ancestors – a pretty sticky segment to think about.  All of my Ancestors back to A37 had to have those SNPs, and each of the Matches in the TG had to also have some part of them (enough to create a Triangulation).

So there you have it – the birth and life of a TG – you know the general process of what to expect.

The above was a top-down description of the Life of DNA Segment 04C2. We could also build the story up – starting with my TG 04C2, and describing what that part of Chr 04 looked like at each generation going back, until the whole 04: 20-45Mpb segment didn’t exist anymore. In this example TG 04C2 would exist in A37, and it would not exist in A37’s parents: A75 and A76.

I think most of our TGs will be found to have Common Ancestors at the 4C to 7C level; and some of the Matches in the TGs will have CAs back to the 8C or 9C level (maybe some even more distant). I think this is promising for genetic genealogists – many of our Matches will have Common Ancestors within a genealogy timeframe. We just need to find them.

Can weird or unusual things happen? Perhaps. I really don’t know. I understand that some ancestor had to have the TG segment and pass it down to me, and parts of it to my Matches in the TG. I understand the TG segment is pretty unique. And because it’s also present in multiple, separated, Match cousins, I’m confident that it’s from only one Ancestor. Is it possible that all of the multiple Matches are also cousins on another Common Ancestor? Technically, yes – practically, no! Maybe among my 372 TGs I’ll find a few that throw me curve balls. But, in the main, I’m confident that virtually all of my TGs will sort out. With my fixed set of Ancestors, and fixed DNA segments – there is only one correct way to interconnect them – there is only one way the DNA from my Ancestors came down to me.

SUMMARY for the Life of a DNA Segment – I know the above description is hard to follow (it was hard to write!) But the summary is that each of our TG/Segments started in some Ancestor, and was passed down to us. It was passed down through one, specific line of descent to a parent to us. When the TG/Segment was first formed, it was part of a full Chromosome in that first Ancestor. Somewhere along it’s journey, the Start and End points of the TG were determined by new crossovers points that are added when a parent passes a new, recombined chromosome to a child.  Portions of each TG/Segment were also passed down to our Matches. Our TGs are formed by our shared DNA segments with those Matches.  The final TG/Segment is a fixed part of our DNA.

Before I leave this topic, I want to refer to my recent post: TG SUMMARY spreadsheet, here. This spreadsheet lists all TGs. It also shows known CAs for each TG – and imputes missing Ancestors. With enough genealogy work, this spreadsheet clearly shows me Walking The Ancestor Back in each TG. By comparing the Ancestor Ahnentafel numbers in adjacent TGs we can see where crossovers occurred. I believe, in this TG SUMMARY spreadsheet, we’ll find the crossovers for the Start and End of each TG – indicating how and where each TG was formed. This spreadsheet could be a KEY interlocking process that leads directly to a full Chromosome Map linked to Ancestors. I’ll write more in a separate post.

[05F] Segment-ology: The Life of a DNA Segment by Jim Bartlett 20220104 Happy New Year

TG SUMMARY Spreadsheet

This is my ultimate spreadsheet, so far. I think it’s use will be profound.

The spreadsheet and data are pretty simple. Here are the header titles:

Side – Paternal or Maternal [I use Ahnentafel number 2 or 3]

Chr – Chromosome number (1 to 23)

Start – the Start location of each Triangulated Group (TG) use Mbp with one decimal

End – the end location of each TG (Mbp) NB: End of one TG = Start of next TG

Mbp – calculate End minus Start; this is “roughly” the cM (real cM is too hard to calculate)

TG-ID – An ID for each TG – Example 01S2 – on Chr 01, Start ~130Mbp, on 2 side

                For more see: Shorthand ID for TGs

G2 – second generation back, column for Ahentafel number for my Ancestor

G3 – Ahnentafel for one of my grandparents; similar for other generations below

G4

G5

G6

G7

G8

G9

G10

CZN – Cousinship of Most Distant Common Ancestor (MDCA) in this TG [within reason]

MDCA – the surnames of the MDCA Couple Ex: SNIDER/BRITZ

Remarks – a place for any discussion I want about a TG – I could write a book in this cell.

For the spreadsheet header, just type in the above list (to the left of the hyphen) across the top row.

This simple spreadsheet has two parts:

1. Side|Chr|Start|End|Mbp|TG-ID – this describes each TG. I have 372 TGs so my TG Summary Spreadsheet has 372 rows of data. Having already done segment Triangulation of all my Matches at FTDNA, 23andMe, MyHeritage and GEDmatch, this part of the TG summary was a snap. And these data points remain fairly static. I say “fairly” as from time to time, as new Shared DNA Segments are found and new MRCAs determined, I do make minor TG end-point shifts and/or combine or split existing TGs.

2. G2-G10|CZN|MDCA – this part summarizes the genealogy – the Most Recent Common Ancestors (MRCAs) I have with Matches in the TG; and my best, conservative, judgment of what is the MDCA for the whole TG, so far. For each of the reasonable MRCAs in this line I enter the Ahnentafel number of the MRCA – bolded and yellow-highlighted – under the appropriate Generation. See the Figures below. This represents the data that I’m comfortable with. The goal is to Walk the Ancestor Back in each TG, and/or to find multiple, separate Matches who agree with the Ancestral line. In some cases, where I have sufficient evidence, I underline the Ahnentafel numbers to indicate I’m confident with the result. In the other cases the Ahnentafel numbers are clues, and more data is needed. In some TGs I have no MRCA, yet. In just a few TGs (less than 10), there are no Match-segments (beyond very close relatives) – these are small gaps in the overall Chromosome Map.

Both of the two parts above come from my atDNA Master Spreadsheet which has all my Shared DNA Segments with Matches. That spreadsheet is now over 20,000 rows – this TG Summary spreadsheet is 372 rows which I can print out in two pages back to back. A very handy scrap of paper.

This is the spreadsheet version of DNA Painter. In fact I once painted the 372 TGs. And I did the same with Kitty Cooper’s Chromosome Painter program.

Several  observations about this TG Summary Spreadsheet:

1. It is a summary! It extracts the important essence from 20,000 rows of data into 372 rows.

2. Trends – By bolding and highlighting my Ahnentafel numbers I can readily see trends and/or conflicts.

3. Pointers – For TGs with strong evidence of an MRCA, this information is very valuable in looking at other Match Trees in the TG. *Knowing* the TG MRCA is a powerful pointer, which has helped me find MRCAs with many more Matches (including those with only one Ancestor in their “Tree”). This summary has become a powerful TOOL in this respect.

4. Fill-in – In many cases, based on high confidence MRCA Ahnentafels, it is easy to “fill in” the other Ahnentafel numbers leading up to the TG MRCA – these “fill ins” identify the Ancestors who “had to be there” for the DNA to pass down to me.

5. Crossover points – With “filled in” Ahnentafels, it’s easy to see where the crossovers occurred between TGs in a Chromosome. If one TG row has Ahnentafels 2-5-10-21-42 and the next row has 2-5-10-20-40, it’s easy to see the crossover occurred between 21 and 20. In this case, Ancestor 10 had these two adjacent TGs in their DNA – one from his mother (21) and one from his father (20), which he (10) recombined and passed as a single, larger TG segment to his daughter 5. A first cousin (1C) might share that larger segment with me. And, of course, parent 2, would have also passed that *double* segment to me intact as part of that Chromosome. In other words, as we fill out the MRCA Ahnentafels, we can track the crossover points, generation by generation.

6. Crossover points per generation – This also gives us the ability to easily count the crossovers per generation. Will we see the 27 male vs 41 female distribution? On the paternal side we can see how many times the TGs shifted from 4 to 5 (or 5 to 4) [no matter what distant MRCA each of those TGs eventually went to after that]. How many 8 to 9 and 10 to 11 cross overs will there be – will it stay near 27 or tread toward the 34 average?

7. Quality Control – The Crossover points per generation summary may be a good QC check – if our result from the TG Summary Spreadsheet is reasonable. If my TG summary shows a lot of change between TGs, I might show 50 or 60 crossovers in a generation – that seems unreasonable to me, and I would be looking over my TGs. It would be relatively unusual for a Chromosome to shift from one grandparent to another and then back again several times on one Chromosome – it’s more likely that I had several MRCAs in a TG and selected the wrong one.

8. Predictive – I have some TGs with no or a very close (2C) MRCA. In some cases, it’s where much of the Chromosome is from one grandparent. If one TG in the middle is from the other grandparent, that adds two crossovers to the total. Although I’m pulled toward the parsimonious solution, I have to ALWAYS keep from jumping to a conclusion. On the other hand, if my total crossovers for the generation is high, I need to look hard for errors I might have made.

9. Brick Walls – In some TGs I have several Matches at, say, the 4C level, and a lot of other Matches, some with good Trees, but no MRCA beyond 4C level. All other things being equal, I should have some 5C and 6C Matches in that TG (I have ThruLines Matches at Ancestrycom for all of my known 6C Ancestors, so I know the genealogy is there if the TG was really from one of them). So I conclude that those TGs are probably headed past the 4C level and on through a Brick Wall. I’ve used this conclusion to successfully use Match Trees to find a Common Ancestor in two TGs with Brick Walls; and in one case where I had an incorrect Ancestor. Where there are a lot of Matches in a TG with closer Cousins, I am suspicious of an Ancestor I haven’t identified. Alternatively, I also have some TGs with no close Matches and lots of distant apparent Matches – sort of like a pile up area. This situation leads me to believe the MRCA may be at the fringes of my genealogy or beyond. Maybe adjacent TGs will be able to help…

10. Chromosome Maps/Generation. As I’ve mentioned in several blog posts before – each parent gives us 23 chromosomes and all 3 billion base pairs in a genome. On each side, our two grandparents provided segments that account for the same 3 billion base pairs – through roughly 57 segments that cover all of our Chromosomes. And so it goes for each generation – our Ancestors in each generation provide sufficient segments to fill up all our Chromosomes. With this TG Summary spreadsheet, it’s easy to see the segments from any Ancestor in a generation. For instance my father’s mother’s Ahnentafel is 5 – I can look down column G3 and see which TGs have a 5 and easily note the appropriate Start and End points to get the Mbp for each grandparent segment (or I could resort the spreadsheet on column G3 and just sum the 5s). Repeat for Ahnentafel 4 – the sum of Mbp for 4 plus 5, had better add up to the whole for that side. [arithmetic check!] I could Paint or map those segments which would cover all of my Chromosomes. Do the same thing with Ahnentafels 12 through 15 on my maternal side to map (or Paint) my great grandparent segments.  It all depends on how much of the TG summary I can fill out.

11. Sticky Segments – It’s easy to see “Sticky” segments – a segment of DNA that must have traveled down many generations intact to get to me. See TG 09A24 in Figure 3 below. Yes, this segment probably started out somewhat larger, but the segment I have in my body (TG 09A24) had to persist from Ancestor 354, through 8 generations, to me.

12. Progress – This TG Summary spreadsheet offers a good way to track your progress (if your objectives include linking Ancestors to Segments, or “proving” your Tree with DNA).

13. Focus – In any case, this TG Summary spreadsheet helps me focus on the *Most Distant Common Ancestor* Chromosome Map objective.

Here is part of my TG Summary Spreadsheet only with known MRCAs (bolded  & highlighted) Ahnental numbers :

Figure 1

Next is part of my TG Summary Spreadsheet including “must be” Ancestors. And a few of the known, even numbered (husband) Ahnentafels were changed to odd (wife) Ahnentafels – as appropriate.

Figure 2.

Now, below, I’ve added some double underlines at crossover points:

Figure 3.

NOTES on this Figure 3:

1. Note the 183 to 182 crossover between TG 06O25 and 06Q25. Ancestor 91 had the two segments as maternal and paternal segments and passed them on as a full maternal segment to Ancestor 45, which then must have been a “sticky segment” down to me.

2. Note TGs 08A24 through 08F24 appear to flip-flop from 8 to 9 to 8 to 9. That almost certainly would not happen. And upon examination I see that TG 08D24 should have been changed from 8 to 9 [NB: by convention I use an even Ahnentafel to represent an MRCA couple, which is this case would have been 8 & 9– so the 08D24 8 has an equal chance to be a 9. Given that both adjacent TGs are a 9, the best guess is to make it a 9. Then there is 1 crossover instead of 3.

3. While we are looking at Chromosome 8, it looks like the 9 will crossover to a 10 before 08L25 starts; but the crossover will actually be in G3, and TG ID 08K24 could be either a 4 or a 5 in G3. This would mean one crossover in Chr 8 in Gen 3 – highly probable for Chr 8.

4. Look at TG 09A25 – two things: although it’s 5.5Mbp long, don’t be fooled – most of the Shared DNA Segments in 09A25 are 15cM segments; and only one Match has a distant MRCA at Ahnentafel 354 – this is a fairly iffy, and I would not be surprised if someday I found a closer paternal cousin on this segment – even one on a 2-4 branch (but for now this is the only clue I have for this TG, and I’ll keep it until something better comes along).

5. Look at TGs 07L24 and 07N25 – there is a crossover here at G3, and the two TGs go back in two very different directions (to two very different geographic areas). I generally find that this type of crossover is fairly crisp, and easy to identify. Other TGs with crossovers somewhat more distant seem to have a fuzzy overlap – often the crossover point cannot be readily pinned down. I such cases I just pick a compromise crossover point. As I’ve noted before, our focus should be on the bulk of the TG segment, from an Ancestor, and not be too concerned about a little fuzzy overlap of TGs.

SUMMARY THOUGHTS

REMEMBER: Your Ancestry is a very unique arrangement of Ancestors – which does not change. Your DNA is a unique tapestry of segments and crossover points – which does not change. Your DNA is linked to your Ancestors in one way – which does not change.

All of our tools help us to determine our Ancestors and segments and how they are interconnected. The Leeds method and Virtual Phasing can help with Generation 3 (G3 above) – the results should be the same as your grandparents and your grandparent segments. DNA Painter can help paint the same segments you see in the TG SUMMARY above – but more colorfully, and perhaps closer to your style of figuring things out. Clustering (in all its various manifestations); Shared Matches (aka In Common With; Relatives in Common, Shared DNA Matches); segment Triangulation (by GEDmatch, MyHeritage, 23andMe, or DoubleMatch Triangulation at FTDNA; or through a Spreadsheet, as I do); and Trees and genealogy Tools primarily at Ancestry – are all good tools that are designed to help find Common Ancestor and DNA segments. Use the tools that work best for you. This TG SUMMARY may also help you.

SUMMARY – NB: This spreadsheet summarizes WORK. It summarizes the segment Triangulation of many DNA Matches – not an easy task, but one when accomplished, remains relatively static. It also summarizes a lot of genealogy WORK determining Common Ancestors with our Matches. As we document our TGs and dig into the genealogy, we begin to build a mountain of evidence linking our Ancestors to our DNA, and vice versa.

[35BE] Segment-ology: TG Summary Spreadsheet by Jim Bartlett 20211222

Ancestor Spreadsheet

The most important spreadsheet for the genetic genealogist, IMO, has nothing to do DNA or segments – it’s an Ancestor Spreadsheet. A simple spreadsheet of Ancestors is a very valuable tool.

This spreadsheet starts off very simple and can expand any way you want. Here are the column headers:

Ahnen – Ahnentafel number (this is a key to sorting and will help you in other ways, too)

L Name – Surname

F Name – Given name(s)

B Date – Birth date (pick a standard format and stick with it)

B yr – Birth year

B Place – Birthplace

B St – Birth state

B Co – Birth country

D Date – Death date

D yr – Death year

D St – Death state

D Co – Death country

D cause – Cause of death

M Date – Marriage date

M yr – Marriage year

M Place – Place of marriage

M St – Marriage state

M Co – Marriage country

I Date – Immigration date

Remarks – Add anything you want

D age – Age at death [D yr minus B yr] [also serves as a Quality check]

M age – Age at marriage [M yr minus B yr] [also serves as a Quality check]

Ch – number of children

Rel – religion

Prof – Profession

Mil – highest level

War – RevWar, 1812, CivWar, etc.

Y-DNA [halplgroup of all male line]

mtDNA [haplogroup of all female line]

FG – Find-a-Grave [I paste in the hyperlink for easy access]

LDS – The FamilySearch ID# for the Ancestor [this may change from time to time]

For the spreadsheet header, just type in the above list (to the left of the hyphen) across the top row.

I started with a Tree at Ancestrycom that was mostly just my Ancestors – Click on Tree search (top right); then select ‘List of all people”; then highlight that list and paste it into a spreadsheet. There is still some amount of manipulation, but I had a good base to start. Or you could just type the spreadsheet from scratch. Or perhaps manipulation from a GEDcom or other Tree software.

Here are some benefits and things to do with this Ancestor Spreadsheet:

1. First – this is your personal spreadsheet – modify it any way you want – you can delete any columns you don’t want and/or add any new columns for data you want. You can hide any column(s), so that only the information you tend to use all the time is shown (you can unhide at any time).

2. This is a very handy inventory of your Ancestors – a printout will only be a few pages.

3. This is an easy repository of key data elements of your Ancestors

4. This is my go-to lookup table for Ahnentafel numbers

5. A searchable database

6. A variety of spreadsheet sorts:

                A. Ahnentafel sort – groups husband and wife together and by generation

                B. Surname + B yr – groups surname lines in chron order => surname lineage

                C. B St – groups Ancestors by states; sort on “B county”, if you add that column.

7. I include Potential Ancestors, but highlight them as Potential.

8. I include “Alternate” Ancestors (also highlighted), when there is an unresolved alternative.

9. Blank spaces can highlight data you don’t have – they are a good tickler for what you need to research. Periodically, I’ll select one to tackle.

10. Add an additional row for each generation – this is a “separator” between generations; Example of the text in a Generation row:  32 GEN 6; 3xG grandparents; 4th cousin Matches [where the 32 is in the Ahnen column.

11. I select a some columns that will fit on one page, and hide the rest, and print out about 4 pages (the top Ancestors) which I always have handy – particularly while traveling.  I refer to this spreadsheet literally every day – so, IMO, it’s worth the work it takes to prepare it.

12. Please post any additional uses you find for this Ancestor Spreadsheet.

Here is an example of part of my Ancestor Spreadsheet:

Figure 1:

NOTES

1. This is just a sample; some columns have been hidden or reduced in width to get a lot in the picture.

2. This Ancestor Spreadsheet is not intended to replace my on-line Tree(s) which are full of documentation. This has key data, and an Ahnentafel place holder for every Ancestor. Ancestors in my Tree more than once are in this spreadsheet more than once – it helps to understand Pedigree collapse.

[35BA] Segment-ology: Ancestor Spreadsheet by Jim Bartlett 20211222 [Edited 20211223]

Segmentology Common Ancestor Spreadsheet

I’m going to try a format here that will make it easier for me to explain some of my spreadsheet tools, and give you an easy way to copy the header (you can adjust the column widths to suit your self). Please let me know if this works for you, and I’ll try some more of them.

Copy the above column titled “Header Row” and paste it into your spreadsheet using the Transpose option. It should create the Header Row for the Common Ancestor Spreadsheet.  [Edit: it appears this doesn’t work from the image above – so just type them in a row across a spreadsheet.]

There are several types of rows for you to input:

1. Include one row for each of your Ancestor Couples – I highlight these rows      

2. There is one row for each Match with each known Common Ancestor (MRCA);              

3. I add a row for my MRCA Child & birth year with a NOTE to refer to appropriate Ahnentafel for more  

4. I add a row for Ancestor multiple marriages, and put marriage year in born column      

                This separates full cousins and half cousins.

5. If something looks fishy, or needs more investigation, I highlight it in orange/mud color.             

6. If an Ancestor/Ahnentafel number and a TG are in conflict, I highlight it in red. The genealogy may be correct but the shared DNA segment did not come from the MRCA         

Other NOTES:   

1. The main sort for this spreadsheet is Ahnen + born+ born +born columns         

                NB: Highlight all columns before sorting.

2. Another sort is on Match Name to analyze multiple MRCAs – only one TG per MRCA    

3. If you want to compare spreadsheets for different Test Takers, be sure to fill in the TT column first. Combine spreadsheets, sort, analyze, then sort on TT and separate the spreadsheets.       

4. Sidebar: I have an Ancestor Spreadsheet – one row for each Ancestor info, including the Ahnentafel number!  

5. I have typed all the data into my Common Ancestor spreadsheet – a lot of work             

                Idea: If you have a download of AncestryDNA Matches, start with that data for ThruLines Matches

6. If you want to be able to sort this by side (your paternal and maternal sides); add a column for P or M (or 2 or 3)               

7. Do not hesitate to add any other columns (or rows) that may be useful to you. I made up this spreadsheet, feel free to change it as you like.

 ADVANTAGES OF THIS COMMON ANCESTORS SPREADSHEET      

1. It captures all of your Matches with Common Ancestors [some may be gone tomorrow…]        

2. It arranges the Matches’ descendants from the MRCA like a Family Group Sheet           

                Easy to compare with your own research

                Helpful in spotting many errors

                Easy to see Matches who are relatively close cousins to each other – good conversation starter

                Easy to highlight real and/or potential errors

                Easy to spot a Match at two companies with different names

3. Shows TG threads in a family [maybe Clusters too, haven’t tried them yet)       

                Makes it easy to spot TG threads through a family (closer Ancestors will have more TG threads)

Here is an example from my CA Spreadsheet:

[35BC] Segmentology Common Ancestor Spreadsheet by Jim Bartlett 20211219  

Using MyHertiage Labels for Triangulation

My Heritage just released an improvement to their “labels for DNA Matches”. See their blogpost at: https://blog.myheritage.com/2021/12/labels-for-dna-matches-now-improved/

These are intended to help you organize your DNA Matches into groups. And, AND, AND … you can “Export entire DNA Match list” (click on the 3 vertical dots to the right of Filters and Sort by), and this spreadsheet will include a list of any labels associated with each Match.

This is a huge time saver for Triangulation. To the extent that we can identify our Matches as Paternal and Maternal, the Triangulation process becomes very simplified. Paternal side Matches will only Triangulate with other Paternal side Matches. NB: watch out for any Matches that may relate to you on both sides. For the vast bulk of our Matches, however, all we have to do is sort by side + Chr + Start and form groups.

If you’ve already done a lot of Triangulation, this will provide a good Quality Control check.

There are a few pesky details: you have to assign the dot labels*…; you have to merge the Match list with the segment list…; you have to analyze the start/stop locations and make a judgment call as to where the Triangulated Group starts and ends.  But aside from these chores, the main headache of checking for Triangulation is gone. Having the effect of “phased data” means the shared segments on one side have to Triangulate only with other segments on that same side.  *Clustering and Shared Matches will often indicate that we can assign “side” labels on a group basis. Triangulated Icons should always indicate the same “side”.  

[10D] Segment-ology: Using My Heritage Labels by Jim Bartlett 20211215

Distribution of Cousins

This blogpost is overtaken by a better analysis by Kurt Allan, based on other analysis by Louis Kessler and information from Doug Speed that his chart was intended for a different purpose and might not apply to genetic genealogy. The result is a spreadsheet similar to the one below, but with a more normal distribution curve with 7C-9C occupying the mean. This is very good news for genetic genealogists – most of our Matches are well within a genealogy horizon. I hope to be able to post or link to Kurt’s final graphs soon.

A recent discussion on the Genetic Genealogy Tips & Techniques facebook page asked about what percent of our DNA matches we should expect at various genetic distances. I’ve often wondered about this too. As I thought about it, we should be able to apply the “Speed and Balding” analysis to this question. The S&B graph shows the probability of a matching DNA segment at different generations (think cousins), for given ranges of shared DNA. See the graph at the ISOGG wiki here.

I scaled each “bucket” in this chart as best I could and put the bucket percentages in an excel spreadsheet – see below.  In the Speed and Balding chart, cM ranges are along the x-axis; percentages on the y-axis; and the “generations” are shown as stacked bars (or “buckets”) for each cM range. The numbers in the body of this chart are the percentage for the cM range and Generation.

I had in my files a complete download of my AncestryDNA Matches by DNAGedcom Client from several years ago, before the Ancestry purge of 6-7cM Matches. I had 131,824 Matches and it was easy to sort by cM and determine the total number of Matches for each column (cM range) in the S&B chart.  Finally I applied the S&B percentages to my breakdown of Matches to get the following  chart.

I know it’s a “squinter”, but I wanted to show the whole spreadsheet. Here are explanations of the lettered rows:

A.            The cousinship which corresponds to the S&B generations back

B.            Speed & Balding generations

C-N.       The first column is cM range groups that correspond to the S&B chart.

C-N.       The second column is the number of my Matches in each category –131,824 Total.

C-N.       The next columns: multiply S&B percent by number of Matches in the cM range

P             Total number of Matches for each cousinship

Q             Percent of cousins vs the 131,824 Total

Key points

1. I could be off by a percent or two in my scaling of a printout of the Speed and Balding chart – but the totals are pretty close, and what I am really looking for is trends and order of magnitude.

2. Line Q, percent of total, was a lot flatter than I had expected – less than 4 percent for any cousinship. I had expected something closer to a normal distribution curve – even a long one, but with a “hump” somewhere. This indicates the two competing factors: an increasing number of cousins with each generation going back, verses a decreasing probability of a shared DNA match (above 6cM) with each generation going back.

3. There are a lot of Match-cousins to work with. Although only about half of all our Matches would be related to us out to the 19th cousin level; nevertheless, there are thousands of cousins in every cousin “bucket”.

4. In my own case I need to use judgment and temper some of these results. Both of my parents were only children, so I have no 1st Cousins. And my great grandparents did not have large families, so I also don’t have many 2C. However, I do have about 300 3C Matches and 600 4C Matches identified, so far, and there are plenty more out there (at least per Speed and Balding). And I am finding many 5C-8C Matches (but my known Tree begins to thin out after that.)

My Takeaways

1. Autosomal DNA “works” throughout a genealogy horizon for most of us.

2. The limiting factor is NOT the atDNA, it’s the genealogy – the lack of good Trees among our Matches; and the shrinking body of documentation the farther back we go.

3. When Matches Triangulate or group in Clusters, it’s often worth the effort to extend their Trees and find the Common Ancestor.

This blog post is one in a series to try and outline what you can generally expect – to put some generalized boundaries on genetic genealogy.

Anyone is welcome to use my estimate of the S&B data in the first spreadsheet, and apply it to the distribution of your own Matches. Please let me know if you see a glaring error in this process or the results.

[06D] Segment-ology: Distribution of Cousins by Jim Bartlett 20211209

Genetic Genealogy Spreadsheets

Spreadsheets are an important tool in Genetic Genealogy. Here are some of mine…

ANCESTORS – Names, dates, locations, and Ahnentafel # are the key foundational data in this spreadsheet. Over time I’ve added columns for: Immigration year; age at death, age at marriage, number of children, Religion, Profession, Military, War, Y-haplogroup, mt-haplogroup, Find-A-Grave hyperlink, remarks. Add any other column of interest to you – you just need to fill it in… I add in Potential (or Alternate) Ancestors (highlighted) to keep track of those possibilities.  I have a “dup” column to indicate which Ancestors are duplicates. This is a very handy reference for me. Two main sorts – 1) by Ahnentafel #; and 2)by surname & birth date. [Initially I took a GEDcom of an Ancestors only Tree and put it into a spreadsheet – then massaged the columns]

COMMON ANCESTOR MATCHES – Names of all Matches who have a Common Ancestor(s) with me. Key data: Name, Admin, cM, #Segs, Company, CA Ahnentafel #, Cousinship, columns for name and birth year of child, grandchild and great grandchild of CA (for Match’s line of descent); hyperlink to Tree. I also have columns for TG or Cluster, GEDmatch #; Remarks. I also have columns to indicate (to me) if I’ve filled in the Notes box of the Match, entered the line of descent to the Match in my Tree… Main sort is by Ahnentafel # and dates of Child, Grchild – this sort looks like a Family Group Sheet – and is very helpful in tracking TGs and Clusters. It also lets me focus on family groups which often group together in TGs or Clusters.

TG MASTER – Match name, company, Admin, email, Segment info (Chr, Start, End, cM, SNPs), TG ID, Side (M or P), CA info (Ahnentafel #, Surnames of Couple, Cousinship, Tree hyperlink); GEDmatch #; date. If you plan this type of spreadsheet for other people, add a column for initials of test taker (you can then, briefly, combine spreadsheets, do analyses, then separate them again – one spreadsheet per person). Advanced: I have columns for each of 10 generations, and enter my Ahnentafel #s from my parent (2 or 3) out to the CA – this helps me analyze multiple CAs in a TG. Headers: 46 Chromosome bars (rows to separate data); TG bars (that summarize the Chr, Start and End of each TG and the CA) – sometimes there are multiple bars – I highlight the most likely. Main sort: by Side, Chr, Start (this will arrange all Shared Segments into their respective TGs – which should have one Ancestral line)

TG SUMMARY – Chr, Start, End, TGID, Side, columns for 8 generation of Ahnentafel #; Cousinship, CA Surnames; NO Matches. This is a summary subset of the TG MASTER – except I fill in only the Ahnentafel # for known CAs out to most likely distant CA. In italics, add in Ahnentafel #s from Cluster analysis compared to known TGs – this often extends the evidence in the TG (this is an experimental spreadsheet at this point. Sort: Side, Chr, Start .For me, this is a handy 2-page crib sheet.

WALK THE CLUSTERS BACK – a fairly technical tool. Start with a download of Excel data for Clusters based on a 50cM threshold (from DNAGEDcom Client). This will include Match name, cM and any Notes you’ve entered for each Match – this Note info is very valuable to have in this WTCB spreadsheet. Add columns for Ahnen (or CA Surnames) and a TG or Cluster (CL) code and Remarks. Finally add a column for serial # of each row (1 to how many rows you have at that threshold – so after a sort, you can reconstruct the original Cluster groups – just type a 1 at the top and drag it down in a series) – call this column CL50. And add another column – called CL# – and add in the Cluster # for each Match (I wish DGC would include this in the download spreadsheet. Then the work is to determine the Ahnentafal/CA and/or TG CL ID for as many Clusters as possible (hopefully your Notes will show you all the info you’ve collected about each Match). Add a summary row for each Cluster (using 0.4 in the serial number column – now when you sort on CL# and CL50, all the Clusters will be grouped with a header. IF there appears to be a consensus for the Ahnentafel/Surname and/or the TG/CL columns, enter that in this header row. Next, rerun the Cluster report with 45cM threshold – add two new columns for serial # in a new CL45 column and the Cluster # in the new CL# column. As before, add in a header row for each Cluster with 0.4 in the CL45 column and the Cluster # in the CL# column. Now add this spreadsheet to the CL50 spreadsheet, sort on Match name, and combine duplicate Matches onto one row (Matches in both CL50 and CL45 runs will have two Cluster #s and two serial #s – don’t worry. Resort on CL# and CL45 to get all the Matches in Cluster order again, with the added info from CL 50. Again, analyze each Cluster with a goal of finding the CA and/or TG/CL for most Clusters (for the Cluster header rows. If advantageous to see where a new Cluster is going, make a duplicate copy of Match rows which have strong affinity for other Clusters (and code it with the other Cluster number) – use to help identify CA for new Clusters. This is very much a judgment call, which will be confirmed or refuted in follow-on Cluster runs. Drop the cM threshold by 5cM and repeat. The number of Matches begins to increase dramatically – it’s a lot of work. But the benefit is that you are imputing Ancestral lines to many Matches who are Private or have little/no Tree. If you add the imputed into to the Notes of these Matches, they will show up in Shared Match lists and “flavor” them with an Ancestral line. Again – this process is experimental and requires us to use judgment. Future investigation – the Clusters from different companies should be roughly the same – it would be great to be able to link Ancestry Clusters (with many MRCAs) with Clusters from 23andMe, FTDNA, MyHeritage and GEDmatch (with TGs)…

Apologies – this WTCB “short” summary got a lot longer than I originally intended. I’ll have to post a more complete version later. The takeaway is: gradually reduce the cM threshold of Cluster runs and trace the Ancestry from the initial grandparents on out the ancestral lines, using new, smaller Matches which are often more distant cousins with more distant MRCAs. Think of a time-lapse movie of a plant growing, with new limbs branching out over time. If (when) we see a Match with an MRCA which is clearly out of whack with the “history” of the Cluster, it’s time to see if that Match would better be relocated to a different Cluster (per the gray cells) or a different MRCA needs to be found. It is OK to move a Match to a more probable Cluster if it has a lot of Shared Matches with that Cluster, too. We can use our judgment…

[35B] Segment-ology: Genetic Genealogy Spreadsheets; by Jim Bartlett 20211207