Genetic Genealogy Spreadsheets

Posted on December 7, 2021 by Jim Bartlett

Spreadsheets are an important tool in Genetic Genealogy. Here are some of mine…

ANCESTORS – Names, dates, locations, and Ahnentafel # are the key foundational data in this spreadsheet. Over time I’ve added columns for: Immigration year; age at death, age at marriage, number of children, Religion, Profession, Military, War, Y-haplogroup, mt-haplogroup, Find-A-Grave hyperlink, remarks. Add any other column of interest to you – you just need to fill it in… I add in Potential (or Alternate) Ancestors (highlighted) to keep track of those possibilities. I have a “dup” column to indicate which Ancestors are duplicates. This is a very handy reference for me. Two main sorts – 1) by Ahnentafel #; and 2)by surname & birth date. [Initially I took a GEDcom of an Ancestors only Tree and put it into a spreadsheet – then massaged the columns]

COMMON ANCESTOR MATCHES – Names of all Matches who have a Common Ancestor(s) with me. Key data: Name, Admin, cM, #Segs, Company, CA Ahnentafel #, Cousinship, columns for name and birth year of child, grandchild and great grandchild of CA (for Match’s line of descent); hyperlink to Tree. I also have columns for TG or Cluster, GEDmatch #; Remarks. I also have columns to indicate (to me) if I’ve filled in the Notes box of the Match, entered the line of descent to the Match in my Tree… Main sort is by Ahnentafel # and dates of Child, Grchild – this sort looks like a Family Group Sheet – and is very helpful in tracking TGs and Clusters. It also lets me focus on family groups which often group together in TGs or Clusters.

TG MASTER – Match name, company, Admin, email, Segment info (Chr, Start, End, cM, SNPs), TG ID, Side (M or P), CA info (Ahnentafel #, Surnames of Couple, Cousinship, Tree hyperlink); GEDmatch #; date. If you plan this type of spreadsheet for other people, add a column for initials of test taker (you can then, briefly, combine spreadsheets, do analyses, then separate them again – one spreadsheet per person). Advanced: I have columns for each of 10 generations, and enter my Ahnentafel #s from my parent (2 or 3) out to the CA – this helps me analyze multiple CAs in a TG. Headers: 46 Chromosome bars (rows to separate data); TG bars (that summarize the Chr, Start and End of each TG and the CA) – sometimes there are multiple bars – I highlight the most likely. Main sort: by Side, Chr, Start (this will arrange all Shared Segments into their respective TGs – which should have one Ancestral line)

TG SUMMARY – Chr, Start, End, TGID, Side, columns for 8 generation of Ahnentafel #; Cousinship, CA Surnames; NO Matches. This is a summary subset of the TG MASTER – except I fill in only the Ahnentafel # for known CAs out to most likely distant CA. In italics, add in Ahnentafel #s from Cluster analysis compared to known TGs – this often extends the evidence in the TG (this is an experimental spreadsheet at this point. Sort: Side, Chr, Start .For me, this is a handy 2-page crib sheet.

WALK THE CLUSTERS BACK – a fairly technical tool. Start with a download of Excel data for Clusters based on a 50cM threshold (from DNAGEDcom Client). This will include Match name, cM and any Notes you’ve entered for each Match – this Note info is very valuable to have in this WTCB spreadsheet. Add columns for Ahnen (or CA Surnames) and a TG or Cluster (CL) code and Remarks. Finally add a column for serial # of each row (1 to how many rows you have at that threshold – so after a sort, you can reconstruct the original Cluster groups – just type a 1 at the top and drag it down in a series) – call this column CL50. And add another column – called CL# – and add in the Cluster # for each Match (I wish DGC would include this in the download spreadsheet. Then the work is to determine the Ahnentafal/CA and/or TG CL ID for as many Clusters as possible (hopefully your Notes will show you all the info you’ve collected about each Match). Add a summary row for each Cluster (using 0.4 in the serial number column – now when you sort on CL# and CL50, all the Clusters will be grouped with a header. IF there appears to be a consensus for the Ahnentafel/Surname and/or the TG/CL columns, enter that in this header row. Next, rerun the Cluster report with 45cM threshold – add two new columns for serial # in a new CL45 column and the Cluster # in the new CL# column. As before, add in a header row for each Cluster with 0.4 in the CL45 column and the Cluster # in the CL# column. Now add this spreadsheet to the CL50 spreadsheet, sort on Match name, and combine duplicate Matches onto one row (Matches in both CL50 and CL45 runs will have two Cluster #s and two serial #s – don’t worry. Resort on CL# and CL45 to get all the Matches in Cluster order again, with the added info from CL 50. Again, analyze each Cluster with a goal of finding the CA and/or TG/CL for most Clusters (for the Cluster header rows. If advantageous to see where a new Cluster is going, make a duplicate copy of Match rows which have strong affinity for other Clusters (and code it with the other Cluster number) – use to help identify CA for new Clusters. This is very much a judgment call, which will be confirmed or refuted in follow-on Cluster runs. Drop the cM threshold by 5cM and repeat. The number of Matches begins to increase dramatically – it’s a lot of work. But the benefit is that you are imputing Ancestral lines to many Matches who are Private or have little/no Tree. If you add the imputed into to the Notes of these Matches, they will show up in Shared Match lists and “flavor” them with an Ancestral line. Again – this process is experimental and requires us to use judgment. Future investigation – the Clusters from different companies should be roughly the same – it would be great to be able to link Ancestry Clusters (with many MRCAs) with Clusters from 23andMe, FTDNA, MyHeritage and GEDmatch (with TGs)…

Apologies – this WTCB “short” summary got a lot longer than I originally intended. I’ll have to post a more complete version later. The takeaway is: gradually reduce the cM threshold of Cluster runs and trace the Ancestry from the initial grandparents on out the ancestral lines, using new, smaller Matches which are often more distant cousins with more distant MRCAs. Think of a time-lapse movie of a plant growing, with new limbs branching out over time. If (when) we see a Match with an MRCA which is clearly out of whack with the “history” of the Cluster, it’s time to see if that Match would better be relocated to a different Cluster (per the gray cells) or a different MRCA needs to be found. It is OK to move a Match to a more probable Cluster if it has a lot of Shared Matches with that Cluster, too. We can use our judgment…

[35B] Segment-ology: Genetic Genealogy Spreadsheets; by Jim Bartlett 20211207

Distribution of TGs Part 4 (Spreadsheet)

Posted on October 13, 2021 by Jim Bartlett

Common Ancestor Spreadsheet

By popular demand, here is a portion of my Common Ancestor Spreadsheet:

I’ve added letters at the top of each column for ease in describing them:

A – Initials for the testing company – if a Match tests at multiple companies, I list them for each one.

B – This indicates if the Match also has a ThruLines (TL) or UnListed Tree (ULT) or Theory of Family Relativity (ToFR), etc. Perhaps a quirk of mine in data collection, but this lets me sort the spreadsheet to select all the TLs, for instance, and compare with the AncestryDNA TL list for completeness.

C. The name of the Match – may also include the Admin or Point of Contact

D. Total cMs

E. Number of Segments

F. Ahnentafel number for the MRCA

G. Cousinship

H. A code for Headers (C is a Green separation for each generation; H are my Ancestor Couples; Hch are my Ancestors who are a child of the Ancestor couple – also highlighted in yellow; Also Hm is a row for a 2nd marriage of an Ancestor – usually with a note that Matches who descend from subsequent children would be half-cousins (use the marriage year in the K column)

I. Ancestor Couple

J. Children of Ancestors – these are the children the Matches descend from – daughters are noted with the surname they married (important to identify grandchildren, etc.) Each Ahnentafel should have a row for my Ancestor (and birth year).

K. Birth year of children

L – O – like J and K

P. This is just one column to add the rest of the descendants down to the Match, if you want to do that (I’ve deleted my entries here as it gets too close to living people)

Q. Triangulated Group IDs – I add a “c” when the TG is implied from a Cluster or consensus among Shared Matches.

R. Add any notes you want – some of mine includes notes when a Y-DNA or mtDNA (testing) path is in the line of descent; or if the line of descent is “iffy”, IMO; or TL was wrong, and this is the fixed/correct version; etc. (whatever you want)

S. Tree URL – this is often very handy to review and/or to quick get back to the Match.

I cannot emphasize enough that spreadsheets are very personal, and you should exercise your own judgment to adopt this one to your own preferences, and to add any other columns you want (realizing that each additional column is more work to maintain).

Here are some additional columns from my spreadsheet which I occasionally use:

Hg – Haplogroup (Y or mt) if it applies to the Match and is important to me

Gm – GEDmatch ID# for the Match

Related – I often note when two (or more) Matches are closely related to each other (this cautions me to “dilute” the value of, say, two children or siblings, as they do not add to any consensus.

MyT – I note in this column (y) if I’ve added this line of descent to my main Tree at Ancestry – it helps me get more ThruLines…

Other – I note in this column any other MRCA Ahnentafel this Match is related to me on – the Match is in my spreadsheet for each such additional relationship (it remains a task to figure out which MRCA goes with the Shared DNA Segment).

Email – email of the Match

The main purposes (for me) of this Common Ancestor Spreadsheet include:

1. Document (in one handy place) all of my Matches with MRCA

2. Use the Family Group Sheet format to check the accuracy of Match descendants against my own Family Group Sheets and/or genealogy research.

3. Determine closely related Matches (this sometimes results in a good communication thread that shares info, draws in other research. and maybe more interest in DNA testing.)

4. Tracking the TGs vs MRCA; particularly with respect to the “Rules”.

Feel free to post a comment about any additional use you see for this spreadsheet; and/or any improvements or drawbacks…

[15K] Segment-ology: Distribution of TGs – Part 4 by Jim Bartlett 20211013

Distribution of TGs – Part 3

Posted on October 8, 2021 by Jim Bartlett

This Part 3 will look at conclusions (what can we learn from all of this), and propose new spreadsheets to track all of your TGs or track TGs/Clusters through other children of our Ancestors (what we can do!).

Before I review the “Rules”, I want to focus set the stage on the Big Picture: Our autosomal DNA consists of 22 Chromosomes from each of our parents. And each of these Chromosomes has a mosaic of segments from our Ancestors. One analogy is an archeological dig with layers of artifacts – the deeper we go, the more distant (further back in time) the artifacts. Our DNA has a similar pattern. Each of our Chromosomes is composed of segments from our grandparents, passed to us from a parent. Going back another generation, each of our Chromosomes is composed of segments from our Great grandparents; and so on. Even if we went back 100 generations, we’d find that our Chromosomes were made up (completely) with DNA from that generation. This is not to say that every Ancestor in a generation contributed to our DNA; it is to say that some of the Ancestors in a generation contributed all of our DNA. And all of the crossover points are there if we can dig deep enough. But our DNA does not include “signposts” or markers that identify crossover points. Segment Triangulation is the only method I know to determine these crossover points and define specific segments from Ancestors (beyond Visual Phasing of grandparent crossovers). One drawback of Triangulated Groups (TGs), is that they are formed from available Shared Segments with Matches, and they are not formed at any particular generational level. However, we do know that each segment of our DNA came from an Ancestor – each segment represented by a TG came from an Ancestor. When we find a number of (widely separated) Matches in a TG agree on the MRCA, this is powerful evidence that the TG-segment came from that Ancestor. We can then map these TGs and MRCAs. In this respect, we learn that segment Triangulation (TGs) is essential to confirming our biological ancestry. We learn that the DNA is not scattered willy-nilly over our DNA. There are genetic guidelines – which I call “Rules” in this blogpost series. Our Chromosome Map should be in general agreement with these rules. And, perhaps in Part 4, I’ll try to outline how we can use these rules to predict much more.

So what do we learn/do with all this musing?

1. Learn: We can understand how TG segments originate in Ancestors – maybe 5 to 8 generations back – and pass them down to us, and our Matches. At each generation – coming down/getting closer to us – our Ancestors have more TGs – until our parents each pass down about 150-200 TGs to us.

2. Do: Use spreadsheets to track and analyze this growing amount of data… Examine closely areas that deviate from the rules.

LEARN

Here is a summary of our “Rules”.

-Rule #1: We can expect roughly the calculated numbers of TGs from each generation – an order of magnitude [See Table in Part 1]

-Rule #2: We can expect about 34 crossovers to occur per generation on each side.

-Rule #3: Shared DNA (with Matches) reduces by roughly 1/4 with each generation.

-Rule #4: We should not see Matches beyond 3C, with the same TG, descending from more than two children of a CA.

-Rule #5: The amount of DNA, and number of segments, in each generation, are not affected by external factors. [The number of TGs may be affected by the Matches you have.]

-Rule #6: The sum of the DNA contributions of all Ancestors at each generation will be 100%. This is a hard rule that is true at every generation.

-Rule #7: Each of our Ancestors will have all of the TG-segments both their parents had.

-Rule #8: A TG that subdivides going back, separates into a two smaller segments – one from each parent.

–Observation: Based on my 372 TGs, I estimate, using an 8cM threshold, most of you should get 150 to 200 TGs per side. They will range from small to large and span almost all of your DNA (there may be a few gaps – it all depends on the “coverage” provided by the shared DNA segments with your Matches.)

I think we should track and analyze our TGs. My preferred method is with spreadsheets:

1. Master atDNA Spreadsheet – this is an “everything but the kitchen sink” spreadsheet for me. I include every IBD segment (over 7cM) of every Match – with segment, genealogy, and other information. I still plan, someday, blogposts about spreadsheets. This is a cursory overview. A spreadsheet is highly personal – I recommend you start with downloads from your testing company and add columns to suit yourself – mine is continually evolving.

Here are some of the columns in my spreadsheet:

a. Match info: full name, last name, company “name”, email, POC, company, Notes

b. Segment info: Chr, Start, End, cM, SNPs [from the companies} and TG ID (from me)

c. Genealogy info: Tree URL hyperlink; cousinship (e.g. 3C1R); MRCA couple surnames, side

d. Other info: GEDmatch ID, dates of communication, etc., etc.

e. Most of this info is from company downloads, the rest is typed in as I get it.

My spreadsheet has over 20,000 rows… including the following “Header” rows

f. 22 paternal and 23 maternal Chromosomes – Header/dividers

g. TG summary Header/dividers – uses earliest start location of TG; MRCA (per my judgment)

h. Alternate TG Header – to records alternate MRCAs for some TGs

Spreadsheet sorts – each of these sorts is a valuable tool for me

i. Alphabetical by name

j. Chr & Start – used to Triangulate segments (as they are added)

k. Side & Chr & Start – clearly groups all Matches in a TG – should have same MRCA

l. TG summary Headers, only, by Side & Chr & Start – to analyze MRCAs by generation [these headers have Ahnentafel numbers (in columns for each generation) to the MRCA].

m. This TG summary sort also allows an analysis with respect to Rules #1, #2 and #3 by generation.

2. Common Ancestors Spreadsheet – this is a list of all Matches with Common Ancestors with me (now about 6,000 rows)

a. Match name, shared cM, cousinship, Tree URL; name/birth of MRCA descendants; Ahnentafel

b Any known TG ID or Cluster

c. A header row for each of my Ancestors (usually a couple)

This started with columns for Children of Ancestors; their birth year; and same for grandchildren – this info was typed in for each Match’s line of descent from the MRCA. I have since expanded it to include columns for remaining line of descent (and birth, down to the Match. For females I add married name; for example: Nancy m FLEMING to indicate succeeding descendants have a different surname (saves a little time and space).

Most of this info was from AncestryDNA ThruLines, but I have a number of rows for Matches from the other companies (and the TG info for them, which is like gold).

When this spreadsheet is sorted by MRCA Ahnentafel & Child birth, it looks like a series of Family Group Sheets for each MRCA family.

d. This makes it very easy to check against my Family Groups Sheets researched and developed over the past 45 years. I highlight conflicts in a mud color for further research and analysis.

e. This spreadsheet also has a column for for Potential Ancestors [POT ANC] (usually from Ancestry or MyHeritage]

f. For each Family Group, it’s also easy to check the range and average of Shared cMs for that family

g. Maybe most importantly for me, it allows me to see if there is a TG and/or Cluster thread in each family

h. I can also check for violations of Rule #4. This has already highlighted a few such violations – almost all of which led to an alternate MRCA with a much better “fit”.

i. All instances of a Match with multiple TGs and/or multiple MRCAs, must be adjudicated. A TG can only link with one ancestral line. This spreadsheet is a good tool for that analysis.

With this Common Ancestors spreadsheet I’m finding three things: a) each Ancestor does tend to have a consensus of Clusters and/or TGs; and there is the occasional outlier (this illustrates that just because you have shared DNA and an MRCA with a Match, it doesn’t necessarily mean the shared DNA came from *that* MRCA – there could be other MRCAs); b) there tends to be only a few groups (Clusters and/or TGs) in each Family Group – roughly in line with the table at the beginning of this blog post; and an occasional outlier of more than 2 children with the same TG (which indicates to me there is probably an issue – probably, also, where the shared DNA didn’t come from *that* MRCA).

3. TG Summary Quick Sheet – taken from my Master atDNA Spreadsheet, this just has a few columns, and I can fit all 372 TGs into 2 pages (maternal and paternal).

a. Columns: Chr; Start; End; TG ID; side; 8 columns for 8 generations of Ahnentafel numbers; MRCA Ahnentafel; MRCA cousinship; MRCA surnames

This Quick Sheet (1-page; front/back) is a handy reference – linking TGs to MRCAs and finding TG IDs for any segment. I include the Chr Headers (solid black) which highlight the TGs in each Chromosome.

I cannot over emphasize that spreadsheets are a tool, and you should adopt any spreadsheet to your own objectives and methodology. Don’t be afraid to add or delete columns or header rows.

Final thoughts for this Part 3 – think through the guidelines (“Rules”); and set up spreadsheets to help track and analyze your data. However, building and maintaining a spreadsheet to manage your Matches and Segments or Common Ancestors takes time – it’s not for everyone. Or… use whatever system you want – genetic genealogy is your hobby, and you are free to enjoy it however you want.

I might have a Part 4 with some more thoughts …

[15J] Segment-ology: Distribution of TGs – Part 3 by Jim Bartlett 20211008

Distribution of TGs – Part 2

Posted on October 5, 2021 by Jim Bartlett

Part 1 of this topic covered some of the background of how TGs are distributed over our Ancestors. This post is about where Triangulate Groups (TGs) are formed and their journey from the Common Ancestors (CAs) down to us and our Matches.

Recap of Part 1:

-Rule #1: We can expect roughly the calculated numbers of TGs from each generation – an order of magnitude

-Rule #2: We can expect about 34 crossovers to occur per generation on each side.

-Rule #3: Shared DNA (with Matches) reduces by roughly 1/4 with each generation.

-Rule #4: We should not see Matches (over 3C), with the same TG, descending from more than two children of a CA

-Rule #5: The amount of DNA, and number of TGs, are not affected by external factors.

-I have 372 TGs, roughly 186 per side, which is close to the 193 TGs predicted to come from my 16 4xG Grandparent couples (6 generations back – 5C level) on one side – about 12 TGs per couple. [See the table in Part 1]

The musing continues…

The TGs we eventually get were in our Ancestors’ DNA somewhere.

Think about the Ancestor who first formed (through recombination) the DNA segment that finally came to you as a TG segment. That Ancestor passed a whole set of Chromosomes to the child who was also your Ancestor – and that TG segment was included somewhere in all those Chromosomes. In these distant Ancestors the TG segment was probably part of a somewhat larger segment. With each succeeding generation that larger segment was either subdivided by a recombination crossover or passed along intact. The smaller the larger segment got, the less likely it was to be subdivided. Near the end of its journey, the TG segment was passed from a parent down to you.

Which Ancestor forms a TG?

Which Ancestor first formed the TG? The most distant cousins, who share the full TG segment, determine the Ancestor. If four 5C all share almost all of the TG segment, then that segment probably came from a 4xG grandparent. NB: 6C and 7C can also be in this TG (usually sharing only a part of the TG segment); and 3C and 4C can also be in this TG (but they aren’t the most distant); and 1C and 2C may actually share this TG and an adjacent TG. It’s often hard to get enough good cousins in a TG to nail it down with certainty.

All TGs from an Ancestor, come through one* child to you.

If an Ancestor couple is “responsible” for 12 TGs, all 12 of those TGs must come down through the one* child (usually) who is our Ancestor. The DNA is also coming from our Ancestor couple down through other children of the Ancestor couple and then down to our DNA Matches.

[*It is possible that we descend from two (or more) children of an Ancestor. If so, each one is treated independently. If an Ancestor at this level was passing down an average of 12 TGs (see the table in Part 1), then that Ancestor would need to pass down roughly 12 TGs to each child who was our Ancestor. Siblings share some DNA with each other. So, some of the TGs could be shared, some would be different. A single child is used to illustrate the concepts in this blogpost.]

All TGs from an Ancestor (MRCA), come through other children to our Matches.

Take an Ancestor couple with two children – all TGs pass down through the one child who is our Ancestor to us; the same TGs also pass down through the other child to our DNA Match-cousins. If an Ancestor has three children, again all TGs pass down to us through the one child who is our Ancestor, and those same TGs could be split between the other two children. If an Ancestor has multiple children, the TGs could be distributed over those other children with two concepts at work: 1) not every child has to get and pass on a TG; 2) The same TG cannot be passed down through more than 2 different children (Rule #4). [An Ancestor couple with only one child is trivial – all TGs pass down through the one child to us, and we have no Match-cousins from this Ancestor].

Amount of DNA vs Number of Matches

It appears to me that roughly the same amount of DNA (TGs) should come from our Ancestors in a large family as from a small family (review Rule #5). Each Ancestor has the same chance of passing DNA down to us as another of our Ancestors (and all of that DNA must come to us through just one of their children – usually). The amount of DNA (and number of eventual TGs) would not depend on the number of children that Ancestor had. But the number of Match-cousins we get would be influenced by the number of other children, and subsequent descendants, they had (and other Rule #5 factors). You might get the sense that I’m trying to emphasize this point – maybe because it’s hard for me to realize that I get the same amount of DNA (in TGs) from an Ancestor couple with few children as I do from one of my Ancestor couples with very large families.

Over the past 45 years I’ve determined over 20,000 descendants of my BARTLETT Patriarch – they were farmers with large families. I do find many Matches with MRCAs, but they all seem to be on repeating TGs (or Clusters). With this paragraph, I now understand why. Each of our 64 4xG grandparents (on both sides) will pass down to us an average of 1/64th of our DNA. Experience and random DNA shows this is not exact, but it is a good order of magnitude – each 4xG grandparent will have a little more or a little less than 1/64. However, the sum of all 64 4xG grandparent contributions to our DNA will total exactly 100%. This leads to Rule #6:

6. The sum of the DNA contributions of all Ancestors at each generation will be 100%. You can take this one to the bank. At each generation, all of our Chromosomes will be filled with DNA from the Ancestors in that generation. I wanted to say our Chromosomes will be filled with TGs from the Ancestors in one generation, but that would not be technically correct. Some of that DNA may be small segments that would not form a TG. But in a closer generation those small areas are recombined into larger TGs. Clearly, from a parent’s generation we get full chromosomes – no small segments…

TGs “accumulate” in closer Ancestors

Let’s assume my 4xG grandparent couple passed down 12 different TGs – to me and our some of my 5Cs. Because the 5Cs have those TGs, we know the TGs existed in a 4XG grandparent. All 12 of these TGs were passed down through the child that was my Ancestor, as discussed above. That child married a child of another 4xG grandparent couple who also passed down about 12 different TGs to their child who was my Ancestor. This means that this 3xG grandparent couple must have 24 different TGs to pass down through one of their children (my Ancestor). Remember this 3xG grandparent couple will pass down a lot of DNA to their child (46 Chromosomes) – that DNA will include at least those 24 TG-segments. At each generation, the die is cast, so to speak! These 24 different TGs must be passed down through their one child who is my Ancestor (in my line of descent), and on down me. After all, I have each of these 24 TGs. These 24 TGs are part of my “inventory” of 372 TGs that my parents passed to me in my Chromosomes. This reinforces the point that each TG is a segment of DNA that is passed from an Ancestor down to me; and each closer generation has to have all of the DNA (TGs) that their Ancestors passed down to me. This is also the reason a close cousin may well share a single DNA segment that spans more than one TG. As well-known examples: each parent passes down to us a full set of chromosomes that are full of TG segments; each grandparent passes down few, but generally large, segments which are also full of TGs. This leads to Rule #7:

7. Each Ancestor will have all of the TG-segments their parents had.

Recombination.

To backtrack a little, think about a DNA segment that a 4xG grandparent Ancestor passed down that was a recombination of segments from his/her two parents. Analysis depends on which generation we are looking at. A 6C (one generation back, on one of the two 5xG grandparents) would only see one segment or the other in an eventual TG. The same for a 5C, who would only be related through a segment on one chromosome (side) or the other (unless the 4xG grandparent created the same crossover in the DNA he/she passed to another child – a very low probability). A 4C (on the 3xG grandparent child of the 4xG grandparent couple), would see the recombined segment as a “regular” segment (on one chromosome), which could be passed down to two children and wind up as a TG. This is another reason why some TGs look like they split going back. This leads to Rule #8.

8. A TG that subdivides going back, separates into a two smaller segments – one from each parent.

To summarize this musing: clear back to my 8xG grandparents (and probably more distant), DNA segments are formed which are passed down to a child who is my Ancestor (as well as to other children who are the Ancestors of my Matches). It’s hard to pinpoint the exact Ancestor who first formed the DNA segment represented by my unique TGs. But from that point on down to me, that unique TG must be in that Ancestor, in the child, and in every other descendant down to me. Now, this child marries another of my Ancestors in that generation who is carrying the DNA with other TGs. So, this couple (in that next generation down) has roughly twice the number of DNA segments (that that will form into TGs) to pass to the next generation. Repeat generation after generation. Finally, my two paternal grandparents will pass roughly 193 segments to my father. To be sure, they only pass 57 segments to my father, but those 57 relatively large segments, will include all 193 segments that wind up in my TGs.

We got the TG segment from an Ancestor; each Match got an overlapping segment.

NB: When I say TG-segments pass down from Ancestors to us and our Matches – let me be clear. The TG represents a real, phased, segment of our own DNA. The shared DNA segments that make up the TG are an overlapping part of our DNA that our Match also has. Each Match may have a different overlapping segment with us. The DNA the Match got from our Common Ancestor may be larger or smaller than the TG-segment we got.

Part 3 will look at conclusions (what can we learn from all of this), and propose a two new spreadsheets to track TGs (what we can do!).

[15I] Segment-ology: Distribution of TGs – Part 2 by Jim Bartlett 20211005

Distribution of TGs – Part 1

Posted on October 1, 2021 by Jim Bartlett

This post is about the distribution of our DNA segments (as represented by TGs) among our Ancestors. It’s gotten long and convoluted, so I am going to post it in pieces. This is Part 1.

I currently have over 110,000 DNA Matches – they are mostly spread over 3/4 of my Ancestry from Colonial America – mostly Virginia. [1/4 of my Ancestry is from my maternal grandmother whose parents were immigrants in the 1860s – and I get relatively few DNA Matches on these Ancestors]. So I am thinking about the distribution of say 90,000 Matches over three of my grandparents from Colonial Virginia, and the chore of finding Common Ancestors (CAs) linked to DNA segments – represented by Triangulated Groups (TGs). I’m musing about the Big Picture – the distribution of TGs in our Ancestry. From a macro view can we learn something? Can we predict something?

These Matches represent a lot of different DNA segments passing down to me (and to my Matches) from my Ancestors. However, I do know something about these “different DNA segments” – I have 372 TGs – each one representing a segment of my DNA from one parent or the other. These 372 TGs are the equivalent to phased data, that “cover” all of my DNA – they are arranged, adjacent to each other, from one end of each chromosome to the other. That’s an average of 186 TG segments on one side.

When I look back at my blog post that used simple math to estimate the number of DNA segments we typically get at each generation – on one side – it shows:

Ancestry used to provide Circles back to the 8C level… They currently limit ThruLines to the 6C level, but at one point clearly acknowledged that the shared DNA segments could come from at least the 8C level, and the Circles they presented showed that plenty of folks had Trees with CAs at that level. I have a lot of evidence that indicates atDNA “works” back to at least the 8C level.

At this point in my musing, I’d like to reflect on several “rules”, or guidelines, or pretty valid assumptions about autosomal DNA.

1. Although atDNA is random, the larger the sample size, the closer to the calculated averages we tend to come. This means that we may find some instances of outliers, but with enough data, it averages out. We see this in several instances – a sibling may share a large part of one chromosome with us, but, on average, the total share will be closer to 50% – some chromosomes may be passed to us from a grandparent intact (no grandparent crossover points), but, on average, the total grandparent crossovers will be closer to the reported averages.

2. Science indicates about 34 crossovers occur per generation on each side. In other words, the biology says our DNA is not pureed into mush and then passed down to the next generation in many little pieces. This means at each generation there are relatively few new subdivisions of the parent’s DNA that is passed to a child (average 34), and over several generations many previous segments are not subdivided by crossovers. [NB: in the closest generation – the grandparents DNA passed to us by our parents – the crossovers may be closer to 27 from the father and 41 from the mother, but this difference damps out with as we go back in generations. For the big picture, I’m using an average of 34.

3. Shared DNA follows a 1/4 “rule” with each generation (rather than a 1/2 “rule”). We see this in the average of 880cM with a 1C, and 220cM with a 2C, and 55cM with a 3C, etc. This means the shared DNA drops off quicky with more distant generations. It’s also the reason why we only share DNA with about 1/2 of our 4C and maybe 1/10 of our 5C. However, even given this steep drop off, we have so many 6C to 9C, that we still get shared DNA with a some of them. Our DNA Match lists are filled with folks who are probably distant cousins…

4. We will almost never see Matches who descend from more than two different children of a distant Ancestor, who share the same (overlapping) DNA segment with us. This one is hard to explain, but it starts with the very low probability that a 7C shares a segment of DNA with us that we each got from the same 6xG grandparent – each of us descending from a different child in that family. Now add to that another Match who shares the same DNA segment with both of us and they descend from a third child in that family. You and a sibling will share about 1/4 of a parent’s DNA. If you add another sibling into the mix, the three of you will only share about 1/8 of a parent’s DNA. Considering the 50/50 chance that a segment will get passed along in the next generation, it gets to be very long odds that you and two Matches will share DNA from a 6xG grandparent’s three children. The science says it’s virtually impossible to add a descendant of a fourth child into a Triangulated Group. I discussed such a scenario in Chapter 1 of “Advanced Genetic Genealogy” and concluded that some Matches with the same TG who descended from 3 different children of the CA probably had mistakes in their genealogy. NB: this “rule” does not preclude multiple Matches in a TG from a distant Ancestor – I have several valid examples of 10 to 20 (or more) Matches from a distant Ancestors. Some of them can be closer cousins to each other, and actually descend from the same child of the distant Ancestor. Also, some of them share different TGs. Bottom line, we won’t see Matches with the same TG descending from more than two children of a CA. I’ll use this “rule” later…

5. The DNA doesn’t play favorites. The DNA process of recombination and crossovers does not have any knowledge of a person’s status, religion, wealth, family size, health, surname, endogamy, etc. etc. The process of recombination is carried out the cell level, without regard to any external factors. So, from the DNA’s perspective, our Ancestors are equivalent. We should expect the same number of segments (TGs) from large or small families; from recent immigrants or Colonial Ancestors; from endogamous family branches or branches without endogamy; etc. However, we can expect a wide range in the number of Matches based on these kinds of factors. But they will not affect the distribution of DNA segments among our Ancestors. The TGs will be distributed randomly – roughly in line with the table above.

So back to my musings….

I believe my TGs come, on average, from my Ancestors around the 5C level – 4xG grandparent couples. Clearly there will be some type of bell-shaped distribution curve, and some will be a little closer and some more distant. At that level, all of my 16 paternal 4xG grandparent couples would pass down about 193 TG segments, averaging about 12 per couple (Note the 193 in the chart above). Probably an average of 6 on the paternal side and 6 on the maternal side. These TGs have to come from somewhere – meaning 6 TGs, average, from each of the 5xG grandparent couples – either as intact segments and/or through recombination. Each of our results may vary some; but we should see these orders of magnitude (which are not large). The total DNA contribution of DNA segments from all the 5xG grandparents, on one side, will add up to a full set of autosomal chromosomes, on one side.

This is the end of Part 1. In Part 2 I’ll muse more about where TGs are formed and their journey from the Ancestors down to us and our Matches. Part 3 will look at conclusions (what can we learn from all of this), and propose a new spreadsheet to track TGs through other children of our Ancestors (what we can do!).

[15H] Segment-ology: Distribution of TGs – Part 1 by Jim Bartlett 20211001

Bad Segments – Good Segments

Posted on September 30, 2021 by Jim Bartlett

A Segment-ology TIDBIT

There are two ways of looking at small segments. But first please remember that ALL of your own DNA is true, even the very smallest part of your DNA came from a parent as a true segment. What we are talking about when we discuss “small segments” are small shared DNA segments with a Match – segments which are determined by a computer algorithm comparing your (true) DNA with a Match’s (true) DNA. Below about 15cM some of those comparisons report a false shared DNA segment. The smaller the segment, the more likely that it is false. The distribution curve starts at about 0% false reporting at about 15cM and drops down to about 50% false reporting at 6-7cM and drops down fairly dramatically below that.

In this post “segment” means a computer generated shared DNA segment.

1. Bad Segments: Small segments have a high probability of being false, and there is no easy way to tell if it’s a valid shared segment or not. And, perhaps, even if it’s a true segment, it’s probably from a very distant Ancestor – probably beyond your genealogy. These small segments are called names and referred to as POISON – DO NOT USE! However, in this derogatory sense we are talking about NOT using these small segments as evidence; NOT the basis of a hypothesis; NOT part of a “proof”. However, these segments are may not be worthless…

2. Good Segments. Shared segments are used by each company to identify DNA Matches, and report them to us. As noted above the small segments may be true or false. But what if they lead us to a person who is really related to us = a cousin? If the “Match” has a Tree we can check it out. We can look at the information presented. Finding a Common Ancestor is only part of the possibilities. Maybe this Match-cousin has more information about our Common Ancestor than we do. Maybe they’ve found records we don’t have, written an interesting story, uploaded pictures we didn’t have. Maybe we can establish a dialog (message, email, phone, in person…) I have made lasting friendships with some of my Matches – some of whom we still don’t know how we are related. The possibilities and opportunities are endless.

At AncestryDNA, ThruLines finds cousins with a Common Ancestor, down to 8cM (they used to go down to 6cM). I checked every one of them, and often found new information. With each DNA Match, keep your genealogy cap on. A small segment may in fact be false, but that doesn’t mean there isn’t a true relationship. Remember, about half of your true 4th cousins won’t share any DNA with you. My advice: don’t ignore a true cousin just because you share a small segment. Genetic Genealogists, myself included, have long stated that a Match with a Common Ancestor and a shared DNA segment does not necessarily mean the shared DNA segment came from the Common Ancestor. By the same logic, a relative with a Common Ancestor to you, may or may not have a true shared DNA segment from that Common Ancestor.

If you are trying to prove a bio-ancestor, or a brick wall Ancestor, or some other relationship using DNA, don’t use small segments. If they cannot be proved to be true segments, they must be ignored as part of a proof. But, on the other hand, don’t ignore a paper-trail relationship just because you share a small segment. Learn what you can from a genealogy perspective and ignore the DNA.

Just my perspective as a long-time genealogist…

[22BD] Segment-ology: Bad Segments – Good Segments TIDBIT by Jim Bartlett 20210930

What Do You Get from a DNA Test?

Posted on September 29, 2021 by Jim Bartlett

A Segment-ology TIDBIT

You get a LOT with a DNA test at 23andMe, FamilyTreeDNA, AncestryDNA and/or MyHeritage. Each of these companies is different, and has its own tools and programs, but some of the features are fairly common among all of them. It’s actually quite amazing how much you get, no matter where you test.

1. Genetic testing of over 500,000 Markers (SNPs – pronounced: snips). Your DNA has two of each marker so actually you are getting over a million data points.

2. This is your data – you may download and save/use this file (about 10MB), but by itself it’s not much use to you (exceptions noted later).

3. A password protected/secured web site for your profile and information about your test results. This is your hub, or base, for your interactions with your results. Each company is different and has different tools and configurations for your personal page. See #16 below.

4. Ethnicity (aka admixture, deep ancestry, origins, geographic makeup, etc.) report. This is interesting in it’s own right (and many take a DNA test solely for this report). It’s important to remember that these are estimates! But there are many other features of your DNA test that are much more valuable – read on.

5. A list of people who share DNA with you – called Matches. These Matches are usually related to you – both of you sharing some DNA that you got from the same Common Ancestor.

6. A profile for each Match – some info about themselves that they may post.

7. Access to any genealogy Tree that your Matches may post

8. The ability for you to post/upload your own Tree and build on it – highly recommended.

9. Tools – these vary by company and include DNA tools and genealogy tools. See #16 below for some of them.

10. A list of Shared Matches (aka InCommonWith or Relatives in Common) that both you and each Match share – these lists are very important, particularly in grouping Matches.

11. Notes – each company offers a way to enter and save notes on each Match Profile – you type these in, and they are available to you when you return to that Match.

12. Communicate – each company has a way for you to contact your Matches. Yes, probably only 10% of the Matches will reply, but it’s often worth a try.

13. Updates – each company adds new Matches frequently as new people test; they also update/improve their ethnicity program every few years; they add new tools; etc.

14. Bargain prices – all of this for an up-front, one-time, $50-100 price. (Try to get any genetic test for under $100). Yes, each company will try to get you to buy more/other tests and reports, and some offer subscriptions to use their tools. You decide – in any case, you get an ethnicity report and list of DNA Matches which are updated… forever!

15. NB: If you want, you can upload your DNA data file to other companies – www.gedmatch.com lets you compare tests between companies; other companies offer analysis and services for health or wellness, based on your DNA, etc. Research carefully before you upload your DNA data file anywhere.

16. The International Society of Genetic Genealogy (ISOGG) has a page that compares the features of the major DNA testing sites: https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart This table covers a lot, including links to various blog posts about each company.

Readers are invited to post comments on other insights that all/most of the major companies offer. Please do NOT tout your opinions about each company and/or their extra features – that is not the objective of this post. The target audience for this post is someone deciding to take a DNA test, or not. Feel free to pass this blog-post to others.

[22BC] Segment-ology: What Do You Get from a DNA Test TIDBIT by Jim Bartlett 20210929

Do You Have a Suspicious Branch in Your Tree?

Posted on September 28, 2021 by Jim Bartlett

A Segment-ology TIDBIT

Although this Segment-ology blog is focused mainly on understanding and using DNA segments, I’ve also tried to look at the genealogy part of the genetic genealogy equation. We need both genealogy and DNA tools. AncestryDNA has some good genealogy tools that help us with our DNA Matches.

One of the powerful tools is ThruLines. Ancestry uses this tool to analyze your Tree, each Match’s Tree, and every other Tree in its inventory to try to build links to a Common Ancestor (CA) for you and each Match. This includes finding CAs with private, but searchable Trees, and with small Trees that may have only a Match’s parent or grandparent. In all cases Ancestry will try to fill in any gaps between the CA and you and/or your Match. The result is a diagram showing how you and the Match are related through a CA, along with reference material to indicate how they determined any generations they used to fill in a gap. This is a powerful tool, which can be used in a variety of ways.

I’ve written about:

How ThruLines Works, here.
Helping ThruLines help you, here.
ThruLines Xray vision (into private Trees), here.
Adding ThruLines info into Match Notes, here.
Using ThruLines and Shared Matches to form Clusters, here.
Using ThruLines to Extend the MRCA of a group, here

All of these posts use Matches who share DNA segments with you, and ThruLines adds the added dimension of genealogy – using the power of Ancestry’s huge database of Trees. ThruLines usually uses multiple Trees which are in agreement. This is a good, easy, place to start – a good hint – but like all such “hints” you should validate the result. Yes, some of the Trees are flawed, but most are not. Based on my 45 years of genealogy research, I’ve found ThruLines to be correct about 95% of the time.

This post is about another way to use the power of ThruLines – checking on a suspicious branch of your Tree. Suppose you have a Tree and have been documenting Matches who have Common Ancestors with you. And you notice that one branch of your Tree isn’t getting as many Matches as you expected. It may be because the branch is one that recently immigrated to the US; or because the Ancestors in the branch had relatively few children. Both reasons would tend to reduce the number of Matches from that branch. But if you’ve ruled those reasons out, what’s left? Well, the elephants in the room are a non-biological parental relationship in your Tree (an NPE or MPE) or faulty genealogy research.

One way to check a suspicious branch is to use ThruLines, as follows:

1. Determine the Ancestor who is the base of the suspicious branch – use your judgment.

2. Click on the child who is your Ancestor

3. Open that (child) Ancestor’s profile page

4. Open the Edit tab (top right)

5. Select Edit relationships

6. Click on the X next to the suspicious parent(s) – one or both

This will remove the suspicious branch from your Tree. It will also preserve all the work you’ve done on that branch, and at any time you can easily go back to the child (#2 above) and add the parents back in using the Select someone in your tree option (just type in the names you had before and select them).

After you’ve removed the suspicious branch, just wait a few days. ThruLines will try to find Matches who are cousins from this line and will identify Potential Ancestors the fill in any gaps. This works out to the 6th cousin (6C) level – your 5xGreat grandparents. If ThruLines identifies Potential Ancestors who were the Ancestors you originally had – well nothing lost (but be sure to use the Select someone in your tree option to get back the branch as you originally had it). If ThruLines identifies alternative Ancestors – well then, you’ve got some work to do to understand more about those Ancestors and decide which Ancestors to use. Remember the ThruLines version is just a “hint” – it’s still up to you…

What I would do is accept the ThruLines Potential Ancestors (later, they can always be deleted or removed from the Tree with Steps 1-6 above) and see if I got a more ThruLines Matches than I had before. If so, these ThruLines Matches would have Trees that should be reviewed for additional evidence. My go-to evidence is the census records if they are available for these new Ancestors – are the times and locations appropriate? At this point, this is mainly a genealogy exercise, although a review of relationships and Shared cMs should also be done.

This is yet another way to use the power of ThruLines. It’s not guaranteed to work, but it does give you a quick and easy look into the huge Ancestry inventory of Trees for potential alternatives.

[22BB] Segment-ology: Do You Have a Suspicious Branch in Your Tree? TIDBIT by Jim Bartlett 20210928

Two Tricks

Posted on July 9, 2021 by Jim Bartlett

A Segment-ology TIDBIT

Bottom Line Up Front (BLUF):

1. Email GEDmatch Matches with Ancestry kits to identify their AncestryDNA Profile.

2. Use Ancestry tools to extend the Ancestry of DNA Matches at other companies.

As mentioned before, genetic genealogy is a combination of genealogy and DNA. Another way to look at it is: genealogy with a DNA tool. The foundation is our genealogy – we want to build it out and we want to get it right. That’s were the DNA comes in. Each segment of our DNA comes down to us from a specific line of our Ancestors to our mother or our father, and then to us. People who share the same DNA segment (called segment Triangulation), are related to us somewhere on the Ancestral line we got it from.

In the practice of genetic genealogy, we look for Common Ancestors (CAs) with our DNA Matches; and we look for DNA Matches who share the same DNA segment. Finding the same DNA segment can be done through DNA Painting, and/or by segment Triangulation (forming Triangulated Groups (TGs)), and/or, roughly, by Clustering. Because it is a precise process, segment Triangulation is generally considered to be the gold standard.

Our search is for DNA Matches with Common Ancestors AND specific segments.

The problem is that AncestryDNA does not reveal the needed DNA segment information. They have the largest database of DNA Matches, and, by far, the best Trees. Yes, many Matches have Private, or no, or skimpy Trees, but still: AncestryDNA has a much higher percentage of Matches with decent Trees than any other company. On the other hand, the other companies (23andMe, FamilyTreeDNA, MyHeritage and GEDmatch all provide the detailed shared DNA segment information, and tools to determine if these shared segments Triangulate. But the genealogy side of these companies pales in comparison to AncestryDNA – in numbers, quality and useful tools. For example I have identified over 4,400 CA-Matches at Ancestry, and only 575 CA-Matches at the other 3 companies (a few are Matches with multiple CAs). However, at each of the other 3 companies, I’ve grouped all my DNA Matches (with IBD (true) DNA segments) into my 372 Triangulated Groups. In other words, I have Common Ancestors for all 84 of my known 128 5xG grandparents (roughly 3/4 of my Tree at that generation); and 372 TGs that cover all of my DNA – if I could only link them together…

The effort now is to: 1) find segment data for Matches at AncestryDNA; and 2) find CAs for Matches in TGs at the other companies. The following two “tricks” have helped me a lot.

Trick 1. Email Matches at GEDmatch who have a DNA test from Ancestry. My standard email:

Hello – we share a DNA segment at GEDmatch (I am kit M200…), and I’d like to determine our Common Ancestor. I will do the research and report back to you on what I find. All I need from you is a link to your Profile at AncestryDNA – this could be your AncestryDNA user name and/or the URL of your Tree. My Ancestry user name is jimbartlett1; and my Tree URL is https://www.ancestry.com/family-tree/tree/20620230/

Hope to hear from you… Jim Bartlett email: jim4bartletts…

Please be sure to follow up and report back to each Match who cooperates.

Trick 2. Use Ancestry tools to extend the Ancestry of DNA Matches at other companies (where we already have segment data). Some of our Matches at these companies provide some information about their Ancestry – even a little bit may be enough. However, I cannot sugar coat this – it’s work! But Matches who share larger segments should be closer cousins – the Common Ancestors should not be very far back. For Matches in key TGs, I actually work on a quick and dirty list of their Ancestry – an Ahnentafel list – 2 parents, 4 grandparents, 8, 16… usually by this time I’ve picked up a probable thread, if not the actual CA. Sometimes the Match has true dead-ends and a CA cannot be found – but often a CA can be determined. More on this process in a recent blogpost here.

Neither of these two Tricks will guarantee success, but they may be helpful if you are Painting or Clustering or Triangulating or just researching your genealogy. I use them regularly. NB: To link a CA to a TG segment (mapping), we need corroborating evidence – certainly genealogy Triangulation with close cousins, and Walking the Ancestor Back for more distant ones.

[22BA] Two Tricks TIDBIT by Jim Bartlett 20210709

Search on a Surname – Using It Part II

Posted on June 26, 2021 by Jim Bartlett

A Segment-ology TIDBIT

In Search on a Surname, I proposed a little experiment for my Matches to see if they, too, had an Ancestor I thought they had.

Here is another way to use the same search-on-a-surname/location process. A Match and I found a Common Ancestor, but on looking at his Tree, I thought one of the links in his ancestry was incorrect. So I proposed that he search his Matches for the link’s spouse’s surname/location in one case and search for my version of his link’s spouse’s surname/location in the other case. He reported “many, many Matches” in one case and none in the other. Case closed.

This often works because the Matches to an MRCA couple, should usually also match on the spouse. And Matches with a specific line of descent from an MRCA couple, should also descend from the spouses along that line of descent. From all of their Matches at Ancestry, some of them should be from those spouses who are also their Ancestors. If they list an Ancestor who is not a bio-Ancestor, they should have few if any Matches on that Ancestor. Actually, the same goes for your own Ancestry…

This is a process that may, or may not, always work. There is no guarantee. It’s something to try that might add evidence one way or the other.

[22AZ] Search on a Surname – Using It Part II TIDBIT by Jim Bartlett 20210625