Percent DNA vs Percent Matches

In a perfect world, our DNA would be 1/2 from each parent,1/4 from each grandparent, 1/8 from each Great grandparent, etc. Also in a perfect world, 1/2 of our DNA Matches would come through a parent, 1/4 through a grandparent, 1/8 through a Great grandparent, etc. But, as we all know, the world is not perfect.

The only constant in the above is that 1/2 of our atDNA (44 Chromosomes) comes from each parent.

Percent DNA

So, let’s look at the DNA side of this issue first, addressing only the 44 atDNA Chromosomes – 50% from each parent. Because of the random DNA recombination from our grandparents, the grandparents don’t have to divide in equal proportions – and in fact they almost never will, exactly. Often, we will get one whole Chromosome from a grandparent (and none of that Chromosome from the other grandparent.)

However, over all of our 22 Chromosomes from each side, the amount per grandparent tends to roughly even out. But – a big BUT – the total of two paternal or two paternal grandparents must add up to 50% of our DNA that each parent gave us. For example, our four grandparents (reading across a Tree from paternal to maternal Ancestors) may have percentages like: 24%, 26%, 27%, 23%. Note that the first two percentages (on the paternal side) total 50% and the second two percentages total 50% (on the maternal side). This process will hold exactly for each succeeding generation: The two percentages for a male/female couple will have to total to the percentage for their child. So the percentages from the Great grandparent level may look like: 13%, 11%, 12%, 14%, 13%, 14%, 10%, 13%. The total is 100%; the total for each couple equals the amount for their child. This continues at each generation, and eventually some ancestors drop out; but the totals for each couple must add up to the child’s % (even if one of Ancestors in a couple is 0%).

If you do segment Triangulation and/or chromosome mapping, you can actually calculate these percentages – in cM or SNPs or Mbp. The sum of the parts must equal the whole at each generation.

Percent Matches

Now we’ll look at the percent (or number) of Matches for each Ancestor in a generation. All other things being equal, we would expect the percentage of our Matches to mirror the percentage of DNA. But many things are not equal…  First on my list is size of the ancestral branch. Some of my Ancestors had 10-15 children; some had 1-2 children. Clearly, we can say the more living descendants from an Ancestor, the more Matches we’ll see from that Ancestor. Large families result in more Matches. A close second on my list is: who tests? For a variety of reasons (interest, geography, income, etc.), some lines tend to have more descendants testing than others.

It’s interesting to note (at least for me) that as we move back with each generation, each Ancestor will have more descendants, but the size of a Shared DNA Segment will decrease. On the one hand, we have more cousins, on the other hand, they tend to share less DNA. In fact, at the 4th cousin level, about half of them will not show up as a DNA Match; and the percentage of cousins who do match, falls off quickly with each additional generation. This is partially offset by the increasing number of cousins. Roughly 10% of my Matches at Ancestry are 4C, the rest are 5C-8C or more (I cannot believe the valid cousins just stop at the 5C or 6C level – they are all on some kind of distribution curve). If our genealogy is deep enough, and if we are willing to build out our Matches’ Trees, I think we can find and document many more of our cousins among our DNA Matches.

Do genealogy records and documentation play a role? I’ll go out on a limb and say no! Our DNA doesn’t depend on any records – each Ancestor (and their descendants) is just as valid with or without records. At each generation, every box on our Ancestral Tree is essentially equivalent. We needed every one of them for us to be here. The records make it easier to determine how we are related to our Matches, but the amount of records wouldn’t change the number of Matches.

Summary

DNA is roughly evenly divided among our closer Ancestors at each generation, but the DNA may drift into “unevenness” in later generations. Our results will vary. Eventually some Ancestors fall out as contributors to our DNA.

We should get DNA Matches from almost all of our Ancestors (in a genealogical timeframe). The number of Matches will depend a lot on the number of descendants of each Ancestor.

An Outcome

An outcome when looking beyond brick walls and/or for bio-Ancestors:

We should find a number of Matches that tie back to every box (Ancestor) in our Tree, generally commensurate with the number of descendants from that box. If we are testing a surname for brickwall/bio-Ancestor, we should get Match-Trees that build a family, with the largest concentration of Matches being close to Ancestor we are looking for. We should get roughly the same experience with this “prospective” Ancestor, that we would get with a proven Ancestor, at the same generational level and number of descendants. In other words: the Goldilocks result. Clearly this is not a hard rule, or formula – it’s an overall feeling of what is reasonable. It’s also an expectation that we have Matches for our Brickwall and bio-Ancestors (and their Ancestors), and that those Matches should combine to build a Tree for us.

[22AW] Segment-ology: Percent DNA vs Percent Matches – A TIDBIT by Jim Bartlett 20210504

Bio-Ancestors – Testing a Line

I am working on the bio-mother of a Target’s grandfather. The grandfather was an orphan or foster child – we have a few family stories but cannot find any of the names in any records (with over 20 years of searching).  In his 1937 Social Security Application the grandfather lists his mother as a RIDENOUR, but she has not been found.

So, a search of the Target’s Matches on the surname RIDENOUR results in over 40 hits which appear to come from one family in the right geography – Western Maryland. But the on-line RIDENOUR families are tangled up [I’ll do a different blogpost on them later], However, there was one Daniel RIDENOUR and Elizabeth BROWN family that had the highest concentration of Matches with the highest shared DNA cMs. But I didn’t want to slog through all of the potential BROWN Matches. So I saw that Elizabeth BROWN’s parents were William BROWN and Anna Elizabeth BUHRMAN. Now BUHRMAN was a unique surname, so I tried that.  For reference here is the a Quick & Dirty Tree:

And here is a Quick & Dirty Outline of the Matches I found:

Note several things.

All the Matches’ Trees fell under John BUHRMAN 1743-1826 & Catharina MARTIN (Outline #1) – this is a very positive sign that these surnames are Ancestral.

Of the 14 Matches, four were from the BROWN/RIDENOUR line (Outline #1C2), and the other ten were spread over different children and grandchildren of John BUHRMAN & Catharina MARTIN – this is also strong information. Ten Matches descend from BUHRMAN/MARTIN, independently, without going through the RINENOUR line. This indicates BUHRMAN and MARTIN are Ancestral surnames of the Target – in their own right.

Matches back to John BURMAN & Catharina MARTIN are at the 5th cousin level. It was nice to find them in such a simple search. And more will be found using other tricks [another blogpost].

The TAKEAWAY: Search on a likely Surname, build a Q&D Tree from the Matches’ Trees. Then test the validity of this Surname as an Ancestor, by also searching on surnames of the spouses. There should be additional, independent Matches on these other surname lines.

Of course there is more work to do… There are other close family to the Target who also have this same Ancestry – a very quick search from their AncestryDNA accounts should show some same and some new Matches under Outline #1. Also, this was a Q&D effort, now I’ll go back and add in census and marriage and Find-A-Grave and etc., etc. I’ve also done this with other surnames and came up with little to nothing – indicating it’s time to try a different surname. Ruling out possibilities is also on the path forward.

[30A] Segment-ology: Bio-Ancestors – Testing a Line by Jim Bartlett 20210420

Breaking Down Brick Walls

There are several methods to break down brick walls, including the bio-parents of an Ancestor. I think of two groups of methods:

1. Blind Luck

2. Use a process

I don’t mean to be flippant when I say “Blind Luck”, but luck plays a large part in some cases. Your list of Matches may include one who shares 2600cM of DNA – that would be a full sibling. Or even a 1750cM share – which would indicate a grandparent, grandchild, half sibling, Aunt/Uncle or Niece/Nephew. Wow. That could get you into your bio family pretty quickly. So always look at your closest Matches – maybe you’ll be lucky and find some really close relatives to work with.

But most of the time we are not that lucky. And sometimes the brick wall is somewhat further back.

I wrote a blogpost: Let the Matches Tell Us the Cluster Common Ancestor. You can review it here.

In this blogpost I want to review a more generic process. Depending on how many of the DNA testing companies we use, we can get from tens of thousands to over a hundred thousand DNA Matches. That’s a lot of data to sift through, so let’s see if we can narrow it down. Here is a handy chart I often use:

The first three columns are almost trivial – they are easy to produce and can be extended upward, if need be. Say you have 10,000 DNA Matches to work with – the 4th column shows you, roughly, what percent of those Matches will come to you through each one of your Ancestors. You already know that each parent gave you half of your DNA – except for special situations, we would expect roughly half of all your DNA Matches to come from each side. And by extrapolation about 1/4 to come from each grandparent, etc.  That’s still a lot of data to work with.

So the next part of this generic process is to look at the cousins you’d expect from each generation – such as 2nd cousins from a Great Grandparent. And from the 5th column we see that 2nd cousin share an average of 229cM – or a range of 41 to 592cM. This means if we set a threshold of 200 or 300cM, and used only Matches over that threshold, they’d mostly be 2nd cousins. At AncestryDNA I only have 4 Matches over 200cM. That’s a start – to look carefully at those 4. At a threshold of 100cM, I have only 28 Matches at AncestryDNA.

My point here is that we need to start with the Brick Wall generation, and go one generation back – to the parents of the brick wall. This is based on the premise that if a lot of folks had identified my Brick Wall Ancestor (BWA), I’d probably have found him or her by now. So there is something about my BWA that has blocked me for many years. I need to find the parents and work down. Very similar to finding a bio-parent.

So I look at my chart, above, and select one generation above my BWA, and note a range of cMs that should include appropriate cousins.

The next step is to group those DNA Matches – by segment Triangulation at most companies and by clustering at AncestryDNA. Depending on the threshold you choose this grouping can be done by hand or through programs at Genetic Affairs, DNAGedcom Client or Shared Clusters (see my review of these programs, here).

It really helps to have a known close cousin in a Cluster or TG – to act as a pointer toward your BWA.  Otherwise, we need to analyze each one of the groups. As a Segmentologist, I’m all for analyzing each of my TGs, but I also pay particular attention to those that point in the direction of my BWAs.

The idea is to find enough Trees in a group to find a Common Ancestor (CA) for that group. This CA is then, very likely, to be an Ancestor of the BWA. The final step is to find the descendants of the CA, keeping mindful of probable birth year and place of the BWA.

This kind of process is often not easy or straightforward. But with enough Matches (test at all four companies and upload to GEDmatch), there should be enough data to significantly narrow down the search.

[19K] Segment-ology: Breaking Down Brick Walls by Jim Bartlett 20210203

Triangulating Your Genome

A How-To Example with MyHeritage

This blogpost will walk you through the steps to Triangulate your shared DNA segments at MyHeritage – creating a Triangulated Genome. You can use this same process at 23andMe and roughly the same process at GEDmatch and FTDNA. Disclaimer: This Process takes time (I estimate about 120 hours); but once you are done, you have a powerful tool.

The objective of Triangulating Your Genome is to determine 300-400 Triangulated Groups (TGs), adjacent to each other, covering your 44 to 46 Chromosomes from beginning to end. These TGs group almost all your shared DNA segments, cull out most of the smaller false “shared” segments, and identify recombination/crossover points where your DNA shifts from one Ancestor to another. Each TG will represent one segment of your DNA (equivalent to a phased segment), which came from a specific Ancestor, down a line of descent to one of your parents and then to you. This is segment-ology in a nutshell.

The Process

I will describe this process briefly and then in detail, using MyHeritage as an example.

Very brief version – group segments per the Triangulated Segment icon at MyHeritage (see details below). The Main Steps are basically the same for 23andMe, GEDmatch or FTDNA.

Overview – The Main Steps

1. Download the Segments from MyHeritage

2. Set up the Spreadsheet (headers, set a threshold to cull out small segments, add columns, etc.)

3. Group the Segments (the main, long, process – estimated time: 120 hours)

      a. Note any known Shared Match relatives as you go – they help later.

4. Summarize the groups (add a header row and ID# for each group, etc.)

      a. Assign partial TG-ID: Chr# plus A-Z (depending on Start location)

      b. NB: At this point you should have 300-400 groups. Each group represents one segment of your DNA from a specific Ancestor. This is very powerful information and is useful in its own right. Assignment, in the next step, makes each group even more powerful.

5. Assign groups into TGs – genealogy, logic and/or judgment required

      a. Assign each group to a maternal or paternal side.

      b. Use Genealogy; logic; “spanning” segments; TG “linkage“; ethnicity, etc.

      c. NB: Not all TGs may be determined.

  6. Enjoy

In the following sections I will describe each of these 5 steps in more detail.

Section 1 – Download the Segments

Log into MyHeritage and click on the DNA tab.

A menu comes down – click on DNA Matches to get:

Select the kit you want to use – for this example I will use my father’s kit, which I had not Triangulated at MyHeritage before.

Here is my father’s page – note 14,225 DNA Matches. Click on the 3-dot ellipsis on the right.

From the menu select: Export shared DNA segment info for all DNA Matches. Note the little Triangulation icon.

You should get a pop-up with a note that the spreadsheet will be mailed to you in a few minutes:

Section 2 – Set Up the Spreadsheet

Here is, generally, what your spreadsheet will look like:

Notes:

1. Save this CSV file as a spreadsheet. I normally include a date in the title (e.g. JVB MH Segs 20201225) I tend to save various versions during this whole process – as backups.

2. The headers are: DNA Match ID [you’d only need this in case of 2 Matches with the same name]; Name [base person with all the Matches – usually you, but in this case my father, James Vincent Bartlett]; Match name [name of each Match – I reduced the width of this column to hide the names for privacy]; Chromosome; Start Location; End Location; Start RSID; End RSID [these RSIDs identify a SNP – useless info to me]; CentiMorgans; and SNPs.

3. Make a backup copy of this spreadsheet – just in case…

4. Delete the Start RSID and End RSID columns.

5. Move the DNA Match ID column to the far right side [out of your way].

6. Insert two blank columns on the left side; and 10 columns to the right of the SNP column.

7. Divide Start and End Locations by 1,000,000 – changing them from bp to Mbp. This will not change the accuracy of anything (all the digits will remain), but it will make it much easier to analyze and deal with these numbers. [My process for this: create two blank columns next to the Start and End columns – let’s say Start is column F; End is column G; the new columns are H and I; and the first row under the header is 2. Then in cell H2, enter: +F2/1000000. Then copy this cell [drag it] to column I. Check to ensure the numbers in H and I are the same numbers in F and G divided by 1,000,000. Now copy cells H2 and I2 to the bottom of the spreadsheet. Next highlight all these numbers: H2 to the last number in row I. Copy that data (Cntr-C). Move the cursor to F2 and right click on it. From the menu, click on the Paste Special icon with 123 in it (that means copy the result, not the formula). All of the data in columns F and G should now be in Mbp, and columns H and I can be deleted.]

8. Sort the spreadsheet by CentiMorgans.

9. Select a cM threshold, and remove [or physically separate to the bottom of the spreadsheet – out of your way] all segments under your threshold. In my case I used 10cM. This removed about 2/3 of the segments, leaving a group with larger cMs that I would be working with. A lower threshold of 6cM or 8cM will greatly increase the time for this process – not recommended. A higher threshold of 12cM or 15cM or even 20cM will shorten the time, but will leave more gaps. Your choice! By keeping these smaller segments separated at the bottom of the spreadsheet, they are still available in case you want to use some of them later. If you delete them, that’s OK – you can always retrieve them from a backup copy.

10. Type in Header titles*: Hd (for headers); Co (for Company); ID#; Par (for parent initial); Sib (for sibling initial); Ch (for children initial); Czn (for close cousin initial); TG (for Triangulated Group); G2 (for generation 2: parents – the “side”); CA/Remarks (for Common Ancestor or any other remarks you want to save). See the example below. *Each of these will be explained more fully, later.

11. Add in “header” rows for each Chromosome – see example below. This is just a visual separator that helps separate the segments by chromosome.

12. Now sort the “over-10cM” part of the spreadsheet by Chr and Start. Remember to highlight the full rows whenever sorting to keep the data in each row with the correct Match.

Resulting spreadsheet:

Notes:

1. I’ve inserted MH (for MyHeritage) in the Co column. This is not important if you will only use data from one company. Eventually I will add data from GEDmatch, 23andMe, and FTDNA, and I want to be able to sort by company from time to time.

2. There are 2 columns between ID# and Par – these are “extra” and sometimes come in handy when grouping – specifically when one group overlaps another group.

3. The Chromosome header row provides a visual demarcation between chromosomes.

This Chr/Start sort puts all overlapping segments adjacent to each other in the spreadsheet. NB: some of these will be on your paternal side; some will be on your maternal side; and a very few (usually under 15cM) will not Triangulate with either side, indicating they are probably false. Under the next section you will form groups, and each group will be on one side or the other. We cannot know, just by grouping, which side they are on; but we do know all the segments in a group will be on only one side.

Section 3 – Group the Segments

This part is hard work. It’s not difficult – the grouping process is fairly straightforward. But I cannot sugar coat it – it is tedious. I used a stop watch to time each chromosome, and I got better (more Mbps grouped per hour) as I progressed through the Chromosomes. In total it took me 120 hours – a table of hours is at the end of this section.

You can start anywhere. I started at the top of Chr 01 and just worked down (I knew I was going to do the whole thing). In hind sight, you might want to start with a medium or small Chromosome, to develop your “rhythm” – to get your “sea-legs” as we say in the Navy. The two ends of a chromosome are often the hardest – so you may want to just start in the middle of one. Later in this blog post I’ll include some other observations. But for now, just start

Pick a Match at, or near, the top and put a one in the ID# column. This is the base Match for a while.

I picked “Don…”, the third Match down, because the segment started where the other’s did, but was a little longer. This segment was more likely to Triangulate with others further down the list, whereas the top Match, “vicki”, ending at 5.6Mbp, may not overlap enough with many of the others. It’s really a matter of efficiency. In the end, we are going to “touch” all of the above-threshold segments in the spreadsheet. As you’ll find, it’s easier to get set up with one base Match and “touch” as many as you can, before you have to regroup and start over with a new base Match. NB: Once a Match-segment is “touched”, and added to a group, it does not need to be selected again later. However, use any order that suits you.

So now “Don” is our base Match. Next we search/select Don at MyHeritage (we’ll use his full name, and check the cMs of the shared segment – just in case there there is more than one Match with the same name (I found about a dozen such cases).

This is where two monitors really helps. Put MyHeritage (from the internet) on one screen and your  segment spreadsheet on the other screen. From your spreadsheet, copy the Match name cell for “Don…” which has the full name, exactly how MyHeritage has it. Then paste this into the search box at MyHeritage.

Notes:

1. At My Heritage, my father’s name is circled top left – it’s his genome I am Triangulating

2. Don’s name info is copied into the search box – “Don” is the base Match this time

3. Next: See Downward Arrow and Click on the name; or click on the magnifying glass in the Search box to get a list of Matches per the search criteria. Often the name you want will be the only Match listed. However, sometimes multiple options will be listed and you have to select the correct one (usually the amount of shared DNA will help). Match names like Thomas Jones may return hundreds of Matches – in that case, I punt – and go back to the spreadsheet and select a different Match as the base.

4. Once you’ve selected the correct Match, click on the colored link: Review DNA Match. This will call up a long page focused on that Match. Scroll down past Theories, Smart Matches, Shared ancestral surnames, Shared ancestral places, etc. until you get to the section called: Shared DNA Matches. See below:

Notes:

1. This is the top of the list of 168 Shared DNA Matches between my father and Don. The top two Matches are a close relation to Don and a close relation (me) to my father. Please note the “triangulated segment(s)” icon to the right of these two Matches. In this case Don shares two segments with my father, so I need to click on the icon to insure each Shared DNA Match is sharing the DNA segment at the beginning of Chromosome 1 in the spreadsheet. Scroll down the Shared DNA Matches at MyHeritage, noting which ones share a triangulated segment on Chr 1, and note them with a digit in the ID# column:

Notes:

1. Don and 5 other Matches are shown here as sharing a Triangulated Segment with my father. Indeed, Don matches my father, and each of the Matches also match my father and Don. This is Triangulation in each case; and as a group, they form a Triangulated Group (TG).

2. At the bottom of the first page of Shared DNA Matches (each page has 10 such Matches), keep tapping on “Show more DNA Matches”, until you get to the bottom of the list. The Shared Matches are roughly listed in by total amount of shared cM, so by going to the end of the list you will pick up the shortest Shared Matches. I think this method is the most efficient in “touching” all the Match-segments.

I then picked “Virgi” as the next base Match; copied that name from the spreadsheet cell; pasted it into the search box at MyHeritage; selected the correct “Virgi” from the list of Matches that came up; clicked the “Virgi” Review DNA Match link; scrolled down the resulting pages; and reviewed the Shared DNA Matches between my father and Virgi who had a Triangulated Segment icon. I noted all of them with a “2” in the ID# cell.

Notes:

1. It’s easy to see that the group with ID# 1 and the group with ID#2 overlap the same area on Chr 1 – one of these groups is on my father’s paternal Chr and the other is on his maternal Chr. At this point we cannot tell which is which – more on that in Section 4.

2. I then tried Match “John” and drew a blank – “John” did not have any Shared DNA Matches who had a Triangulated Segment icon for Chr 1.

3. Next I tried “blai” as the base, and “blai” had some icons with Matches who already had a 2, and some new Matches. So we now have:

I now paused to re-sort this part of the spreadsheet – I sorted by ID# and Start. This moved “John” to the top (out of the way – as almost certainly a false segment); and clearly showed the ID#1 and ID#2 groups:

Notes:

1. The color coding in this figure is just to highlight the two groups. I do not do this in actual practice – I let the sorted ID#s speak for the groups.

2. Select the next, ungrouped, Match-segment, “Nanc”, and repeat the process.

3. For ID#, use the numbers 1 to 9 first, then the letters a to z – they will all sort in order.

Grouping Mantra: Select a base – group segments – repeat…

A quick summary of these steps:

1. Select the next base Match in your spreadsheet

2. Copy the Match name

3. Paste the Match name into the search box at MyHeritage – and execute the search

4. Select the Match from the ones listed and click on “Review DNA Match

5. Scroll down the resulting page to Shared DNA Matches

6. Look for Matches with the Triangulated segments icon

7. Verify it’s the spreadsheet segment of interest (same cMs as in spreadsheet)

8. Type in the ID# character (1 to z) in the spreadsheet

9. Continue down the spreadsheet per steps 6-7-8, until the last Shared DNA Match

10. Repeat from #1.

Notes:

1. Generally pick the next Match in the spreadsheet as the next base. However, if it is a common name, or if it’s a very small segment, or if that Match turns out to share several segments, feel free to select another Match, near the top of the list, as the next base.

            – this makes the process go faster

            – a skipped Match will often show up as Triangulated later in the process

2. At the end, you can always go back and pick up any “stragglers”. These “stragglers” will usually be smaller segments, which would usually have a very distant Common Ancestor – I’ve actually just abandoned some of these, and moved them to the bottom of the spreadsheet with the under-10cM segments.

3. Be careful with Matches with multiple shared segments – click on the TG icon to insure the TG segment agrees with the area you are working on. TIP: Note the total cM of the Match (with a TG icon) at MyHeritage – if that is the same cM as the Match in your spreadsheet – all is OK – you’re dealing with only one segment and it has to be the one you want. Most Match-segments will be like this.  

4. Some groups may be very small, and some may be very large. Many 10-15cM segments in a group may indicate a pile-up which is probably very distant – but at least you’ve grouped these segments to highlight that probability.

5. Don’t worry if the ends of some groups overlap the beginnings of other groups – the company algorithms cannot identify the crossover points precisely – these ends are fuzzy.

6. In my experience, MyHeritage Triangulation has never been incorrect. However, in some cases, segments that should Triangulate are not identified (a false negative), although those segments are almost always identified as Triangulating with other segments in the same group. I think imputation of SNPs is the culprit here. The main point is to put each Match-segment in a group (or cull it out as false).

7. In a few cases the picture got muddled. My best solution is to just skip over this area (these segments) and proceed further down the spreadsheet. Usually the bigger picture becomes clear, and you can return to the problem segments later.

8. Remember the objective of this exercise is to form 300-400 Triangulated Groups, spanning all of your Chromosomes. You do NOT have to adjudicate every shared segment to accomplish the overall goal. The 300-400 TGs are the prize – the important tools that will help you immensely.

Here is a summary of how long it took me to Triangulate each Chromosome:

Notes:

1. I rounded my total of 115.9 hours up to 120 as an estimate for most folks. Hopefully you will have the advantage of using this blogpost to be more efficient than I was.

2. Chromosome 1 was a bear! But I was also working out processes and methods…

3. As noted before, you might want to start with Chromosomes 13 to 22. It’s a good feeling when you reach the end of each Chromosome;>j

Other observations about forming groups:

1. The Chromosome tips (Start and End of each Chromosome) usually have short TGs, and they are sometimes hard to figure out. For a few of these Chromosomes, I actually shifted to the middle of the Chromosome in the spreadsheet, and worked both ways. It doesn’t matter much as every shared segment has to be in one overlapping group or the other (or be false).

2. When searching at MyHeritage for Matches who have the TG icon:

      a. If there are only a few Matches (say under 10) in your spreadsheet which could overlap, then just look down the Shared DNA Matches at MyHeritage for the Matches with the TG icon. These few Matches will be easy to find in the spreadsheet – type the ID#.

      b. If there are many Matches in the spreadsheet which could overlap, I usually highlight those spreadsheet rows and sort them on Name. This way I can start with the Shared DNA Match name at MyHeritage (with a TG icon) and easily and accurately look down the alphabetized names in the spreadsheet to find the Match name – then type the ID#. [This advice may be more clear after you’ve tried finding some MyHeritage Matches in your spreadsheet.]

      c. At MyHeritage, skip over the Matches which show a total cM less than 10cM (or your threshold) – you’ve moved those Matches out of your grouping spreadsheet.

      d. Some Matches with a TG icon at MyHeritage may show, say, 18cM, but they aren’t in your spreadsheet. Upon inspection in the browser, the 18cM actually is two segments and the one you’re working on is below the threshold. Aaarrrrg! False alarm. Skip over that Match and continue looking for TG icons.

      e. These are my tips – or discover your own “most-efficient” way to find TG Matches at MyHeritage and then find those Matches in your spreadsheet to type the ID#.

3. From time to time, re-sort the Chromosome you are working on. Within a Chromosome, you can sort on ID# and Start location (they sort from 1 to z). This keeps the groups together and quickly shows you the remaining segments to work on.

4. Long segments. Sometimes a long segment will stretch beyond the bulk of the other segments in a group. This usually indicates a closer cousin, with a shared segment that spans across two (or more) segments. This is OK. Later, a segment in an adjacent group may also show a TG with the long segment. I tend to keep the groups separate, because they will probably come from different Common Ancestors down to you, through one closer Ancestor. The long segment is an important “tell tale”, indicating both groups are on the same side, and closely related.

5. Most “tell tales” will come from known close relatives. These are parents, children, siblings, avuncular, close cousins and Theories of Family Relativity (ToFR) that you agree with. Parents, avuncular, cousins and ToFR are particularly useful in identifying the side of a TG. Children and siblings tend to share long segments on one side or the other – usually through several TGs and sometimes over an entire Chromosome. Use the Par, Sib, Ch, and Czn columns in the spreadsheet and type in initials (or other identifier) for these known relatives as you find them as Shared DNA Matches with TG icons at MyHeritage (check the browser to ensure they share on the right segment). I note a Match ToFR in a spreadsheet column for Common Ancestor/remarks. These will help you later.

6. Even with a lower threshold of 10cM, some of the 10-15cM segments don’t always match others as they should. I note that MyHeritage uses imputation to add “most probable” SNPs, in order get larger, more complete segments. But I recall that they don’t use these inputed SNPs when they declare TGs – so even though there is an overlap, sometimes MyHeritage won’t indicate a TG. As a result, not all overlapping segments in a group with show with a MyHeritage TG. I got used to it. After all, every shared segment has to be from one parent or the other (or be false). So if a segment shows as  TG with several others, it’s probably true, even if it doesn’t match every segment it “should” match. If you get frustrated with a Match-segment in your spreadsheet, just code it ID# 0. You can ignore it now, or forever. It saves time to “skip over” problem segments.

7. MyHeritage appears to be throttling the Shared Match list after an hour, or so. The “Show more DNA Matches” sticks on the timer and doesn’t show the next list, and/or the next page just hangs up looking for TGs – in either case, you cannot find any more TGs. Options:

      a. Click on your browser refresh icon (top left).

      b. Use the back arrow, and try a different Match.

      c. Turn MyHeritage off; and then back on.

      d. Try again 8-12-24 hours later… I was always able to start fresh the next day.

      e. Maybe just be content with working on this project an hour a day.

8. Order is not a factor – you can literally work any area of your spreadsheet you want, in any order you want. But the objective is to cover the whole genome eventually.

9. Extra Credit – determine the number of segments: In the beginning, before sorting out the small segments from your spreadsheet, add a column called “Seg”. Then sort the entire spreadsheet by Match name DNA Match ID. Put a 1 (the number one) in the “Seq” column for the whole spreadsheet – most of your Matches, by far, will only share 1 DNA segment with you. Then scroll down the whole spreadsheet and change the 1 to the number of segments each Match shares with you. Yes – this is tedious; but this information (number of segments a Match shares), is very helpful in selecting a base Match, or even understanding that there are multiple segment. However, this is NOT a required step for the overall process.

10. Don’t get discouraged – this whole process gets familiar and easier as you gain experience. Take frequent breaks. The end result is worth it.

Section 4 – Summarize the Groups

When you are done grouping all the segments, take a well earned break! The tedious part is over. This section is easy and a little fun…  We are going to insert a header row for each group. We still don’t know which side (maternal or paternal) each group is on, but we do have the Match-segments in groups. I estimate you will get about 300-400 of them.

See the figure below:

In this case, I inserted a new row just above ID#9 in Chromsome 8. I like to use a highlight color. For these “group” rows, add TG in column A; and TG in the Match name column; type the Chromosome number in that column; and the ID# in that column. In the Start Location column type in a number just a little bit smaller than the first segment  – I typed in 64.0 over 65.0 for ID#9 and 77.0 over 77.2 in ID#a. The idea is that I can sort the spreadsheet by Chr, ID# and Start and always come back to this list with all the Match-segments sorted in their respective groups.

Repeat for all 300-400 groups. NB: There will always be two groups that start at the beginning of each Chromosome – one for each side. For each of these group headers, use .001 for the Start location – this will sort before any Match-segments and after the Chromosome header (which Starts at 0.0).

Now we can also assign the first three characters to name this groups as TGs.

      – The first two characters are the Chromosome number: 01 to 23

      – The third character is a letter indicating where the group starts:

            A for 0 to 9.9Mbp; B for 10.0 to 19.9Mbp; C for 20.0 to 29.9Mbp; etc.

Here is an example:

In the next section we can start assigning the TGs to sides…

Section 5 – Assign Groups into TGs

This is the final section. After all of the tedium of grouping segments, we now come to assigning these groups to a maternal or paternal side. As mentioned before, each group represents part of your DNA – on one Chromosome or the other. Each group is DNA from your maternal side or your paternal side. The segment data, by itself, doesn’t provide a clue. We cannot assign a side without additional information. We need to use genealogy.

The main method is to use known relatives. If you’ve tested at least one parent, that is gold at this point. You can pretty quickly determine all the shared segments which include that parent, and the remainder would be with the other parent. I used that method to Triangulate my own genome – I have my father’s atDNA, which was a huge advantage. But now, I’m working on Triangulating my father’s genome – his parents, and their generation, are long since gone. So, in this case I have to rely on known relatives. And we can use logic in some cases – more on that later.

There are some workarounds… For instance someone who is half English or half Ashkenazi Jew or half Scandinavian or half Asian or half African… we might be able to tell from the Match names (or Ancestors) in each group which side they were on. It’s a stretch, but perhaps one parent has New England ancestry, and the other parent has Colonial Virginia ancestry… We’d need to really dig into the Trees of many Matches in each group to tease this information out.

I like using Ahnentafel numbers, so I assign the number 2 for the paternal side, and the number 3 for the maternal side. Alternatively, use P for paternal and M for maternal.

In the following example, I have a close paternal cousin of my father – John. Wherever I found John, I could add 2 (for paternal side) to a column I called G2 (for the second generation, which are the parents).

I know that the ID# 9 group he is in must also be paternal (they all share the same Common Acnestor). I can add a 2 to the TG ID and assign that TG ID to each segment in that group:

All of the shared segments in group ID# 9 are now assigned to TG 08G2 and G2 = 2 (paternal side). This 2 on the paternal side is important because, later, I can separate all my segments into paternal and maternal sides, and very easily see how they are adjacent (or not) to each other. They should be adjacent to each other, and this kind of a sort will highlight any gaps.

Next, we will use some logic. TG 08G2 ends in the 76-78 range (discounting close cousin John’s long segment ending at 90.6 – which close cousins often overlap to the next TG). Group ID# a starts at 77.2, which looks like a good “fit” – a little better than the 82 to 96.9 range in Group ID# b. So, I’m going to conclude that ID# a is a continuation on the 2 – side.

Continuing with this logic, group ID# b clearly overlaps ID# a (side 2) by a lot. So it must be on the other side – side 3. So we now have TG 08I3.

We actually have several different ways to make these assignments:

1. Common Ancestors:

      – The best way to make assignments is with known close relationships which tell us which side a group is on. Parents, aunt and uncles, and close cousins are strong indicators.

      – MyHeritage also offers Theories of Family Relativity (ToFR). My experience with them is mixed – some are clearly correct (they agree with my 46 years of research); some are completely bogus. So, heed the correct ToFRs, and make assignments to a side accordingly. Ignore the bogus ToFRs.

      – Also use Common Ancestors with Matches which you work out on your own and are confident about.

2. TG “Fit”: Examine the rough Ends of TGs and see where they fit with the Starts of subsequent groups. All of these groups and TGs have to fit together somehow. None can significantly overlap any others on the same side.

      – However, allow close cousins with long shared segments to overlap.

3. Siblings and Children: Not illustrated here are the use of siblings and children. They tend to share very long segments on the same side – spanning several groups or TGs. So, also follow those patterns, which “tend” to stay on the same side from TG to TG.

4. Ethnicity and Geography: As previously discussed, you can also use ethnicity in some cases. You can even make a case for geography – if one side has some unique geographic roots that set it apart from the other side.

5. Other companies & Matches: I should also make the case for Matches from other companies. I’ve done most of my father’s genome, and now I’m reviewing his Matches and known Common Ancestors from FTDNA and GEDmatch and figuring out most of my father’s remaining TGs. The genome we work with (usually our own, but in this case my father’s) is fixed – the recombination crossovers are determined before birth, and don’t change. Although the companies may test a different set of SNPs, the 300 to 400 Triangulated Groups should stay the same. The start and end of each TG may be a little fuzzy, but the bulk of the TG still comes from a specific Ancestor, down a line of descent to one of your parents to you. Don’t worry about the fuzzy ends – think about the thousands of SNPs in a TG coming from an Ancestor.

6. Painting and Clustering: If you use DNA Painter or Clustering and determined a side for some of your Matches, use that information.

Other observations:

1. Double up on all the Chromosome headers – there needs to be two for each Chromosome – one paternal (put a 2 in the G2 column) and one maternal (put a 3 in the G2 column).

2. Something I thought about, but haven’t tried, yet. Set the threshold at 20cM and Triangulate those segments fairly quickly (you would skip over a lot of Matches with TG icons who shared less than 20cM). Then re-sort the full spreadsheet and set the threshold at 15cM – then re-sort the 15cM+ spreadsheet by Chr and Start. This has all the segments in order, but the paternal and maternal sides are comingled with the new 15-19.9cM segments. These new segments should be easy to Triangulate other segments already identified with TG IDs. I predict some of the larger TGs will be subdivided in this process. Then drop the threshold to 10cM and repeat. That would take 3 passes through the whole genome, but it may go faster than by starting with a 10cM threshold. Maybe I’ll try my brother’s MyHeritage kit that way…

Section 6 – Enjoy

This in a MAJOR MILESTONE in Genetic Genealogy. Now that you’ve put in the work to create 300-400 TGs that cover your genome, enjoy the benefits of this great tool!

1. All new shared segments (from any company) should quickly and easily fit into an existing TG. You have the framework for organizing all your Match-segment data.

2. Each TG comes from a Common Ancestor – even ones behind a brick wall… Review the blogpost: Let the Matches Tell Us the Cluster Common Ancestor here.  TGs, just like Clusters, are groups focused on a Common Ancestor, and the Matches in a TG may be able to tell you who the Common Ancestor is.

3. When you know which line a TG is from, tell your Matches! Focus on your Matches’ Trees to find that line. With that focus, I have found a collateral line in a Match’s Tree and built it back to our Common Ancestor – many times.

4. Walk the Ancestor Back within each TG. See here.

5. Pile-on (not pile-up). In several TGs I’ve gone back to the Matches asking if they had a particular Ancestor in their Tree. Even when they had no Tree, they sometimes confirmed that Ancestor I was focused on. I’ve collected dozens of Matches who all share the same Common Ancestor with me. I’ve done this with a growing number of different TGs. This is very strong evidence that the DNA segment (represented by the TG) is from that Common Ancestor.

6. Play with these TGs. You can easily use DNA Painter to paint the TGs! All the Matches in a TG could also be painted if you want. Think about Painting virtually every one of your Matches…

7. You can easily use the G2 column to sort your TGs into a paternal side and a maternal side.

8. As mentioned before, add the data from other companies. Consolidate all your Matches and segments into one comprehensive spreadsheet.

The End of Triangulating Your Genome

I’ve enjoyed putting this blogpost together, and sharing some of my concepts on Triangulation. If you try this, please post some comments about your experience and/or any suggestions for improvement.

[10A] Segment-ology: Triangulating Your Genome; by Jim Bartlett 20201229

Find AncestryDNA Matches with Common Ancestors

A Segment-ology TIDBIT

Disclaimer 1 – This is a search methodology, and may not zero in on your Ancestor  100% of the time.

Disclaimer 2 – This process works down to the lowest cM and any generation. There is NO guarantee that the DNA segments are linked to any Common Ancestor found, or that the Common Ancestors found are accurate. This is just a clue.

This process uses AncestryDNA’s search function to find DNA Matches with Trees that have your specified surname/location in them. From my perspective this is a pretty powerful search technique. The results are a list of your DNA Matches (so you share some DNA with each one); and each Match has a Tree large enough to have the surname and birth place combination you specify in them. I’ve found a high percentage of these results to be my Ancestors. We must still verify both our path and the Match’s patch back to the CA. And, if we want to use the DNA segment, we must still verify that that segment is from that CA.

Process:

  1. At AncestryDNA, click on DNA (in the upper tool bar) and DNA Matches (from the drop down menu) – the result is the entire list of your DNA Matches.
  2. Click on Search (on the right side; just above your Match list) – this brings up 3 search boxes.
  3. Type a Surname in the “Surname in Matches’ trees” box; AND type a location in the “Birth location in Matches’ trees” box. Ex: CHEATHAM and Henrico County, Virginia, USA [This is one of my 7xGreat grandparents – I should get mostly 8C Matches with this search]. Use the “Include similar surnames” check box per your judgment. It’s best to use the Ancestry standards for the location, which will usually come up as a suggestion as you type.
  4. Click on the green Search button – the resulting list will be your DNA Matches who meet the criteria.
  5. Click on any Match to get their page (compared to you). You can investigate your target surname from the Shared Surnames list OR by clicking on their Linked (or Unlinked) Tree and search for that Surname.

Like all search processes, the key is finding the right combination of search terms. Clearly searching on JONES in Virginia, USA would not be helpful. My best suggestion is to search on a County, State combination.

I started with a list of my 7xGreat grandparents with their birth places. Most of these surname/birthplace combinations give me a very useful list, which often includes Matches with closer CAs (same surname and county). I can easily skip over all the ones I’ve already found and “Stared”, “Dotted”, and “Noted” – but many are new to me (ThruLines does not include any Matches beyond 6C). Also, many of the Matches with these CAs will share small DNA Segments – but they may be true genealogy cousins anyway.

If you get into a “grove” with this process, try using surnames from married daughters of your Ancestor (with the appropriate birth location). You’ll find even more Matches who had not taken their Tree back far enough…

This process ties into Triangulated Groups and/or (manual) Clustering, in that it finds more CAs to add to your Notes. You can then click on Shared Matches to see if this information would influence a Cluster or TG. By “influence” I mean that it could reinforce existing information seen in the Shared Matches, it could add evidence to extend an existing CA or Ancestral line, or it could contradict existing information resulting in a review of that TG or Cluster.

Also, if you are trying to “Dot” some of your 6-7cM Matches, this process will focus on some key Matches. When your Match list (for a surname/location) comes up, just scroll down and work up from the bottom until you’re into the 8cM Matches…

More Common Ancestors are good! They help validate the genealogy and add clues for Triangulated Groups and/or Clusters.

 

[AV] Segment-ology: Find AncestryDNA Matches with Common Ancestors TIDBIT by Jim Bartlett 20200808

Use Clusters!

Clusters form on a Common Ancestor (CA). We don’t have proof of this but a) it makes sense (why else would our Matches match each other in a Cluster?); and b) it sure seems to work (I’ve found many new CAs with Matches, just by focusing on the CA in a Cluster).

So, with this concept in mind, let’s use our Clusters!

  1. Known CA – If you *know*, or even suspect, the CA of a Cluster, search other Matches in that Cluster for that CA or location or a collateral line. If a Cluster Match has a good Tree, there’s a good chance you’ll find the CA in their Tree. There’s a good chance multiple Matches in a Cluster will all have the same CA. Armed with a known CA, I’ve often been able to build out a Match’s Tree to that CA.
  2. Unknown CA – If you don’t have a clue to the Cluster CA, find the most likely CA among the Matches – whether you have that surname or not. Let the Matches tell you the Cluster CA – per this blogpost. This is also effective for Brick Walls and unknown parentage.
  3. Suspect CA – If some on the internet propose an Ancestor for one of your lines without proof, or if you are suspicious of their “proof”, test out that Ancestor. Look for that surname among the Matches in appropriate Clusters. “Appropriate” means these Clusters are probably on that line. Try the Unknown CA process and see if this same surname comes up. Clearly, if many people have bought into this Suspect CA, this process won’t work (however, then using this process with the Suspect CA’s mother’s surname, may be helpful). Example: During 40 years of research on my NEWLON line, many had heard the claim by one researcher that a spouse was “Martha JANNEY”, but without proof, few used that information. So I decided to test it. Virtually none of my Cluster Matches had the JANNEY surname; but many had the CUMMIN/GS surname. In fact, searching all of my DNA Matches (over 125,000 of them) turned up 17 Matches (down to 6cM) with the JANNEY surname in Loudoun Co, VA – none in any of my “appropriate” Clusters.

Bottom line: Use the concept that Clusters form on a CA. Use it to find CAs with more Matches; Use it to break through Brick Walls or explore Clusters without a CA. Use it to *test* likely or suspicious surnames in selected Clusters – if the CA is correct, it should show up in multiple Matches in a Cluster.

 

[19J] Segment-ology: Use Clusters! by Jim Bartlett 20200705

Easy Manual Clustering at AncestryDNA

Auto-Clustering at AncestryDNA is in a pause mode now. But we can still look at and analyze our own Matches any way we want. We can even form our own Clusters. Here is a modest process that may produce Clusters that are very helpful to us. AncestryDNA does not provide segment information that would allow grouping by Triangulated Groups, so Clustering is the best way to group Matches. And there are several advantages to using Clusters.

Manual Clustering Process at AncestryDNA

  1. Start with your ThruLines. These are Matches who share a Common Ancestor (usually a couple) with us. The ThruLines process looks for obvious CAs; it looks in Private (but searchable) Trees for CAs; it sometimes “fills in the blanks” with information from other (even non-DNA Match) Trees to create a link between you and a DNA Match back to a CA. This ‘fill in the blanks” process may be in your Tree or your Match’s Tree or both. The ThruLines process works out to 5xG grandparents on both sides – if either side is more than 7 generations back, it will not be reported. In any case, you should review the information provided by Ancestry and decide if the ThruLines CA is correct, or not.
  2. Enter the CA information in your Match’s Note box [see “Add note”] – I use a combination of Ahnentafel Number/side; relationship; and surnames. Example: A0140P/6C: WELCH/SPENCE – the Match and I are 6C, sharing ancestors Sylvester WELCH Jr and Anne SPENCE; Sylvester WELCH is my Ahnentafel Number 140 on my Paternal side. I like using Ahnentafel numbers as they are easy to compare and determine relationships. Just divide by 2 to get 140>70>35>17>8>4>2>1 (me), so A0008P/2C: BARTLETT/NEWLON is on this same ancestral line. Do this for all your valid or suspected* ThruLines CAs. NB: Some Matches will share more than one CA with you – enter them both.         [* I include suspected ThruLines CAs – if they are incorrect, they almost never Cluster and can thus be culled out.]  Anyway – use whatever system works for you, just enter something in the Note box. You’ll be looking at these Notes of Shared Matches to form Clusters, and you want to know who shares the same CA.
  1. After going one or all the ThruLines CAs, call up one of these Matches and review their Shared Matches. I count the number of SMs and the number on the same ancestral line, and record this in the Note box. Example: SM: 17/25xA0140P. This means that out of 25 total Shared Matches, 17 of them had a Note indicating A0140P CA. NB a Shared Match with a Note indicating A0070P would be included. A SM with A0034P would also be included because 34P is really a short cut for 34P/35P, and 35P is in the same ancestral line. Likewise, 8P is in the same ancestral line and would be counted as also having A0140P ancestry. Repeat for all ThruLines Matches. It doesn’t take that long for such a powerful tool as Clustering.
  2. Use judgement to decide who is in a Cluster. In some cases, it’s crystal clear – virtually every Shared Match has an “SM: note” with the same CA. Other cases are not so clear, so you need to decide if there is sufficient evidence to include a Match in a Cluster. In some cases, a Match with a ThruLines CA will actually have several Shared Matches with a “different CA” – the Clustering process dictates such a Match be Clustered with the Matches with a “different CA”. And I would certainly review that Match again to see if there isn’t some clue that indicates the “different CA” is in their tree, too.
  3. Cluster ID – you can use any system you want to name your Clusters. One way is CL001 to CL200. Another way is to use the CA – Example: CL0140P1. This is the Ahnentafel Number preceeded by CL. NB: I added a 1 at the end because some of your Ancestors may be linked to more than one Cluster. [I have Ancestor A0556M, a 7xG grandparent couple, who are in three large Clusters.] Add this Cluster ID to the Match notes. Example A0170P-CL047/6C: WELCH/SPENCE. Or use whatever system you want.
  4. Once you have determined Clusters based on ThruLines Matches and CAs, you can go back and look at a Match in a Cluster and look at his/her Shared Matches who aren’t in a Cluster. Do some have several Shared Matches themselves who are in a Cluster? If so, add these Shared Matches to the Cluster. NB: You can also look at Matches under 20cM – many of them have Shared Matches. If several Shared Matches are in one particular Cluster, add the under 20cM Match to the Cluster.

Clusters are one of the best tools I’ve found for grouping AncestryDNA Matches and finding more CAs.

 

[19I] Segment-ology: Easy Manual Clustering at AncestryDNA by Jim Bartlett 20200701

Let the Matches Tell Us the Cluster Common Ancestor

Using a 20cM threshold at AncestryDNA, I got 156 Clusters. That’s roughly one Cluster for each of my 128 5xG grandparents – or two Clusters per 5xG grandparent couples – often with valuable Common Ancestor (CA) hints from ThruLines. I don’t know 50 of my 128 5xG grandparents (they are brick walled) – so I would expect (50×156/128=) 61 of my 156 clusters to be blank. What’s a body to do?

Well… in the first place the above calculation is based on finding a CA at the 5xG grandparent level. ThruLines provides clues for all the Ancestors I know – but, clearly, they cannot help with Clusters (or TGs) beyond a brick wall. For almost all of the Clusters, I know the parent; and for roughly 80% I know the grandparent; and for many I know the CA out to the brick wall. So I’ve got a start. But, for many of my Clusters, there is very little otherwise to go on – just a lot of Matches in a Cluster. What’s a body to do?

As I’ve said before, let’s think about lemonade…  In my last post (Using a Group Ancestor), I noted that grouping (segment Triangulation and Shared Match Clustering) results in a group of Matches with the same Common Ancestor (CA). This is the concept, even if we don’t have any clue as to who the CA is. But let’s make “the certainty that there is a CA” work for us… Let’s have the Matches tell us who the CA is for a Cluster. Seems like lemonade to me.

Here is a process for AncestryDNA: [I hope you’ve saved your last Cluster report]

  1. Select a large Cluster for which you have no known CAs (or only a few which are in conflict with each other).
  2. Make a spreadsheet with three columns: Match Name and Surnames and Notes.
  3. Select a Match in the Cluster who has a Tree with more than 99 people.
  4. Type the Match name in the spreadsheet.
  5. Go to that Match in AncestryDNA (either from the URL in the Cluster; or by searching AncestryDNA).
  6. Type the surnames for that Match (both Shared Surnames & Match’s Tree Only) in Surname column.
  7. Copy the Match name down the spreadsheet for each surname.
  8. Repeat for each Match in the Cluster with a Tree over 99 people.
  9. Sort the spreadsheet on the Surname column.
  10. Scroll down the list and highlight likely Surname groups [it would be great to find a clear winner – repeated multiple times. If not pick the top few surnames].
  11. Go back to the Matches with most likely surname(s) and put in the Notes column the Patriarch or any other identifying information (birth, location, ethnicity, etc). The expectation (hope) is that you’ll find a Common Ancestor or two in this process.

I can almost hear the collective groan at step #6. Yes, it’s an onerous task. I sat down with a favorite beverage and typed non-stop the 660 surnames for Matches in one Cluster; 750 in another Cluster. But, think about this another way: would you spend a half-day of work to find a new Ancestor? That would be a nice glass of lemonade.

In my first Cluster try, I found three Surnames (ADAMS, CAUDILL and CRAFT) repeated several times. A quick and dirty Tree quickly determined John ADAMS married 1769 Loudoun Co, VA Nancy CAUDILL; and their daughter, Elizabeth married Archelous CRAFT – and 5 of my Matches in the Cluster descended from these two couples!! I already had some clues that this Cluster was on my father’s father’s side. This includes my NEWLON line which had a brick wall born c1774 Loudon Co, VA which I determined was Susan CUMMINGS – blogpost here. Her father is strongly suspected to be John CUMMINGS born c1746, but nothing is known about John’s first wife, the mother of Susan CUMMINGS and my Ancestor – a new brick wall. If John’s first wife was an ADAMS, all of this would fall into place as a hypothesis.

By the nature of Shared Match Clustering, this Cluster must have a CA. With five widely separated Matches agreeing on the same CA (and no other surnames turning out any hints at all), I think this is a strong clue. But, more research is needed.

The other Cluster had several repeated surnames, but none that I have been able to link together, yet. I may drop down and look at the surnames of Matches with Trees in the 50-99 people range… maybe another hour of typing… If I find a clue it will all be worth while.

Bottom Line: A Cluster (or a TG) has a CA. The Matches in a Cluster should all share this CA. Let the Matches Tell Us the Cluster Common Ancestor.  The process above is one way to do this. A particular advantage to me is that this process is comprehensive, and with no bias – the data from the Matches is treated evenly.

Post Script: By it’s nature genealogy is an ego-centric hobby. We tend to focus on ourselves as the center of the universe. Or, if we are professionals, we treat the Client as the center of the universe. Everything revolves around our Ancestors and what we can find out about them. But each of us is a small part of the human race, and our Matches – our cousins – are part of this larger picture. They fit in, too. They are an interlocking part of the whole jigsaw puzzle, and in some (many?) cases, some of them know more than I do . The process above draws on the data they have provided. Often, they have clues to the solutions we seek. Often, they know what’s on the other side of our brick walls.

Edit 6/22/20: I’ve been asked to add a photo of my spreadsheet. Here it is – showing the top two surnames.

Spreadsheet of Cluster Common Ancestors

The 3rd column is Match Names and it has been narrowed for Match privacy. When I started, I had columns for Company and Where (the name of the Cluster run – 20cMCL63: Cluster 63 of the Shared Match run using a 20cM threshold), but it turns out this is a Quick and Dirty spreadsheet, and I didn’t need those columns. The objective is to get started on a Quick and Dirty Tree, and work from there. As soon as I saw the last line – a CRAFT married to an ADAMS, I started the Q&D Tree and found the five Matches who all tied together. Since then, I’ve used the previous blogpost on Searching and have found over a dozen more Matches who descend from this same line. All of the Cluster Matches were over 20cM. However, now knowing what I’m looking for, the Search process let me drop below 20cM and find many more – and most of them have above-20cM Shared Matches from the same Cluster. This is added evidence that I tie into this line some how.

[19H] Segment-ology: Let the Matches Tell Us the Cluster Common Ancestor by Jim Bartlett 20200620

Using a Group Common Ancestor

A Triangulation (and grouping) Concept

We have spent a lot of time and effort to describe *how* to group our Matches: segment Triangulation, DNA Painting, Shared Match Clustering. Each of these processes results in a group of Matches that should have a Common Ancestor (CA). This is an important concept.

But the main thing is to *use* this concept – to use the information found in these groups. If a group is formed around a CA, then all of the Matches in the group should share a CA. Once a CA is found, each Match in the group should also have that group CA, or be a closer cousin with an MRCA that descends from the group CA, or have a more distant MRCA which is ancestral to the group CA. In other words, all the Matches in a group should have the same distant CA.

So… if we find a CA for a group, the other Matches in the group should have the same CA line. This is a powerful focus – let’s *use* it. We should be able to look at other Matches in the group (who have Trees) and find that CA – either directly through a search, or indirectly by building out their Tree.

I illustrated this in Case 3 of Chapter 1 (Lessons Learned from Triangulating a Genome) of “Advanced Genetic Genealogy: Techniques and Case Studies” – here or here. This was all about one of my TGs which I call [04P36]. At Ancestry, I found a few cousins (who had uploaded to GEDmatch) in that TG who  shared my HIGGINBOTHAM ancestry. Armed with that hint, I searched for HIGGINBOTHAMs in other Matches (in that TG) who had trees. I also contacted Matches from FTDNA, 23andMe and MyHeritage – and several replied that they had the same HIGGINBOTHAM Ancestry. In the end I found 14 different Matches ranging from 4C to 8C on this HIGGINBOTHAM line in TG [04P36].

Because TG [04P36] came down a line of descent with the HIGGINBOTHAM surname in 5 generations, this case was an easier example – searching for one distinct surname. If a group represents a CA with a male-female zig-zag line of descent to me, it will be harder – the surname will change often. However, each line of descent (from a given Ancestor) is fixed – and we may find Match cousins with MRCAs of different surnames, but they will all be on the same ancestral line. This is akin to “Genealogy Triangulation” – getting an alignment of multiple cousins on one line.

Finding one Match with a CA in a group is not the end of the story – it’s a clue to the beginning of more research. If we find a CA for a group, but no other Match seems to have that CA, maybe we need to look for a different CA. The “correct” CA for each group should lead to Genealogy Triangulation – agreement by other Matches on the same ancestral line. If you find a CA in a group, *use* it to find more Matches on that same line. Seek CA agreement among Matches in each group.

 

[08D] Segment-ology: Using a Group Common Ancestor Concept by Jim Bartlett 20200620

Using Ethnicity to Identify a Cluster

A Segmentology TIDBIT

My Ancestor 14M was John William CAMPBELL, born 1856 NY; died 1916 WV. His parents were Samuel CAMPBELL and Ann CLARK who were married 1851 in Scotland and immigrated to the US in 1853. This 1/8 of my ancestry is the only known part to come from Scotland. Several cousins have done Y-DNA testing and the CAMPELL line is the Argyll CAMPBELLs.

I have over 125,000 Matches at AncestryDNA. I have identified Common Ancestors with over 4,500 Matches – only 5 of them are on my CAMPBELL line. About 12.5% of my DNA is from my CAMPBELL line, and, all other things being equal, about 12.5% of my Matches should come from my CAMPBELL line.  But all things are not equal – this CAMPBELL line is relatively small, and there are no known Ancestors before 1850, and there are no known links to any Ancestors in Scotland.

This doesn’t mean that none of my other Matches are cousins from this CAMPBELL line. However, it does result in me not being able to find any more links. I have tens of thousands of Matches with no Trees; I’ve even found some with a CAMPBELL surname – but no way to determine if I am related to them (other than the few who have matching Y-DNA at FamilyTreeDNA).

So, I drop back and relook at the big picture: exactly 1/8 of my Ancestry came from Scotland (well, maybe not going way back, but probably within a genealogy timeframe); roughly 1/8 of my DNA came from/through Scotland; and if not 1/8, perhaps 10,000 of my Matches should be on this part of my Ancestry– certainly more than the five close cousins I already knew about.

I decided to turn this lemon into lemonade. The lemon is recent Scottish immigrant ancestor – the lemonade is Scotland ethnicity. If this is the only part of ancestry from Scotland, maybe I could use that information. When I Cluster my AncestryDNA Matches at the 20cM Threshold (the lowest cM amount with Shared Matches to each other) I get about 160 Clusters. 1/8 of those is 20 Clusters – a manageable number. So when I see some solid looking Clusters without any hints of other ancestry, maybe they are from my Scotland line.

Here is one such Cluster. I clicked on the link for each Match and checked their ethnicity:

Every Match in this Cluster has 14% to 62% Scotland ethnicity. A few scattered Matches with Scotland ethnicity might be expected randomly, but for all of them to have significant amounts of Scotland ethnicity is a strong clue.

I think I can safely assume this CL149/14/[Scotland…] Cluster represents my Ancestor, John CAMPBELL – Ahnentafel 14M. If I knew the DNA segment, I could Paint this Cluster. I have several others that also show a pretty clear Cluster “picture”. Next I’ll be looking a some other Clusters which may even have a ThruLines Common Ancestor in them, but also have a lot of Scotland ethnicity – the ThruLines CA may be the outlier… With only one ThruLines CA I don’t have a high confidence that it’s right. But with high concordance of Scottish ethnicity, that’s a strong clue the Cluster is on my CAMPBELL line.

The next step is studying any Trees in these Scotland Clusters to see if those Matches have some Common Ancestors among themselves… That will be the sweetest lemonade of all.

 

[22AU] Segment-ology: Using Ethnicity to Identify a Cluster TIDBIT by Jim Bartlett 20200612