Save the Clues!

A Segment-ology TIDBIT

In genealogy if we look for Thomas BARTLETT in the census and find three entries with that name, we don’t discard them all if we cannot immediately figure it out. We record them all and look for more evidence. The same concept applies to DNA. Record all the Common Ancestors you find with a Match. Even if the Match is a 2nd cousin (2C), she may also be a 5C or 8C on a different line. One of her shared segments may go back to that more distant ancestor – it’s happened to me! Don’t disregard a “pile up” of shared segments which Triangulate (just because they may be from a very distant ancestor). Science tells us that some of those 7-20cM shared segments will be with closer cousins – it’s happened to me, often. Don’t disregard a distant cousinship beyond 5C with a Common Ancestor. Save the clues. As new Matches come in, you may find supporting evidence for that CA in the TG – it’s happened to me. I have 11 out of 35 Matches, on one TG with the same 7G grandparents (some of them are 6C and 7C, the rest are 8C).

 

[22E] Segment-ology: Save the Clues TIDBIT by Jim Bartlett 20170103

Only One Comparison Needed to Add a Segment to a TG

A Segment-ology TIDBIT

Start with a list of your overlapping shared segments – they all match you (two triangle legs). Find two of these shared segments that match each other (third triangle leg) to form a Triangulated Group! All the rest of the shared segments in your overlapping segment list need only match one of the shared segments in the TG – any one of them with a good overlap – to be added to the TG. The explanation will take a long blog post with diagrams – but the thrust is that forming a TG basically identifies that segment just as good as trio-phasing. So trust me! If a shared segment doesn’t match the TG, it will match the other, overlapping TG; or it is IBC (it happens for some shared segments under 15cM). As a Quality Control measure, I often make a second comparison – it always matches!

 

[22D] Segment-ology: Only One Comparison Needed to Add a Segment to a TG TIDBIT by Jim Bartlett 20170101

Roughly Right is OK for Genealogy

A Segment-ology TIDBIT

A Chromosome Map divides each chromosome into segments [or TGs] from specific ancestral lines. The data we get is fuzzy (see Fuzzy Segments), so the size and boundaries of these ancestral segments is a little fuzzy. But so what! For genealogy purposes these ancestral segments are large “targets”. Most shared segments with Matches easily hit these targets. The ancestral line “owns” their target segment. For genetic genealogy, that’s what matters – linking Matches and Common Ancestors with segments. It doesn’t make any difference if the shape or size of the target segment (TG) is a little blurry.

 

[22C] Segment-ology: Roughly Right is OK for Genealogy TIDBIT by Jim Bartlett 20170102

Number of Matches on a Segment (or TG)

A Segment-ology TIDBIT

Segments from Ancestors with large families will generally have more Matches (in general, the larger the families, the more cousins you have). Segments from more distant Ancestors will generally have more Matches (in general, the more distant the ancestor, the more cousins you have). Segments from Ancestors in Colonial America will generally have more Matches (in general, more Americans have taken the DNA tests). Segments from endogamous Ancestors will have more Matches (because of endogamy, there is more matching DNA).

 

22B Segment-ology: Number of Matches on a Segment (or TG) TIDBIT by Jim Bartlett 20170101

 

Crossover and Segment Formation

A Segment-ology TIDBIT…

Crossovers and segments are formed by random DNA biology.  They are formed at conception in each of our ancestors and in ourselves. They are at fixed, permanent locations in each of us. They are not affected by family size, geography, wealth, status, intelligence, etc. For each of us, they are a fixed structure of our chromosomes – like a picture or jigsaw puzzle – which is different for each person.

 

22A Segment-ology: Crossover and Segment Formation TIDBIT by Jim Bartlett 20170101

A Targeted Process at AncestryDNA

AncestryDNA does not provide segment info. This is a problem for Segmentologists who want to Triangulate – like me. Triangulation has worked well in grouping my many thousands of segments into specific groups representing specific ancestral lines. These Triangulated Groups (TGs) now cover 94% of my 45 chromosomes, and identify specific maternal and paternal segments of DNA from my ancestors. Over 75% of my 45 chromosomes have some identified Common Ancestor(s) beyond my parents. Are they all correct? Of course not. But many are linked to first, second, third and fourth cousins, which are walking back the Common Ancestors of those segments. These are reinforcing and validating more distant Common Ancestors shared by fifth through ninth cousins.

I have over 400 Hints at AncestryDNA, and very few of them can be linked to GEDmatch kits. It’s frustrating to know so many Triangulated Groups on one hand; and so many Common Ancestors at AncestryDNA on the other hand; and not be able to merge this information. Knowing the segment data for each of the AncestryDNA Hints would significantly expand my chromosome map. Grrrrrrr:-(

So I’ve tried a new process at AncestryDNA, and am having some amount of success with it – I thought I’d share it in this blog.

I selected a surname – HIGGINBOTHAM – on my mother’s side. This surname is in my Tree 6, 7, 8 and 9 generations back. The patriarch (9 generations back) is thought to be John HIGGINBOTHAM, but there is some controversy about his given name. However, there is general agreement on several of his children, and I have several AncestryDNA Hints at the 6C, 7C and 8C levels going back on this HIGGINBOTHAM line.

So I searched on this surname at my AncestryDNA Results page and got over 100 Matches. I looked at each one and selected 46 which were Hints or had strong Colonial Virginia ties to Amherst Co, VA (where my HIGGINBOTHAMs were for 3 generations) or to nearby counties.

Note that at 9 generations back we have 256 ancestors on our mother’s side. My chromosome mapping currently shows just over 200 TGs on my mother’s side. So I should reasonably expect about one, maybe two or three, of my TGs to be from my HIGGINBOTHAM line. Maybe there are a few Matches at the 6C level which actually then branch off on the HIGGINBOTHAM’s wife’s ancestry.

I was surprised at the number of HIGGINBOTHAM cousins I had – particularly since I was pretty sure they shouldn’t be spread out over many different TGs. And, on the other hand, I don’t believe that many cousins should all pile up on the same ancestor – unless, of course, some of them were related to each other more closely (some Matches were from the same Admin), or they only descended from two or three of the Patriarch’s children at each TG.

I needed to find out the actual segment data each match shared with me….

So I drafted a standard message:

You and I have a DNA match at AncestryDNA, and have a common HIGGINBOTHAM line. I have 46 such Matches at AncestryDNA! I am mapping my DNA (linking Ancestors to specific DNA segments), and would very much like to determine the segment for each of these lines. Most of us are 8th cousins, and I did not expect to get HIGGINBOTHAM DNA from more than one or two different segments. To sort this out, I am requesting that you upload your raw DNA file to www.GEDmatch.com .It’s a free site – easy to register – with complete instructions on their home page. My GEDmatch ID is Mxxxxxx. You will get many more Matches at GEDmatch, with emails. No medical or health info. Please let me know your GEDmatch kit number if you upload at GEDmatch.

I will provide you feedback on what I find, including other cousins who share the same segments. This will help all of us.

FYI, I’m having success with atDNA and would be happy to help you. See my How to Succeed list at: http://boards.rootsweb.com/topics.dnaresearch.autosomal/301/mb.ashx It has some good links at the end. My  DNA blog is at www.segmentology.org – written for genealogists in plain language.

Hope to hear from you. Jim Bartlett jim4bartletts@verizon.net

end of message

Note – do not send such a standard message to 46 Matches at one sitting. Ancestry looks for “spam-like” messages, and will block your messaging ability (you have to phone in and talk them out of it, to get reinstated). I sent 5 or 6 a night for a week or so.

Well… this worked out much better than previous attempts to randomly beg Matches to upload to GEDmatch. I now have 23 GEDmatch kit numbers out of this group. And several others are still working on it. The results have been in several categories:

  1. At GEDmatch I can also compare with my deceased father’s kit uploaded from FTDNA. It turns out several of my HIGGINBOTHAM cousins share a DNA segment with both me and my father. So although we may well be genealogy cousins on HIGGINBOTHAM, the shared DNA that formed our match at AncestryDNA is from my father’s side – not from a HIGGINBOTHAM line.
  2. Several TGs are getting most of the shared segments. TG [04P36] has six Matches on two children of the Patriarch (some of them are 5C or 6C to each other from one child of the Patriarch). TG [10I36] has 3 Matches; and TG [04B36] has two Matches. [NB my TG naming system starts with the Chr (04 and 10); the letter roughly represents the 10Mbp block where the segment starts; and 3-6 are Ahnentafel for my mother’s father’s side – where my HIGGINBOTHAM ancestry is]
  3. The rest of the Matches are spread out over different TGs – most of these TGs include Matches with other Common Ancestors. So it is entirely possible, indeed probable, that most of these TGs will come from different Common Ancestors. As time allows, I will investigate the several surnames from each of these TGs to see if there is consensus among the Matches.

This process would probably be too overwhelming for common surnames like JONES, SMITH, JOHNSON, etc. And your own surname might not be very helpful in determining a few TGs – your father’s or mother’s surname could be sprinkled all over your chromosomes – so it would be harder to form groups. Since we probably have the majority of our Matches in the sixth to eighth cousin range, I’m thinking that would be a good place to select surnames.

The main point here is that by using a more personal message, I’m getting more cooperation from my AncestryDNA Matches. By selecting a surname, and doing some homework to make sure we have the same Patriarch, the message is targeted. By promising to provide feedback to each Match who uploads to GEDmatch, I’m helping my Matches. The Matches don’t have to understand Triangulation or Segmentology – all they need to do is upload to GEDmatch. It seems to be working…

This process also works for testing educated guesses on new Surnames. It takes advantage of the more than 23,000 Matches I now have at AncestryDNA. I can search for a particular SURNAME and see if it pops up. Out of 23,000 Matches, each of my Ancestral surnames should be shared by some of my AncestryDNA Matches.

For example, I’ve looked for years for the maiden name of the wife of my ancestor Thomas BARTLETT c1705-1783 of Richmond Co, VA. At one point he owned a piece of land between two EIDSON brothers, so I thought perhaps his wife’s maiden name was EIDSON (and I’ve collected a lot of EIDSON records in Richmond Co, VA trying to find the connection – to no avail). So I searched my AncestryDNA Results for the Surname: EIDSON. I only got one EIDSON hit, and that clearly was not a link to Richmond Co, VA in the early 1700s. This means to me that that surname is probably unlikely as my ancestor. I’ll try some other surnames from my FAN list for Thomas BARTLETT. And in a year or two, when I have twice as many Matches at AncestryDNA, I may try EIDSON again. And revisit some of the other surnames…

For me, this targeted approach is turning out to be a good way to get uploads to GEDmatch and to find Triangulated Groups with several Matches who share the same ancestry with me.

 

15G Segment-ology: A Targeted Process at AncestryDNA by Jim Bartlett 20161020

The Attributes of a TG

This article will describe the various attributes of a Triangulated Group (TG). Some have noted that I use the term TG to describe both a Group of Matches as well as an ancestral segment. Well… yes, I do. Read on.

Once established, each TG has certain attributes which can be used to describe and/or define the TG:

A1. A TG is a group of shared segments from Matches. We often think about the Matches in a TG. They have a Common Ancestor. They can be contacted and encouraged to collaborate on finding the common ancestry. So in this sense a TG is a group of Matches. However, note that any of these Matches could, potentially, also share the same or a different ancestor with you on another segment (in a different TG)

A2. A TG occupies a specific physical space on a chromosome. It is, in effect, a segment in its own right – a segment from one of your ancestors. The TG is on one chromosome with a start location and an end location. These start and end locations are determined by the matching, overlapping, shared segments (from Matches) within the TG. Please review: Anatomy of a TG.

A3. As a segment, a TG has a specific string of SNPs on one chromosome. These would be the same SNPs in a segment on your chromosome, and on the segment each one of your Matches shares with you in the TG. All the SNPs would be the same. The SNPs have to be the same for IBD segments to match.

A4. A TG is the equivalent of phased data. The TG represents an ancestral segment on a chromosome. All of the SNP values (alleles) are on one chromosome, and are the same SNP values you got from your mother or father (depending on the chromosome). What you have in a TG (segment) is exactly what you would have with phased data.

  1. We don’t see the actual ACGT values in a TG that we would get with true phasing (with a child-parents trio), but they are the same values in the TG. The TG segment represents part of one of your chromosomes – it must have the same ACGT values that your parent passed to you on that chromosome.
  2. We can treat the TG as phased data, and any other shared IBD segment which Triangulates with the TG will have the same SNP values.
  3. This is true, even if you have formed a TG (with matching, overlaying, shared segments from Matches) and do not have the genealogy to determine which side it is on. You can be confident that the TG exactly matches the DNA on one of your two chromosomes. In this case the TG is not entirely equivalent to a true phased segment. But, if you had the phased information, you’d already know which side the TG was on. And very often you can determine the side of a TG by imputation – by determining which side it’s not on; or by the admixture of the segment.

A5. Technically, each TG has a cM value. However, it usually takes a lookup table to determine the cM value for a segment on a given chromosome, between two points. This is what the testing companies and GEDmatch do for each shared segment they report. It’s a lot of work for genetic genealogists – and, in general, our TGs will morph over time as new shared segments are added, and the TG cMs will need to be adjusted. However, we can fairly easily make rough estimates of the TG cMs, which are plenty good enough for genealogy:

  1. Subtract the TG start location from the end location to get the number of base pairs (bps), divide by 1,000,000 to get Mbp, which is roughly equal to the number of cMs.
  2. Or, eyeball the cMs of the larger shared segments in a TG and extrapolate to the full TG (if you’re lucky, you may have a shared segment which nearly fills the TG)
  3. Note that the cM is a fuzzy value anyway – it’s empirically derived (an average of many observations), and it’s an average of the female and male averages. So don’t go to too much trouble, and use round numbers. AND note that there is a wide range of possibilities when trying to use cMs to determine approximate cousinships. See Blaine Bettinger’s chart for the ranges of cMs vs cousinships.

A6. TGs have fuzzy ends. There is no “signpost” in our DNA to identify crossover points, or where a shared segment starts or ends. The company algorithms estimate shared segments by looking for areas of DNA that are identical, and it then continues until the DNA is not identical. This results in some, usually small, amount of overrun (a longer segment than is actually there from a Common Ancestor). So my convention is to use the start location of the first shared segment in a TG (the one with the lowest start in bp). This is the start location of the TG. The end location is determined several ways:

  1. If there is no overlap with the next TG on that chromosome, then use the largest end location of all the shared segments in the TG – the shared segment which runs the farthest. This often is not the last shared segment in a sorted spreadsheet – you need to look at them all.
  2. If there is a small, fuzzy overlap of 1-2Mbp with the next TG, I use the start location of the next TG, and accept the fact that there is a fuzzy overlap. We don’t need to be real precise for genealogy. Each TG represents a large block of DNA from an Ancestor – the fact that the edges of the block may be fuzzy should not obscure the big picture: the main TG segment came from an Ancestor!
  3. If there is a large overlap with the next obvious TG (almost always from a large shared segment with a close cousin, which probably spans more than one TG), I start a new TG at the obvious point dictated by the next group of shared segments, and use the same point as the end location of the first TG. This involves judgment – there is no hard rule – and the data will usually indicate where to start a new TG. Just accept that close cousins may share large segment which span more than one TG.
  4. If the shared segments in the TG all end a “few” Mbp before the next TG starts, I will just round up, and use the start location of the next TG as the end location. Again, use judgment.
  5. If there appears to be a large enough gap between two known TGs, I create a “dummy” TG to fill the gap. And then I keep looking for some Matches with shared segments to fill that gap. At this date my dummy TGs are about 7% of my DNA.
  6. Using these conventions will result in TGs that are adjacent to each other over all your chromosomes. Even with some “dummy“ TGs, this process will organize all of your IBD Match segments into TGs over all of your DNA. When done, this is a happy day! You can then focus on TGs that should link to specific ancestors. And all new Match segments will generally fit easily into existing TGs.

A7. As you work this process of forming TGs and assigning them to sides using genealogy, you are creating a chromosome map. As new Matches are posted, their shared segments may adjust the start and/or end locations of the TGs. When you get lucky, you’ll find a new shared segment that fills a “dummy” TG.

A8. Naming TGs. I label each TG (and all the shared segments in it) with a short code – like 07C25. The 07 means Chr 7; C means the TG starts within the third group of 10Mbp – in this case between 20-30Mbp; 25 means this TG ion on my father’s (2), mother’s (5) side – using Ahnentafel numbers. If I were starting over I’d use 07.027PM to indicate Chr 7; start 27Mbp; on Paternal, and his Maternal ancestry. Note: before you do any assigning, the label might be 07.027, then add P or M when determined. I usually add this code in the subject line of emails and messages – it just helps me keep organized.

A9. Also note that each TG also has within it, many genes. Each of your 22,000 or so genes will have a specific location on your DNA. If you become curious about any particular gene, you can look up where it is located (chromosome and location). You will have two of each gene, one on your maternal chromosome and one on your paternal chromosome. You can then look at your chromosome map of TGs and see which maternal TG and which paternal TG it’s in. If you’ve determined a Common Ancestor for those TGs, you’ll know which ancestor passed that gene down to you. You can also add a gene to your spreadsheet, so that it sorts with all the segments and TGs. Examples: Short Sleepers gene BHLHE41 would be 12.026M and 12.026P (very close to LRRK2 (Parkinson’s) at 12.031. Also my Neanderthal segment is 10.130 (I don’t know which side)

Have fun with your TGs!

 

15B Segment-ology: The Attributes of a TG by Jim Bartlett 210919