Measuring Segments

For autosomal DNA, segments are measured 3 ways:

base pairs (bp) – these are the individual building blocks (molecules) that form each chromosome. Over the entire set of chromosomes there are 3.2 billion base pairs. Each chromosome has from 48 to 250 million base pairs. So a segment can be defined by the Start Location and End Location. Think of base pairs as a physical picture of a segment – it’s physical length and location on a chromosome.

Example: Start at 23,500,000; End at 46,300,000

Many of us round to the closest Megabasepair (Mbp). A Mbp is 1,000,000 bp

Example: The segment is 23.5-46.3Mbp

Rounding makes these numbers much easier to read and to type. And, in my opinion, there is virtually nothing lost in accuracy. Mbp is just fine for genealogy, triangulation and chromosome mapping. If you want to do some analysis of a particular part of a segment for some scientific or medical reason, you may want to use bp. (I’ll discuss “fuzzy data” and “fuzzy segments” in a separate blog post)

centiMorgans (cM) – I think of cM as a “quality” factor, or a “genetic distance” of sorts. The cM is the best measure we have of genetic distance, but it is far from perfect. The cM is empirically derived – that is scientists have recorded many observations and put them into tables. From these tables the cM value between any two points on a chromosome (as measured by bp) can be determined. In very general terms, more is better, and the larger the cM of a shared segment, the closer the Match would be. DNA is very random, and there are wide ranges of cM vs cousinship (including much overlap).  See these references for more info [1], [2] and [3].

Example: A segment may be 15.4cM

SNPs – the single molecules (nucleotides), or base pairs, which show some amount of variation in human DNA. Most (99%) of our DNA is the same. For genealogy, we are looking for SNPs (sometimes referred to as markers) which are known to vary. The difference in our SNPs is what sets us apart. Basically each SNP can have one of 4 values: A, C, G or T. Each of the autosomal DNA testing companies uses a slightly different “chip” to determine these values, and they each effectively test a different number of SNPs – usually in the range of 600,000 to 700,000 SNPs. These are spread out over all of your chromosomes – think of them as a sampling of your DNA (a sampling of the most variable parts of your DNA). This might range from about 10,000 SNPs on the smallest chromosomes to over 58,000 on Chromosome 1.

Example: A segment may include 2,451 SNPs

Note that there is no firm correlation between these measurements. We can convert temperature measured in Centigrade to Fahrenheit because they both measure the same thing. All of the above measurements, measure different things. However, on average, there are about 100 cM for 100 Mbp

So, in summary:

A segment may be described: Chr 6: 23,500,000 to 46,300,000; 15.4cM; 2451 SNPs

A short cut description may be 6: 23.5-46.3Mbp; 15.4cM; 2451 SNPs

References:

[1] http://www.isogg.org/wiki/CentiMorgan

[2] http://en.wikipedia.org/wiki/Centimorgan

[3] http://compgen4.rutgers.edu/mapinterpolator

03 Segmentology: Measuring Segments; by Jim Bartlett 20150513

22 thoughts on “Measuring Segments

  1. Hi Jim,

    I discovered your table of contents. So, here I am reviewing “basic” concepts.

    I came to understand cM as being the likelihood that the segment might be split in the next meiosis. And that it was strictly calculated on the basis of segment length over chromosome length. It’s a quick and dirty and probably-not-actually-true definition. I have some 70 cM segments on a bunch of related people, whose line seems never to subdivide that segment. And they are not closely related to me, from what I can tell so far.

    DNAPainter converts Mbps to cM. I assume Jonny programmed as described above. He has a disclaimer to say it is “approx cM”.

    And isn’t it true the smaller the # of SNPs, the lower the “quality” of the match? I’ve been trying to make sense of MyH joining segments across centromeres where, for the same kit uploaded to GEDmatch they are 2 separate segments. Does this mean that there are no SNPs sampled in the centromere or pileup regions, but that the algorithm happily joins them, just to confuse us?

    I’ve also been struggling with matches that won’t triangulate; ie >2 groups at the same position on a given chromosome (that was on MyH). I think this might be due to bad reads in one or more of the files, but I don’t know how to check into this, so I can move on with a clear conscience. Can you advise?

    Kate

    Like

    • Kate, The cM is an imperical value – it is derived from many observations. There is no formula. As it turns out the number of cM in a genome and the number of Mbp in a genome are roughly the same, so equating them would usually be an order of Magnitude. The companies use a lookup table to get the cM between any given start and stop for a shared segment. In my segment spreadsheet (over 20,000 rows), I have one column that has the formula end minus start (in Mbp) divided by cM. I can see the variation over each chromosome. Also, it is known that there are deserts and hot spots on each Chromome (unique to each one), where the likelyhood of recombination is low and high – it’s not uniform.
      MH does impute SNPs in order to be able to compare among the different testing chips being used these days. Jim

      Like

      • Kate, My advice is to test at all the companies and upload to GEDmatch, and to Triangualate the shared segments within each company and then merge the TGs. The TGs respresent a mosiac of your DNA – your DNA segments between crossover points by your Ancestors. Given this it’s fairly easy to merge the TGs between companies. I have 371 TGs which fully cover all the tested areas of my 45 chromosomes (well, about 6 of my “TGs” are holding areas where I don’t have shared DNA with any of my Matches). There are many Match segments in each TG, so any new Match segment is easily determined as to which overlaping TG it matches. Yes, along the journey to full TG coverage, some TGs were indeterminant or iffy – but now they are all pretty rock solid. It’s a lot of work, but the outcome is worth it. Jim

        Like

  2. Hello Jim,
    At GEDmatch on the one to one match when specifying graphics and full resolution the chromosome bars have two scales. One is Mbp, what is the other scale? Is it a metric scale?

    Like

    • Bill, It’s a cM “scale” of sorts. Our chromosomes have hot spots and deserts for recombination crossovers. The cM data is determined from lots of data. There is no conversion factor from cM to SNPs or cM to Mbp. You’ll note the carrot marks are not uniformly spaced like the Mbp marks are. So we cannot interpolate between the cM carrot marks – they are just a guide.

      Like

  3. Pingback: Getting Started with GEDmatch | segment-ology

  4. Pingback: The Porcupine Chart | segment-ology

  5. Pingback: Fuzzy Data, Fuzzy Segments – No Worry | segment-ology

  6. Jim, what do you make of a match that shows a single large segment match but not much else. Several times I have found a match of over 20 cM on one chromosome but nothing else over the default (7cM) threshold level anywhere else.

    Like

  7. Cheryl

    You say you used the Chromosome Browser and found all of these segments “matched”. Yes, you can use the CB and determine they all match you at the same location; but no, this does not mean that they match each other. At FTDNA it is impossible to check two of your matches to see if they match each other. You have to check them using In Common With to see if they are ICW each other.

    Like

  8. Hello Jim,
    I am a newbie to all of this but I found your article very interesting and understandable to someone with limited knowledge of DNA.
    I tested at Ancestry and 23andme, then uploaded to Family Tree DNA and Gedmatch. Last night I was reviewing some information.
    I was just reviewing mt DNA results on Family Tree DNA. I noticed that 21 of my matches matched me on Chromosone 19 at the exact range and they all had 8.15 cM. I also noticed that another seven matches were 9.14 cM at another spot on Chromosone 19, I have many more matches to compare but stopped when I noticed this pattern emerging. Is any of the information relevant? Does it means anything. My father is unknown to me. Thank you in advance for your help. Cheryl Glenn Bradley

    Like

      • Hello Jim, I am not sure what you mean about a pileup. I have checked the matches with me in the chromosone browser on FTDNA and we all match. I am not sure what to do next. Thank you for answering me.
        I do have a higher person on FTDNA who has a total of 127.61 and the longest is 60.22 on Chromosone 11 and 49.99 on Chromosone 17. There are two other matches who are on Chromosone 10 and one is 28.14 and the other is 30.73. From what I read and if I understand it, I might be able to find a common ancestor with them as their matches are higher in number. Is that correct? Since I do not know who my father is and have no information on him, is it possible to determine how I am related to these matches and how? Thank you so much for answering me.
        With Warmest Regards, Cheryl

        Like

  9. The SNPS may sometimes be a secondary measure, but not often. The 16cM segment is always good. The 9cM segment with the fewest SNPs is suspicious. I’d make sure that one matched at least 3 others.

    Like

  10. Hi Jim,
    I’m struggling a bit with the SNPS, so based on your comment, if I have results like below, that does not tell me anything other than the 16cm match is closer (in terms of generations) than the others? Here are a few examples,

    Match A with 16 cm 700 SNPS
    Match B with 9 cm 2800 SNPS
    Match C with 9 cm 500 SNPS

    If SNPS aren’t useful, to determine something, I’m wondering why this info is included Does it matter if the matches above were on the same chromosome (other than Match A is a closer cousin)? Told you I was confused!

    Thank

    Like

  11. Thanks, Jim. I was informed of your blog via Roberta’s. This “Segmentology” series are excellent! I’m able to apply immediate the methodology you’ve outlined. I’m also currently in a family project in which triangulation is proving to be an exceptionally helpful tool in deconstructing and reconstruction the “yellow brick road” to our common ancestor.

    Liked by 1 person

Leave a reply to jim4bartletts Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.