Pile-ups

What’s all the buzz about “pile-ups”?  In my mind there are three kinds of pile-ups: small, medium and large. They are different, so it’s important to understand each one. In this case Goldilocks should prefer the large pile-ups, but let me go through my views of all three kinds.

Alert: This post contains my opinions about small pile-ups and AncestryDNA (based on my own experience) so you should make your own judgments.

Background

I think the two keys to success with autosomal DNA lie in a robust Tree (as many ancestors out to 13 generations as possible) and as many Match-segments as possible (including as many close relatives as you can get). I spent about a year expanding my Tree as best as I could, and then posted that GEDcom in several places. I’ve tested at all three companies and use GEDmatch.  I put every single shared segment I can find over 7cM into my spreadsheet, and I periodically run a Quality Control check against a fresh download to pick up any missed Matches or segments. I currently have 5,000 different individuals with segment data in my spreadsheet, and have determined a Common Ancestor (CA) with 309 of them.

I have compared virtually every segment against other overlapping segments, and formed Triangulated Groups (TGs) that cover over 90% of my 45 chromosomes. It is now rare for me to get a new shared segment that changes my chromosome map in any way. This process has provided some insights on medium and large pile-ups.

Pile-ups

My definition of pile-up sizes:

  1. Small is smaller than 5cM
  2. Medium is 5-10cM
  3. Large is greater than 10cM

Small pile-ups – by my definition, these pileups are composed almost entirely of IBS shared segments. When AncestryDNA first rolled out their autosomal DNA test, their threshold was 5Mbp. This threshold included many shared segments well below 5cM, and resulted in many thousands of bogus Matches. To their credit, they provided a caution about these. When AncestryDNA revised their threshold to 5cM, many of these Matches went away. Part of their explanation was the elimination of “pile-ups”.  I agree that these “small pile-ups” should be eliminated. And when they reset their threshold to 5cM, that should have eliminated this problem. However, their explanations continue to stress the elimination of “pile-ups”. I just hope they don’t also toss out Matches in larger pile-ups – throwing the baby out with the bath water.

Medium pile-ups – 5-10cM range. As I gathered as many segments over 5cM as I could and sorted them in my spreadsheet, I noticed a few areas that had many such segments, all in a very narrow chromosome area. Very clearly a pile-up! Virtually none of them matched each other, although they had almost the same segment start/end locations. And there were a lot of them – many more than in large TGs.

In discussions on various email lists, we compared notes, and found that most of these areas were unique to our own experience. In general they were not due to some common feature of most human genomes. A notable exception to this blanket statement is the HLA Region on Chromosome 6 – roughly from 29.8 to 33.1Mbp.

However, most of the other areas were not tied to known issues like the HLA Region. In my analysis, it was not possible for me to link these to one parental side or the other. The fact that these areas include so many IBC segments indicates to me that it’s the combination of both of my chromosomes (maternal and paternal) that allows the “matches”. It’s the unique combination of alleles in these small stretches of DNA that make matching much easier. And this unique combination is only in my genome. On chromosome 18, I have 307 segments in the 7 to 11 cM range. They are all in a very tight area:  from location 5,800,000 to 8,700,000bp.  Very few of them triangulate.

Sometimes the pile-up area has been documented. On chromosome 15, I have 281 segments in the 7 to 10cM range. They are at: 24,000,000 to 28,000,000 bp. This area partly overlaps a known pile-up area (20,100,000 to 25,200,000). But the known pile-up area is only partly the cause in my case. See 14 small pile-up areas found by Li et al (2014), listed at the ISOGG Wiki: http://www.isogg.org/wiki/Identical_by_descent These medium pile-up areas, and a few others in my experience, are characterized by a very tall pile-up of many segments about the same size in a narrow area just a little larger than the segments. The Li et al (2014) article refers to “regions where excess IBD is detected…” Virtually all of the segments I have noted above are IBS/IBC – they do NOT triangulate with the other segments.  A few segments in these regions do triangulate with known close relatives, and each other. I’ve kept those segments in maternal and paternal TGs, as appropriate, covering that area. After all, both my mother and father gave me those areas, and they in turn got them from their parents, etc.  It is very probable that these segments are IBD and come from a CA.

My experience is that these are areas with a lot of shared segments in the 7-10cM range that are in a tight area, usually just 10cM wide, and a very high proportion of these segments are IBS/IBC.  A few segments in these areas will be IBD, but they will tend to be larger than the 7-10cM segments.

My bottom line for these pile-ups: Unless you have a lot of free time, skip over these areas – particularly the shared segments under 10cM. Concentrate on triangulating any larger segments in these areas and then move on to other areas.

Large pile-ups – these are my favorites. Larger shared segments (over 10cM) that spread out and overlap each other over wider areas.  These segments tend to triangulate with each other, forming TGs on both sides.  I have some of these TGs which include over 50 shared segments.  Since the shared segments triangulate with each other, this is a good pile-up. These TGs are large because more people have these shared segments – probably because the Common Ancestors had large families in Colonial America, leaving us with many, many cousins. Another reason could be a more distant Common Ancestor, who would also leave us a large number of cousins.

In some cases we can use this observation to our advantage. I have a 2nd cousin, on his mother’s side, who is also an 8th cousin, on his father’s side. Our close Common Ancestor was an immigrant to the US in the mid-1800s, and I get relatively few Matches on the segments I share with him. However, on one segment, we have many Matches – it turns out our Common Ancestor is on his father’s side. The tip-off should have been the size of the TG (measured by the number of Matches).

Another observation about large pile-ups…. They will get larger. The number of folks taking an atDNA test is about doubling every 12 months. A consequence of this is that all of our TGs will also double in the next 12 months. So, if you have pile-ups now, they will about double by this time next year. Use these larger TGs to your advantage – work with the Matches to investigate place/time matches, if a Common Ancestor is not easily determined.

Summary

  1. In general, don’t work with shared segments below 5cM. Most are IBS/IBC – even if they appear to triangulate. We don’t have a good test below 5cM to indicate IBD.
  2. Watch for, and avoid, pile-ups in the 5-10cM range. These are characterized by many shared segments in the 5-10cM range in a very tight location- usually only 10 or 11cM wide. Move on to larger shared segments in other locations.
  3. Embrace the large pile-ups. They may from Common Ancestors with large families and/or more distant Common Ancestors. In either case, work with the Matches in these TGs as a Team to determine the Common Ancestor.

18 Segment-ology: Pile-ups by Jim Bartlett 20151007

20 thoughts on “Pile-ups

  1. Excellent work and very informative. A very nice 71st birthday present as well.

    In reference to Colonial Ancestry and your suggestion that it skews results; One issue with my tree is that almost all of my ancestors are Colonial American. Most of the rest are Canadian or are not identified yet. I believe that my newest immigrant ancestor arrived in 1829. Probably half of my ancestors were in the Americas pre 1700. A large number of my AncestryDNA matches show multiple, indicated by clickable arrows, up to 5 separate and different matches. What impact do you think might this sort of a tree might have. The overwhelming number of Ancestors go back to the 1600’s

    Also note, my tree is a 80,000 plus person “data mining tool” Questionable new data is allowed. But, errors are diligently searched for and when detected ruthlessly corrected. atDNA has been very useful in finding and correcting errors. I’ve bought atDNA kits for 8 people and work with a number of others who administer a similar number of kits.

    Like

    • Sam – Thanks and Happy Birthday! It looks like you are well on your way with atDNA. I’m working on a blog post about endogamy. I personally believe it’s an overblown issue – some impact, but not much – but I’m crunching the numbers to give a fair look at it.

      Like

      • Have you written anything on working with endogamous groups? Or can you recommend some reading that could help? The lines on my paternal side are very tangled. The group that I’m working with has about 60 kits on gedmatch. I match 88% of this group. The highest kit matches 93% of the group with the lowest matching 50%.

        Like

      • Jeannette,
        I’ll start with the effect of one set of first cousins as ancestors and then extrapolate to endogamy – we’ll see how the numbers shake out.

        Like

  2. Dear Jim, Thanks for sharing your work and your conclusions. I appreciate same and I have learned a lot from you these past few years. Linda McKee

    Like

  3. Thanx again for very timely and informative article. I believe that Ancestry.com has thrown lots of babies out with the bath water. I can point to 4 matches quickly on my list with segments of 18.4 to 19.7 cM that disappeared with Ancestry’s first purge. The same 4 matches remain on the list of my 2nd cousin and I believe all of my segments are larger. I’m sure that there are many more. My confidence in their process is waning!

    Like

    • Jeanette,

      Thanks for the encouragement. I’ve heard a number of stories of large segments which dropped out as well as Ancestry kit’s found at GEDmatch with large IBD segments that AncestryDNA didn’t report. That’s why I like GEDmatch so much – yes, I get more IBS segments, but with their comparison tools, it’s easy to cull them out.

      Like

  4. Very good article! I agree 100% that just calling the large pile-ups “pile-ups” is misleading because they are completely valid and the ones you WANT to focus on. John Abbott.

    Your success with triangulation is very encouraging to the rest of us –thanks!

    Like

    • John, That’s exactly why I wrote the blog post. “Pile-ups” was turning into a bad name; and folks were beginning to believe that all pile-ups were bad news. As in many things, we need to look at the details and sort it out.
      Triangulation, I admit, is hard work. So I recommend just starting on some areas that are interesting to you. Gradually you’ll build out the framework of your segments on both sides.

      Like

  5. Hi Jim
    Do you have a sample spreadsheet that you use for chromosome painting and triangularization that you would be willing to share? I would like to start building a chromosome map and would appreciate starting with a tool that has been exercised and is set up fit for the purpose…

    Thanks for your blogs!
    Douglas

    Like

    • Douglas, I’m working on several blog posts about spreadsheets. Mine has evolved, a lot, over the past 5 years. One caution: whatever you want to see in a spreadsheet, is what you have to enter and maintain – it’s literally a two-edged sword. I’ve got way too much data in mine. Way more than is needed for just Triangulation. But, I also use my spreadsheet as my master atDNA tool, diary and repository – it keeps everything in one place. I emailed you the format – watch for the blog post.

      Like

  6. Thanks for your marvelous posting! I truly
    enjoyed reading it, you could be a great author.I will
    ensure that I bookmark your blog and will often come back down the road.

    I want to encourage yourself to continue your great job, have a nice day!

    Like

  7. I decided to use the GEDMatch chromosome browser to look at the top hundred or so my top X-chromosome matches. As you know, you can’t use X chromosomes to prove much if anything about distant genealogical relationships, but I wanted to look for patterns in what chromosomal regions are matched. I notice that there are definite patterns with strong pileups on the X chromosome, with several distinct regions, so it basically looks like 3 stripes on that chromosome,, and on a region in chromosome 12 in particular. Some chromosomes look like scattered noise, but not those.
    However – I know that Bayes and the birthday paradox may both play a role in these patterns showing up. I need to do more sophisticated analysis. Has anyone tried anything similar?

    Like

    • Lois,
      I don’t know of such a study. IMO, each segment of our DNA came from a specific ancestor. All the DNA on a chromosome, including Chr X, came from a parent, who got it from their parent, etc. Somewhere going up our ancestry, that segment is a recombination from 2 ancestors, and it stops being an IBD segment. Drop back one generation and that ancestor is the last one to pass that entire segment to you. Without chromosome mapping we can’t usually tell which ancestor is the ultimate one. See my post on the porcupine chart. With Matching cousins, we can “walk each segment back”, and find this out. Somebody had to pass down each segment of Chr X. Yes, smaller segments may be from distant ancestors, so we need to look for the larger segments with closer cousins.

      Like

  8. Echoing other replies here over a year after this post: this is extremely helpful information as are all your posts. Thank you for sharing and explaining your experiences. This seemed like it was too hard till I started reading and re-reading your blog.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s