yDNA whole chromomsome testing – The Whats and Whys of FamilyTreeDNA’s BigY test

Recently I got the results back from my Y chromosome test at FamilyTreeDNA. I want to share my results and use them to illustrate the benefits of taking such a test. I’ve writen this article for people who are interested in their (or their male relative’s) yDNA and are thinking about  how much time, effort and money they invest in it. I’ve split this article into two parts, the “Whats” and the “Whys”, but there is inevitably some overlap between the two. Before I dive into these I think it’s useful if I try to do short explanation about the Y chromosome.

yDNA for Beginners

As you may remember from school, we humans normally have 23 pairs of DNA. 22 are the matching autosomal chromosomes. The last pair are the sex chromosomes. If you are female you have two X chromosomes (one each from both parents), if you are a male you have one X chromosome from Mum and one Y chromosome from Dad. Given that Dad only has one Y chromosome to begin with, his chromosome is normally passed down exactly as is. As a result the Y chromosome can be used as the basis to trace your paternal line.

STRs and SNPs

There is a big however in what I just wrote. The process of copying the Y chromosome is prone to occasional errors. These errors, referred to as mutations, allow us to create a genetic network of how and when the mutations occurred. To plot these mutations genetic genealogists can use two methods. The first is to look at STR (Short Tandem Repeats). These are parts of the yDNA that form short repeated patterns (typically repeating somewhere between 8 and 32 times). Rather like an office photocopier they sometimes make mistakes and produce one too many, or one to few copies. Examining the areas of the Y chromosome that show best variance in these repeating patterns means you can identify other males that are likely to share a common paternal ancestor with recent history. These tests can be taken through familytreeDNA.com and used to evaluate 12, 37, 67 or 111 STR markers. The results are a set of numbers that represent how many repeats there are of each STR and you will see them referred to as a Haplotype. Below is an example showing my 12 STR marker results:

My 12 STR marker results

As an idea of what to expect from this testing, I’ve taken the 37 and 67 marker tests, but have yet to find any significant genealogical matches. I do match 7 people at a 25 marker test and 191 at the 12 marker test. These matches allow for a certain amount of genetic variation, so my single best match is probably (95% chance) share a common paternal-line ancestor within the last 16 generations. Bear in mind that my results are probably a bit of an outlier i.e. I seem to have very few paternal-line relations. Other people have been much more successful with STR tests, for example when trying to identify shared Colonial American ancestors.

The alternative approach is to start from the beginning (the biological Adam) and look at individual points on the Y chromosome and see if they match the standard reference sequence for those points, i.e. they have the expected base of cytosine (C), guanine (G), adenine (A) or thymine (T). Occasionally there will be mutations. These are called  Single Nucleotide Polymorphisms (hereafter referred to as SNPs). There are a number of companies that will do this either as a separate test or, typically, as part of an “ancestry” test-bundle. The tests I’ve taken and the results are shown below:

Test Provider Cost 1 yDNA haplogroup
LivingDNA 2  €129 I-Z138
Genographic Project v.2 3  $99 I-Z138
23andMe 4 €99  I-M253
FamilyTreeDNA 5  $59 I-Z138

1 Based on prices as at 31 November 2017. Note: All links are direct links to the vendors websites. I DO NOT use affiliate links in this post. All links are https for your privacy.
2 available at www.livingDNA.com
3 available from either directly from National Geographic or via Helix.
4 available at www.23andMe.com This company also provides health reports in some parts of the world, so prices vary depending on which test you take and where in the world you order the test from.
5 Based on the yDNA 12 STR marker results. It’s hard to find this test on the ftDNA website, but this link should take you to the correct place.

On top of these companies you can also get a top-level haplogroup from YSeq. Their Top-level Orientation panel test costs $159. Based on my results this would have also tested me with the “I1 Superclade panel” for free (normally costing $99) and would have taken me to the yDNA haplogroup subclade I-S2268. That is 3 nodes further “downstream” (or better refined) than the I-Z138 results I got from most other companies.

Navigating the yDNA haplotree

OK, I’ve mentioned a few times about haplogroups, sub-clades and upstream/downstream branches, it’s probably worth explaining briefly a little better about the yDNA haplogroups and haplotrees. The Human yDNA haplotree starts with “yDNA Adam” at the base. He is the paternal ancestor of all known humans. At some point one of his males descendants had a mutation, a SNP, in his yDNA. As a result there were some of “yDNA Adam’s” descendants who had this mutation and some who didn’t. Each of the major mutations that have occurred in Humans since “yDNA Adam” is labelled as a “Haplogroup“, for example I’m Haplogroup I. The order that these mutations occurred is mapped into a Haplotree. We also have some idea of where and when these mutations occurred. The structure of the yDNA Haplogroup tree and the likely locations of the each haplogroup mutation’s origin are very nicely shown in the results you get from LivingDNA, the Genographic Project or FamilyTreeDNA. As an example I’m using my results from LivingDNA to show these.

The yDNA Haplotree starting at the yDNA Adam – Screen capture from LivingDNA.com

The likely origin locations of yDNA Haplogroup I – Screen capture from LivingDNA.com

Now the interesting point is that as your results get more refined you get into smaller and smaller groups (hence the tree analogy, from truck to branches to twigs and the leaves). Theses smaller grouping are commonly referred to as sub-clades (although they are also referred to as haplogroups). The notation of these haplogroups also changes. Whilst the top-level haplogroups are a letter of the alphabet (e.g. I), the next level is identified by a letter plus a number (e.g. I1) and after that you will normally add the name of the most recent known mutation (e.g. I1-Z138). As you may have seen earlier from my results I am identified as either I1-M253 or I1-Z138. The difference is that I1-Z138 is a more precise result than I1-M253 (Note I1, I-M253  and I1-M253 are in this case the same haplogroup – confusing huh ? – This is due to the fact that the naming convention changed a few years back when people realised how diverse the yDNA Haplotree was going to be). I’m using my results from my LivingDNA test to show when I-Z138 sits within Haplogroup I. The same structure can been seen at the yFull and ISOGG websites.

The main branches of Haplogroup I1 – Screen capture from LivingDNA.com

The other obvious thing to state is that as a haplogroup result becomes more refined the dates of origin of the mutation become closer to the present time. How close it will come depends on both how deeply you test and how many people with there are with similar results to you.

Beyond your Haplogroup Results

Once you have received your top-level Haplogroup results the question is what comes next. There are a few options, depending on your motivation for yDNA testing.

  • Do Nothing – This is a valid option. If you are hoping to connect with genetic relatives through DNA testing then you may well get most use out of your time by investing it in understanding your autosomal DNA results and contacting people your DNA matches.
  • SNP Testing – You can take you results and either based on your own research, or with the guide of others, take individual SNP tests that will guide you to a more precise Haplogroup. Both FamilyTreeDNA.com and yseq.net offer these SNP tests, either as packs or as individual tests. In most cases you will find that joining a Haplogroup project will help guide you with this testing. The projects are normally run through ftDNA. A list of some of the projects is available here. Having said that my personal experience has been that the most relevant yDNA project for myself is most-active as a facebook group, which has the advantage of allowing poeple who haven’t tested at FamilyTreeDNA to participate. As you will have seen earlier the I1-SupercladePack at ySeq can give you a quite precise Haplogroup for a comparatively low cost.
  • Comprehensive SNP testing – This is a rather convoluted title (which I will explain in a while) but this is where the BigY test from familytreeDNA.com comes in.

The BigY Test

In simple terms, the BigY test is a test of the most genetically-relevant parts of your Y Chromosome. This means ftDNA test “Around 11.5 to 12.5 million base-pairs of reliably mapped positions of non-recombining Y chromosome at 55X to 80X average coverage“. To give you some idea the total yDNA chromosome is over 59 million base pairs. Out of these test results the BigY test will check against ~70,000 Named SNPs. Detailed information about the BigY test from FamilyTreeDNA can be found here. Obviously BigY is not a test of the complete yDNA as there are parts of the Y chromosome that are not worth testing, or cannot be tested with the current technology. You do need to be aware that this is a relatively-expensive test. It costs $475 and requires that you have already done  a yDNA STR test.

Results of a Big Y Test

When you get a BigY test result the information can be a little overwhelming, however there are four main elements of your result.

Terminal SNP assignment

This is the single most important part of the test. You will receive a “Terminal SNP” designation. This is most precise, and the most recent mutation in your yDNA you share with other tested males. As you can see my terminal SNP is I-PH2658. It is 4 branches further down the yDNA haplotree that the result I-Z138/I-Z139 I got from most of my other yDNA testing. The graphic below shows my Terminal SNP assignment, and also the four upstream SNPs that I’m positive for.

My Terminal SNP -I-PH2658

Now if you are paying attention you may wonder why I wrote I-Z138/I-Z139: This is important – at the moment everybody who has the mutation I-Z138 also has the mutation I-Z139 (as well as a two other mutations S5204 and Z2540). As you can see FamilyTreeDNA actually designate this haplogroup as I-Z139. Some Haplogroups, especially top-level ones, are defined by hundreds of mutations, for example I-M253 is defined by over 300 mutations. In this case what is believed to have happened is that there was a “population bottleneck” in the I-M253 population. In other words there was a small population that had the I-M253 mutation that existed for maybe hundreds of generations without a significant expansion in the population. The net result was that the male population at the end of the bottleneck were all carrying the same set of genetic mutations, all other sets of mutations having died out during (or possibly after) this bottleneck.

Named and Unnamed Variants

This is a bit of an ugly title (but maybe one that Donald Rumsfeld would have used). These are two lists of your results. The named variants are the more common and older mutations you will share with other testers. The SNPs will normally been placed on the yDNA haplotree, although it is possible that their location is incorrect. At the time of writing my BigY report indicates I have 1,050 named variants. The unnamed variants are normally the more interesting ones. These are expected to be more recent mutations whose position in the yDNA Haplotree is yet to be determined. These Unnamed variants may even be unique to one surname/family. I have 20 unnamed variants from my BigY test results, as shown below.

My Unnamed yDNA Variants (Dec 2017)

List of Matches

On top of your Haplogroup assignment you will get a list of your closest matches. This provides some of the most interesting information from the whole set of test results, as these people have the most recent shared paternal ancestor. At the time of writing I have three close matches, as shown below:

My three closest paternal matches. Notice the all share a number of Named Variants

A BAM file

A BAM file is simply a compressed binary file with all the data from your BigY test. Unless you are planning to analyse the raw data by yourself you will not spend much time looking at the bam file directly. It can, however, be passed to other people or companies with the experience and tools to study it. Specifications for the bam file can be found here. The bam file is something you need to request directly from familytreeDNA.com via their contact page.

Why take a BigY test ?

If you look back at the screenshot showing my BigY matches piece of information jumps out. All three matches share three Named SNPs that I don’t have. These are A6554, A6588 and S2275. These are SNPs that all three matches inherited from a common ancestor after the time when I shared a common ancestor. This means we have a new branch within the I1-Z138 tree. If it helps you to imagine this then hopefully the image below helps:

 

Part of the I-Z2541 sub-clade, showing how I branch off from the other testers

In this illustration I have also included all the additional SNPs that are currently grouped together. As you can see there are 16 SNPs all grouped with S2268. Before I tested the three additional SNPs (S2275, A6554 and A6588) were also in this group. Each one of these 19 SNPs represents a branch in my part of the yDNA Haplotree. It is only by people like myself doing a BigY test that we will be able to determine the order that these mutations occurred. As a result of my testing we now know that S2275, A6654 and A6588 provide one branch under the I-PH2658 subclade. The aim is that eventually each and every one of these 19 SNPs is placed correctly as a branch of the I-Z2541 Haplotree. What we don’t know is if there are people still alive who’s DNA can sort the order of every SNP with these grouping. It may be that a population-bottleneck occurred, just like is suspected with the I-M253 haplogroup.

The other result from a BigY test are the Unnamed Variants (or Novel SNPs to use the YFull terminology). These results are just about as useful as the results above ones, although you may need some patience before you see their value. The “Most Recent Common Ancestor” I happen to share with my three matches is some generations earlier than the ones they share with each other. My yDNA results prove there is at least one male, me, who is descended from a different branch of the tree. The question is, are there more. I know there are some males who share the same surname as myself and, at least from our paper trail, share my oldest known paternal ancestor (Henry Grass b.1764 in Brandon, Suffolk, England). It’s possible, indeed probable, that there are other males who descend from the ancestors of “my Henry” via other lines. The hope is that as more people test then each of the Unnamed Variants/Novel SNPs will represent a new branch on the I-Haplotree.

Analysis by yFull.com

FamilyTreeDNA isn’t the only organisation that can help you understand your BigY results. You can use your BAM file to transfer your data to yfull.com for analysis. Their analysis costs $50, and you only need to pay to unlock their results once they have completed their analysis. The obvious question is why you should do this. There are a few benefits that are particularly useful.

  • YFull provide detailed reporting, including the quality, of all Novel SNPs. Novel SNPs are basically equivalent to the Unnamed Variants that FamilyTreeDNA identified, however there are cases where only one company identified the SNP mutation. For example yFull has 4 SNPs at “Best Quality”, but only one mutation (at position 15250191 T->A) is actually on my list of Unnamed Variants at FamilyTreeDNA. Below is my report, showing that I have 27 Novel SNPs, grouped by the quality/quantity of DNA test results. As part of this service yFull also register/name all of your Novel SNPs. These are the YFS numbers you see in light grey next to each SNP.

yFull Novel SNPs report

  • YFull regularly update their version of the yDNA Haplotree, typically every couple of months in my experience. Your position in the haplotree may be seem here much quicker than on the FamilyTreeDNA.com tree. It’s worth looking at this report to see if there are any negative results in each Haplogroup. When I first received my report from yFull it showed that SNPs A6554, A6588 and S2275 were part of S2268 but all were negative. This was the first time I was able to see the split of this part of the yDNA tree.

yFull.com yReport, showing my terminal SNP in the yDNA tree

  • YFull provide an age estimation of TMRCA (Time to Most Recent Common Ancestor) for each SNP. This isn’t, yet, perfect, but it does help you understand better when each branch/twig of the yDNA tree formed.

yFull-Age Estimation I-Z2541

  • YFull report on 500 STRs within the y chromosome, including all the markers FamilyTreeDNA provide in their 111 STR marker test.
  • YFull provide a SNP matching service, which co-incidentally shows me matching the many of the same people as at FamilyTreeDNA (this is mostly because we all belong to the same Haplogroup group on Facebook, where transferring your BigY results to yFull is encouraged).
  • YFull run a “Groups” function, similar to the facility provided by FamilyTreeDNA.

The exceptions that disprove the rule

You may have noticed that one other SNP, M10781 is also shared by all three of my matches. This is SNP M10781. Checking at yFull on my results for this SNP shows the following:

My results for SNP M10781

There are three useful bits of information here.

  • This SNP is also known by two other names, ZS4770 and S20812
  • The normal (ancestral) value is G (guanine) and the mutation is an A (adenine).
  • My data shows 39 reads at this position, all with the ancestral value of G, so yFull are pretty sure (100% probability) that I have a value G at this position.

Given the above you would think that the SNP M10781 should join the three SNPs S2275,, A6551 and A6588 as a separate branch of the Z2541 sub-clade. The answer is not so clear. If you check the SNP M101781 at yFull you will see that this mutation terms up at many places in the Haplotree, included the S2275 branch. As a result of these numerous “sightings” it’s not considered a reliable SNP. It’s complicated !

The SNP M10781 / ZS4770 / S20812 turns in approximately 30 different places in the yDNA Haplotree

The Alternatives to BigY

FamilyTreeDNA are not the only company that offer Y chromosome testing. Full Genomes Corporation offer their “Y Elite Ancestry Test for Men” for $645. Dante Labs will test your whole genome for $750.  Another alternative is the “Whole Genome Sequence” available from ySeq, with prices starting at $740. I’ve no doubt that other competitors will join this list, and that prices will come down as the volume of testing goes up. As far as I’m aware all three sets of test results can be transferred via a BAM file to yFull.

Final Thoughts

It took way longer than I expected to write this blog post. This is perhaps and indication of how difficult it is to explain the whole topic. yDNA testing is something that many beginner genetic genealogists may need some help and support from. I’ve been very lucky that a Facebook group focused on the I-Z138/I-Z139 subclade has been there to help me.

When you first get your BigY results you may find it hard to understand/interpret the results. In part this is because you are doing a discovery test. Both FamilyTreeDNA and yFull will take some months to update their Haplotree models to reflect your results. During this period you may find it useful to work within a Haplogroup project to help interpret your results.

BigY testing isn’t cheap and it doesn’t provide much immediate information for a genealogist. As such you need to think about how relevant/interesting it is to invest your time and money in yDNA testing. That said I’m happy to have invested. ‘ll admit I first took some individual SNP tests at yseq.net before I bought in to the more-expensive BigY test. This meant that the ySeq tests were redundant and a cheaper path would have been to invest in the BigY test directly. I would suggest that the ySeq testing acted as a “gateway drug” encouraging me to finally invest in the BigY test.

In theory much of the information you get from yFull is the same as what you receive from FamilyTreeDNA. Nevertheless I consider it a worthwhile investment getting yFull to review your results, as the additional reporting they provide is useful and they seem to update their version of the yDNA Haplotree more often than FamilyTreeDNA. Just remember that they maintain their own Haplotree, so it may look different than the one from FamilyTreeDNA. As an example they consider my Terminal SNP as I-S2268, rather than I-PH2658. The reality is that both SNPs are part of the pack of 16 SNPs that currently form one branch of the haplotree.

Finally, at the time of writing this FamilyTreeDNA were in the process of updating the reference model of the Human Genome they use from version Hg19 to Hg38. This update means that ftDNA are now using the most current version of the reference genome (released 2013). The down side is that it meant all existing BigY results had to be re-calibrated against the latest reference model. At the time of writing (December 2107) this process doesn’t appear to be complete, and therefore not all your yDNA matches will be showing.

 

Advertisements
This entry was posted in BigY test, FamilyTreeDNA, Genetic Genealogy, yDNA. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.