Revealing The Power of Whole Genome Sequencing
On March 11, 2010, researchers from the Institute for Systems Biology (ISB), in Seattle, WA announced that they had, very successfully, analyzed the first whole genome sequences of a family of four. ISB partnered with Complete Genomics, of Mountain View California, to sequence the genomes of a father, mother and two children – both of which had two recessive genetic disorders, Miller Syndrome (a rare craniofacial disorder), and Primary Ciliary Dyskinesia (or PCD – a lung disease). The results demonstrate the tremendous benefits of having the complete genome from an entire family with which to work – allowing the team to minimize the error rates in sequencing and increase the accuracy of the sequencing data to 99.999%.
This high quality data allowed the researchers to reduce the list of candidate disease genes associated with Miller syndrome down to only four. Another exciting discovery from this study is the first direct estimate of how much the genome changes from one human generation to the next – known as the intergenerational mutation rate. The researchers reveal that gene mutations from parent to child occurred at only half the most widely hypothesized rate. The complete findings are published in the journal Science, and are available online, through their website.
Dr. Leroy Hood, co-founder and president of the Institute for Systems Biology, and one of the paper's corresponding authors, is recognized as a world leading scientist in genomics and molecular biotechnology – having invented the four instruments that contemporary molecular biology relies upon. As a faculty member at Caltech in the 1980’s, Dr. Hood and his team developed the protein sequencer, the protein synthesizer, the DNA synthesizer, and the automated DNA sequencer – the last of which really became the key part of the equation in the successful mapping of the genome accomplished in the Human Genome Project of the 90’s. Since co-founding the Institute for Systems Biology in 2000, Dr. Hood has been pioneering systems medicine and the systems approach to disease.
We had an opportunity to catch up with Dr. Hood this week and speak with him about not only the details and significance of this particular genomics study, but also about some of the exciting things he sees in sequencing technologies, and what other future projects they hope to utilize these family genome sequencing techniques upon.
LabGrab: Dr. Hood, thank you for taking some time out of your day to chat with us. In this study you had a pretty unique family from which to sample - one where both offspring had two recessive disorders (Miller Syndrome and Primary Ciliary Dyskinesia) - but neither parent showed either of the genetic abnormalities. This study was also the first to sequence the entire genome of multiple members of the same family. Why is this so significant?
Hood: By sequencing entire families you can actually use the principles of Mendelian genetics to correct about 70% of the DNA sequencing errors. Getting DNA sequence that accurate allows us to do many things such as narrowing down the list of candidates for disease genes for simple genetic diseases.
In this particular study, from thousands of possible candidates that might have been forecast if we just had sequenced one member of the family - we were able to narrow that candidate list to just four genes. And then it was easy to make the relative assignments of which of the four went with each disease.
LabGrab: Do you believe it possible that these gene detection techniques could be effectively used on individuals who might be connected farther out along the limbs of the family tree - or is it necessary to have a parent / child genomic sample for comparison?
Hood: No, I think you can actually do three, four or even five generation families and get an enormous amount of information that would be useful, even for the much more distant relatives. I think that the principals extend to the family as a whole. In fact, we're in the course of getting ready to sequence one family that has five generations - so that will be really interesting to see. And we believe that by sequencing more than just two generations that the accuracy of sequencing will be improved even further.
LabGrab: We noted from the press release that your research group partnered with the company Complete Genomics out of Mountain View, California to handle the sequencing of the family's genome. Can you comment on the types of new sequencing technologies used in this study?
Hood: It is probably too technical for me to comment on in this forum. It uses a really new strategy of sequencing by hybridization together with very clever molecular biology - in fact, at first I have to say that I was very skeptical about whether the technique would work initially. But they have it working quite beautifully, and they've got very, very high quality data - we are really pleased with it.
Part of the accuracy of the data comes from the fact that we are using the entire family to correct sequencing errors - but getting errors that are less than 1 in 100,000 is pretty powerful.
LabGrab: One of the goals that the NIH put out last year was to develop revolutionary technologies that would reduce the cost of sequencing an entire genome to under $1000. What do you estimate the cost for each of the genomes sequenced in this project to be?
Hood: About a year ago when we negotiated this work with Complete Genomics the cost came out to around $20,000 per genome. However, I'm fairly confident that with the improvements they've made over the last year or so, that the cost for our next sequencing project with them will be roughly half that amount. The feeling is that within 5 years, we'll be down maybe below the $1000 number.
LabGrab: In your opinion, what are the major hurdles that must be overcome to reduce the cost to the $1000 threshold - is it more a limitation of generating the sequencing data itself - or is it on the back-end processes of the sequence assembly and verification / error-checking?
Hood: I really think it will be a combination of the two. If you look at illumina or SOLiD (from Applied Biosystems) - which are the two major commercially available kinds of sequencers now, they both claim that by the end of the year, the sequencing itself - the reagents and so forth - will be down to perhaps $3,000 per genome or so. Now you still have to account for the amortization of the instrumentation, the people that are working on the sequences and, most important of all, the whole assembly and analysis end of things. And my feeling is there will be an optimization of all of those things.
Certainly we'll really be able to optimize the assembly and analysis part - that's just very straight forward, creating pipelines that are tuned to the particular sequencing methods, and we've gone a long ways towards doing that with the Complete Genomics technology. A lot of the principles that we've developed in terms of new software pipelines can be applied to any kind of sequencing instrumentation.
I think the real challenge that's going to come - and it's going to be a part of the whole discovery process of medical science - is what does this genome mean. Right now there are really limited things that we can infer from a complete genome sequence, just because they haven't been available - we haven't had the information. We've just barely begun to do the correlation between genotype and phenotype. And that's going to change just dramatically in the next ten years. There will be an enormous amount of information that can be mined from an individual genome sequence in the next 10 years. That information will give deep insights into future potential health history of the individual, as well as specific details about things that they really have to watch out for.
I think it's still a race to see which of the approaches is really going to work effectively, and I think the real question is which will scale for the millions of people whose genomes we'd like to do in the future as a part of classic medical records. I think in 10 years, many people will have their sequenced genomes as part of their medical records.
LabGrab: What are the next projects your group will be working on in this area?
Hood: In the area of family genome sequencing, we are collaborating with both the Gladstone Institute and Mass General Hospital to move on to a very extensive study of 60 to 100 individuals from families that have Huntington's Disease. And there what we're looking for is not the disease gene itself - which is already known - but we're looking for genes that modify the behavior of the disease gene. That is, genes that will let them get it early in life, or genes that will let them get it very late, or will ameliorate the effects of the disease. So, in a sense that's the next stage of complexity.
And then, a follow-up study to that eventually will be looking at families who have Alzheimer's, because that's a very complicated disease, for which very little is known about the genetics of how it's inherited. We think actually that with the appropriate stratification of Alzheimer's for it's distinct types, paired with these kind of family studies, we'll be able to move forward and make real progress with that disease once we get started.
LabGrab.com would like to extend a special thank you to Dr. Lee Hood for his generous donation of time.
Visit the Institute for Systems Biology Website