Thousands of never-before-seen genetic variants in the human genome have been uncovered using a new genome sequencing technology. These discoveries close many human genome mapping gaps that have long resisted sequencing.
The technique, called single-molecule, real-time DNA sequencing (SMRT), may now make it possible for researchers to identify potential genetic mutations behind many conditions whose genetic causes have long eluded scientists, said Evan Eichler, professor of genome sciences at the University of Washington, who led the team that conducted the study
To date, scientists have been able to identify the genetic causes of only about half of inherited conditions. This puzzle has been called the “missing heritability problem.” One reason for this problem may be that standard genome sequencing technologies cannot map many parts of the genome precisely. These approaches map genomes by aligning hundreds of millions of small, overlapping snippets of DNA, typically about 100 bases long, and then analyzing their DNA sequences to construct a map of the genome.
This approach has successfully pinpointed millions of small variations in the human genome. These variations arise from substitution of a single nucleotide base, called a single-nucleotide polymorphisms or SNP. The standard approach also made it possible to identify very large variations, typically involving segments of DNA that are 5,000 bases long or longer. But for technical reasons, scientists had previously not been able to reliably detect variations whose lengths are in between — those ranging from about 50 to 5,000 bases in length.
The SMRT technology used in the new study makes it possible to sequence and read DNA segments longer than 5,000 bases, far longer than standard gene sequencing technology.
So far unable to find the genetic secret of supercentinarians
If supercentenarians have a magic gene that helps them reach this age, it is lying low. A thorough search for longevity gene variants in 17 supercentenarians – average age 112 (the oldest was 116) – has so far drawn a blank. This research may not have tried the SMRT technology.
This “long-read” technique, developed by Pacific Biosciences of California, Inc. of Menlo Park, Calif., allowed the researchers to create a much higher resolution structural variation map of the genome than has previously been achieved. Mark Chaisson, a postdoctoral fellow in Eichler’s lab and lead author on the study, developed the method that made it possible to detect structural variants at the base pair resolution using this data.
To simplify their analysis, the researchers used the genome from a hydatidiform mole, an abnormal growth caused when a sperm fertilizes an egg that lacks the DNA from the mother. The fact that mole genome contains only one copy of each gene, instead of the two copies that exist in a normal cell. simplifies the search for genetic variation.
Using the new approach in the hydatidiform genome, the researchers were able to identify and sequence 26,079 segments that were different from a standard human reference genome used in genome research. Most of these variants, about 22,000, have never been reported before, Eichler said.
“These findings suggest that there is a lot of variation we are missing,” he said.
The technique also allowed Eichler and his colleagues to map some of the more than 160 segments of the genome, called euchromatic gaps, that have defied previous sequencing attempts. Their efforts closed 50 of the gaps and narrowed 40 others.
The gaps include some important sequences, Eichler said, including parts of genes and regulatory elements that help control gene expression. Some of the DNA segments within the gaps show signatures that are known to be toxic to Escherichia coli, the bacteria that is commonly used in some genome sequencing processes.
Eichler said, “It is likely that if a sequence of this DNA were put into an E. coli, the bacteria would delete the DNA.” This may explain why it could not be sequenced using standard approaches. He added that the gaps also carry complex sequences that are not well reproduced by standard sequencing technologies.
“The sequences vary extensively between people and are likely hotspots of genetic instability,” he explained.
For now, SMRT technology will remain a research tool because of its high cost, about $100,000 per genome.
Eichler predicted, “In five years there might be a long-read sequence technology that will allow clinical laboratories to sequence a patient’s chromosomes from tip to tip and say, ‘Yes, you have about three to four million SNPs and insertions deletions but you also have approximately 30,000-40,000 structural variants. Of these, a few structural variants and a few SNPs are the reason why you’re susceptible to this disease.’ Knowing all the variation is going to be a game changer.”
The human genome is arguably the most complete mammalian reference assembly yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome—78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.
PLOS One – Supercentenarians (110 years or older) are the world’s oldest people. Seventy four are alive worldwide, with twenty two in the United States. We performed whole-genome sequencing on 17 supercentenarians to explore the genetic basis underlying extreme human longevity. We found no significant evidence of enrichment for a single rare protein-altering variant or for a gene harboring different rare protein altering variants in supercentenarian compared to control genomes. We followed up on the gene most enriched for rare protein-altering variants in our cohort of supercentenarians, TSHZ3, by sequencing it in a second cohort of 99 long-lived individuals but did not find a significant enrichment. The genome of one supercentenarian had a pathogenic mutation in DSC2, known to predispose to arrhythmogenic right ventricular cardiomyopathy, which is recommended to be reported to this individual as an incidental finding according to a recent position statement by the American College of Medical Genetics and Genomics. Even with this pathogenic mutation, the proband lived to over 110 years. The entire list of rare protein-altering variants and DNA sequence of all 17 supercentenarian genomes is available as a resource to assist the discovery of the genetic basis of extreme longevity in future studies.