This week, we profile a recent publication in Nature from the lab of Evan Eichler (pictured) at UW Medicine.
Can you provide a brief overview of your lab’s current research focus?
The Eichler lab is focused on understanding the organization and structure of complex repetitive genomic regions and their relationship to human neurodevelopment and the evolution of our species. The lab has developed both computational and experimental methods to characterize structural and copy number variation in humans and apes, including segmental duplications as well as pericentromeric transition regions from euchromatin to heterochromatin. As part of this effort, we also study the genetic basis of autism and developmental delay identifying dozens of copy number variants and hundreds of genes associated with these disorders based on a significant excess of deleterious mutations (including de novo events) in patients when compared to unaffected siblings. We have applied long-read sequencing technology to better characterize hidden structural variation, assemble complex structural variants, including centromeres, and fully phase and assemble human and nonhuman genomes. We hypothesize that this missing variation will contribute significantly to improving our understanding of the genetic basis of human disease and evolution. As part of this effort, we have worked with the Telomere-to-Telomere (T2T) Consortium, led by Adam Phillippy (NHGRI) and Karen Miga (UCSC), to generate the first truly complete human genome, where every chromosome’s sequence is resolved from telomere to telomere.
What is the significance of the findings in this publication?
This publication reports the first complete sequence of a human autosome: chromosome 8. We leveraged the strengths of two long-read sequencing technologies (Pacific Biosciences high-fidelity long-read sequencing and Oxford Nanopore Technologies ultra-long-read sequencing) to resolve five previously long-standing gaps in chromosome 8, including both telomeres, a structurally dynamic segmental duplication on the p-arm (known as β-defensin), the centromere, and a neocentromeric variable number tandem repeat on the q-arm. The result is a whole-chromosome assembly that is 146.3 Mbp long and estimated to be >99.99% accurate.
This work uncovered the epigenetic, transcriptional, and evolutionary landscape of these previously unresolved regions. In each new region, we determined the methylation status of the sequence and investigated these gaps for the presence of missing genes. As a result, we identified 12 novel genes in the β-defensin segmental duplication, which are found in a newly duplicated block of sequence missing from the current human reference genome (GRCh38).
Focusing on the chromosome 8 centromere, which is an approximately 2 Mbp region comprised of tandem α-satellite repeats, we found that almost all of the higher-order α-satellite repeats were methylated, except for a small 73 kbp region that was hypomethylated. This hypomethylated region coincided with the presence of nucleosomes containing the centromeric histone, CENP-A, revealing the site of centromeric chromatin, which provides the foundation of the kinetochore, for the very first time.
We also reconstructed the evolution of the chromosome 8 centromere over the last 25 million years by sequencing and assembling the orthologous centromeres in chimpanzee, orangutan, and macaque. We found that each primate centromere is comprised of four or five major evolutionary layers, with the evolutionarily youngest layer located in the core of the centromere and more ancient and divergent layers located peripherally. We confirmed that α-satellite higher-order repeat structure evolved specifically in the great ape ancestor. Additionally, we estimated the centromere mutation rate to be at least 2.2- to 3.8-fold greater than the rest of the genome, indicating that this region is undergoing rapid evolution.
What are the next steps for this research?
There are four areas of development. First, now that we have resolved the first complete sequence of a human autosome, we and others are focused on resolving the remaining chromosomes in the human genome. The T2T Consortium has been working diligently on this effort, and a release of the first truly complete human genome is imminent, including the first sequence resolution of centromeres, acrocentric short arms, and a complete view of segmental duplications. This will serve as the basis for more detailed functional characterization of these complex regions of our genome. Second, we are currently developing methods (Ebert et al., Science, 2021) to extend our efforts to phase and assemble normal human diploid genomes from telomere to telomere and to better understand the full spectrum of human genetic variation as part of collaborative efforts with the Human Genome Structural Variation and Human Pangenome Reference Consortia. Third, we are applying these methods to other nonhuman primate genomes in order to reconstruct the evolutionary history of every base pair of the genome and discover new events important for our species adaptation. Finally, we are beginning to investigate the utility of these methods to discover missing variation in these gap regions that underlie unexplained genetic causes of autism and developmental delay. Generating haplotype-resolved genome assemblies where the maternal and paternal alleles are completely phased, we believe, will improve our understanding of the genetic basis of disease.
This work was funded by:
This research was supported, in part, by funding from the National Institutes of Health (NIH), HG002385 and HG010169 (EEE); National Institute of General Medical Sciences (NIGMS), F32 GM134558 (GAL); Intramural Research Program of the National Human Genome Research Institute at NIH (SK, AMP, AR); National Library of Medicine Big Data Training Grant for Genomics and Neuroscience 5T32LM012419-04 (MRV); NIH/NHGRI Pathway to Independence Award K99HG011041 (PH); NIH/NHGRI R21 1R21HG010548-01 and NIH/NHGRI U01 1U01HG010971 (KHM); and the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research, USA (VL). EEE is an investigator of the Howard Hughes Medical Institute.