2023 has been an incredible year for genomics! Thanks to the brilliance and scientific talent of researchers all over the world, major progress was made across the board –from agrigenomics and cancer biology to neuroscience, microbiology, and clinical genomics research applications and more.
As we look forward to all the electrifying new advances that 2024 is sure to hold, we would like to take one last opportunity to look back at a subset of our PacBio-staff-favorite publications from 2023. This list includes a blend of methodological innovations, foundational breakthroughs, and milestone accomplishments that are sure to enable new discoveries in the new year and many more to come.
Multiomics Cell atlas research Human genomics Human pangenomes
As a side note, those of you who have followed PacBio-powered genomics innovations closely will likely be familiar with these publications. But for those of you who haven’t (or in case you missed them when they were released), we hope you find the science contained within these articles illuminating and inspiring.
From all of us here at PacBio, here’s to an incredible year for genomics and an exciting one to come!
Synchronized long-read genome, methylome, epigenome, and transcriptome for resolving a Mendelian condition
Synchronized long-read genome, methylome, epigenome, and transcriptome for resolving a Mendelian condition, describes how to achieve the simultaneous generation of four high-quality, haplotype-resolved ‘omes – the genome, CpG methylome, chromatin epigenome, and transcriptome – from a single Revio SMRT Cell sequencing run. This marvelously innovative approach was the product of a collaboration led by Dr. Andrew Stergachis and Dr. Mitchell Vollger at the University of Washington.
This new capability was achieved by combining three key technological advancements. The first, involved leveraging the higher throughput of the PacBio Revio system to high-quality long-read HiFi sequence data at scale. The second involved a machine learning approach developed by University of Washington researchers for chromatin Fiber-seq data (Stergachis et al. 2020) to accurately assemble haplotype-resolved, single-molecule chromatin accessibility, nucleosome positioning, and transcription factor occupancy patterns. Third, the team adapted a library protocol from the MAS-Seq method (Al’Khafaji et al. 2023) to concatenate full-length cDNA molecules to increase the throughput for full-length RNA sequencing. The concatenated cDNA library targeted a similar fragment size to standard HiFi WGS libraries, which allowed combining the Fiber-Seq treated gDNA library with the matching cDNA library in a single HiFi sequencing run on the Revio system.
The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity
The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity from ENCODE project researchers represents the fourth phase of the initiative. This exciting accomplishment adds matching Iso-Seq data in a set of 81 unique human and mouse samples, representing “the first large-scale, cross-species survey of transcript structure diversity using full-length cDNA sequencing on long-read platforms.” The results were dramatic, detecting:
- “…a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains.”
- “…more than one predominant transcript across samples for 73.0% of genes, which is in contrast with prior reports,” and that
- “…for a substantial number of genes, transcript structure diversity and major transcript usage for the same gene differs between tissues, samples, and developmental timepoints.”
This incredibly rich resource marks the beginning of a new era in cell atlas research, with the authors noting that it provides “a foundation for further analyses of alternative transcript usage.” In one example, which the researchers described as “the most surprising to us,” they observed “substantial differences in splicing diversity for orthologous genes between human and mouse,” with well over half of all orthologous mouse & human genes showing “substantial differences in mechanism of diversification in matching tissues.” They noted that this calls for caution in the interpretation of the widely used practice of predicting and interpreting human gene functions from mouse models, both “in genomics and the wider biology community,” and “for both basic and preclinical purposes.”
The complete sequence of a human Y chromosome
The complete sequence of a human Y chromosome from an international research team led by the Telomere-to-Telomere (T2T) Consortium employed a combination of sequencing technologies that included PacBio HiFi long reads. The potent combination of read length (15-20 kb) and accuracy (Q30+) that defines HiFi sequencing was instrumental in enabling T2T scientists to access missing gDNA and RNA isoform information in the Y’s highly homologous regions. As a result, the team was able to fill the gaps that had previously comprised almost 50% of the chromosome’s sequence.
This new assembly now spans nearly 62.5 million base pairs and is the first complete gapless human Y chromosome ever generated. Of the 62.5 million bases, 30 million were new to science and project contributors also identified 41 predicted protein-coding genes. Previously unappreciated gene regulatory networks and structural rearrangements on the Y chromosome were also discovered, shedding light on male infertility and related conditions. This newly completed Y chromosome sequence is likely to be a very valuable new resource for researchers launching investigations into everything from sexual development and fertility to genealogy, cancer, evolution, and more.
Assembly of 43 human Y chromosomes reveals extensive complexity and variation
Assembly of 43 human Y chromosomes reveals extensive complexity and variation from HGSVC scientists presented the results of a remarkably comprehensive pangenomic analysis of Y chromosomes from 43 individuals with diverse genetic backgrounds. The authors employed PacBio HiFi sequencing as the study’s foundation, along with other advanced techniques, to reveal some astounding insights into the characteristics of the Y chromosome at the population level.
Through their analysis, the team found large inversions and distinct mutation rates in male-specific sequences, among other recurrent genetic variation. Most surprisingly, the study revealed that the size of the Y can vary from individual to individual by nearly two-fold, or a size range of approximately 45.2 to 84.9 Mbp! Diving deeper into this groundbreaking dataset, researchers were able to trace approximately 183,000 years of human evolution to reveal even more hidden complexity and remarkable differences in the size and structure of the Y chromosome. Interestingly, the HGSVC’s analysis pointed to low levels of base substitution suggesting that most of the variation housed within the Y chromosome is likely to be structural in nature.