Menu
Sprite decoration

Explore scientific publications featuring PacBio long-read sequencing data

Biorxiv  |  2023

De novo genome assemblies from two Indigenous Americans from Arizona identify new polymorphisms in non-reference sequences

Çiğdem Köroğlu, Peng Chen, Michael Traurig, Serdar Altok, Clifton Bogardus, Leslie J Baier

There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. To help address this issue, we created a modified hg38 reference map using de novo sequence assemblies from Indigenous Americans living in Arizona (IAZ). Using HiFi SMRT long-read sequencing technology, we generated de novo genome assemblies for one female and one male IAZ individual. Each assembly included ∼17 Mb of DNA sequence not present (non-reference sequence; NRS) in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with WGS sequencing data from 387 IAZ cohorts using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (MAF = 0.45) compared to Caucasians (MAF = 0.15) and African Americans (MAF = 0.03). This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an under-represented ethnic groups and thereby lead to the discovery of previously missed common variations.
Biorxiv  |  2023

Characterization of Alternative Splicing During Mammalian Brain Development Reveals the Magnitude of Isoform Diversity and its Effects on Protein Conformational Changes

Leila Haj Abdullah Alieh, Beatriz Cardoso de Toledo, Anna Hadarovich, Agnes Toth-Petroczy, Federico Calegari

Regulation of gene expression is critical for fate commitment of stem and progenitor cells during tissue formation. In the context of mammalian brain development, a plethora of studies have described how changes in the expression of individual genes characterize cell types across ontogeny and phylogeny. However, little attention was paid to the fact that different transcripts can arise from any given gene through alternative splicing (AS). Considered a key mechanism expanding transcriptome diversity during evolution, assessing the full potential of AS on isoform diversity and protein function has been notoriously difficult. Here we capitalize on the use of a validated reporter mouse line to isolate neural stem cells, neurogenic progenitors and neurons during corticogenesis and combine the use of short- and long-read sequencing to reconstruct the full transcriptome diversity characterizing neurogenic commitment. Extending available transcriptional profiles of the mammalian brain by nearly 50,000 new isoforms, we found that neurogenic commitment is characterized by a progressive increase in exon inclusion resulting in the profound remodeling of the transcriptional profile of specific cortical cell types. Most importantly, we computationally infer the biological significance of AS on protein structure by using AlphaFold2 and revealing how radical protein conformational changes can arise from subtle changes in isoforms sequence. Together, our study reveals that AS has a greater potential to impact protein diversity and function than previously thought independently from changes in gene expression.
Biorxiv  |  2023

The single-molecule accessibility landscape of newly replicated mammalian chromatin

Megan S Ostrowski, Marty G Yang, Colin P McNally, Nour J Abdulhay, Simai Wang, Elphège P Nora, Hani Goodarzi, Vijay Ramani

The higher-order structure of newly replicated (i.e. ‘nascent’) chromatin fibers remains poorly-resolved, limiting our understanding of how epigenomes are maintained across cell divisions. To address this, we present Replication-Aware Single-molecule Accessibility Mapping (RASAM), a long-read sequencing method that nondestructively measures genome-wide replication-status and protein-DNA interactions simultaneously on intact chromatin templates. We report that individual human and mouse nascent chromatin fibers are ‘hyperaccessible’ compared to steady-state chromatin. This hyperaccessibility occurs at two, coupled length-scales: first, individual nucleosome core particles on nascent DNA exist as a mixture of partially-unwrapped nucleosomes and other subnucleosomal species; second, newly-replicated chromatin fibers are significantly enriched for irregularly-spaced nucleosomes on individual DNA molecules. Focusing on specific cis-regulatory elements (e.g. transcription factor binding sites; active transcription start sites [TSSs]), we discover unique modes by which nascent chromatin hyperaccessibility is resolved at the single-molecule level: at CCCTC-binding factor (CTCF) binding sites, CTCF and nascent nucleosomes compete for motifs on nascent chromatin fibers, resulting in quantitatively-reduced CTCF occupancy and motif accessibility post-replication; at active TSSs, high levels of steady-state chromatin accessibility are preserved, implying that nucleosome free regions (NFRs) are rapidly re-established behind the fork. Our study introduces a new paradigm for studying higher-order chromatin fiber organization behind the replication fork. More broadly, we uncover a unique organization of newly replicated chromatin that must be reset by active processes, providing a substrate for epigenetic reprogramming.
Biorxiv  |  2023

CLN3 transcript complexity revealed by long-read RNA sequencing analysis

Hao-Yu Zhang, Christopher Minnis, Emil Gustavsson, Mina Ryten, Sara E Mole

Background Batten disease is a group of rare inherited neurodegenerative diseases. Juvenile CLN3 disease is the most prevalent type, and the most common mutation shared by most patients is the “1-kb” deletion which removes two internal coding exons (7 and 8) in CLN3. Previously, we identified two transcripts in patient fibroblasts homozygous for the “1-kb” deletion: the “major” and “minor” transcripts. To understand the full variety of disease transcripts and their role in disease pathogenesis, it is necessary to first investigate CLN3 transcription in “healthy” samples without juvenile CLN3 disease. Methods We leveraged PacBio long-read RNA sequencing datasets from ENCODE to investigate the full range of CLN3 transcripts across various tissues and cell types in human control samples. Then we sought to validate their existence using data from different sources.
Biorxiv  |  2023

Multi-omic profiling of pathogen-stimulated primary immune cells

Renee Salz, Emil E. Vorsteveld, Caspar I. van der Made, Simone Kersten, Merel Stemerdink, Tsung-han Hsieh, Musa Mhlanga, Mihai G. Netea, Pieter-Jan Volders, Alexander Hoischen, Peter A.C. ’t Hoen

Objectives To perform long-read transcriptome and proteome profiling of pathogen-stimulated peripheral blood mononuclear cells (PBMCs) from healthy donors. We aim to discover new transcripts and protein isoforms expressed during immune responses to diverse pathogens. Methods PBMCs were exposed to four microbial stimuli for 24 hours: the TLR4 ligand lipopolysaccharide (LPS), the TLR3 ligand Poly(I:C), heat-inactivated Staphylococcus aureus, Candida albicans, and RPMI medium as negative controls. Long-read sequencing (PacBio) of one donor and secretome proteomics and short-read sequencing of five donors were performed. IsoQuant was used for transcriptome construction, Metamorpheus/FlashLFQ for proteome analysis, and Illumina short-read 3’-end mRNA sequencing for transcript quantification. Results Long-read transcriptome profiling reveals the expression of novel sequences and isoform switching induced upon pathogen stimulation, including transcripts that are difficult to detect using traditional short-read sequencing. We observe widespread loss of intron retention as a common result of all pathogen stimulations. We highlight novel transcripts of NFKB1 and CASP1 that may indicate novel immunological mechanisms. In general, RNA expression differences did not result in differences in the amounts of secreted proteins. Interindividual differences in the proteome were larger than the differences between stimulated and unstimulated PBMCs. Clustering analysis of secreted proteins revealed a correlation between chemokine (receptor) expression on the RNA and protein levels in C. albicans- and Poly(I:C)-stimulated PBMCs. Conclusion Isoform aware long-read sequencing of pathogen-stimulated immune cells highlights the potential of these methods to identify novel transcripts, revealing a more complex transcriptome landscape than previously appreciated.
Journal of Maternal-Fetal & Neonatal Medicine  |  2023

Identification of a novel 91.5 kb-deletion (αα)FJ in the α-globin gene cluster using single-molecule real-time (SMRT) sequencing

Liangpu Xu ,Meihuan Chen,Junhao Zheng,Siwen Zhang,Min Zhang,Lingji Chen, et al

Objectives To present a novel 91.5-kb deletion of the α-globin gene cluster (αα)FJ identified by genetic assay and prenatal diagnosis in a Chinese family. Subjects and Methods The proband was a 34-year-old G3P1 (Gravida 3, Para 1) female at the gestational age of 21+ weeks with a history of an edematous fetus. A routine genetic assay (reverse dot blot hybridization, RDB) was performed to detect common thalassemia mutations. Multiplex ligation-dependent probe amplification (MLPA) and single-molecule real-time technology (SMRT) were used to detect rare thalassemia mutations. Results The hematological phenotypes of the proband, her mother, elder sister, husband, daughter, and nephew were consistent with the phenotype of α-thalassemia trait. No mutations were found in these family members by RDB, except for the proband’s husband who carried an α-globin gene deletion --SEA/αα. MLPA results showed that the proband and other α-thalassemia-suspected relatives had heterozygous deletions around the POLR3K-3-463nt, HS40-178nt, and HBA-HS40-382nt probes. The 5′-breakpoint was out of probe scope and could not be determined. SMRT was performed and a 91.5-kb deletion (NC_000016.10: g.39268_130758del) in the α-globin gene cluster (αα)FJ was identified in the proband and other suspected relatives, which could explain their phenotypes. At the proband’s gestational age of 22+ weeks, an amniotic fluid sample was collected and analyzed. As only the 91.5-kb deletion (αα)FJ was identified in the fetus with RDB, MLPA, and SMRT. The proband was suggested to continue the pregnancy. Conclusion We first reported a 91.5-kb deletion (NC_000016.10: g.hg38-chr16:39268-_130758del) of the HS-40 region in the α-globin gene cluster (αα)FJ identified in a Chinese family. Since the HS-40 loss of heterozygosity in combination with the heterozygous deletion --SEA might result in Hb Bart’s hydrops fetalis, routine genetic assay, and SMRT were recommended to individuals at risk for prenatal diagnosis.
Biorxiv  |  2023

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Jarkko Salojärvi, Aditi Rambani, Zhe Yu, Romain Guyot, Susan Strickler, Maud Lepelley, Cui Wang, et al (see full list in article)

Coffea arabica, an allotetraploid hybrid of C. eugenioides and C. canephora, is the source of approximately 60% of coffee products worldwide. Cultivated accessions have undergone several population bottlenecks resulting in low genetic diversity. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, which show a mosaic pattern of dominance, similar to other polyploid crop species. Resequencing of 39 wild and cultivated accessions suggests a founding polyploidy event ∼610,000 years ago, followed by several subsequent bottlenecks, including a population split ∼30.5 kya and a period of migration between Arabica populations until ∼8.9 kya. Analysis of lines historically introgressed with C. canephora highlights loci that may contribute to their superior pathogen resistance and lay the groundwork for future genomics-based breeding of C. arabica.
Nature  |  2023

Assembly of 43 human Y chromosomes reveals extensive complexity and variation

Pille Hallast, Peter Ebert, Mark Loftus, Feyza Yilmaz, Peter A. Audano, Glennis A. Logsdon, Marc Jan Bonder, Weichen Zhou, Wolfram Höps, Kwondo Kim, Chong Li, Savannah J. Hoyt, Philip C. Dishuck, et al

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Nature  |  2023

The complete sequence of a human Y chromosome

Arang Rhie, Sergey Nurk, Monika Cechova, Savannah J. Hoyt, Dylan J. Taylor, Nicolas Altemose, Paul W. Hook, Sergey Koren, Mikko Rautiainen, Ivan A. Alexandrov, Jamie Allen, Mobin Asri, Andrey V. Bzikadze, et al

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1,2,3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
AJHG  |  2023

Beyond the exome: What’s next in diagnostic testing for Mendelian conditions

Lea M. Starita Michael Talkowski Stephen B. Montgomery Michael J. Bamshad Jessica X. Chong Matthew T. Wheeler Seth I. Berger Anne O'Donnell-Luria Fritz J. Sedlazeck Danny E. Miller et al

Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.
Genome Medicine  |  2023

Applications of long-read sequencing to Mendelian genetics

Francesco Kumara Mastrorosa, Danny E. Miller & Evan E. Eichler

Advances in clinical genetic testing, including the introduction of exome sequencing, have uncovered the molecular etiology for many rare and previously unsolved genetic disorders, yet more than half of individuals with a suspected genetic disorder remain unsolved after complete clinical evaluation. A precise genetic diagnosis may guide clinical treatment plans, allow families to make informed care decisions, and permit individuals to participate in N-of-1 trials; thus, there is high interest in developing new tools and techniques to increase the solve rate. Long-read sequencing (LRS) is a promising technology for both increasing the solve rate and decreasing the amount of time required to make a precise genetic diagnosis. Here, we summarize current LRS technologies, give examples of how they have been used to evaluate complex genetic variation and identify missing variants, and discuss future clinical applications of LRS. As costs continue to decrease, LRS will find additional utility in the clinical space fundamentally changing how pathological variants are discovered and eventually acting as a single-data source that can be interrogated multiple times for clinical service.
The Journal of Immunology  |  2023

FLAIRR-Seq: A Method for Single-Molecule Resolution of Near Full-Length Antibody H Chain Repertoires

Easton E. Ford, David Tieri, Oscar L. Rodriguez, Nancy J. Francoeur, Juan Soto, Justin T. Kos, Ayelet Peres, William S. Gibson, Catherine A. Silver, Gintaras Deikus, Elizabeth Hudson, Cassandra R. Woolley, Noam Beckmann, Alexander Charney, Thomas C. Mitchell, Gur Yaari, Robert P. Sebra, Corey T. Watson, Melissa L. Smith

Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using short-read sequencing strategies resolve expressed Ab transcripts with limited resolution of the C region. In this article, we present the near-full-length AIRR-seq (FLAIRR-seq) method that uses targeted amplification by 5′ RACE, combined with single-molecule, real-time sequencing to generate highly accurate (99.99%) human Ab H chain transcripts. FLAIRR-seq was benchmarked by comparing H chain V (IGHV), D (IGHD), and J (IGHJ) gene usage, complementarity-determining region 3 length, and somatic hypermutation to matched datasets generated with standard 5′ RACE AIRR-seq using short-read sequencing and full-length isoform sequencing. Together, these data demonstrate robust FLAIRR-seq performance using RNA samples derived from PBMCs, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving H chain gene features not documented in IMGT at the time of submission. FLAIRR-seq data provide, for the first time, to our knowledge, simultaneous single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high-resolution identification of class switch recombination within a clonal lineage. In conjunction with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk-expressed Ab repertoires to date.
Biorxiv  |  2023

Utility of long-read sequencing for All of Us

M. Mahmoud, Y. Huang, K. Garimella, P. A. Audano, W. Wan, N. Prasad, R. E. Handsaker, S. Hall, A. Pionzio, M. C. Schatz, M. E. Talkowski, E. E. Eichler, S. E. Levy, F. J. Sedlazeck

The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compared the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis revealed substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also considered the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produced the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results will lead to widespread improvements across AoU.
Nature Communications  |  2022

Transposable element-mediated rearrangements are prevalent in human genomes

Parithi Balachandran, Isha A. Walawalkar, Jacob I. Flores, Jacob N. Dayton, Peter A. Audano, Christine R. Beck

Transposable elements constitute about half of human genomes, and their role in generating human variation through retrotransposition is broadly studied and appreciated. Structural variants mediated by transposons, which we call transposable element-mediated rearrangements (TEMRs), are less well studied, and the mechanisms leading to their formation as well as their broader impact on human diversity are poorly understood. Here, we identify 493 unique TEMRs across the genomes of three individuals. While homology directed repair is the dominant driver of TEMRs, our sequence-resolved TEMR resource allows us to identify complex inversion breakpoints, triplications or other high copy number polymorphisms, and additional complexities. TEMRs are enriched in genic loci and can create potentially important risk alleles such as a deletion in TRIM65, a known cancer biomarker and therapeutic target. These findings expand our understanding of this important class of structural variation, the mechanisms responsible for their formation, and establish them as an important driver of human diversity.
Nature Communications  |  2022

HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota

Kim, Chan Yeong and Ma, Junyeong and Lee, Insuk

Metagenomics, Microbiology, Microbiome, Microbial community, HiFi, complete MAGs, circular contigs

Advances in metagenomic assembly have led to the discovery of genomes belonging to uncultured microorganisms. Metagenome-assembled genomes (MAGs) often suffer from fragmentation and chimerism. Recently, 20 complete MAGs (cMAGs) have been assembled from Oxford Nanopore long-read sequencing of 13 human fecal samples, but with low nucleotide accuracy. Here, we report 102 cMAGs obtained by Pacific Biosciences (PacBio) high-accuracy long-read (HiFi) metagenomic sequencing of five human fecal samples, whose initial circular contigs were selected for complete prokaryotic genomes using our bioinformatics workflow. Nucleotide accuracy of the final cMAGs was as high as that of Illumina sequencing. The cMAGs could exceed 6 Mbp and included complete genomes of diverse taxa, including entirely uncultured RF39 and TANB77 orders. Moreover, cMAGs revealed that regions hard to assemble by short-read sequencing comprised mostly genomic islands and rRNAs. HiFi metagenomic sequencing will facilitate cataloging accurate and complete genomes from complex microbial communities, including uncultured species.
Quick search

Quick search is faster but may return fewer results.

Advanced search

Advanced search allows you to search more fields but may take longer.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.