Sprite decoration

Explore scientific publications featuring PacBio long-read sequencing data

Nature Methods  |  2022

Metagenome assembly of high-fidelity long reads with hifiasm-meta

Feng, Xiaowen and Cheng, Haoyu and Portik, Daniel and Li, Heng

Metagenomics, Microbiology, Microbiome, Microbial community, Metagenome, MAGs, assembly

De novo assembly of metagenome samples is a common approach to the study of microbial communities. Current metagenome assemblers developed for short sequence reads or noisy long reads were not optimized for accurate long reads. We thus developed hifiasm-meta, a metagenome assembler that exploits the high accuracy of recent data. Evaluated on seven empirical datasets, hifiasm-meta reconstructed tens to hundreds of complete circular bacterial genomes per dataset, consistently outperforming other metagenome assemblers.
Microbial Genomics  |  2022

Finding the right fit: Evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data

Gehrig, Jeanette L and Portik, Daniel M and Driscoll, Mark D and Jackson, Eric and Chakraborty, Shreyasee and Gratalo, Dawn and Ashby, Meredith and Valladares, Ricardo

Microbiome, Microbial community, 16S rRNA, live biotherapeutic product, shotgun metagenomics, MAGs

A long-standing challenge in human microbiome research is achieving the taxonomic and functional resolution needed to generate testable hypotheses about the gut microbiota's impact on health and disease. With a growing number of live microbial interventions in clinical development, this challenge is renewed by a need to understand the pharmacokinetics and pharmacodynamics of therapeutic candidates. While short-read sequencing of the bacterial 16S rRNA gene has been the standard for microbiota profiling, recent improvements in the fidelity of long-read sequencing underscores the need for a re-evaluation of the value of distinct microbiome-sequencing approaches. We leveraged samples from participants enrolled in a phase 1b clinical trial of a novel live biotherapeutic product to perform a comparative analysis of short-read and long-read amplicon and metagenomic sequencing approaches to assess their utility for generating clinical microbiome data. Across all methods, overall community taxonomic profiles were comparable and relationships between samples were conserved. Comparison of ubiquitous short-read 16S rRNA amplicon profiling to long-read profiling of the 16S-ITS-23S rRNA amplicon showed that only the latter provided strain-level community resolution and insight into novel taxa. All methods identified an active ingredient strain in treated study participants, though detection confidence was higher for long-read methods. Read coverage from both metagenomic methods provided evidence of active-ingredient strain replication in some treated participants. Compared to short-read metagenomics, approximately twice the proportion of long reads were assigned functional annotations. Finally, compositionally similar bacterial metagenome-assembled genomes (MAGs) were recovered from short-read and long-read metagenomic methods, although a greater number and more complete MAGs were recovered from long reads. Despite higher costs, both amplicon and metagenomic long-read approaches yielded added microbiome data value in the form of higher confidence taxonomic and functional resolution and improved recovery of microbial genomes compared to traditional short-read methodologies.
AJHG  |  2022

Familial Long-Read Sequencing Increases Yield of De Novo Mutations

Noyes, Michelle D. and Harvey, William T. and Porubsky, David and Sulovari, Arvis and Li, Ruiyang and Rose, Nicholas R. and Audano, Peter A. and Munson, Katherine M. and Lewis, Alexandra P. and Hoekzema, Kendra and Mantere, Tuomo and Graves-Lindsay, Tina A. and Sanders, Ashley D. and Goodwin, Sara and Kramer, Melissa and Mokrab, Younes and Zody, Michael C. and Hoischen, Alexander and Korbel, Jan O. and McCombie, W. Richard and Eichler, Evan E.

Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children—a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10−8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone.
MDPI  |  2022

Complete Genome Sequence of Herpes Simplex Virus 2 Strain G

Chang, Weizhong and Jiao, Xiaoli and Sui, Hongyan and Goswami, Suranjana and Sherman, Brad T. And Fromont, Caroline and Caravaca, Juan Manuel and Tran, Bao and Imamichi, Tomozumi

Herpes simplex virus type 2 (HSV-2) is a common causative agent of genital tract infections. Moreover, HSV-2 and HIV infection can mutually increase the risk of acquiring another virus infection. Due to the high GC content and highly repetitive regions in HSV-2 genomes, only the genomes of four strains have been completely sequenced (HG52, 333, SD90e, and MS). Strain G is commonly used for HSV-2 research, but only a partial genome sequence has been assembled with Illumina sequencing reads. In the current study, we de novo assembled and annotated the complete genome of strain G using PacBio long sequencing reads, which can span the repetitive regions, analyzed the ‘α’ sequence, which plays key roles in HSV-2 genome circulation, replication, cleavage, and packaging of progeny viral DNA, identified the packaging signals homologous to HSV-1 within the ‘α’ sequence, and determined both termini of the linear genome and cleavage site for the process of concatemeric HSV-2 DNA produced via rolling-circle replication. In addition, using Oxford Nanopore Technology sequencing reads, we visualized four HSV-2 genome isomers at the nucleotide level for the first time. Furthermore, the coding sequences of HSV-2 strain G have been compared with those of HG52, 333, and MS. Moreover, phylogenetic analysis of strain G and other diverse HSV-2 strains has been conducted to determine their evolutionary relationship. The results will aid clinical research and treatment development of HSV-2.
Nature Biotechnology  |  2022

Curated variation benchmarks for challenging medically relevant autosomal genes

Wagner, Justin and Olson, Nathan D. and Harris, Lindsay and McDaniel, Jennifer and Cheng, Haoyu and Fungtammasan, Arkarachai and Hwang, Yih-Chii and Gupta, Richa and Wenger, Aaron M. and Rowell, William J. and Khan, Ziad M. and Farek, Jesse and Zhu, Yiming and Pisupati, Aishwarya and Mahmoud, Medhat and Xiao, Chunlin and Yoo, Byunggil and Sahraeian, Sayed Mohammad Ebrahim and Miller, Danny E. and Jáspez, David and Lorenzo-Salazar, José M. and Muñoz-Barrera, Adrián and Rubio-Rodríguez, Luis A. and Flores, Carlos and Narzisi, Giuseppe and Evani, Uday Shanker and Clarke, Wayne E. and Lee, Joyce and Mason, Christopher E. and Lincoln, Stephen E. and Miga, Karen H. and Ebbert, Mark T. W. and Shumate, Alaina and Li, Heng and Chin, Chen-Shan and Zook, Justin M. and Sedlazeck, Fritz J.

The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.
Nature Biotechnology  |  2022

Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities

Bickhart, Derek M. and Kolmogorov, Mikhail and Tseng, Elizabeth and Portik, Daniel M. And Korobeynikov, Anton and Tolstoganov, Ivan and Uritskiy, Gherman and Liachko, Ivan and Sullivan, Shawn T. and Shin, Sung Bong and Zorea, Alvah and Andreu, Victoria Pascal and Panke-Buisse, Kevin and Medema, Marnix H. and Mizrahi, Itzhak and Pevzner, Pavel A. and Smith, Timothy P.

Metagenomics, Microbiology, Microbiome

Microbial communities might include distinct lineages of closely related organisms that complicate metagenomic assembly and prevent the generation of complete metagenome-assembled genomes (MAGs). Here we show that deep sequencing using long (HiFi) reads combined with Hi-C binning can address this challenge even for complex microbial communities. Using existing methods, we sequenced the sheep fecal metagenome and identified 428 MAGs with more than 90% completeness, including 44 MAGs in single circular contigs. To resolve closely related strains (lineages), we developed MAGPhase, which separates lineages of related organisms by discriminating variant haplotypes across hundreds of kilobases of genomic sequence. MAGPhase identified 220 lineage-resolved MAGs in our dataset. The ability to resolve closely related microbes in complex microbial communities improves the identification of biosynthetic gene clusters and the precision of assigning mobile genetic elements to host genomes. We identified 1,400 complete and 350 partial biosynthetic gene clusters, most of which are novel, as well as 424 (298) potential host–viral (host–plasmid) associations using Hi-C data.
medRxiv  |  2021

Germline mosaicism of a missense variant in KCNC2 in a multiplex family with autism and epilepsy

Mehinovic, Elvisa and Gray, Teddi and Campbell, Meghan and Ekholm, Jenny and Wenger, Aaron and Rowell, William and Grudo, Ari and Grimwood, Jane and Korlach, Jonas and Gurnett, Christina and Constantino, John N. and Turner, Tychele N.

Variant detection, Human,, Autism, epilepsy, germline, mosaicism, KCNC2, de novo missense variant

Currently, protein-coding de novo variants and large copy number variants have been identified as important for ∼30% of individuals with autism. One approach to identify relevant variation in individuals who lack these types of events is by utilizing newer genomic technologies. In this study, highly accurate PacBio HiFi long-read sequencing was applied to a family with autism, treatment-refractory epilepsy, cognitive impairment, and mild dysmorphic features (two affected female full siblings, parents, and one unaffected sibling) with no known clinical variant. From our long-read sequencing data, a de novo missense variant in the KCNC2 gene (encodes Kv3.2 protein) was identified in both affected children. This variant was phased to the paternal chromosome of origin and is likely a germline mosaic. In silico assessment of the variant revealed it was in the top 0.05% of all conserved bases in the genome, and was predicted damaging by Polyphen2, MutationTaster, and SIFT. It was not present in any controls from public genome databases nor in a joint-call set we generated across 49 individuals with publicly available PacBio HiFi data. This specific missense mutation (Val473Ala) has been shown in both an ortholog and paralog of Kv3.2 to accelerate current decay, shift the voltage dependence of activation, and prevent the channel from entering a long-lasting open state. Seven additional missense mutations have been identified in other individuals with neurodevelopmental disorders (p = 1.03 × 10−5). KCNC2 is most highly expressed in the brain; in particular, in the thalamus and is enriched in GABAergic neurons. Long-read sequencing was useful in discovering the relevant variant in this family with autism that had remained a mystery for several years and will potentially have great benefits in the clinic once it is widely available.
Wiley  |  2021

Long-read whole genome sequencing reveals HOXD13 alterations in synpolydactyly

Melas, Marilena and Kautto, Esko A. and Franklin, Samuel J. and Mori, Mari and McBride, Kim L. and Mosher, Theresa Mihalic and Pfau, Ruthann B. and Hernandez-Gonzalez, Maria Elena and McGrath, Sean D. and Magrini, Vincent J. and White, Peter and Samora, Julie Balch and Koboldt, Daniel C. and Wilson, Richard K.

Whole genome sequencing, Rare disease, HOXD13, long-read, synpolydactyly 1, SDTY2

Synpolydactyly 1 (SPD; MIM# 186000), also called syndactyly type II (SDTY2), is a genetic limb malformation characterized by polydactyly with syndactyly involving the webbing of the third and fourth fingers, and the fourth and fifth toes. It is caused by heterozygous alterations in HOXD13 with incomplete penetrance and phenotypic variability. In our study, a five-generation family with an SPD phenotype was enrolled in our Rare Disease Genomics Protocol. A comprehensive examination of three generations using Illumina short-read whole-genome sequencing (WGS) did not identify any causative variants. Subsequent WGS using Pacific Biosciences (PacBio) long-read HiFi Circular Consensus Sequencing (CCS) revealed a heterozygous 27-bp duplication in the polyalanine tract of HOXD13. Sanger sequencing of all available family members confirmed that the variant segregates with affected individuals. Re-analysis of an unrelated family with a similar SPD phenotype uncovered a 21-bp (7-alanine) duplication in the same region of HOXD13. Although ExpansionHunter identified these events in most individuals in a retrospective analysis, low sequence coverage due to high GC content in the HOXD13 polyalanine tract makes detection of these events challenging. Our findings highlight the value of long-read WGS in elucidating the molecular etiology of congenital limb malformation disorders.
Microbiology Spectrum  |  2021

Greenhead (Tabanus nigrovittatus) Wolbachia and Its Microbiome: A Preliminary Study

Lefoulon, Emilie and Truchon, Alex and Clark, Travis and Long, Courtney and Frey, Daniel and Slatko, Barton E.

Metagenomics, Microbiology, Microbiome, 16S, Amplicon, Bacteria, Full-length 16S, greenhead, rRNA, Wolbachia

Endosymbiotic Wolbachia bacteria are known to influence the host physiology, microbiota composition, and dissemination of pathogens. We surveyed a population of Tabanus nigrovittatus, commonly referred to as "greenheads," from Crane Beach (Ipswich, MA, USA) for the presence of the alphaproteobacterial symbiont Wolbachia. We studied the COI (mitochondrial cytochrome oxidase) marker gene to evaluate the phylogenetic diversity of the studied specimens. The DNA sequences show strong similarity (between 99.9 and 98%) among the collected specimens but lower similarity to closely related entries in the NCBI database (only between 96.3 and 94.7%), suggesting a more distant relatedness. Low levels of Wolbachia presence necessitated a nested PCR approach, and using 5 markers (ftsZ, fbpA, dnaA, coxA, and gatB), we determined that two recognized "supergroups" of Wolbachia species were represented in the studied specimens, members of clades A and B. Using next-generation sequencing, we also surveyed the insect gut microbiomes of a subset of flies, using Illumina and PacBio 16S rRNA gene sequencing with barcoded primers. The composition of Proteobacteria also varied from fly to fly, with components belonging to Gammaproteobacteria making up the largest percentage of organisms (30 to 70%) among the microbiome samples. Most of the samples showed the presence of Spiroplasma, a member of the phylum Mollicutes, although the frequency of its presence was variable, ranging from 2 to 57%. Another noteworthy bacterial phylum consistently identified was Firmicutes, though the read abundances were typically below 10%. Of interest is an association between Wolbachia presence and higher Alphaproteobacteria representation in the microbiomes, suggesting that the presence of Wolbachia affects the host microbiome. IMPORTANCE Tabanus nigrovittatus greenhead populations contain two supergroups of Wolbachia endosymbionts, members of supergroups A and B. Analysis of the greenhead microbiome using next-generation sequencing revealed that the majority of bacterial species detected belonged to Gammaproteobacteria, with most of the samples also showing the presence of Spiroplasma, a member of the Mollicutes phylum also known to infect insects. An association between Wolbachia presence and higher Alphaproteobacteria representation in the microbiomes suggests that Wolbachia presence affects the host microbiome composition.
PLOS Pathogens  |  2021

High-throughput, single-copy sequencing reveals SARS-CoV-2 spike variants coincident with mounting humoral immunity during acute COVID-19

Ko, Sung Hee and Mokhtari, Elham Bayat and Mudvari, Prakriti and Stein, Sydney and Stringham, Christopher D. and Wagner, Danielle and Ramelli, Sabrina and Ramos-Benitez, Marcos J. and Strich, Jeffrey R. and Davey Jr., Richard T. and Zhou, Tongquing and Misasi, John and Kwong, Peter D. and Chertow, Daniel S. and Sullivan, Nancy J. and Boritz, Eli A.

coronavirus, COVID-19, HiFiViral, pathogen surveillance, respiratory virus, SARS-CoV-2, Virus

Tracking evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) within infected individuals will help elucidate coronavirus disease 2019 (COVID-19) pathogenesis and inform use of antiviral interventions. In this study, we developed an approach for sequencing the region encoding the SARS-CoV-2 virion surface proteins from large numbers of individual virus RNA genomes per sample. We applied this approach to the WA-1 reference clinical isolate of SARS-CoV-2 passaged in vitro and to upper respiratory samples from 7 study participants with COVID-19. SARS-CoV-2 genomes from cell culture were diverse, including 18 haplotypes with non-synonymous mutations clustered in the spike NH2-terminal domain (NTD) and furin cleavage site regions. By contrast, cross-sectional analysis of samples from participants with COVID-19 showed fewer virus variants, without structural clustering of mutations. However, longitudinal analysis in one individual revealed 4 virus haplotypes bearing 3 independent mutations in a spike NTD epitope targeted by autologous antibodies. These mutations arose coincident with a 6.2-fold rise in serum binding to spike and a transient increase in virus burden. We conclude that SARS-CoV-2 exhibits a capacity for rapid genetic adaptation that becomes detectable in vivo with the onset of humoral immunity, with the potential to contribute to delayed virologic clearance in the acute setting.
PNAS  |  2021

Rescue of codon-pair deoptimized respiratory syncytial virus by the emergence of genomes with very large internal deletions that complemented replication

Le Nouën, Cyril and McCarty, Thomas and Yang, Lijuan and Brown, Michael and Wimmer, Eckard and Collins, Peter L. and Buchholz, Ursula J.

HiFiViral, pathogen surveillance, respiratory virus, Virus

Recoding viral genomes by introducing numerous synonymous but suboptimal codon pairs—called codon-pair deoptimization (CPD)—provides new types of live-attenuated vaccine candidates. The large number of nucleotide changes resulting from CPD should provide genetic stability to the attenuating phenotype, but this has not been rigorously tested. Human respiratory syncytial virus in which the G and F surface glycoprotein ORFs were CPD (called Min B) was temperature-sensitive and highly restricted in vitro. When subjected to selective pressure by serial passage at increasing temperatures, Min B substantially regained expression of F and replication fitness. Whole-genome deep sequencing showed many point mutations scattered across the genome, including one combination of six linked point mutations. However, their reintroduction into Min B provided minimal rescue. Further analysis revealed viral genomes bearing very large internal deletions (LD genomes) that accumulated after only a few passages. The deletions relocated the CPD F gene to the first or second promoter-proximal gene position. LD genomes amplified de novo in Min B–infected cells were encapsidated, expressed high levels of F, and complemented Min B replication in trans. This study provides insight on a variation of the adaptability of a debilitated negative-strand RNA virus, namely the generation of defective minihelper viruses to overcome its restriction. This is in contrast to the common “defective interfering particles” that interfere with the replication of the virus from which they originated. To our knowledge, defective genomes that promote rather than inhibit replication have not been reported before in RNA viruses.
Science  |  2021

Haplotype-resolved diverse human genomes and integrated analysis of structural variation

Ebert, Peter and Audano, Peter A. and Zhu, Qihui and Rodriguez-Martin, Bernardo and Porubsky, David and Bonder, Marc Jan and Sulovari, Arvis and Ebler, Jana and Zhou, Weichen and Serra Mari, Rebecca and Yilmaz, Feyza and Zhao, Xuefang and Hsieh, PingHsun and Lee, Joyce and Kumar, Sushant and Lin, Jiadong and Rausch, Tobias and Chen, Yu and Ren, Jingwen and Santamarina, Martin and H{\"o}ps, Wolfram and Ashraf, Hufsah and Chuang, Nelson T. and Yang, Xiaofei and Munson, Katherine M. and Lewis, Alexandra P. and Fairley, Susan and Tallon, Luke J. and Clarke, Wayne E. and Basile, Anna O. and Byrska-Bishop, Marta and Corvelo, Andr{\'e} and Evani, Uday S. and Lu, Tsung-Yu and Chaisson, Mark J.P. and Chen, Junjie and Li, Chong and Brand, Harrison and Wenger, Aaron M. and Ghareghani, Maryam and Harvey, William T. and Raeder, Benjamin and Hasenfeld, Patrick and Regier, Allison A. and Abel, Haley J. and Hall, Ira M. and Flicek, Paul and Stegle, Oliver and Gerstein, Mark B. and Tubio, Jose M.C. and Mu, Zepeng and Li, Yang I. and Shi, Xinghua and Hastie, Alex R. and Ye, Kai and Chong, Zechen and Sanders, Ashley D. and Zody, Michael C. and Talkowski, Michael E. and Mills, Ryan E. and Devine, Scott E. and Lee, Charles and Korbel, Jan O. and Marschall, Tobias and Eichler, Evan E.

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent{\textendash}child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic variation even across complex loci. We identify 107,590 structural variants (SVs), of which 68\% are not discovered by short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterize 130 of the most active mobile element source elements and find that 63\% of all SVs arise by homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1,526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Cell Host & Microbe  |  2021

Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition

Greaney, Allison J. and Starr, Tyler N. and Gilchuk, Pavlo and Zost, Seth J. and Binshtein, Elad and Loes, Andrea N. and Hilton, Sarah K. and Huddleston, John and Eguia, Rachel and Crawford, Katharine H.D. and Dingens, Adam S. and Nargi, Rachel S. and Sutton, Rachel E. and Suryadevara, Naveenchandra and Rothlauf, Paul W. and Liu, Zhuoming and Whelan, Sean P.J. and Carnahan, Robert H. and Crowe Jr., James E. and Bloom, Jesse D.

coronavirus, COVID-19, HiFiViral, pathogen surveillance, respiratory virus, SARS-CoV-2, Virus

Antibodies targeting the SARS-CoV-2 spike receptor-binding domain (RBD) are being developed as therapeutics and are a major contributor to neutralizing antibody responses elicited by infection. Here, we describe a deep mutational scanning method to map how all amino-acid mutations in the RBD affect antibody binding and apply this method to 10 human monoclonal antibodies. The escape mutations cluster on several surfaces of the RBD that broadly correspond to structurally defined antibody epitopes. However, even antibodies targeting the same surface often have distinct escape mutations. The complete escape maps predict which mutations are selected during viral growth in the presence of single antibodies. They further enable the design of escape-resistant antibody cocktails—including cocktails of antibodies that compete for binding to the same RBD surface but have different escape mutations. Therefore, complete escape-mutation maps enable rational design of antibody therapeutics and assessment of the antigenic consequences of viral evolution.
Genomics  |  2021

Pathogenic 12-kb copy-neutral inversion in syndromic intellectual disability identified by high-fidelity long-read sequencing

Takeshi Mizuguchi and Nobuhiko Okamoto and Keiko Yanagihara and Satoko Miyatake and Yuri Uchiyama and Naomi Tsuchida and Kohei Hamanaka and Atsushi Fujita and Noriko Miyake and Naomichi Matsumoto

We report monozygotic twin girls with syndromic intellectual disability who underwent exome sequencing but with negative pathogenic variants. To search for variants that are unrecognized by exome sequencing, high-fidelity long-read genome sequencing (HiFi LR-GS) was applied. A 12-kb copy-neutral inversion was precisely identified by HiFi LR-GS after trio-based variant filtering. This inversion directly disrupted two genes, CPNE9 and BRPF1, the latter of which attracted our attention because pathogenic BRPF1 variants have been identified in autosomal dominant intellectual developmental disorder with dysmorphic facies and ptosis (IDDDFP), which later turned out to be clinically found in the twins. Trio-based HiFi LR-GS together with haplotype phasing revealed that the 12-kb inversion occurred de novo on the maternally transmitted chromosome. This study clearly indicates that submicroscopic copy-neutral inversions are important but often uncharacterized culprits in monogenic disorders and that long-read sequencing is highly advantageous for detecting such inversions involved in genetic diseases.
European Journal of Human Genetics   |  2020

Long-read trio sequencing of individuals with unsolved intellectual disability

Pauper, Marc and Kucuk, Erdi and Wenger, Aaron M. and Chakraborty, Shreyasee and Baybayan, Primo and Kwint, Michael and van der Sanden, Bart and Nelen, Marcel R. and Derks, Ronny and Brunner, Han G. and Hoischen, Alexander and Vissers, Lisenka E. L. M. and Gilissen, Christian

Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×–40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.
Quick search

Quick Search is faster but may return less results.

Advanced search

Advanced Search allows you to search more fields but may take longer.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.