Scientific publications

Publications featuring PacBio long-read + short-read sequencing data

Nature Methods | 2024

SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms

Francisco J. Pardo-Palacios, Angeles Arzalluz-Luque, Liudmyla Kondratova, Pedro Salguero, Jorge Mestre-Tomás, Rocío Amorín, Eva Estevan-Morió, Tianyuan Liu, Adalena Nanni, Lauren McIntyre, Elizabeth Tseng & Ana Conesa

SQANTI3 is a tool designed for the quality control, curation and annotation of long-read transcript models obtained with third-generation sequencing technologies. Leveraging its annotation framework, SQANTI3 calculates quality descriptors of transcript models, junctions and transcript ends. With this information, potential artifacts can be identified and replaced with reliable sequences. Furthermore, the integrated functional annotation feature enables subsequent functional iso-transcriptomics analyses.

Biorxiv | 2024

Addressing technical pitfalls in pursuit of molecular factors that mediate immunoglobulin gene regulation

Eric Engelbrecht¹ , Oscar L. Rodriguez¹ , Corey T. Watson¹ 1) Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA.

The expressed antibody repertoire is a critical determinant of immune-related phenotypes. Antibody-encoding transcripts are distinct from other expressed genes because they are transcribed from somatically rearranged gene segments. Human antibodies are composed of two identical heavy and light chain polypeptides derived from genes in the immunoglobulin heavy chain (IGH) locus and one of two light chain loci. The combinatorial diversity that results from antibody gene rearrangement and the pairing of different heavy and light chains contributes to the immense diversity of the baseline antibody repertoire. During rearrangement, antibody gene selection is mediated by factors that influence chromatin architecture, promoter/enhancer activity, and V(D)J recombination. Interindividual variation in the composition of the antibody repertoire associates with germline variation in IGH, implicating polymorphism in antibody gene regulation. Determining how IGH variants directly mediate gene regulation will require integration of these variants with other functional genomic datasets. Here, we argue that standard approaches using short reads have limited utility for characterizing regulatory regions in IGH at haplotype-resolution. Using simulated and ChIP-seq reads, we define features of IGH that limit use of short reads and a single reference genome, namely 1) the highly duplicated nature of DNA sequence in IGH and 2) structural polymorphisms that are frequent in the population. We demonstrate that personalized diploid references enhance performance of short-read data for characterizing mappable portions of the locus, while also showing that long-read profiling tools will ultimately be needed to fully resolve functional impacts of IGH germline variation on expressed antibody repertoires.

Biorxiv | 2024

Structurally divergent and recurrently mutated regions of primate genomes

Yafei Mao, William T. Harvey, David Porubsky, Katherine M. Munson, Kendra Hoekzema, Alexandra P. Lewis, Peter A. Audano, Allison Rozanski, Xiangyu Yang, Shilong Zhang, David S. Gordon, Xiaoxi Wei, Glennis A. Logsdon, Marina Haukness, Philip C. Dishuck, Hyeonsoo Jeong, Ricardo del Rosario, Vanessa L. Bauer, Will T. Fattor, Gregory K. Wilkerson, Qing Lu, Benedict Paten, Guoping Feng, Sara L. Sawyer, Wesley C. Warren, Lucia Carbone, Evan E. Eichler

To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.47 Mbp or ~27% of the genome has been affected by SVs based on analysis of these primate lineages. We identify 1,607 structurally divergent regions (SDRs) wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (CARDs, ABCD7, OLAH) and new lineage-specific genes are generated (e.g., CKAP2, NEK5) and have become targets of rapid chromosomal diversification and positive selection (e.g., RGPDs). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species for the first time.

Biorxiv | 2024

CTAT-LR-fusion: accurate fusion transcript identification from long and short read isoform sequencing at bulk or single cell resolution

Qian Qin, Victoria Popic, Houlin Yu, Emily White, Akanksha Khorgade, Asa Shin, Kirsty Wienand, Arthur Dondi, Niko Beerenwinkel, Francisca Vazquez, Aziz M. Al’Khafaji, Brian J. Haas

Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.

Biorxiv | 2024

Stimulated saliva has a distinct composition that influences release of volatiles from wine

Xinwei Ruan, Yipeng Chen, Aafreen Chauhan, Kate Howell

Aroma perception plays an important role in wine preference and evaluation and varies between groups of wine consumers. Saliva influences the release of aroma in the oral cavity. The composition of human saliva varies depending on stimulation; however, the compositional differences of stimulated and unstimulated saliva and their influences on aroma release have not been evaluated. In this study, we recruited healthy adults, of which 15 were Australian and 15 Chinese. Three types of saliva were collected from each participant: before, during, and after salivary stimulation. The collected salivary samples were characterised by flow rate, total protein concentration, esterase activity and microbiome composition by full-length 16S rRNA gene sequencing. The saliva samples were mixed with wine to investigate the differences in released volatiles by headspace solid-phase microextraction gas chromatography–mass spectrometry (HS-SPME-GC-MS). Differences in salivary composition and specific wine volatiles were found between Australian and Chinese participants, and amongst the three stimulation stages. Differential species were identified and significant correlations between the relative abundance of 3 bacterial species and 10 wine volatiles were observed. Our results confirm the influence of host factors and stimulation on salivary composition. Understanding the interactions of salivary components, especially salivary bacteria, on the release of aroma during wine tasting allows nuanced appreciation of the variability of flavour perception in wine consumers.

Biorxiv | 2024

Full-length transcript sequencing traces the brain isoform diversity in house mouse natural populations

The ability to generate multiple RNA isoforms of transcripts from the same gene is a general phenomenon in eukaryotes. However, the complexity and diversity of alternative isoforms in natural populations remains largely unexplored. Using a newly developed full-length transcripts enrichment protocol, we sequenced full-length RNA transcripts of 48 individuals from outbred populations and subspecies of Mus musculus, as well as the closely-related sister species Mus spretus and Mus spicilegus as outgroups. This represents the largest full-length high-quality isoform catalog at the population level to date. In total, we reliably identify 117,728 distinct transcripts, of which only 51% were previously annotated. We show that the population-specific distribution pattern of isoforms is phylogenetically informative and reflects the segregating SNP diversity between the populations. We find that ancient house-keeping genes are the major source to the overall isoform diversity, and the recruiting of alternative first exon plays the dominant role in generating new isoforms. Given that our data allow to distinguish between population-specific isoforms and isoforms that are conserved across multiple populations, it is possible to refine the annotation of the reference mouse genome to a set of about 40,000 isoforms that should be most relevant for comparative functional analysis across species.

Biorxiv | 2024

Adaptive diversification through structural variation in barley

Murukarthick Jayakodi, Qiongxian Lu, Hélène Pidon, M. Timothy Rabanus-Wallace, Micha Bayer, Thomas Lux, Yu Guo, Benjamin Jaegle, Ana Badea, Wubishet Bekele, Gurcharn S. Brar, Katarzyna Braune, Boyke Bunk, Kenneth J. Chalmers, Brett Chapman, Morten Egevang Jørgensen, Jia-Wu Feng, Manuel Feser, Anne Fiebig, Heidrun Gundlach, Wenbin Guo, Georg Haberer, Mats Hansson, Axel Himmelbach, Iris Hoffie, Robert E. Hoffie, Haifei Hu, Sachiko Isobe, Patrick König, Sandip M. Kale, Nadia Kamal, Gabriel Keeble-Gagnère, Beat Keller, Manuela Knauft, Ravi Koppolu, Simon G. Krattinger, Jochen Kumlehn, Peter Langridge, Chengdao Li, Marina P. Marone, Andreas Maurer, Klaus F.X. Mayer, Michael Melzer, Gary J. Muehlbauer, Emiko Murozuka, Sudharsan Padmarasu, Dragan Perovic, Klaus Pillen, Pierre A. Pin, Curtis J. Pozniak, Luke Ramsay, Pai Rosager Pedas, Twan Rutten, Shun Sakuma, Kazuhiro Sato, Danuta Schüler, Thomas Schmutzer, Uwe Scholz, Miriam Schreiber, Kenta Shirasawa, Craig Simpson, Birgitte Skadhauge, Manuel Spannagl, Brian J. Steffenson, Hanne C. Thomsen, Josquin F. Tibbits, Martin Toft Simmelsgaard Nielsen, Corinna Trautewig, Dominique Vequaud, Cynthia Voss, Penghao Wang, Robbie Waugh, Sharon Westcott, Magnus Wohlfahrt Rasmussen, Runxuan Zhang, Xiao-Qi Zhang, Thomas Wicker, Christoph Dockter, Martin Mascher, Nils Stein

Pangenomes are collections of annotated genome sequences of multiple individuals of a species. The structural variants uncovered by these datasets are a major asset to genetic analysis in crop plants. Here, we report a pangenome of barley comprising long-read sequence assemblies of 76 wild and domesticated genomes and short-read sequence data of 1,315 genotypes. An expanded catalogue of sequence variation in the crop includes structurally complex loci that have become hot spots of gene copy number variation in evolutionarily recent times. To demonstrate the utility of the pangenome, we focus on four loci involved in disease resistance, plant architecture, nutrient release, and trichome development. Novel allelic variation at a powdery mildew resistance locus and population-specific copy number gains in a regulator of vegetative branching were found. Expansion of a family of starch-cleaving enzymes in elite malting barleys was linked to shifts in enzymatic activity in micro-malting trials. Deletion of an enhancer motif is likely to change the developmental trajectory of the hairy appendages on barley grains. Our findings indicate that rapid evolution at structurally complex loci may have helped crop plants adapt to new selective regimes in agricultural ecosystems.

MedRxiv | 2024

Novel syndromic neurodevelopmental disorder caused by de novo deletion of CHASERR, a long noncoding RNA

Vijay S. Ganesh, Kevin Riquin, Nicolas Chatron, Kay-Marie Lamar, Miriam C. Aziz, Pauline Monin, Melanie O’Leary, Julia K. Goodrich, Kiran V. Garimella, Eleina England, Esther Yoon, Ben Weisburd, Francois Aguet, Carlos A. Bacino, David R. Murdock, Hongzheng Dai, Jill A. Rosenfeld, Lisa T. Emrick, Shamika Ketkar, Undiagnosed Diseases Network, Yael Sarusi, Damien Sanlaville, Saima Kayani, Brian Broadbent, Bertrand Isidor, Alisée Pengam, Benjamin Cogné, Daniel G. MacArthur, Igor Ulitsky, Gemma L. Carvill, Anne O’Donnell-Luria

Genes encoding long non-coding RNAs (lncRNAs) comprise a large fraction of the human genome, yet haploinsufficiency of a lncRNA has not been shown to cause a Mendelian disease. CHASERR is a highly conserved human lncRNA adjacent to CHD2–a coding gene in which de novo loss-of-function variants cause developmental and epileptic encephalopathy. Here we report three unrelated individuals each harboring an ultra-rare heterozygous de novo deletion in the CHASERR locus. We report similarities in severe developmental delay, facial dysmorphisms, and cerebral dysmyelination in these individuals, distinguishing them from the phenotypic spectrum of CHD2 haploinsufficiency. We demonstrate reduced CHASERR mRNA expression and corresponding increased CHD2 mRNA and protein in whole blood and patient-derived cell lines–specifically increased expression of the CHD2 allele in cis with the CHASERR deletion, as predicted from a prior mouse model of Chaserr haploinsufficiency. We show for the first time that de novo structural variants facilitated by Alu-mediated non-allelic homologous recombination led to deletion of a non-coding element (the lncRNA CHASERR) to cause a rare syndromic neurodevelopmental disorder. We also demonstrate that CHD2 has bidirectional dosage sensitivity in human disease. This work highlights the need to carefully evaluate other lncRNAs, particularly those upstream of genes associated with Mendelian disorders.

BMC Genomics | 2024

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity

Saranga Wijeratne, Maria E. Hernandez Gonzalez, Kelli Roach, Katherine E. Miller, Kathleen M. Schieffer, James R. Fitch, Jeffrey Leonard, Peter White, Benjamin J. Kelly, Catherine E. Cottrell, Elaine R. Mardis, Richard K. Wilson & Anthony R. Miller

Cancers exhibit complex transcriptomes with aberrant splicing that induces isoform-level differential expression compared to non-diseased tissues. Transcriptomic profiling using short-read sequencing has utility in providing a cost-effective approach for evaluating isoform expression, although short-read assembly displays limitations in the accurate inference of full-length transcripts. Long-read RNA sequencing (Iso-Seq), using the Pacific Biosciences (PacBio) platform, can overcome such limitations by providing full-length isoform sequence resolution which requires no read assembly and represents native expressed transcripts. A constraint of the Iso-Seq protocol is due to fewer reads output per instrument run, which, as an example, can consequently affect the detection of lowly expressed transcripts. To address these deficiencies, we developed a concatenation workflow, PacBio Full-Length Isoform Concatemer Sequencing (PB_FLIC-Seq), designed to increase the number of unique, sequenced PacBio long-reads thereby improving overall detection of unique isoforms. In addition, we anticipate that the increase in read depth will help improve the detection of moderate to low-level expressed isoforms.

Biorxiv | 2024

Long-read transcriptome sequencing of CLL and MDS patients uncovers molecular effects of SF3B1 mutations

Alicja Pacholewska, Matthias Lienhard, Mirko Brüggemann, Heike Hänel, Lorina Bilalli, Anja Königs, Kerstin Becker, Karl Köhrer, Jesko Kaiser, Holger Gohlke, Norbert Gattermann, Michael Hallek, Carmen D. Herling, Julian König, Christina Grimm, Ralf Herwig, Kathi Zarnack, Michal R. Schweiger

Background Mutations in splicing factor 3B subunit 1 (SF3B1) frequently occur in patients with chronic lymphocytic leukemia (CLL) and myelodysplastic syndromes (MDS). These mutations have a different effect on the disease prognosis with beneficial effect in MDS and worse prognosis in CLL patients. A full-length transcriptome approach can expand our knowledge on SF3B1 mutation effects on RNA splicing and its contribution to patient survival and treatment options. Results We applied long-read transcriptome sequencing to 44 MDS and CLL patients with and without SF3B1 mutations and found > 60% of novel isoforms. Splicing alterations were largely shared between cancer types and specifically affected the usage of introns and 3’ splice sites. Our data highlighted a constrained window at canonical 3’ splice sites in which dynamic splice site switches occurred in SF3B1-mutated patients. Using transcriptome-wide RNA binding maps and molecular dynamics simulations, we showed multimodal SF3B1 binding at 3’ splice sites and predicted reduced RNA binding at the second binding pocket of SF3B1K700E. Conclusions Our work presents the hitherto most complete long-read transcriptome sequencing study in CLL and MDS and provides a resource to study aberrant splicing in cancer. Moreover, we showed that different disease prognosis results most likely from the different cell types expanded during cancerogenesis rather than different mechanism of action of the mutated SF3B1. These results have important implications for understanding the role of SF3B1 mutations in hematological malignancies and other related diseases.

Biorxiv | 2024

Microsatellite break-induced replication generates highly mutagenized extrachromosomal circular DNAs

Rujuta Yashodhan Gadgil, S. Dean Rider Jr, Resha Shrestha, Venicia Alhawach, David C. Hitch, Michael Leffak

Extrachromosomal circular DNAs (eccDNAs) are produced from all regions of the eucaryotic genome. In tumors, highly transcribed eccDNAs have been implicated in oncogenesis, neoantigen production and resistance to chemotherapy. Here we show that unstable microsatellites capable of forming hairpin, triplex, quadruplex and AT-rich structures generate eccDNAs when integrated at a common ectopic site in human cells. These non-B DNA prone microsatellites form eccDNAs by replication-dependent mechanisms. The microsatellite-based eccDNAs are highly mutagenized and display template switches to sister chromatids and to nonallelic chromosomal sites. High frequency mutagenesis occurs within the eccDNA microsatellites and extends bidirectionally for several kilobases into flanking DNA and nonallelic DNA. Mutations include mismatches, short duplications, longer nontemplated insertions and large deletions. Template switching leads to recurrent deletions and recombination domains within the eccDNAs. Template switching events are microhomology-mediated, but do not occur at all potential sites of complementarity. Each microsatellite exhibits a distinct pattern of recombination, microhomology choice and base substitution signature. Depletion of Rad51, the COPS2 signalosome subunit or POLη alter the eccDNA mutagenic profiles. We propose an asynchronous capture model based on break-induced replication from microsatellite-induced DNA breaks for the generation and circularization of mutagenized eccDNAs and genomic homologous recombination deficiency (HRD) scars.

Biorxiv | 2024

De novo annotation of the wheat pan-genome reveals complexity and diversity within the hexaploid wheat pan-transcriptome

Benjamen White1^†, Thomas Lux2^†, Rachel L Rusholme-Pilcher^1†, Angéla Juhász^6†, Gemy Kaithakottil¹, Susan Duncan¹³, James Simmonds³, Hannah Rees¹, Jonathan Wright¹, Joshua Colmer¹, Sabrina Ward¹, Ryan Joynson¹⁴, Benedict Coombes¹, Naomi Irish¹, Suzanne Henderson¹, Tom Barker¹, Helen Chapman¹, Leah Catchpole¹, Karim Gharbi¹, Moeko Okada⁵¹⁶¹⁷, Hirokazu Handa¹⁸, Shuhei Nasuda¹⁹, Kentaro K. Shimizu⁵¹⁶, Heidrun Gundlach², Daniel Lang², Guy Naamati⁷, Erik J. Legg⁸, Arvind K. Bharti⁸, Michelle L. Colgrave⁶⁹, Wilfried Haerty¹, Cristobal Uauy³, David Swarbreck¹, Philippa Borril^l3, Jesse A. Poland¹⁰, Simon Krattinger¹⁰, Nils Stein¹¹¹⁵, Klaus F.X. Mayer²¹², Curtis Pozniak¹³, 10+ Wheat Genome Project, Manuel Spannag^l2* and Anthony Hall1^14* 1) Earlham Institute, Norwich Research Park, Norwich, NR4 7UH, UK 2) PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Researc h Center for Environmental Health, Neuherberg, Germany 3) John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK 4) Limagrain Europe, Clermont-Ferrand, Auvergne-Rhône-Alpes, France 5) Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland 6) Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, School of Science, Edith Cowan University, Joondalup, WA, 6027, Australia 7) EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK 8) Syngenta Crop Protection, Research Triangle Park, NC, USA 9) CSIRO Agriculture and Food, St Lucia, QLD 4067, Australia 10) Plant Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia 11) Center of Integrated Breeding Research (CiBreed), Georg-August-University, Göttingen, Germany 12) School of Life Sciences, Technical University Munich, Freising, Germany 13) Crop Development Centre, The University of Saskatchewan, Saskatoon, Canada 14) School of Biological Sciences, University of East Anglia, Norwich, UK 15) Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany 16) Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan 17) Graduate School of Science and Technology, Niigata University, Niigata, Japan 18) Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto, Japan 19) Graduate School of Agriculture, Kyoto University, Kyoto, Japan

Wheat is the most widely cultivated crop in the world with over 215 million hectares grown annually. However, to meet the demands of a growing global population, breeders face the challenge of increasing wheat production by approximately 60% within the next 40 years. The 10+ Wheat Genomes Project recently sequenced and assembled the genomes of 15 wheat cultivars to develop our understanding of genetic diversity and selection within the pan-genome of wheat. Here, we provide a wheat pan-transcriptome with de novo annotation and differential expression analysis for nine of these wheat cultivars, across multiple different tissues and whole seedlings sampled at dusk/dawn. Analysis of these de novo annotations facilitated the discovery of genes absent from the Chinese Spring reference, identified genes specific to particular cultivars and defined the core and dispensable genomes. Expression analysis across cultivars and tissues revealed conservation in expression between a large core set of homoeologous genes, but also widespread changes in sub-genome homoeolog expression bias between cultivars. Co-expression network analysis revealed the impact of divergence of sub-genome homoeolog expression and identified tissue-associated cultivar-specific expression profiles. In summary, this work provides both a valuable resource for the wider wheat community and reveals diversity in gene content and expression patterns between global wheat cultivars.

Nature Biotechnology | 2024

Characterization and visualization of tandem repeats at genome scale

Egor Dolzhenko, Adam English, Harriet Dashnow, Guilherme De Sena Brandine, Tom Mokveld, William J. Rowell, Caitlin Karniski, Zev Kronenberg, Matt C. Danzi, Warren A. Cheung, Chengpeng Bi, Emily Farrow, Aaron Wenger, Khi Pin Chua, Verónica Martínez-Cerdeño, Trevor D. Bartley, Peng Jin, David L. Nelson, Stephan Zuchner, Tomi Pastinen, Aaron R. Quinlan, Fritz J. Sedlazeck & Michael A. Eberle

disease research, HiFi sequencing, cancer research, tandem repeats, repeat expansions, allele phasing, monogenic disease

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.

Nature Biotechnology | 2024

High-quality metagenome assembly from long accurate reads with metaMDBG

Gaëtan Benoit, Sébastien Raguideau, Robert James, Adam M. Phillippy, Rayan Chikhi and Christopher Quince

We introduce metaMDBG, a metagenomics assembler for PacBio HiFi reads. MetaMDBG combines a de Bruijn graph assembly in a minimizer space with an iterative assembly over sequences of minimizers to address variations in genome coverage depth and an abundance-based filtering strategy to simplify strain complexity. For complex communities, we obtained up to twice as many high-quality circularized prokaryotic metagenome-assembled genomes as existing methods and had better recovery of viruses and plasmids.

Science Direct | 2023

Third-generation sequencing for genetic disease

Xiaoting Ling, Chenghan Wang, Linlin Li, Liqiu Pan, Chaoyu Huang, Caixia Zhang, Yunhua Huang, Yuling Qiu, Faquan Lin, and Yifang Huang

Third-generation sequencing (TGS) has led to a brave new revolution in detecting genetic diseases over the last few years. TGS has been rapidly developed for genetic disease applications owing to its significant advantages such as long read length, rapid detection, and precise detection of complex and rare structural variants. This approach greatly improves the efficiency of disease diagnosis and complements the shortcomings of short-read sequencing. In this paper, we first briefly introduce the working mechanism of one of the most important representatives of TGS, single-molecule real-time (SMRT) sequencing by Pacific Bioscience (PacBio), followed by a review and comparison of the advantages and disadvantages of different sequencing technologies. Finally, we focused on the progress of SMRT sequencing applications in genetic disease detection. Future perspectives on the applications of TGS in other fields were also presented. With the continuous innovation of the SMRT technologies and the expansion of their fields of application, SMRT sequencing has broad clinical application prospects in genetic diseases detection, and is expected to become an important tool for the molecular diagnosis of other diseases.

Quick search

Quick search is faster but may return fewer results.

Advanced search

Advanced search allows you to search more fields but may take longer.

ALS case study

Scientific publications

Publications featuring PacBio long-read + short-read sequencing data

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Quick search

Advanced search

Talk with an expert