Menu
Sprite decoration

Scientific publications

Publications featuring PacBio long-read + short-read sequencing data

Biorxiv  |  2024

Long-read transcriptome sequencing of CLL and MDS patients uncovers molecular effects of SF3B1 mutations

Alicja Pacholewska, Matthias Lienhard, Mirko Brüggemann, Heike Hänel, Lorina Bilalli, Anja Königs, Kerstin Becker, Karl Köhrer, Jesko Kaiser, Holger Gohlke, Norbert Gattermann, Michael Hallek, Carmen D. Herling, Julian König, Christina Grimm, Ralf Herwig, Kathi Zarnack, Michal R. Schweiger

Background Mutations in splicing factor 3B subunit 1 (SF3B1) frequently occur in patients with chronic lymphocytic leukemia (CLL) and myelodysplastic syndromes (MDS). These mutations have a different effect on the disease prognosis with beneficial effect in MDS and worse prognosis in CLL patients. A full-length transcriptome approach can expand our knowledge on SF3B1 mutation effects on RNA splicing and its contribution to patient survival and treatment options. Results We applied long-read transcriptome sequencing to 44 MDS and CLL patients with and without SF3B1 mutations and found > 60% of novel isoforms. Splicing alterations were largely shared between cancer types and specifically affected the usage of introns and 3’ splice sites. Our data highlighted a constrained window at canonical 3’ splice sites in which dynamic splice site switches occurred in SF3B1-mutated patients. Using transcriptome-wide RNA binding maps and molecular dynamics simulations, we showed multimodal SF3B1 binding at 3’ splice sites and predicted reduced RNA binding at the second binding pocket of SF3B1K700E. Conclusions Our work presents the hitherto most complete long-read transcriptome sequencing study in CLL and MDS and provides a resource to study aberrant splicing in cancer. Moreover, we showed that different disease prognosis results most likely from the different cell types expanded during cancerogenesis rather than different mechanism of action of the mutated SF3B1. These results have important implications for understanding the role of SF3B1 mutations in hematological malignancies and other related diseases.
Biorxiv  |  2024

Microsatellite break-induced replication generates highly mutagenized extrachromosomal circular DNAs

Rujuta Yashodhan Gadgil, S. Dean Rider Jr, Resha Shrestha, Venicia Alhawach, David C. Hitch, Michael Leffak

Extrachromosomal circular DNAs (eccDNAs) are produced from all regions of the eucaryotic genome. In tumors, highly transcribed eccDNAs have been implicated in oncogenesis, neoantigen production and resistance to chemotherapy. Here we show that unstable microsatellites capable of forming hairpin, triplex, quadruplex and AT-rich structures generate eccDNAs when integrated at a common ectopic site in human cells. These non-B DNA prone microsatellites form eccDNAs by replication-dependent mechanisms. The microsatellite-based eccDNAs are highly mutagenized and display template switches to sister chromatids and to nonallelic chromosomal sites. High frequency mutagenesis occurs within the eccDNA microsatellites and extends bidirectionally for several kilobases into flanking DNA and nonallelic DNA. Mutations include mismatches, short duplications, longer nontemplated insertions and large deletions. Template switching leads to recurrent deletions and recombination domains within the eccDNAs. Template switching events are microhomology-mediated, but do not occur at all potential sites of complementarity. Each microsatellite exhibits a distinct pattern of recombination, microhomology choice and base substitution signature. Depletion of Rad51, the COPS2 signalosome subunit or POLη alter the eccDNA mutagenic profiles. We propose an asynchronous capture model based on break-induced replication from microsatellite-induced DNA breaks for the generation and circularization of mutagenized eccDNAs and genomic homologous recombination deficiency (HRD) scars.
Biorxiv  |  2024

De novo annotation of the wheat pan-genome reveals complexity and diversity within the hexaploid wheat pan-transcriptome

Benjamen White1, Thomas Lux2, Rachel L Rusholme-Pilcher1†, Angéla Juhász6†, Gemy Kaithakottil1, Susan Duncan13, James Simmonds3, Hannah Rees1, Jonathan Wright1, Joshua Colmer1, Sabrina Ward1, Ryan Joynson14, Benedict Coombes1, Naomi Irish1, Suzanne Henderson1, Tom Barker1, Helen Chapman1, Leah Catchpole1, Karim Gharbi1, Moeko Okada51617, Hirokazu Handa18, Shuhei Nasuda19, Kentaro K. Shimizu516, Heidrun Gundlach2, Daniel Lang2, Guy Naamati7, Erik J. Legg8, Arvind K. Bharti8, Michelle L. Colgrave69, Wilfried Haerty1, Cristobal Uauy3, David Swarbreck1, Philippa Borrill3, Jesse A. Poland10, Simon Krattinger10, Nils Stein1115, Klaus F.X. Mayer212, Curtis Pozniak13, 10+ Wheat Genome Project, Manuel Spannagl2* and Anthony Hall114* 1) Earlham Institute, Norwich Research Park, Norwich, NR4 7UH, UK 2) PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Researc h Center for Environmental Health, Neuherberg, Germany 3) John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK 4) Limagrain Europe, Clermont-Ferrand, Auvergne-Rhône-Alpes, France 5) Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland 6) Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, School of Science, Edith Cowan University, Joondalup, WA, 6027, Australia 7) EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK 8) Syngenta Crop Protection, Research Triangle Park, NC, USA 9) CSIRO Agriculture and Food, St Lucia, QLD 4067, Australia 10) Plant Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia 11) Center of Integrated Breeding Research (CiBreed), Georg-August-University, Göttingen, Germany 12) School of Life Sciences, Technical University Munich, Freising, Germany 13) Crop Development Centre, The University of Saskatchewan, Saskatoon, Canada 14) School of Biological Sciences, University of East Anglia, Norwich, UK 15) Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany 16) Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan 17) Graduate School of Science and Technology, Niigata University, Niigata, Japan 18) Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto, Japan 19) Graduate School of Agriculture, Kyoto University, Kyoto, Japan

Wheat is the most widely cultivated crop in the world with over 215 million hectares grown annually. However, to meet the demands of a growing global population, breeders face the challenge of increasing wheat production by approximately 60% within the next 40 years. The 10+ Wheat Genomes Project recently sequenced and assembled the genomes of 15 wheat cultivars to develop our understanding of genetic diversity and selection within the pan-genome of wheat. Here, we provide a wheat pan-transcriptome with de novo annotation and differential expression analysis for nine of these wheat cultivars, across multiple different tissues and whole seedlings sampled at dusk/dawn. Analysis of these de novo annotations facilitated the discovery of genes absent from the Chinese Spring reference, identified genes specific to particular cultivars and defined the core and dispensable genomes. Expression analysis across cultivars and tissues revealed conservation in expression between a large core set of homoeologous genes, but also widespread changes in sub-genome homoeolog expression bias between cultivars. Co-expression network analysis revealed the impact of divergence of sub-genome homoeolog expression and identified tissue-associated cultivar-specific expression profiles. In summary, this work provides both a valuable resource for the wider wheat community and reveals diversity in gene content and expression patterns between global wheat cultivars.
Nature Biotechnology  |  2024

Characterization and visualization of tandem repeats at genome scale

Egor Dolzhenko, Adam English, Harriet Dashnow, Guilherme De Sena Brandine, Tom Mokveld, William J. Rowell, Caitlin Karniski, Zev Kronenberg, Matt C. Danzi, Warren A. Cheung, Chengpeng Bi, Emily Farrow, Aaron Wenger, Khi Pin Chua, Verónica Martínez-Cerdeño, Trevor D. Bartley, Peng Jin, David L. Nelson, Stephan Zuchner, Tomi Pastinen, Aaron R. Quinlan, Fritz J. Sedlazeck & Michael A. Eberle

tandem repeats, HiFi sequencing, cancer research, repeat expansions, disease research, allele phasing, monogenic disease

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Nature Biotechnology  |  2024

High-quality metagenome assembly from long accurate reads with metaMDBG

Gaëtan Benoit, Sébastien Raguideau, Robert James, Adam M. Phillippy, Rayan Chikhi and Christopher Quince

We introduce metaMDBG, a metagenomics assembler for PacBio HiFi reads. MetaMDBG combines a de Bruijn graph assembly in a minimizer space with an iterative assembly over sequences of minimizers to address variations in genome coverage depth and an abundance-based filtering strategy to simplify strain complexity. For complex communities, we obtained up to twice as many high-quality circularized prokaryotic metagenome-assembled genomes as existing methods and had better recovery of viruses and plasmids.
Science Direct  |  2023

Third-generation sequencing for genetic disease

Xiaoting Ling, Chenghan Wang, Linlin Li, Liqiu Pan, Chaoyu Huang, Caixia Zhang, Yunhua Huang, Yuling Qiu, Faquan Lin, and Yifang Huang

Third-generation sequencing (TGS) has led to a brave new revolution in detecting genetic diseases over the last few years. TGS has been rapidly developed for genetic disease applications owing to its significant advantages such as long read length, rapid detection, and precise detection of complex and rare structural variants. This approach greatly improves the efficiency of disease diagnosis and complements the shortcomings of short-read sequencing. In this paper, we first briefly introduce the working mechanism of one of the most important representatives of TGS, single-molecule real-time (SMRT) sequencing by Pacific Bioscience (PacBio), followed by a review and comparison of the advantages and disadvantages of different sequencing technologies. Finally, we focused on the progress of SMRT sequencing applications in genetic disease detection. Future perspectives on the applications of TGS in other fields were also presented. With the continuous innovation of the SMRT technologies and the expansion of their fields of application, SMRT sequencing has broad clinical application prospects in genetic diseases detection, and is expected to become an important tool for the molecular diagnosis of other diseases.
Nature Communications  |  2023

Isoform-resolved transcriptome of the human preimplantation embryo

Denis Torre, Nancy J. Francoeur, Yael Kalma, Ilana Gross Carmel, Betsaida S. Melo, Gintaras Deikus, Kimaada Allette, Ron Flohr, Maya Fridrikh, Konstantinos Vlachos, Kent Madrid, Hardik Shah, Ying-Chih Wang, Shwetha H. Sridhar, Melissa L. Smith, Efrat Eliyahu, Foad Azem, Hadar Amir, Yoav Mayshar, Ivan Marazzi, Ernesto Guccione, Eric Schadt, Dalit Ben-Yosef & Robert Sebra

Human preimplantation development involves extensive remodeling of RNA expression and splicing. However, its transcriptome has been compiled using short-read sequencing data, which fails to capture most full-length mRNAs. Here, we generate an isoform-resolved transcriptome of early human development by performing long- and short-read RNA sequencing on 73 embryos spanning the zygote to blastocyst stages. We identify 110,212 unannotated isoforms transcribed from known genes, including highly conserved protein-coding loci and key developmental regulators. We further identify 17,964 isoforms from 5,239 unannotated genes, which are largely non-coding, primate-specific, and highly associated with transposable elements. These isoforms are widely supported by the integration of published multi-omics datasets, including single-cell 8CLC and blastoid studies. Alternative splicing and gene co-expression network analyses further reveal that embryonic genome activation is associated with splicing disruption and transient upregulation of gene modules. Together, these findings show that the human embryo transcriptome is far more complex than currently known, and will act as a valuable resource to empower future studies exploring development.
Biorxiv  |  2023

De novo genome assemblies from two Indigenous Americans from Arizona identify new polymorphisms in non-reference sequences

Çiğdem Köroğlu, Peng Chen, Michael Traurig, Serdar Altok, Clifton Bogardus, Leslie J Baier

There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. To help address this issue, we created a modified hg38 reference map using de novo sequence assemblies from Indigenous Americans living in Arizona (IAZ). Using HiFi SMRT long-read sequencing technology, we generated de novo genome assemblies for one female and one male IAZ individual. Each assembly included ∼17 Mb of DNA sequence not present (non-reference sequence; NRS) in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with WGS sequencing data from 387 IAZ cohorts using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (MAF = 0.45) compared to Caucasians (MAF = 0.15) and African Americans (MAF = 0.03). This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an under-represented ethnic groups and thereby lead to the discovery of previously missed common variations.
Biorxiv  |  2023

Characterization of Alternative Splicing During Mammalian Brain Development Reveals the Magnitude of Isoform Diversity and its Effects on Protein Conformational Changes

Leila Haj Abdullah Alieh, Beatriz Cardoso de Toledo, Anna Hadarovich, Agnes Toth-Petroczy, Federico Calegari

Regulation of gene expression is critical for fate commitment of stem and progenitor cells during tissue formation. In the context of mammalian brain development, a plethora of studies have described how changes in the expression of individual genes characterize cell types across ontogeny and phylogeny. However, little attention was paid to the fact that different transcripts can arise from any given gene through alternative splicing (AS). Considered a key mechanism expanding transcriptome diversity during evolution, assessing the full potential of AS on isoform diversity and protein function has been notoriously difficult. Here we capitalize on the use of a validated reporter mouse line to isolate neural stem cells, neurogenic progenitors and neurons during corticogenesis and combine the use of short- and long-read sequencing to reconstruct the full transcriptome diversity characterizing neurogenic commitment. Extending available transcriptional profiles of the mammalian brain by nearly 50,000 new isoforms, we found that neurogenic commitment is characterized by a progressive increase in exon inclusion resulting in the profound remodeling of the transcriptional profile of specific cortical cell types. Most importantly, we computationally infer the biological significance of AS on protein structure by using AlphaFold2 and revealing how radical protein conformational changes can arise from subtle changes in isoforms sequence. Together, our study reveals that AS has a greater potential to impact protein diversity and function than previously thought independently from changes in gene expression.
Biorxiv  |  2023

The single-molecule accessibility landscape of newly replicated mammalian chromatin

Megan S Ostrowski, Marty G Yang, Colin P McNally, Nour J Abdulhay, Simai Wang, Elphège P Nora, Hani Goodarzi, Vijay Ramani

The higher-order structure of newly replicated (i.e. ‘nascent’) chromatin fibers remains poorly-resolved, limiting our understanding of how epigenomes are maintained across cell divisions. To address this, we present Replication-Aware Single-molecule Accessibility Mapping (RASAM), a long-read sequencing method that nondestructively measures genome-wide replication-status and protein-DNA interactions simultaneously on intact chromatin templates. We report that individual human and mouse nascent chromatin fibers are ‘hyperaccessible’ compared to steady-state chromatin. This hyperaccessibility occurs at two, coupled length-scales: first, individual nucleosome core particles on nascent DNA exist as a mixture of partially-unwrapped nucleosomes and other subnucleosomal species; second, newly-replicated chromatin fibers are significantly enriched for irregularly-spaced nucleosomes on individual DNA molecules. Focusing on specific cis-regulatory elements (e.g. transcription factor binding sites; active transcription start sites [TSSs]), we discover unique modes by which nascent chromatin hyperaccessibility is resolved at the single-molecule level: at CCCTC-binding factor (CTCF) binding sites, CTCF and nascent nucleosomes compete for motifs on nascent chromatin fibers, resulting in quantitatively-reduced CTCF occupancy and motif accessibility post-replication; at active TSSs, high levels of steady-state chromatin accessibility are preserved, implying that nucleosome free regions (NFRs) are rapidly re-established behind the fork. Our study introduces a new paradigm for studying higher-order chromatin fiber organization behind the replication fork. More broadly, we uncover a unique organization of newly replicated chromatin that must be reset by active processes, providing a substrate for epigenetic reprogramming.
Biorxiv  |  2023

CLN3 transcript complexity revealed by long-read RNA sequencing analysis

Hao-Yu Zhang, Christopher Minnis, Emil Gustavsson, Mina Ryten, Sara E Mole

Background Batten disease is a group of rare inherited neurodegenerative diseases. Juvenile CLN3 disease is the most prevalent type, and the most common mutation shared by most patients is the “1-kb” deletion which removes two internal coding exons (7 and 8) in CLN3. Previously, we identified two transcripts in patient fibroblasts homozygous for the “1-kb” deletion: the “major” and “minor” transcripts. To understand the full variety of disease transcripts and their role in disease pathogenesis, it is necessary to first investigate CLN3 transcription in “healthy” samples without juvenile CLN3 disease. Methods We leveraged PacBio long-read RNA sequencing datasets from ENCODE to investigate the full range of CLN3 transcripts across various tissues and cell types in human control samples. Then we sought to validate their existence using data from different sources.
Biorxiv  |  2023

Synchronized long-read genome, methylome, epigenome, and transcriptome for resolving a Mendelian condition

Mitchell R. Vollger1 2, Jonas Korlach3, Kiara C. Eldred4, Elliott Swanson1, Jason G. Underwood3, Yong-Han H. Cheng1, Jane Ranchalis2, Yizi Mao2, Elizabeth E. Blue2 5 6, Ulrike Schwarze7, Katherine M. Munson1, Christopher T. Saunders3, Aaron M. Wenger3, Aimee Allworth2, Sirisak Chanprasert2, Brittney L. Duerden8, Ian Glass9 6, Martha Horike-Pyne2, Michelle Kim3, Kathleen A. Leppig10, Ian J. McLaughlin3, Jessica Ogawa1, Elisabeth A. Rosenthal2, Sam Sheppeard2, Stephanie M. Sherman2, Samuel Strohbehn2, Amy L. Yuen10, University of Washington Center for Mendelian Genomics (UW-CMG), Undiagnosed Diseases Network (UDN), Thomas A. Reh4, Peter H. Byers7,2, Michael J. Bamshad9,6, Fuki M. Hisama2,6, Gail P. Jarvik1,2,6, Yasemin Sancak1,2, Katrina M. Dipple9,6 and Andrew B. Stergachis1,2,6,† 1) University of Washington School of Medicine, Department of Genome Sciences, Seattle, WA, USA 2) University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA 3) PacBio, Menlo Park, CA, USA 4) University of Washington School of Medicine, Department of Biological Structure, Seattle, WA, USA 5) Institute for Public Health Genetics, University of Washington, Seattle, WA, USA 6) Brotman Baty Institute for Precision Medicine, Seattle, WA, USA 7) University of Washington School of Medicine, Department of Laboratory Medicine and Pathology, Seattle, WA, USA 8) Mary Bridge/MultiCare, Tacoma, WA, USA 9) University of Washington, Department of Pediatrics, Seattle, WA, USA 10) Genetic Services, Kaiser Permanente Washington, Seattle, Washington, USA 11) Case Western Reserve University, Cleveland, OH 12) University of Washington School of Medicine, Department of Pharmacology, Seattle, WA, USA ↵†) Corresponding author. (e-mail [email protected])

Mendelian disease, full-length transcripts, haplotype-resolved CpG methylation, chromatin accessibility

Resolving the molecular basis of a Mendelian condition (MC) remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome, and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion-deletion, and structural variant calling and diploid de novo genome assembly, and permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility, and full-length transcript information in a single long-read sequencing run. Application of this approach to an Undiagnosed Diseases Network (UDN) participant with a chromosome X;13 balanced translocation of uncertain significance revealed that this translocation disrupted the functioning of four separate genes (NBEA, PDK3, MAB21L1, and RB1) previously associated with single-gene MCs. Notably, the function of each gene was disrupted via a distinct mechanism that required integration of the four ‘omes’ to resolve. These included nonsense-mediated decay, fusion transcript formation, enhancer adoption, transcriptional readthrough silencing, and inappropriate X chromosome inactivation of autosomal genes. Overall, this highlights the utility of synchronized long-read multi-omic profiling for mechanistically resolving complex phenotypes.
Biorxiv  |  2023

Multi-omic profiling of pathogen-stimulated primary immune cells

Renee Salz, Emil E. Vorsteveld, Caspar I. van der Made, Simone Kersten, Merel Stemerdink, Tsung-han Hsieh, Musa Mhlanga, Mihai G. Netea, Pieter-Jan Volders, Alexander Hoischen, Peter A.C. ’t Hoen

Objectives To perform long-read transcriptome and proteome profiling of pathogen-stimulated peripheral blood mononuclear cells (PBMCs) from healthy donors. We aim to discover new transcripts and protein isoforms expressed during immune responses to diverse pathogens. Methods PBMCs were exposed to four microbial stimuli for 24 hours: the TLR4 ligand lipopolysaccharide (LPS), the TLR3 ligand Poly(I:C), heat-inactivated Staphylococcus aureus, Candida albicans, and RPMI medium as negative controls. Long-read sequencing (PacBio) of one donor and secretome proteomics and short-read sequencing of five donors were performed. IsoQuant was used for transcriptome construction, Metamorpheus/FlashLFQ for proteome analysis, and Illumina short-read 3’-end mRNA sequencing for transcript quantification. Results Long-read transcriptome profiling reveals the expression of novel sequences and isoform switching induced upon pathogen stimulation, including transcripts that are difficult to detect using traditional short-read sequencing. We observe widespread loss of intron retention as a common result of all pathogen stimulations. We highlight novel transcripts of NFKB1 and CASP1 that may indicate novel immunological mechanisms. In general, RNA expression differences did not result in differences in the amounts of secreted proteins. Interindividual differences in the proteome were larger than the differences between stimulated and unstimulated PBMCs. Clustering analysis of secreted proteins revealed a correlation between chemokine (receptor) expression on the RNA and protein levels in C. albicans- and Poly(I:C)-stimulated PBMCs. Conclusion Isoform aware long-read sequencing of pathogen-stimulated immune cells highlights the potential of these methods to identify novel transcripts, revealing a more complex transcriptome landscape than previously appreciated.
Journal of Maternal-Fetal & Neonatal Medicine  |  2023

Identification of a novel 91.5 kb-deletion (αα)FJ in the α-globin gene cluster using single-molecule real-time (SMRT) sequencing

Liangpu Xu ,Meihuan Chen,Junhao Zheng,Siwen Zhang,Min Zhang,Lingji Chen, et al

Objectives To present a novel 91.5-kb deletion of the α-globin gene cluster (αα)FJ identified by genetic assay and prenatal diagnosis in a Chinese family. Subjects and Methods The proband was a 34-year-old G3P1 (Gravida 3, Para 1) female at the gestational age of 21+ weeks with a history of an edematous fetus. A routine genetic assay (reverse dot blot hybridization, RDB) was performed to detect common thalassemia mutations. Multiplex ligation-dependent probe amplification (MLPA) and single-molecule real-time technology (SMRT) were used to detect rare thalassemia mutations. Results The hematological phenotypes of the proband, her mother, elder sister, husband, daughter, and nephew were consistent with the phenotype of α-thalassemia trait. No mutations were found in these family members by RDB, except for the proband’s husband who carried an α-globin gene deletion --SEA/αα. MLPA results showed that the proband and other α-thalassemia-suspected relatives had heterozygous deletions around the POLR3K-3-463nt, HS40-178nt, and HBA-HS40-382nt probes. The 5′-breakpoint was out of probe scope and could not be determined. SMRT was performed and a 91.5-kb deletion (NC_000016.10: g.39268_130758del) in the α-globin gene cluster (αα)FJ was identified in the proband and other suspected relatives, which could explain their phenotypes. At the proband’s gestational age of 22+ weeks, an amniotic fluid sample was collected and analyzed. As only the 91.5-kb deletion (αα)FJ was identified in the fetus with RDB, MLPA, and SMRT. The proband was suggested to continue the pregnancy. Conclusion We first reported a 91.5-kb deletion (NC_000016.10: g.hg38-chr16:39268-_130758del) of the HS-40 region in the α-globin gene cluster (αα)FJ identified in a Chinese family. Since the HS-40 loss of heterozygosity in combination with the heterozygous deletion --SEA might result in Hb Bart’s hydrops fetalis, routine genetic assay, and SMRT were recommended to individuals at risk for prenatal diagnosis.
Biorxiv  |  2023

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Jarkko Salojärvi, Aditi Rambani, Zhe Yu, Romain Guyot, Susan Strickler, Maud Lepelley, Cui Wang, et al (see full list in article)

Coffea arabica, an allotetraploid hybrid of C. eugenioides and C. canephora, is the source of approximately 60% of coffee products worldwide. Cultivated accessions have undergone several population bottlenecks resulting in low genetic diversity. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, which show a mosaic pattern of dominance, similar to other polyploid crop species. Resequencing of 39 wild and cultivated accessions suggests a founding polyploidy event ∼610,000 years ago, followed by several subsequent bottlenecks, including a population split ∼30.5 kya and a period of migration between Arabica populations until ∼8.9 kya. Analysis of lines historically introgressed with C. canephora highlights loci that may contribute to their superior pathogen resistance and lay the groundwork for future genomics-based breeding of C. arabica.
Quick search

Quick search is faster but may return fewer results.

Advanced search

Advanced search allows you to search more fields but may take longer.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.