Menu
July 19, 2019

Genome structural diversity among 31 Bordetella pertussis isolates from two recent U.S. whooping cough statewide epidemics

During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B.~pertussis populations.IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B.~pertussis strains circulating during epidemics exhibit diversity visible on a genome structural level, previously undetectable by traditional sequence analysis using short-read technologies. For the first time, we combine short- and long-read sequencing platforms with restriction optical mapping for single-contig, de novo assembly of 31 isolates to investigate two geographically and temporally independent U.S. pertussis epidemics. These complete genomes reshape our understanding of B.~pertussis evolution and strengthen molecular epidemiology toward one day understanding the resurgence of pertussis.


July 19, 2019

Single-molecule sequencing reveals complex genomic variation of hepatitis B virus during 15 years of chronic infection following liver transplantation.

Chronic hepatitis B (CHB) is prevalent worldwide. The infectious agent, hepatitis B virus (HBV) replicates via an RNA intermediate and is error-prone, leading to rapid generation of closely related but not identical viral variants, including those that can escape host immune responses and antiviral treatments. The complexity of CHB can be further enhanced by the presence of HBV variants with large deletions in the genome, generated via splicing (spHBV). Although spHBV variants are incapable of autonomous replication, their replication is rescued by wild-type HBV. SpHBV variants have been shown to enhance wild-type virus replication, and their prevalence increases with liver disease progression. Single-molecule deep sequencing was performed on whole HBV genomes extracted from longitudinal samples of a post-liver transplant CHB subject, collected over a 15-year period that included the liver explant. By employing novel bioinformatics methods, this analysis showed a complex dynamics of the viral population across a period of changing treatment regimens. The spHBV detected in the liver explant remained present post-transplantation, along with emergence of a highly diverse novel spHBV population as well as variants with multiple deletions in the preS genes. The identification of novel mutations outside the HBV reverse transcriptase gene that co-occur with known drug resistant mutations, highlight the relevance of using full genome deep sequencing and support the hypothesis that drug resistance involves interactions across the full-length HBV genome.Single-molecule sequencing allowed characterising, in unprecedented detail, the evolution of HBV populations and offered unique insights into the dynamics of defective and spHBV variants following liver transplantation and complex treatment regimes. This analysis also showed rapid adaptation of HBV populations to treatment regimens with evolving drug resistance phenotypes and evidence of purifying selection across the whole genome. Finally, the new open source bioinformatics tools are freely available, with the capacity to easily identify potential spliced variants from deep sequencing data. Copyright © 2016, American Society for Microbiology. All Rights Reserved.


July 19, 2019

AgIn: Measuring the landscape of CpG methylation of individual repetitive elements.

Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it produces long read lengths, and its kinetic information is sensitive to DNA modifications.We propose a novel linear-time algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Using a practical read coverage of ~30-fold from an inbred strain medaka (Oryzias latipes), we observed that both the sensitivity and precision of our method on individual CpG sites were ~93.7%. We also observed a high correlation coefficient (R?=?0.884) between our method and bisulfite sequencing, and for 92.0% of CpG sites, methylation levels ranging over [0, 1] were in concordance within an acceptable difference 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and hypomethylation and detecting hypomethylation hot spots of LTRs and LINEs. We uncovered the methylation states for nearly identical active transposons, two novel LINE insertions of identity ~99% and length 6050 base pairs (bp) in the human genome, and 16 Tol2 elements of identity >99.8% and length 4682?bp in the medaka genome.AgIn (Aggregate on Intervals) is available at: https://github.com/hacone/AgIn CONTACT: ysuzuki@cb.k.u-tokyo.ac.jp, moris@cb.k.u-tokyo.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author(s) 2016. Published by Oxford University Press.


July 19, 2019

Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11?kb), single molecule, real-time sequencing.

The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [~80.6% (A?+?T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12?kb, with 50% of the reads between 15.5 and 50?kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [~90-99% (A?+?T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 19, 2019

Separate F-type plasmids have shaped the evolution of the H30 subclone of Escherichia coli sequence type 131.

The extraintestinal pathogenic Escherichia coli (ExPEC) H30 subclone of sequence type 131 (ST131-H30) has emerged abruptly as a dominant lineage of ExPEC responsible for human disease. The ST131-H30 lineage has been well described phylogenetically, yet its plasmid complement is not fully understood. Here, single-molecule, real-time sequencing was used to generate the complete plasmid sequences of ST131-H30 isolates and those belonging to other ST131 clades. Comparative analyses revealed separate F-type plasmids that have shaped the evolution of the main fluoroquinolone-resistant ST131-H30 clades. Specifically, an F1:A2:B20 plasmid is strongly associated with the H30R/C1 clade, whereas an F2:A1:B- plasmid is associated with the H30Rx/C2 clade. A series of plasmid gene losses, gains, and rearrangements involving IS26 likely led to the current plasmid complements within each ST131-H30 sublineage, which contain several overlapping gene clusters with putative functions in virulence and fitness, suggesting plasmid-mediated convergent evolution. Evidence suggests that the H30Rx/C2-associated F2:A1:B- plasmid type was present in strains ancestral to the acquisition of fluoroquinolone resistance and prior to the introduction of a multidrug resistance-encoding gene cassette harboring bla CTX-M-15. In vitro experiments indicated a host strain-independent low frequency of plasmid transfer, differential levels of plasmid stability even between closely related ST131-H30 strains, and possible epistasis for carriage of these plasmids within the H30R/Rx lineages. IMPORTANCE A clonal lineage of Escherichia coli known as ST131 has emerged as a dominating strain type causing extraintestinal infections in humans. The evolutionary history of ST131 E. coli is now well understood. However, the role of plasmids in ST131’s evolutionary history is poorly defined. This study utilized real-time, single-molecule sequencing to compare plasmids from various current and historical lineages of ST131. From this work, it was determined that a series of plasmid gains, losses, and recombinational events has led to the currently circulating plasmids of ST131 strains. These plasmids appear to have evolved to acquire similar gene clusters on multiple occasions, suggesting possible plasmid-mediated convergent evolution leading to evolutionary success. These plasmids also appear to be better suited to exist in specific strains of ST131 due to coadaptive mutations. Overall, a series of events has enabled the evolution of ST131 plasmids, possibly contributing to the lineage’s success.


July 19, 2019

High-quality assembly of an individual of Yoruban descent

De novo assembly of human genomes is now a tractable effort due in part to advances in sequencing and mapping technologies. We use PacBio single-molecule, real-time (SMRT) sequencing and BioNano genomic maps to construct the first de novo assembly of NA19240, a Yoruban individual from Africa. This chromosome-scaffolded assembly of 3.08 Gb with a contig N50 of 7.25 Mb and a scaffold N50 of 78.6 Mb represents one of the most contiguous high-quality human genomes. We utilize a BAC library derived from NA19240 DNA and novel haplotype-resolving sequencing technologies and algorithms to characterize regions of complex genomic architecture that are normally lost due to compression to a linear haploid assembly. Our results demonstrate that multiple technologies are still necessary for complete genomic representation, particularly in regions of highly identical segmental duplications. Additionally, we show that diploid assembly has utility in improving the quality of de novo human genome assemblies.


July 19, 2019

Living apart together: crosstalk between the core and supernumerary genomes in a fungal plant pathogen.

Eukaryotes display remarkable genome plasticity, which can include supernumerary chromosomes that differ markedly from the core chromosomes. Despite the widespread occurrence of supernumerary chromosomes in fungi, their origin, relation to the core genome and the reason for their divergent characteristics are still largely unknown. The complexity of genome assembly due to the presence of repetitive DNA partially accounts for this.Here we use single-molecule real-time (SMRT) sequencing to assemble the genome of a prominent fungal wheat pathogen, Fusarium poae, including at least one supernumerary chromosome. The core genome contains limited transposable elements (TEs) and no gene duplications, while the supernumerary genome holds up to 25 % TEs and multiple gene duplications. The core genome shows all hallmarks of repeat-induced point mutation (RIP), a defense mechanism against TEs, specific for fungi. The absence of RIP on the supernumerary genome accounts for the differences between the two (sub)genomes, and results in a functional crosstalk between them. The supernumerary genome is a reservoir for TEs that migrate to the core genome, and even large blocks of supernumerary sequence (>200 kb) have recently translocated to the core. Vice versa, the supernumerary genome acts as a refuge for genes that are duplicated from the core genome.For the first time, a mechanism was determined that explains the differences that exist between the core and supernumerary genome in fungi. Different biology rather than origin was shown to be responsible. A “living apart together” crosstalk exists between the core and supernumerary genome, accelerating chromosomal and organismal evolution.


July 19, 2019

A distinct class of chromoanagenesis events characterized by focal copy number gains.

Chromoanagenesis is the process by which a single catastrophic event creates complex rearrangements confined to a single or a few chromosomes. It is usually characterized by the presence of multiple deletions and/or duplications, as well as by copy neutral rearrangements. In contrast, an array CGH screen of patients with developmental anomalies revealed three patients in which a single chromosome carries from 8 to 11 large copy number gains confined to a single chromosome or chromosomal arm, but the absence of deletions. Subsequent fluorescence in situ hybiridization and massive parallel sequencing revealed the duplicons to be clustered together in distinct locations across the altered chromosomes. Breakpoint junction sequences showed both microhomology and non-templated insertions of up to 40 bp. Hence, these patients each demonstrate a single altered chromosome of clustered insertional duplications, no deletions, and breakpoint junction sequences showing microhomology and/or non-templated insertions. These observations are difficult to reconcile with current mechanistic descriptions of chromothripsis and chromoanasynthesis. Therefore, we hypothesize those rearrangements to be of a mechanistically different origin. In addition, we suggest that large untemplated insertional sequences observed at breakpoints are driven by a non-canonical non-homologous end joining mechanism.© 2016 WILEY PERIODICALS, INC.


July 19, 2019

Condition-dependent co-regulation of genomic clusters of virulence factors in the grapevine trunk pathogen Neofusicoccum parvum.

The ascomycete Neofusicoccum parvum, one of the causal agents of Botryosphaeria dieback, is a destructive wood-infecting fungus and a serious threat to grape production worldwide. The capability to colonize woody tissue, combined with the secretion of phytotoxic compounds, is thought to underlie its pathogenicity and virulence. Here, we describe the repertoire of virulence factors and their transcriptional dynamics as the fungus feeds on different substrates and colonizes the woody stem. We assembled and annotated a highly contiguous genome using single-molecule real-time DNA sequencing. Transcriptome profiling by RNA sequencing determined the genome-wide patterns of expression of virulence factors both in vitro (potato dextrose agar or medium amended with grape wood as substrate) and in planta. Pairwise statistical testing of differential expression, followed by co-expression network analysis, revealed that physically clustered genes coding for putative virulence functions were induced depending on the substrate or stage of plant infection. Co-expressed gene clusters were significantly enriched not only in genes associated with secondary metabolism, but also in those associated with cell wall degradation, suggesting that dynamic co-regulation of transcriptional networks contributes to multiple aspects of N. parvum virulence. In most of the co-expressed clusters, all genes shared at least a common motif in their promoter region, indicative of co-regulation by the same transcription factor. Co-expression analysis also identified chromatin regulators with correlated expression with inducible clusters of virulence factors, suggesting a complex, multi-layered regulation of the virulence repertoire of N. parvum.© 2016 BSPP AND JOHN WILEY & SONS LTD.


July 19, 2019

De novo assembly and phasing of a Korean human genome.

Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9?Mb and a scaffold N50 size of 44.8?Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03?Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6?Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.


July 19, 2019

Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.

De novo sequencing of complex genomes is one of the main challenges for researchers seeking high-quality reference sequences. Many de novo assemblies are based on short reads, producing fragmented genome sequences. Third-generation sequencing, with read lengths >10 kb, will improve the assembly of complex genomes, but these techniques require high-molecular-weight genomic DNA (gDNA), and gDNA extraction protocols used for obtaining smaller fragments for short-read sequencing are not suitable for this purpose. Methods of preparing gDNA for bacterial artificial chromosome (BAC) libraries could be adapted, but these approaches are time-consuming, and commercial kits for these methods are expensive. Here, we present a protocol for rapid, inexpensive extraction of high-molecular-weight gDNA from bacteria, plants, and animals. Our technique was validated using sunflower leaf samples, producing a mean read length of 12.6 kb and a maximum read length of 80 kb.


July 19, 2019

Multiple origins of the pathogenic yeast Candida orthopsilosis by separate hybridizations between two parental species.

Mating between different species produces hybrids that are usually asexual and stuck as diploids, but can also lead to the formation of new species. Here, we report the genome sequences of 27 isolates of the pathogenic yeast Candida orthopsilosis. We find that most isolates are diploid hybrids, products of mating between two unknown parental species (A and B) that are 5% divergent in sequence. Isolates vary greatly in the extent of homogenization between A and B, making their genomes a mosaic of highly heterozygous regions interspersed with homozygous regions. Separate phylogenetic analyses of SNPs in the A- and B-derived portions of the genome produces almost identical trees of the isolates with four major clades. However, the presence of two mutually exclusive genotype combinations at the mating type locus, and recombinant mitochondrial genomes diagnostic of inter-clade mating, shows that the species C. orthopsilosis does not have a single evolutionary origin but was created at least four times by separate interspecies hybridizations between parents A and B. Older hybrids have lost more heterozygosity. We also identify two isolates with homozygous genomes derived exclusively from parent A, which are pure non-hybrid strains. The parallel emergence of the same hybrid species from multiple independent hybridization events is common in plant evolution, but is much less documented in pathogenic fungi.


July 19, 2019

Full-length mitochondrial-DNA sequencing on the PacBio RSII.

Conventional mitochondrial-DNA (MT DNA) sequencing approaches use Sanger sequencing of 20-40 partially overlapping PCR fragments per individual, which is a time- and resource-consuming process. We have developed a high-throughput, accurate, fast, and cost-effective human MT DNA sequencing approach. In this setup we first generate long-range PCR products for two partially overlapping 7.7 and 9.2 kb MT DNA-specific amplicons, add sample-specific barcodes, and sequence these on the PacBio RSII system to obtain full-length MT DNA sequences for genotyping/haplotyping purposes.


July 19, 2019

Single-molecule sequencing revealing the presence of distinct JC polyomavirus populations in patients with progressive multifocal leukoencephalopathy.

Progressive multifocal leukoencephalopathy (PML) is a fatal disease caused by reactivation of JC polyomavirus (JCPyV) in immunosuppressed individuals and lytic infection by neurotropic JCPyV in glial cells. The exact content of neurotropic mutations within individual JCPyV strains has not been studied to our knowledge.We exploited the capacity of single-molecule real-time sequencing technology to determine the sequence of complete JCPyV genomes in single reads. The method was used to precisely characterize individual neurotropic JCPyV strains of 3 patients with PML without the bias caused by assembly of short sequence reads.In the cerebrospinal fluid sample of a 73-year-old woman with rapid PML onset, 3 distinct JCPyV populations could be identified. All viral populations were characterized by rearrangements within the noncoding regulatory region (NCCR) and 1 point mutation, S267L in the VP1 gene, suggestive of neurotropic strains. One patient with PML had a single neurotropic strain with rearranged NCCR, and 1 patient had a single strain with small NCCR alterations.We report here, for the first time, full characterization of individual neurotropic JCPyV strains in the cerebrospinal fluid of patients with PML. It remains to be established whether PML pathogenesis is driven by one or several neurotropic strains in an individual.


July 19, 2019

Targeted capture and sequencing of gene-sized DNA molecules.

Targeted capture provides an efficient and sensitive means for sequencing specific genomic regions in a high-throughput manner. To date, this method has mostly been used to capture exons from the genome (the exome) using short insert libraries and short-read sequencing technology, enabling the identification of genetic variants or new members of large gene families. Sequencing larger molecules results in the capture of whole genes, including intronic and intergenic sequences that are typically more polymorphic and allow the resolution of the gene structure of homologous genes, which are often clustered together on the chromosome. Here, we describe an improved method for the capture and single-molecule sequencing of DNA molecules as large as 7 kb by means of size selection and optimized PCR conditions. Our approach can be used to capture, sequence, and distinguish between similar members of the NB-LRR gene family-key genes in plant immune systems.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.