Menu
July 19, 2019

Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships.

Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33-35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we demonstrate that long-read, single molecule real-time (SMRT) sequencing solves this problem. Taking an ensemble approach to first generate local, tal gene contigs, we correctly assembled de novo the genomes of two strains of the rice pathogen X. oryzae completed previously using the Sanger method and even identified errors in those references. Sequencing two more strains revealed a dynamic genome structure and a striking plasticity in tal gene content. Our results pave the way for population-level studies to inform resistance breeding, improve biotechnology and probe TAL effector evolution.


July 19, 2019

Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16?kilobases) reads with random errors, we assembled 99% (244?megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4?megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.


July 19, 2019

Long-read Single-Molecule Real-Time (SMRT) full gene sequencing of cytochrome P450-2D6 (CYP2D6).

The CYP2D6 enzyme metabolizes ~25% of common medications, yet homologous pseudogenes and copy-number variants (CNVs) make interrogating the polymorphic CYP2D6 gene with short-read sequencing challenging. Therefore, we developed a novel long-read, full gene CYP2D6 single-molecule real-time (SMRT) sequencing method using the Pacific Biosciences platform. Long-range PCR and CYP2D6 SMRT sequencing of 10 previously genotyped controls identified expected star (*) alleles, but also enabled suballele resolution, diplotype refinement, and discovery of novel alleles. Coupled with an optimized variant calling pipeline, CYP2D6 SMRT sequencing was highly reproducible as triplicate intra- and inter-run non-reference genotype results were completely concordant. Importantly, targeted SMRT sequencing of upstream and downstream CYP2D6 gene copies characterized the duplicated allele in 15 control samples with CYP2D6 CNVs. The utility of CYP2D6 SMRT sequencing was further underscored by identifying the diplotypes of 14 samples with discordant or unclear CYP2D6 configurations from previous targeted genotyping, which again included suballele resolution, duplicated allele characterization, and discovery of a novel allele and tandem arrangement (CYP2D6*36+*41). Taken together, long-read CYP2D6 SMRT sequencing is an innovative, reproducible, and validated method for full-gene characterization, duplication allele-specific analysis and novel allele discovery, which will likely improve CYP2D6 metabolizer phenotype prediction for both research and clinical testing applications. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.


July 19, 2019

The power of Single Molecule Real-Time sequencing technology in the de novo assembly of a eukaryotic genome.

Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.


July 19, 2019

DNA methylation assessed by SMRT Sequencing is linked to mutations in Neisseria meningitidis isolates.

The Gram-negative bacterium Neisseria meningitidis features extensive genetic variability. To present, proposed virulence genotypes are also detected in isolates from asymptomatic carriers, indicating more complex mechanisms underlying variable colonization modes of N. meningitidis. We applied the Single Molecule, Real-Time (SMRT) sequencing method from Pacific Biosciences to assess the genome-wide DNA modification profiles of two genetically related N. meningitidis strains, both of serogroup A. The resulting DNA methylomes revealed clear divergences, represented by the detection of shared and of strain-specific DNA methylation target motifs. The positional distribution of these methylated target sites within the genomic sequences displayed clear biases, which suggest a functional role of DNA methylation related to the regulation of genes. DNA methylation in N. meningitidis has a likely underestimated potential for variability, as evidenced by a careful analysis of the ORF status of a panel of confirmed and predicted DNA methyltransferase genes in an extended collection of N. meningitidis strains of serogroup A. Based on high coverage short sequence reads, we find phase variability as a major contributor to the variability in DNA methylation. Taking into account the phase variable loci, the inferred functional status of DNA methyltransferase genes matched the observed methylation profiles. Towards an elucidation of presently incompletely characterized functional consequences of DNA methylation in N. meningitidis, we reveal a prominent colocalization of methylated bases with Single Nucleotide Polymorphisms (SNPs) detected within our genomic sequence collection. As a novel observation we report increased mutability also at 6mA methylated nucleotides, complementing mutational hotspots previously described at 5mC methylated nucleotides. These findings suggest a more diverse role of DNA methylation and Restriction-Modification (RM) systems in the evolution of prokaryotic genomes.


July 19, 2019

Quantifying influenza virus diversity and transmission in humans.

Influenza A virus is characterized by high genetic diversity. However, most of what is known about influenza evolution has come from consensus sequences sampled at the epidemiological scale that only represent the dominant virus lineage within each infected host. Less is known about the extent of within-host virus diversity and what proportion of this diversity is transmitted between individuals. To characterize virus variants that achieve sustainable transmission in new hosts, we examined within-host virus genetic diversity in household donor-recipient pairs from the first wave of the 2009 H1N1 pandemic when seasonal H3N2 was co-circulating. Although the same variants were found in multiple members of the community, the relative frequencies of variants fluctuated, with patterns of genetic variation more similar within than between households. We estimated the effective population size of influenza A virus across donor-recipient pairs to be approximately 100-200 contributing members, which enabled the transmission of multiple lineages, including antigenic variants.


July 19, 2019

Major improvements to the Heliconius melpomene genome assembly used to confirm 10 chromosome fusion events in 6 million years of butterfly evolution.

The Heliconius butterflies are a widely studied adaptive radiation of 46 species spread across Central and South America, several of which are known to hybridize in the wild. Here, we present a substantially improved assembly of the Heliconius melpomene genome, developed using novel methods that should be applicable to improving other genome assemblies produced using short read sequencing. First, we whole-genome-sequenced a pedigree to produce a linkage map incorporating 99% of the genome. Second, we incorporated haplotype scaffolds extensively to produce a more complete haploid version of the draft genome. Third, we incorporated ~20x coverage of Pacific Biosciences sequencing, and scaffolded the haploid genome using an assembly of this long-read sequence. These improvements result in a genome of 795 scaffolds, 275 Mb in length, with an N50 length of 2.1 Mb, an N50 number of 34, and with 99% of the genome placed, and 84% anchored on chromosomes. We use the new genome assembly to confirm that the Heliconius genome underwent 10 chromosome fusions since the split with its sister genus Eueides, over a period of about 6 million yr. Copyright © 2016 Davey et al.


July 19, 2019

The complete genome sequence of the murine pathobiont Helicobacter typhlonius.

Immuno-compromised mice infected with Helicobacter typhlonius are used to model microbially inducted inflammatory bowel disease (IBD). The specific mechanism through which H. typhlonius induces and promotes IBD is not fully understood. Access to the genome sequence is essential to examine emergent properties of this organism, such as its pathogenicity. To this end, we present the complete genome sequence of H. typhlonius MIT 97-6810, obtained through single-molecule real-time sequencing.The genome was assembled into a single circularized contig measuring 1.92 Mbp with an average GC content of 38.8%. In total 2,117 protein-encoding genes and 43 RNA genes were identified. Numerous pathogenic features were found, including a putative pathogenicity island (PAIs) containing components of type IV secretion system, virulence-associated proteins and cag PAI protein. We compared the genome of H. typhlonius to those of the murine pathobiont H. hepaticus and human pathobiont H. pylori. H. typhlonius resembles H. hepaticus most with 1,594 (75.3%) of its genes being orthologous to genes in H. hepaticus. Determination of the global methylation state revealed eight distinct recognition motifs for adenine and cytosine methylation. H. typhlonius shares four of its recognition motifs with H. pylori.The complete genome sequence of H. typhlonius MIT 97-6810 enabled us to identify many pathogenic features suggesting that H. typhlonius can act as a pathogen. Follow-up studies are necessary to evaluate the true nature of its pathogenic capabilities. We found many methylated sites and a plethora of restriction-modification systems. The genome, together with the methylome, will provide an essential resource for future studies investigating gene regulation, host interaction and pathogenicity of H. typhlonius. In turn, this work can contribute to unraveling the role of Helicobacter in enteric disease.


July 19, 2019

A role for the bacterial GATC methylome in antibiotic stress survival.

Antibiotic resistance is an increasingly serious public health threat. Understanding pathways allowing bacteria to survive antibiotic stress may unveil new therapeutic targets. We explore the role of the bacterial epigenome in antibiotic stress survival using classical genetic tools and single-molecule real-time sequencing to characterize genomic methylation kinetics. We find that Escherichia coli survival under antibiotic pressure is severely compromised without adenine methylation at GATC sites. Although the adenine methylome remains stable during drug stress, without GATC methylation, methyl-dependent mismatch repair (MMR) is deleterious and, fueled by the drug-induced error-prone polymerase Pol IV, overwhelms cells with toxic DNA breaks. In multiple E. coli strains, including pathogenic and drug-resistant clinical isolates, DNA adenine methyltransferase deficiency potentiates antibiotics from the ß-lactam and quinolone classes. This work indicates that the GATC methylome provides structural support for bacterial survival during antibiotic stress and suggests targeting bacterial DNA methylation as a viable approach to enhancing antibiotic activity.


July 19, 2019

A method for near full-length amplification and sequencing for six hepatitis C virus genotypes.

Hepatitis C virus (HCV) is a rapidly evolving RNA virus that has been classified into seven genotypes. All HCV genotypes cause chronic hepatitis, which ultimately leads to liver diseases such as cirrhosis. The genotypes are unevenly distributed across the globe, with genotypes 1 and 3 being the most prevalent. Until recently, molecular epidemiological studies of HCV evolution within the host and at the population level have been limited to the analyses of partial viral genome segments, as it has been technically challenging to amplify and sequence the full-length of the 9.6 kb HCV genome. Although recent improvements have been made in full genome sequencing methodologies, these protocols are still either limited to a specific genotype or cost-inefficient.In this study we describe a genotype-specific protocol for the amplification and sequencing of the near-full length genome of all six major HCV genotypes. We applied this protocol to 122 HCV positive clinical samples, and had a successful genome amplification rate of 90 %, when the viral load was greater than 15,000 IU/ml. The assay was shown to have a detection limit of 1-3 cDNA copies per reaction. The method was tested with both Illumina and PacBio single molecule, real-time (SMRT) sequencing technologies. Illumina sequencing resulted in deep coverage and allowed detection of rare variants as well as HCV co-infection with multiple genotypes. The application of the method with PacBio RS resulted in sequence reads greater than 9 kb that covered the near full-length HCV amplicon in a single read and enabled analysis of the near full-length quasispecies.The protocol described herein can be utilised for rapid amplification and sequencing of the near-full length HCV genome in a cost efficient manner suitable for a wide range of applications.


July 19, 2019

DNA methylation on N(6)-adenine in mammalian embryonic stem cells.

It has been widely accepted that 5-methylcytosine is the only form of DNA methylation in mammalian genomes. Here we identify N(6)-methyladenine as another form of DNA modification in mouse embryonic stem cells. Alkbh1 encodes a demethylase for N(6)-methyladenine. An increase of N(6)-methyladenine levels in Alkbh1-deficient cells leads to transcriptional silencing. N(6)-methyladenine deposition is inversely correlated with the evolutionary age of LINE-1 transposons; its deposition is strongly enriched at young (<1.5 million years old) but not old (>6 million years old) L1 elements. The deposition of N(6)-methyladenine correlates with epigenetic silencing of such LINE-1 transposons, together with their neighbouring enhancers and genes, thereby resisting the gene activation signals during embryonic stem cell differentiation. As young full-length LINE-1 transposons are strongly enriched on the X chromosome, genes located on the X chromosome are also silenced. Thus, N(6)-methyladenine developed a new role in epigenetic silencing in mammalian evolution distinct from its role in gene activation in other organisms. Our results demonstrate that N(6)-methyladenine constitutes a crucial component of the epigenetic regulation repertoire in mammalian genomes.


July 19, 2019

Long-read sequence assembly of the gorilla genome.

Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. Copyright © 2016, American Association for the Advancement of Science.


July 19, 2019

Genome structural diversity among 31 Bordetella pertussis isolates from two recent U.S. whooping cough statewide epidemics

During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B.~pertussis populations.IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B.~pertussis strains circulating during epidemics exhibit diversity visible on a genome structural level, previously undetectable by traditional sequence analysis using short-read technologies. For the first time, we combine short- and long-read sequencing platforms with restriction optical mapping for single-contig, de novo assembly of 31 isolates to investigate two geographically and temporally independent U.S. pertussis epidemics. These complete genomes reshape our understanding of B.~pertussis evolution and strengthen molecular epidemiology toward one day understanding the resurgence of pertussis.


July 19, 2019

Large deletions at the SHOX locus in the pseudoautosomal region are associated with skeletal atavism in Shetland ponies.

Skeletal atavism in Shetland ponies is a heritable disorder characterized by abnormal growth of the ulna and fibula that extend the carpal and tarsal joints, respectively. This causes abnormal skeletal structure, impaired movements, and affected foals are usually euthanized. In order to identify the causal mutation we subjected six confirmed Swedish cases and a DNA pool consisting of 21 control individuals to whole genome resequencing. We screened for polymorphisms where the cases and the control pool were fixed for opposite alleles and observed this signature for only 25 SNPs, most of which were scattered on genome assembly unassigned scaffolds. Read depth analysis at these loci revealed homozygosity or compound heterozygosity for two partially overlapping large deletions in the pseudoautosomal region (PAR) of chromosome X/Y in cases but not in the control pool. One of these deletions removes the entire coding region of the SHOX gene and both deletions remove parts of the CRLF2 gene located downstream of SHOX. The horse reference assembly of the PAR is highly fragmented, and in order to characterize this region we sequenced bacterial artificial chromosome (BAC) clones by single-molecule real-time (SMRT) sequencing technology. This considerably improved the assembly and enabled size estimations of the two deletions to 160-180 kb and 60-80 kb, respectively. Complete association between the presence of these deletions and disease status was verified in eight other affected horses. The result of the present study is consistent with previous studies in humans showing crucial importance of SHOX for normal skeletal development. Copyright © 2016 Author et al.


July 19, 2019

Single-molecule sequencing reveals complex genomic variation of hepatitis B virus during 15 years of chronic infection following liver transplantation.

Chronic hepatitis B (CHB) is prevalent worldwide. The infectious agent, hepatitis B virus (HBV) replicates via an RNA intermediate and is error-prone, leading to rapid generation of closely related but not identical viral variants, including those that can escape host immune responses and antiviral treatments. The complexity of CHB can be further enhanced by the presence of HBV variants with large deletions in the genome, generated via splicing (spHBV). Although spHBV variants are incapable of autonomous replication, their replication is rescued by wild-type HBV. SpHBV variants have been shown to enhance wild-type virus replication, and their prevalence increases with liver disease progression. Single-molecule deep sequencing was performed on whole HBV genomes extracted from longitudinal samples of a post-liver transplant CHB subject, collected over a 15-year period that included the liver explant. By employing novel bioinformatics methods, this analysis showed a complex dynamics of the viral population across a period of changing treatment regimens. The spHBV detected in the liver explant remained present post-transplantation, along with emergence of a highly diverse novel spHBV population as well as variants with multiple deletions in the preS genes. The identification of novel mutations outside the HBV reverse transcriptase gene that co-occur with known drug resistant mutations, highlight the relevance of using full genome deep sequencing and support the hypothesis that drug resistance involves interactions across the full-length HBV genome.Single-molecule sequencing allowed characterising, in unprecedented detail, the evolution of HBV populations and offered unique insights into the dynamics of defective and spHBV variants following liver transplantation and complex treatment regimes. This analysis also showed rapid adaptation of HBV populations to treatment regimens with evolving drug resistance phenotypes and evidence of purifying selection across the whole genome. Finally, the new open source bioinformatics tools are freely available, with the capacity to easily identify potential spliced variants from deep sequencing data. Copyright © 2016, American Society for Microbiology. All Rights Reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.