Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.
The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq.We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features.Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.
Nearly finished genomes produced using gel microdroplet culturing reveal substantial intraspecies genomic diversity within the human microbiome.
The majority of microbial genomic diversity remains unexplored. This is largely due to our inability to culture most microorganisms in isolation, which is a prerequisite for traditional genome sequencing. Single-cell sequencing has allowed researchers to circumvent this limitation. DNA is amplified directly from a single cell using the whole-genome amplification technique of multiple displacement amplification (MDA). However, MDA from a single chromosome copy suffers from amplification bias and a large loss of specificity from even very small amounts of DNA contamination, which makes assembling a genome difficult and completely finishing a genome impossible except in extraordinary circumstances. Gel microdrop cultivation allows culturing of a diverse microbial community and provides hundreds to thousands of genetically identical cells as input for an MDA reaction. We demonstrate the utility of this approach by comparing sequencing results of gel microdroplets and single cells following MDA. Bias is reduced in the MDA reaction and genome sequencing, and assembly is greatly improved when using gel microdroplets. We acquired multiple near-complete genomes for two bacterial species from human oral and stool microbiome samples. A significant amount of genome diversity, including single nucleotide polymorphisms and genome recombination, is discovered. Gel microdroplets offer a powerful and high-throughput technology for assembling whole genomes from complex samples and for probing the pan-genome of naturally occurring populations.
Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation.
The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome.The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads.The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease.
With approximately one-third of their genomes consisting of linear and circular plasmids, the Lyme disease agent cluster of species has the most complex genomes among known bacteria. We report here a comparative analysis of plasmids in eleven Borreliella (also known as Borrelia burgdorferi sensu lato) species.We sequenced the complete genomes of two B. afzelii, two B. garinii, and individual B. spielmanii, B. bissettiae, B. valaisiana and B. finlandensis isolates. These individual isolates carry between seven and sixteen plasmids, and together harbor 99 plasmids. We report here a comparative analysis of these plasmids, along with 70 additional Borreliella plasmids available in the public sequence databases. We identify only one new putative plasmid compatibility type (the 30th) among these 169 plasmid sequences, suggesting that all or nearly all such types have now been discovered. We find that the linear plasmids in the non-B. burgdorferi species have undergone the same kinds of apparently random, chaotic rearrangements mediated by non-homologous recombination that we previously discovered in B. burgdorferi. These rearrangements occurred independently in the different species lineages, and they, along with an expanded chromosomal phylogeny reported here, allow the identification of several whole plasmid transfer events among these species. Phylogenetic analyses of the plasmid partition genes show that a majority of the plasmid compatibility types arose early, most likely before separation of the Lyme agent Borreliella and relapsing fever Borrelia clades, and this, with occasional cross species plasmid transfers, has resulted in few if any species-specific or geographic region-specific Borreliella plasmid types.The primordial origin and persistent maintenance of the Borreliella plasmid types support their functional indispensability as well as evolutionary roles in facilitating genome diversity. The improved resolution of Borreliella plasmid phylogeny based on conserved partition-gene clusters will lead to better determination of gene orthology which is essential for prediction of biological function, and it will provide a basis for inferring detailed evolutionary mechanisms of Borreliella genomic variability including homologous gene and plasmid exchanges as well as non-homologous rearrangements.
Excision-reintegration at a pneumococcal phase-variable restriction-modification locus drives within- and between-strain epigenetic differentiation and inhibits gene acquisition.
Phase-variation of Type I restriction-modification systems can rapidly alter the sequence motifs they target, diversifying both the epigenetic patterns and endonuclease activity within clonally descended populations. Here, we characterize the Streptococcus pneumoniae SpnIV phase-variable Type I RMS, encoded by the translocating variable restriction (tvr) locus, to identify its target motifs, mechanism and regulation of phase variation, and effects on exchange of sequence through transformation. The specificity-determining hsdS genes were shuffled through a recombinase-mediated excision-reintegration mechanism involving circular intermediate molecules, guided by two types of direct repeat. The rate of rearrangements was limited by an attenuator and toxin-antitoxin system homologs that inhibited recombinase gene transcription. Target motifs for both the SpnIV, and multiple Type II, MTases were identified through methylation-sensitive sequencing of a panel of recombinase-null mutants. This demonstrated the species-wide diversity observed at the tvr locus can likely specify nine different methylation patterns. This will reduce sequence exchange in this diverse species, as the native form of the SpnIV RMS was demonstrated to inhibit the acquisition of genomic islands by transformation. Hence the tvr locus can drive variation in genome methylation both within and between strains, and limits the genomic plasticity of S. pneumoniae.
Chromulinavorax destructans, a pathogenic TM6 bacterium with an unusual replication strategy targeting protist mitochondrion
Most of the diversity of microbial life is not available in culture, and as such we lack even a fundamental understanding of the biological diversity of several branches on the tree of life. One branch that is highly underrepresented is the candidate phylum TM6, also known as the Dependentiae. Their biology is known only from reduced genomes recovered from metagenomes around the world and two isolates infecting amoebae, all suggest that they live highly host-associated lifestyles as parasites or symbionts. Chromulinavorax destructans is an isolate from the TM6/Dependentiae that infects and lyses the abundant heterotrophic flagellate, Spumella elongata. Chromulinavorax destructans is characterized by a high degree of reduction and specialization for infection, so much so it was discovered in a screen for giant viruses. Its 1.2 Mb genome shows no metabolic potential and C. destructans instead relies on extensive transporter system to import nutrients, and even energy in the form of ATP from the host. Accordingly, it replicates in a viral-like fashion, while extensively reorganizing and expanding the host mitochondrion. 44% of proteins contain signal sequences for secretion, which includes many proteins of unknown function as well as 98 copies of ankyrin-repeat domain proteins, known effectors of host modulation, suggesting the presence of an extensive host-manipulation apparatus.