Background: Understanding the co-evolution of HIV populations and broadly neutralizing antibodies (bNAbs) may inform vaccine design. Novel long-read, next-generation sequencing methods allow, for the first time, full-length deep sequencing of HIV env populations. Methods: We longitudinally examined HIV-1 env populations (12 time points) in a subtype A infected individual from the IAVI primary infection cohort (Protocol C) who developed bNAbs (62% ID50>50 on a diverse panel of 105 viruses) targeting the V1/V2 loop region. We developed a PacBio single molecule, real-time sequencing protocol to deeply sequence full-length env from HIV RNA. Bioinformatics tools were developed to align env sequences, infer phylogenies, and interrogate escape dynamics of key residues and glycosylation sites. PacBio env sequences were compared to env sequences generated through amplification and cloning. Env dynamics and viral escape motif evolution were interpreted in the context of the development V1/V2-targeting broadly neutralizing antibodies. Results: We collected a median of 6799 (range: 1770-14727) high quality full-length HIV env circular consensus sequences (CCS) per SMRT Cell, per time point. Using only CCS reads comprised of 6 or more passes over the HIV env insert (= 16 kb read length) ensured that our median per-base accuracy was 99.7%. A phylogeny inferred with PacBio and 100 cloned env sequences (10 time points) found the cloned sequences evenly distributed among PacBio sequences. Viral escape from the V1/V2 targeted bNAbs was evident at V2 positions 160, 166, 167, 169 and 181 (HxB2 numbering), exhibiting several distinct escape pathways by 40 months post-infection. Conclusions: Our PacBio full-length env sequencing method allowed unprecedented view and ability to characterize HIV-1 env dynamics throughout the first four years of infection. Longitudinal full-length env deep sequencing allows accurate phylogenetic inference, provides a detailed picture of escape dynamics in epitope regions, and can identify minority variants, all of which will prove critical for increasing our understanding of how env evolution drives the development of antibody breadth.
Background: Understanding the co-evolution of HIV populations and broadly neutralizing antibody (bNAb) lineages may inform vaccine design. Novel long-read, next-generation sequencing methods allow, for the first time, full-length deep sequencing of HIV env populations. Methods: We longitudinally examined env populations (12 time points) in a subtype A infected individual from the IAVI primary infection cohort (Protocol C) who developed bNAbs (62% ID50>50 on a diverse panel of 105 viruses) targeting the V1/V2 region. We developed a Pacific Biosciences single molecule, real-time sequencing protocol to deeply sequence full-length env from HIV RNA. Bioinformatics tools were developed to align env sequences, infer phylogenies, and interrogate escape dynamics of key residues and glycosylation sites. PacBio env sequences were compared to env sequences generated through amplification and cloning. Env dynamics were interpreted in the context of the development of a V1/V2-targeting bNAb lineage isolated from the donor. Results: We collected a median of 6799 high quality full-length env sequences per timepoint (median per-base accuracy of 99.7%). A phylogeny inferred with PacBio and 100 cloned env sequences (10 time points) found cloned env sequences evenly distributed among PacBio sequences. Phylogenetic analyses also revealed a potential transient intra-clade superinfection visible as a minority variant (~5%) at 9 months post-infection (MPI), and peaking in prevalence at 12MPI (~64%), just preceding the development of heterologous neutralization. Viral escape from the bNAb lineage was evident at V2 positions 160, 166, 167, 169 and 181 (HxB2 numbering), exhibiting several distinct escape pathways by 40MPI. Conclusions: Our PacBio full-length env sequencing method allowed unprecedented characterization of env dynamics and revealed an intra-clade superinfection that was not detected through conventional methods. The importance of superinfection in the development of this donor’s V1/V2-directed bNAb lineage is under investigation. Longitudinal full-length env deep sequencing allows accurate phylogenetic inference, provides a detailed picture of escape dynamics in epitope regions, and can identify minority variants, all of which may prove useful for understanding how env evolution can drive the development of antibody breadth.
In this webinar, Dr. Ashby gives attendees a brief update on PacBio’s metagenomics solutions on the Sequel II System. Then, Dr. Ma, University of Maryland School of Medicine, discusses her…
Corals comprise a biomineralizing cnidarian, dinoflagellate algal symbionts, and associated microbiome of prokaryotes and viruses. Ongoing efforts to conserve coral reefs by identifying the major stress response pathways and thereby laying the foundation to select resistant genotypes rely on a robust genomic foundation. Here we generated and analyzed a high quality long-read based ~886 Mbp nuclear genome assembly and transcriptome data from the dominant rice coral, Montipora capitata from Hawai’i. Our work provides insights into the architecture of coral genomes and shows how they differ in size and gene inventory, putatively due to population size variation. We describe a recent example of foreign gene acquisition via a bacterial gene transfer agent and illustrate the major pathways of stress response that can be used to predict regulatory components of the transcriptional networks in M. capitata. These genomic resources provide insights into the adaptive potential of these sessile, long-lived species in both natural and human influenced environments and facilitate functional and population genomic studies aimed at Hawaiian reef restoration and conservation.
Alternative splicing profile and sex-preferential gene expression in the female and male Pacific abalone Haliotis discus hannai.
In order to characterize the female or male transcriptome of the Pacific abalone and further increase genomic resources, we sequenced the mRNA of full-length complementary DNA (cDNA) libraries derived from pooled tissues of female and male Haliotis discus hannai by employing the Iso-Seq protocol of the PacBio RSII platform. We successfully assembled whole full-length cDNA sequences and constructed a transcriptome database that included isoform information. After clustering, a total of 15,110 and 12,145 genes that coded for proteins were identified in female and male abalones, respectively. A total of 13,057 putative orthologs were retained from each transcriptome in abalones. Overall Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyzed in each database showed a similar composition between sexes. In addition, a total of 519 and 391 isoforms were genome-widely identified with at least two isoforms from female and male transcriptome databases. We found that the number of isoforms and their alternatively spliced patterns are variable and sex-dependent. This information represents the first significant contribution to sex-preferential genomic resources of the Pacific abalone. The availability of whole female and male transcriptome database and their isoform information will be useful to improve our understanding of molecular responses and also for the analysis of population dynamics in the Pacific abalone.
Genome-wide transcriptome profiling of the medicinal plant Zanthoxylum planispinum using a single-molecule direct RNA sequencing approach.
High-throughput RNA sequencing has revolutionized transcriptome-based studies of candidate genes, key pathways and gene regulation in non-model organisms. We analyzed full-length cDNA sequences in Zanthoxylum planispinum (Z. planispinum), a medicinal herb in major parts of East Asia. The full-length mRNA derived from tissues of leaf, early fruit and maturing fruit stage were sequenced using PacBio RSII platform to identify isoform transcriptome. We obtained 51,402 unigenes, with average 1781?bp per gene in 82.473?Mb gene lengths. Among 51,402, 3963 unigenes showed variety of isoform. By selection of one representative gene among each of the various isoforms, we finalized 46,306 unique gene set for this herb. We identified 76 cytochrome P450 (CYP450) and related isoforms that are of the wide diversity in the molecular function and biological process. These transcriptome data of Z. planispinum will provide a good resource to study metabolic engineering for the production of valuable medicinal drugs and phytochemicals. Copyright © 2018. Published by Elsevier Inc.
Molecular genetic diversity and characterization of conjugation genes in the fish parasite Ichthyophthirius multifiliis.
Ichthyophthirius multifiliis is the etiologic agent of “white spot”, a commercially important disease of freshwater fish. As a parasitic ciliate, I. multifiliis infects numerous host species across a broad geographic range. Although Ichthyophthirius outbreaks are difficult to control, recent sequencing of the I. multifiliis genome has revealed a number of potential metabolic pathways for therapeutic intervention, along with likely vaccine targets for disease prevention. Nonetheless, major gaps exist in our understanding of both the life cycle and population structure of I. multifiliis in the wild. For example, conjugation has never been described in this species, and it is unclear whether I. multifiliis undergoes sexual reproduction, despite the presence of a germline micronucleus. In addition, no good methods exist to distinguish strains, leaving phylogenetic relationships between geographic isolates completely unresolved. Here, we compared nucleotide sequences of SSUrDNA, mitochondrial NADH dehydrogenase subunit I and cox-1 genes, and 14 somatic SNP sites from nine I. multifiliis isolates obtained from four different states in the US since 1995. The mitochondrial sequences effectively distinguished the isolates from one another and divided them into at least two genetically distinct groups. Furthermore, none of the nine isolates shared the same composition of the 14 somatic SNP sites, suggesting that I. multifiliis undergoes sexual reproduction at some point in its life cycle. Finally, compared to the well-studied free-living ciliates Tetrahymena thermophila and Paramecium tetraurelia, I. multifiliis has lost 38% and 29%, respectively, of 16 experimentally confirmed conjugation-related genes, indicating that mechanistic differences in sexual reproduction are likely to exist between I. multifiliis and other ciliate species. Copyright © 2015 Elsevier Inc. All rights reserved.
The diversity and complexity of the human brain are widely assumed to be encoded within a constant genome. Somatic gene recombination, which changes germline DNA sequences to increase molecular diversity, could theoretically alter this code but has not been documented in the brain, to our knowledge. Here we describe recombination of the Alzheimer’s disease-related gene APP, which encodes amyloid precursor protein, in human neurons, occurring mosaically as thousands of variant ‘genomic cDNAs’ (gencDNAs). gencDNAs lacked introns and ranged from full-length cDNA copies of expressed, brain-specific RNA splice variants to myriad smaller forms that contained intra-exonic junctions, insertions, deletions, and/or single nucleotide variations. DNA in situ hybridization identified gencDNAs within single neurons that were distinct from wild-type loci and absent from non-neuronal cells. Mechanistic studies supported neuronal ‘retro-insertion’ of RNA to produce gencDNAs; this process involved transcription, DNA breaks, reverse transcriptase activity, and age. Neurons from individuals with sporadic Alzheimer’s disease showed increased gencDNA diversity, including eleven mutations known to be associated with familial Alzheimer’s disease that were absent from healthy neurons. Neuronal gene recombination may allow ‘recording’ of neural activity for selective ‘playback’ of preferred gene variants whose expression bypasses splicing; this has implications for cellular diversity, learning and memory, plasticity, and diseases of the human brain.
MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs.
There are numerous computational tools for taxonomic or functional analysis of microbiome samples, optimized to run on hundreds of millions of short, high quality sequencing reads. Programs such as MEGAN allow the user to interactively navigate these large datasets. Long read sequencing technologies continue to improve and produce increasing numbers of longer reads (of varying lengths in the range of 10k-1M bps, say), but of low quality. There is an increasing interest in using long reads in microbiome sequencing, and there is a need to adapt short read tools to long read datasets.We describe a new LCA-based algorithm for taxonomic binning, and an interval-tree based algorithm for functional binning, that are explicitly designed for long reads and assembled contigs. We provide a new interactive tool for investigating the alignment of long reads against reference sequences. For taxonomic and functional binning, we propose to use LAST to compare long reads against the NCBI-nr protein reference database so as to obtain frame-shift aware alignments, and then to process the results using our new methods.All presented methods are implemented in the open source edition of MEGAN, and we refer to this new extension as MEGAN-LR (MEGAN long read). We evaluate the LAST+MEGAN-LR approach in a simulation study, and on a number of mock community datasets consisting of Nanopore reads, PacBio reads and assembled PacBio reads. We also illustrate the practical application on a Nanopore dataset that we sequenced from an anammox bio-rector community.This article was reviewed by Nicola Segata together with Moreno Zolfo, Pete James Lockhart and Serghei Mangul.This work extends the applicability of the widely-used metagenomic analysis software MEGAN to long reads. Our study suggests that the presented LAST+MEGAN-LR pipeline is sufficiently fast and accurate.
Convergent adaptation provides unique insights into the predictability of evolution and ultimately into processes of biological diversification. Supergenes (beneficial gene linkage) are striking examples of adaptation, but little is known about their prevalence or evolution. A recent study on anther-smut fungi documented supergene formation by rearrangements linking two key mating-type loci, controlling pre- and post-mating compatibility. Here further high-quality genome assemblies reveal four additional independent cases of chromosomal rearrangements leading to regions of suppressed recombination linking these mating-type loci in closely related species. Such convergent transitions in genomic architecture of mating-type determination indicate strong selection favoring linkage of mating-type loci into cosegregating supergenes. We find independent evolutionary strata (stepwise recombination suppression) in several species, with extensive rearrangements, gene losses, and transposable element accumulation. We thus show remarkable convergence in mating-type chromosome evolution, recurrent supergene formation, and repeated evolution of similar phenotypes through different genomic changes.
Integration of genomic data with NMR analysis enables assignment of the full stereostructure of neaumycin B, a potent inhibitor of glioblastoma from a marine-derived Micromonospora.
The microbial metabolites known as the macrolides are some of the most successful natural products used to treat infectious and immune diseases. Describing the structures of these complex metabolites, however, is often extremely difficult due to the presence of multiple stereogenic centers inherent in this class of polyketide-derived metabolites. With the availability of genome sequence data and a better understanding of the molecular genetics of natural product biosynthesis, it is now possible to use bioinformatic approaches in tandem with spectroscopic tools to assign the full stereostructures of these complex metabolites. In our quest to discover and develop new agents for the treatment of cancer, we observed the production of a highly cytotoxic macrolide, neaumycin B, by a marine-derived actinomycete bacterium of the genus Micromonospora. Neaumycin B is a complex polycyclic macrolide possessing 19 asymmetric centers, usually requiring selective degradation, crystallization, derivatization, X-ray diffraction analysis, synthesis, or other time-consuming approaches to assign the complete stereostructure. As an alternative approach, we sequenced the genome of the producing strain and identified the neaumycin gene cluster ( neu). By integrating the known stereospecificities of biosynthetic enzymes with comprehensive NMR analysis, the full stereostructure of neaumycin B was confidently assigned. This approach exemplifies how mining gene cluster information while integrating NMR-based structure data can achieve rapid, efficient, and accurate stereostructural assignments for complex macrolides.
Comparison of highly and weakly virulent Dickeya solani strains, with a view on the pangenome and panregulon of this species.
Bacteria belonging to the genera Dickeya and Pectobacterium are responsible for significant economic losses in a wide variety of crops and ornamentals. During last years, increasing losses in potato production have been attributed to the appearance of Dickeya solani. The D. solani strains investigated so far share genetic homogeneity, although different virulence levels were observed among strains of various origins. The purpose of this study was to investigate the genetic traits possibly related to the diverse virulence levels by means of comparative genomics. First, we developed a new genome assembly pipeline which allowed us to complete the D. solani genomes. Four de novo sequenced and ten publicly available genomes were used to identify the structure of the D. solani pangenome, in which 74.8 and 25.2% of genes were grouped into the core and dispensable genome, respectively. For D. solani panregulon analysis, we performed a binding site prediction for four transcription factors, namely CRP, KdgR, PecS and Fur, to detect the regulons of these virulence regulators. Most of the D. solani potential virulence factors were predicted to belong to the accessory regulons of CRP, KdgR, and PecS. Thus, some differences in gene expression could exist between D. solani strains. The comparison between a highly and a low virulent strain, IFB0099 and IFB0223, respectively, disclosed only small differences between their genomes but significant differences in the production of virulence factors like pectinases, cellulases and proteases, and in their mobility. The D. solani strains also diverge in the number and size of prophages present in their genomes. Another relevant difference is the disruption of the adhesin gene fhaB2 in the highly virulent strain. Strain IFB0223, which has a complete adhesin gene, is less mobile and less aggressive than IFB0099. This suggests that in this case, mobility rather than adherence is needed in order to trigger disease symptoms. This study highlights the utility of comparative genomics in predicting D. solani traits involved in the aggressiveness of this emerging plant pathogen.
Changes in the genetic requirements for microbial interactions with increasing community complexity.
Microbial community structure and function rely on complex interactions whose underlying molecular mechanisms are poorly understood. To investigate these interactions in a simple microbiome, we introduced E. coli into an experimental community based on a cheese rind and identified the differences in E. coli’s genetic requirements for growth in interactive and non-interactive contexts using Random Barcode Transposon Sequencing (RB-TnSeq) and RNASeq. Genetic requirements varied among pairwise growth conditions and between pairwise and community conditions. Our analysis points to mechanisms by which growth conditions change as a result of increasing community complexity and suggests that growth within a community relies on a combination of pairwise and higher-order interactions. Our work provides a framework for using the model organism E. coli as a readout to investigate microbial interactions regardless of the genetic tractability of members of the studied ecosystem.© 2018, Morin et al.