Menu
July 19, 2019

The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.

Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity.Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel.Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain’s genetic profile from pathogenic to environmental.


July 19, 2019

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons.

Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018.


July 19, 2019

Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi.

Although the genome of Trypanosoma cruzi, the causative agent of Chagas disease, was first made available in 2005, with additional strains reported later, the intrinsic genome complexity of this parasite (the abundance of repetitive sequences and genes organized in tandem) has traditionally hindered high-quality genome assembly and annotation. This also limits diverse types of analyses that require high degrees of precision. Long reads generated by third-generation sequencing technologies are particularly suitable to address the challenges associated with T. cruzi’s genome since they permit direct determination of the full sequence of large clusters of repetitive sequences without collapsing them. This, in turn, not only allows accurate estimation of gene copy numbers but also circumvents assembly fragmentation. Here, we present the analysis of the genome sequences of two T. cruzi clones: the hybrid TCC (TcVI) and the non-hybrid Dm28c (TcI), determined by PacBio Single Molecular Real-Time (SMRT) technology. The improved assemblies herein obtained permitted us to accurately estimate gene copy numbers, abundance and distribution of repetitive sequences (including satellites and retroelements). We found that the genome of T. cruzi is composed of a ‘core compartment’ and a ‘disruptive compartment’ which exhibit opposite GC content and gene composition. Novel tandem and dispersed repetitive sequences were identified, including some located inside coding sequences. Additionally, homologous chromosomes were separately assembled, allowing us to retrieve haplotypes as separate contigs instead of a unique mosaic sequence. Finally, manual annotation of surface multigene families, mucins and trans-sialidases allows now a better overview of these complex groups of genes.


July 19, 2019

Ultradeep single-molecule real-time sequencing of HIV envelope reveals complete compartmentalization of highly macrophage-tropic R5 proviral variants in brain and CXCR4-using variants in immune and peripheral tissues.

Despite combined antiretroviral therapy (cART), HIV+ patients still develop neurological disorders, which may be due to persistent HIV infection and selective evolution in brain tissues. Single-molecule real-time (SMRT) sequencing technology offers an improved opportunity to study the relationship among HIV isolates in the brain and lymphoid tissues because it is capable of generating thousands of long sequence reads in a single run. Here, we used SMRT sequencing to generate ~?50,000 high-quality full-length HIV envelope sequences (>?2200 bp) from seven autopsy tissues from an HIV+/cART+ subject, including three brain and four non-brain sites. Sanger sequencing was used for comparison with SMRT data and to clone functional pseudoviruses for in vitro tropism assays. Phylogenetic analysis demonstrated that brain-derived HIV was compartmentalized from HIV outside the brain and that the variants from each of the three brain tissues grouped independently. Variants from all peripheral tissues were intermixed on the tree but independent of the brain clades. Due to the large number of sequences, a clustering analysis at three similarity thresholds (99, 99.5, and 99.9%) was also performed. All brain sequences clustered exclusive of any non-brain sequences at all thresholds; however, frontal lobe sequences clustered independently of occipital and parietal lobes. Translated sequences revealed potentially functional differences between brain and non-brain sequences in the location of putative N-linked glycosylation sites (N-sites), V1 length, V3 charge, and the number of V4 N-sites. All brain sequences were predicted to use the CCR5 co-receptor, while most non-brain sequences were predicted to use CXCR4 co-receptor. Tropism results were confirmed by in vitro infection assays. The study is the first to use a SMRT sequencing approach to study HIV compartmentalization in tissues and supports other reports of limited trafficking between brain and non-brain sequences during cART. Due to the long sequence length, we could observe changes along the entire envelope gene, likely caused by differential selective pressure in the brain that may contribute to neurological disease.


July 19, 2019

RNAi is a critical determinant of centromere evolution in closely related fungi.

The centromere DNA locus on a eukaryotic chromosome facilitates faithful chromosome segregation. Despite performing such a conserved function, centromere DNA sequence as well as the organization of sequence elements is rapidly evolving in all forms of eukaryotes. The driving force that facilitates centromere evolution remains an enigma. Here, we studied the evolution of centromeres in closely related species in the fungal phylum of Basidiomycota. Using ChIP-seq analysis of conserved inner kinetochore proteins, we identified centromeres in three closely related Cryptococcus species: two of which are RNAi-proficient, while the other lost functional RNAi. We find that the centromeres in the RNAi-deficient species are significantly shorter than those of the two RNAi-proficient species. While centromeres are LTR retrotransposon-rich in all cases, the RNAi-deficient species lost all full-length retroelements from its centromeres. In addition, centromeres in RNAi-proficient species are associated with a significantly higher level of cytosine DNA modifications compared with those of RNAi-deficient species. Furthermore, when an RNAi-proficient Cryptococcus species and its RNAi-deficient mutants were passaged under similar conditions, the centromere length was found to be occasionally shortened in RNAi mutants. In silico analysis of predicted centromeres in a group of closely related Ustilago species, also belonging to the Basidiomycota, were found to have undergone a similar transition in the centromere length in an RNAi-dependent fashion. Based on the correlation found in two independent basidiomycetous species complexes, we present evidence suggesting that the loss of RNAi and cytosine DNA methylation triggered transposon attrition, which resulted in shortening of centromere length during evolution. Copyright © 2018 the Author(s). Published by PNAS.


July 19, 2019

HIV envelope glycoform heterogeneity and localized diversity govern the initiation and maturation of a V2 apex broadly neutralizing antibody lineage.

Understanding how broadly neutralizing antibodies (bnAbs) to HIV envelope (Env) develop during natural infection can help guide the rational design of an HIV vaccine. Here, we described a bnAb lineage targeting the Env V2 apex and the Ab-Env co-evolution that led to development of neutralization breadth. The lineage Abs bore an anionic heavy chain complementarity-determining region 3 (CDRH3) of 25 amino acids, among the shortest known for this class of Abs, and achieved breadth with only 10% nucleotide somatic hypermutation and no insertions or deletions. The data suggested a role for Env glycoform heterogeneity in the activation of the lineage germline B cell. Finally, we showed that localized diversity at key V2 epitope residues drove bnAb maturation toward breadth, mirroring the Env evolution pattern described for another donor who developed V2-apex targeting bnAbs. Overall, these findings suggest potential strategies for vaccine approaches based on germline-targeting and serial immunogen design. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.


July 19, 2019

Unexpected diversity in the mobilome of a Pseudomonas aeruginosa strain isolated from a dental unit waterline revealed by SMRT Sequencing.

The Gram-negative bacterium Pseudomonas aeruginosa is found in several habitats, both natural and human-made, and is particularly known for its recurrent presence as a pathogen in the lungs of patients suffering from cystic fibrosis, a genetic disease. Given its clinical importance, several major studies have investigated the genomic adaptation of P. aeruginosa in lungs and its transition as acute infections become chronic. However, our knowledge about the diversity and adaptation of the P. aeruginosa genome to non-clinical environments is still fragmentary, in part due to the lack of accurate reference genomes of strains from the numerous environments colonized by the bacterium. Here, we used PacBio long-read technology to sequence the genome of PPF-1, a strain of P. aeruginosa isolated from a dental unit waterline. Generating this closed genome was an opportunity to investigate genomic features that are difficult to accurately study in a draft genome (contigs state). It was possible to shed light on putative genomic islands, some shared with other reference genomes, new prophages, and the complete content of insertion sequences. In addition, four different group II introns were also found, including two characterized here and not listed in the specialized group II intron database.


July 19, 2019

Long read assemblies of geographically dispersed Plasmodium falciparum isolates reveal highly structured subtelomeres.

Background: Although thousands of clinical isolates of Plasmodium falciparum are being sequenced and analysed by short read technology, the data do not resolve the highly variable subtelomeric regions of the genomes that contain polymorphic gene families involved in immune evasion and pathogenesis. There is also no current standard definition of the boundaries of these variable subtelomeric regions. Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated the genomes of 15 P. falciparum isolates, ten of which are newly cultured clinical isolates. We performed comparative analysis of the entire genome with particular emphasis on the subtelomeric regions and the internal var genes clusters.   Results: The nearly complete sequence of these 15 isolates has enabled us to define a highly conserved core genome, to delineate the boundaries of the subtelomeric regions, and to compare these across isolates. We found highly structured variable regions in the genome. Some exported gene families purportedly involved in release of merozoites show copy number variation. As an example of ongoing genome evolution, we found a novel CLAG gene in six isolates.  We also found a novel gene that was relatively enriched in the South East Asian isolates compared to those from Africa. Conclusions: These 15 manually curated new reference genome sequences with their nearly complete subtelomeric regions and fully assembled genes are an important new resource for the malaria research community. We report the overall conserved structure and pattern of important gene families and the more clearly defined subtelomeric regions.


July 19, 2019

Antigenic variation in the lyme spirochete: Insights into recombinational switching with a suggested role for error-prone repair.

The Lyme disease spirochete, Borrelia burgdorferi, uses antigenic variation as a strategy to evade the host’s acquired immune response. New variants of surface-localized VlsE are generated efficiently by unidirectional recombination from 15 unexpressed vls cassettes into the vlsE locus. Using algorithms to analyze switching from vlsE sequencing data, we characterize a population of over 45,000 inferred recombination events generated during mouse infection. We present evidence for clustering of these recombination events within the population and along the vlsE gene, a role for the direct repeats flanking the variable region in vlsE, and the importance of sequence homology in determining the location of recombination, despite RecA’s dispensability. Finally, we report that non-templated sequence variation is strongly associated with recombinational switching and occurs predominantly at the 5′ end of conversion tracts. This likely results from an error-prone repair mechanism operational during recombinational switching that elevates the mutation rate > 5,000-fold in switched regions. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.


July 19, 2019

A Borrelia burgdorferi mini-vls system that undergoes antigenic switching in mice: investigation of the role of plasmid topology and the long inverted repeat.

Borrelia burgdorferi evades the host immune system by switching the surface antigen. VlsE, in a process known as antigenic variation. The DNA mechanisms and genetic elements present on the vls locus that participate in the switching process remain to be elucidated. Manipulating the vls locus has been difficult due to its instability on Escherichia coli plasmids. In this study, we generated for the first time a mini-vls system composed of a single silent vlsE variable region (silent cassette 2) through the vlsE gene by performing some cloning steps directly in a highly transformable B. burgdorferi strain. Variants of the mini system were constructed with or without the long inverted repeat (IR) located upstream of vlsE and on both circular and linear plasmids to investigate the importance of the IR and plasmid topology on recombinational switching at vlsE. Amplicon sequencing using PacBio long read technology and analysis of the data with our recently reported pipeline and VAST software showed that the system undergoes switching in mice in both linear and circular versions and that the presence of the hairpin does not seem to be crucial in the linear version, however it is required when the topology is circular.© 2018 John Wiley & Sons Ltd.


July 19, 2019

Population genomics shows no distinction between pathogenic Candida krusei and environmental Pichia kudriavzevii: One species, four names.

We investigated genomic diversity of a yeast species that is both an opportunistic pathogen and an important industrial yeast. Under the name Candida krusei, it is responsible for about 2% of yeast infections caused by Candida species in humans. Bloodstream infections with C. krusei are problematic because most isolates are fluconazole-resistant. Under the names Pichia kudriavzevii, Issatchenkia orientalis and Candida glycerinogenes, the same yeast, including genetically modified strains, is used for industrial-scale production of glycerol and succinate. It is also used to make some fermented foods. Here, we sequenced the type strains of C. krusei (CBS573T) and P. kudriavzevii (CBS5147T), as well as 30 other clinical and environmental isolates. Our results show conclusively that they are the same species, with collinear genomes 99.6% identical in DNA sequence. Phylogenetic analysis of SNPs does not segregate clinical and environmental isolates into separate clades, suggesting that C. krusei infections are frequently acquired from the environment. Reduced resistance of strains to fluconazole correlates with the presence of one gene instead of two at the ABC11-ABC1 tandem locus. Most isolates are diploid, but one-quarter are triploid. Loss of heterozygosity is common, including at the mating-type locus. Our PacBio/Illumina assembly of the 10.8 Mb CBS573T genome is resolved into 5 complete chromosomes, and was annotated using RNAseq support. Each of the 5 centromeres is a 35 kb gene desert containing a large inverted repeat. This species is a member of the genus Pichia and family Pichiaceae (the methylotrophic yeasts clade), and so is only distantly related to other pathogenic Candida species.


July 19, 2019

Deep genome annotation of the opportunistic human pathogen Streptococcus pneumoniae D39.

A precise understanding of the genomic organization into transcriptional units and their regulation is essential for our comprehension of opportunistic human pathogens and how they cause disease. Using single-molecule real-time (PacBio) sequencing we unambiguously determined the genome sequence of Streptococcus pneumoniae strain D39 and revealed several inversions previously undetected by short-read sequencing. Significantly, a chromosomal inversion results in antigenic variation of PhtD, an important surface-exposed virulence factor. We generated a new genome annotation using automated tools, followed by manual curation, reflecting the current knowledge in the field. By combining sequence-driven terminator prediction, deep paired-end transcriptome sequencing and enrichment of primary transcripts by Cappable-Seq, we mapped 1015 transcriptional start sites and 748 termination sites. We show that the pneumococcal transcriptional landscape is complex and includes many secondary, antisense and internal promoters. Using this new genomic map, we identified several new small RNAs (sRNAs), RNA switches (including sixteen previously misidentified as sRNAs), and antisense RNAs. In total, we annotated 89 new protein-encoding genes, 34 sRNAs and 165 pseudogenes, bringing the S. pneumoniae D39 repertoire to 2146 genetic elements. We report operon structures and observed that 9% of operons are leaderless. The genome data are accessible in an online resource called PneumoBrowse (https://veeninglab.com/pneumobrowse) providing one of the most complete inventories of a bacterial genome to date. PneumoBrowse will accelerate pneumococcal research and the development of new prevention and treatment strategies.


July 19, 2019

Degradation and remobilization of endogenous retroviruses by recombination during the earliest stages of a germ-line invasion.

Endogenous retroviruses (ERVs) are proviral sequences that result from colonization of the host germ line by exogenous retroviruses. The majority of ERVs represent defective retroviral copies. However, for most ERVs, endogenization occurred millions of years ago, obscuring the stages by which ERVs become defective and the changes in both virus and host important to the process. The koala retrovirus, KoRV, only recently began invading the germ line of the koala (Phascolarctos cinereus), permitting analysis of retroviral endogenization on a prospective basis. Here, we report that recombination with host genomic elements disrupts retroviruses during the earliest stages of germ-line invasion. One type of recombinant, designated recKoRV1, was formed by recombination of KoRV with an older degraded retroelement. Many genomic copies of recKoRV1 were detected across koalas. The prevalence of recKoRV1 was higher in northern than in southern Australian koalas, as is the case for KoRV, with differences in recKoRV1 prevalence, but not KoRV prevalence, between inland and coastal New South Wales. At least 15 additional different recombination events between KoRV and the older endogenous retroelement generated distinct recKoRVs with different geographic distributions. All of the identified recombinant viruses appear to have arisen independently and have highly disrupted ORFs, which suggests that recombination with existing degraded endogenous retroelements may be a means by which replication-competent ERVs that enter the germ line are degraded. Copyright © 2018 the Author(s). Published by PNAS.


July 19, 2019

Genome organization and DNA accessibility control antigenic variation in trypanosomes.

Many evolutionarily distant pathogenic organisms have evolved similar survival strategies to evade the immune responses of their hosts. These include antigenic variation, through which an infecting organism prevents clearance by periodically altering the identity of proteins that are visible to the immune system of the host1. Antigenic variation requires large reservoirs of immunologically diverse antigen genes, which are often generated through homologous recombination, as well as mechanisms to ensure the expression of one or very few antigens at any given time. Both homologous recombination and gene expression are affected by three-dimensional genome architecture and local DNA accessibility2,3. Factors that link three-dimensional genome architecture, local chromatin conformation and antigenic variation have, to our knowledge, not yet been identified in any organism. One of the major obstacles to studying the role of genome architecture in antigenic variation has been the highly repetitive nature and heterozygosity of antigen-gene arrays, which has precluded complete genome assembly in many pathogens. Here we report the de novo haplotype-specific assembly and scaffolding of the long antigen-gene arrays of the model protozoan parasite Trypanosoma brucei, using long-read sequencing technology and conserved features of chromosome folding4. Genome-wide chromosome conformation capture (Hi-C) reveals a distinct partitioning of the genome, with antigen-encoding subtelomeric regions that are folded into distinct, highly compact compartments. In addition, we performed a range of analyses-Hi-C, fluorescence in situ hybridization, assays for transposase-accessible chromatin using sequencing and single-cell RNA sequencing-that showed that deletion of the histone variants H3.V and H4.V increases antigen-gene clustering, DNA accessibility across sites of antigen expression and switching of the expressed antigen isoform, via homologous recombination. Our analyses identify histone variants as a molecular link between global genome architecture, local chromatin conformation and antigenic variation.


July 19, 2019

Global genetic diversity of var2csa in Plasmodium falciparum with implications for malaria in pregnancy and vaccine development.

Malaria infection during pregnancy, caused by the sequestering of Plasmodium falciparum parasites in the placenta, leads to high infant mortality and maternal morbidity. The parasite-placenta adherence mechanism is mediated by the VAR2CSA protein, a target for natural occurring immunity. Currently, vaccine development is based on its ID1-DBL2Xb domain however little is known about the global genetic diversity of the encoding var2csa gene, which could influence vaccine efficacy. In a comprehensive analysis of the var2csa gene in >2,000?P. falciparum field isolates across 23 countries, we found that var2csa is duplicated in high prevalence (>25%), African and Oceanian populations harbour a much higher diversity than other regions, and that insertions/deletions are abundant leading to an underestimation of the diversity of the locus. Further, ID1-DBL2Xb haplotypes associated with adverse birth outcomes are present globally, and African-specific haplotypes exist, which should be incorporated into vaccine design.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.