Menu
July 7, 2019

Fallacy of the unique genome: sequence diversity within single Helicobacter pylori strains.

Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra- and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopolysaccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains.IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish “the genome” of a bacterial strain. Variability is usually reduced (“only sequence from a single colony”), ignored (“just publish the consensus”), or placed in the “too-hard” basket (“analysis of raw read data is more robust”). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading. Copyright © 2017 Draper et al.


July 7, 2019

Complete genome sequence of Mycoplasma pneumoniae type 2 reference strain FH using single-molecule real-time sequencing technology.

Mycoplasma pneumoniae type 2 strain FH was previously sequenced with Illumina (FH-Illumina) and 454 (FH-454) technologies according to Xiao et al. (2015) and Krishnakumar et al. (2010). Comparative analyses revealed differences in genomic content between these sequences, including a 6-kb region absent from the FH-454 submission. Here, we present a complete genome sequence of FH sequenced with the Pacific Biosciences RSII platform. Copyright © 2017 Desai et al.


July 7, 2019

Assessment of insertion sequence mobilization as an adaptive response to oxidative stress in Acinetobacter baumannii using IS-Seq.

Insertion sequence (IS) elements are found throughout bacterial genomes and contribute to genome variation by interrupting genes or altering gene expression. Few of the more than thirty IS elements described in Acinetobacter baumannii have been characterized for transposition activity or expression effects. A targeted sequencing method, IS-seq, was developed to efficiently map the locations of new insertion events in A. baumannii genomes and was used to identify novel IS sites following growth in the presence of hydrogen peroxide, which causes oxidative stress. Serial subculture in the presence of sub-inhibitory concentrations of hydrogen peroxide led to rapid selection of cells carrying an ISAba1 element upstream of the catalase/peroxidase gene katG Several additional sites for the elements ISAba1, ISAba13, ISAba25, ISAba26, and ISAba125 were found at low abundance after serial subculture, indicating that each element is active and contributes to genetic variation that may be subject to selection. Following hydrogen peroxide exposure, rapid changes in gene expression were observed in genes related to iron homeostasis. The IS insertions adjacent to katG resulted in more than 20-fold overexpression of the gene and increased hydrogen peroxide tolerance.Importance Insertion sequences (IS) are contribute to genomic and phenotypic variation in many bacterial species, but little is known about how transposition rates vary among elements or how selective pressure influences this process. A new method, termed “IS-seq” for identifying new insertion locations that arise under experimental growth conditions in the genome was developed and tested with cells grown in the presence of hydrogen peroxide, which causes oxidative stress. Gene expression changes in response to hydrogen peroxide exposure are similar to those observed in other species and include genes that control free iron concentrations. New IS insertions adjacent to a gene encoding a catalase enzyme confirm that IS elements can rapidly contribute to adaptive variation in the presence of selection. Copyright © 2017 Wright et al.


July 7, 2019

Patterns of polymorphism at the self-incompatibility locus in 1,083 Arabidopsis thaliana genomes.

Although the transition to selfing in the model plant Arabidopsis thaliana involved the loss of the self-incompatibility (SI) system, it clearly did not occur due to the fixation of a single inactivating mutation at the locus determining the specificities of SI (the S-locus). At least three groups of divergent haplotypes (haplogroups), corresponding to ancient functional S-alleles, have been maintained at this locus, and extensive functional studies have shown that all three carry distinct inactivating mutations. However, the historical process of loss of SI is not well understood, in particular its relation with the last glaciation. Here, we took advantage of recently published genomic resequencing data in 1,083 Arabidopsis thaliana accessions that we combined with BAC sequencing to obtain polymorphism information for the whole S-locus region at a species-wide scale. The accessions differed by several major rearrangements including large deletions and interhaplogroup recombinations, forming a set of haplogroups that are widely distributed throughout the native range and largely overlap geographically. “Relict” A. thaliana accessions that directly derive from glacial refugia are polymorphic at the S-locus, suggesting that the three haplogroups were already present when glacial refugia from the last Ice Age became isolated. Interhaplogroup recombinant haplotypes were highly frequent, and detailed analysis of recombination breakpoints suggested multiple independent origins. These findings suggest that the complete loss of SI in A. thaliana involved independent self-compatible mutants that arose prior to the last Ice Age, and experienced further rearrangements during postglacial colonization.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

Review of the algal biology program within the National Alliance for Advanced Biofuels and Bioproducts

In 2010, when the National Alliance for Advanced Biofuels and Bioproducts (NAABB) consortium began, little was known about the molecular basis of algal biomass or oil production. Very few algal genome sequences were available and efforts to identify the best-producing wild species through bioprospecting approaches had largely stalled after the U.S. Department of Energy’s Aquatic Species Program. This lack of knowledge included how reduced carbon was partitioned into storage products like triglycerides or starch and the role played by metabolite remodeling in the accumulation of energy-dense storage products. Furthermore, genetic transformation and metabolic engineering approaches to improve algal biomass and oil yields were in their infancy. Genome sequencing and transcriptional profiling were becoming less expensive, however; and the tools to annotate gene expression profiles under various growth and engineered conditions were just starting to be developed for algae. It was in this context that an integrated algal biology program was introduced in the NAABB to address the greatest constraints limiting algal biomass yield. This review describes the NAABB algal biology program, including hypotheses, research objectives, and strategies to move algal biology research into the twenty-first century and to realize the greatest potential of algae biomass systems to produce biofuels.


July 7, 2019

The unique genomic landscape surrounding the EPSPS gene in glyphosate resistant Amaranthus palmeri: a repetitive path to resistance.

The expanding number and global distributions of herbicide resistant weedy species threaten food, fuel, fiber and bioproduct sustainability and agroecosystem longevity. Amongst the most competitive weeds, Amaranthus palmeri S. Wats has rapidly evolved resistance to glyphosate primarily through massive amplification and insertion of the 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) gene across the genome. Increased EPSPS gene copy numbers results in higher titers of the EPSPS enzyme, the target of glyphosate, and confers resistance to glyphosate treatment. To understand the genomic unit and mechanism of EPSPS gene copy number proliferation, we developed and used a bacterial artificial chromosome (BAC) library from a highly resistant biotype to sequence the local genomic landscape flanking the EPSPS gene.By sequencing overlapping BACs, a 297 kb sequence was generated, hereafter referred to as the “EPSPS cassette.” This region included several putative genes, dense clusters of tandem and inverted repeats, putative helitron and autonomous replication sequences, and regulatory elements. Whole genome shotgun sequencing (WGS) of two biotypes exhibiting high and no resistance to glyphosate was performed to compare genomic representation across the EPSPS cassette. Mapping of sequences for both biotypes to the reference EPSPS cassette revealed significant differences in upstream and downstream sequences relative to EPSPS with regard to both repetitive units and coding content between these biotypes. The differences in sequence may have resulted from a compounded-building mechanism such as repetitive transpositional events. The association of putative helitron sequences with the cassette suggests a possible amplification and distribution mechanism. Flow cytometry revealed that the EPSPS cassette added measurable genomic content.The adoption of glyphosate resistant cropping systems in major crops such as corn, soybean, cotton and canola coupled with excessive use of glyphosate herbicide has led to evolved glyphosate resistance in several important weeds. In Amaranthus palmeri, the amplification of the EPSPS cassette, characterized by a complex array of repetitive elements and putative helitron sequences, suggests an adaptive structural genomic mechanism that drives amplification and distribution around the genome. The added genomic content not found in glyphosate sensitive plants may be driving evolution through genome expansion.


July 7, 2019

A genomic view of short tandem repeats.

Short tandem repeats (STRs) are some of the fastest mutating loci in the genome. Tools for accurately profiling STRs from high-throughput sequencing data have enabled genome-wide interrogation of more than a million STRs across hundreds of individuals. These catalogs have revealed that STRs are highly multiallelic and may contribute more de novo mutations than any other variant class. Recent studies have leveraged these catalogs to show that STRs play a widespread role in regulating gene expression and other molecular phenotypes. These analyses suggest that STRs are an underappreciated but rich reservoir of variation that likely make significant contributions to Mendelian diseases, complex traits, and cancer. Copyright © 2017 Elsevier Ltd. All rights reserved.


July 7, 2019

Complete gene sequence of spider attachment silk protein (PySp1) reveals novel linker regions and extreme repeat homogenization.

Spiders use a myriad of silk types for daily survival, and each silk type has a unique suite of task-specific mechanical properties. Of all spider silk types, pyriform silk is distinct because it is a combination of a dry protein fiber and wet glue. Pyriform silk fibers are coated with wet cement and extruded into “attachment discs” that adhere silks to each other and to substrates. The mechanical properties of spider silk types are linked to the primary and higher-level structures of spider silk proteins (spidroins). Spidroins are often enormous molecules (>250 kDa) and have a lengthy repetitive region that is flanked by relatively short (~100 amino acids), non-repetitive amino- and carboxyl-terminal regions. The amino acid sequence motifs in the repetitive region vary greatly between spidroin type, while motif length and number underlie the remarkable mechanical properties of spider silk fibers. Existing knowledge of pyriform spidroins is fragmented, making it difficult to define links between the structure and function of pyriform spidroins. Here, we present the full-length sequence of the gene encoding pyriform spidroin 1 (PySp1) from the silver garden spider Argiope argentata. The predicted protein is similar to previously reported PySp1 sequences but the A. argentata PySp1 has a uniquely long and repetitive “linker”, which bridges the amino-terminal and repetitive regions. Predictions of the hydrophobicity and secondary structure of A. argentata PySp1 identify regions important to protein self-assembly. Analysis of the full complement of A. argentata PySp1 repeats reveals extreme intragenic homogenization, and comparison of A. argentata PySp1 repeats with other PySp1 sequences identifies variability in two sub-repetitive expansion regions. Overall, the full-length A. argentata PySp1 sequence provides new evidence for understanding how pyriform spidroins contribute to the properties of pyriform silk fibers. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.


July 7, 2019

Neurotrophin biology at NGF 2016: From fundamental science to clinical applications.

In 1986, members of the growing neurotrophin community came together to honor the scientific contributions (and 77th birth- day) of Dr. Rita Levi-Montalcini. The celebration took the form of a conference dedicated to the field birthed by Dr. Levi-Montalcini’s discovery of nerve growth factor (NGF), for which she shared the Nobel Prize later that year with Stanley Cohen. The meeting proved to be a great success, and eventually became an ongoing series. The NGF 2016 meeting, held at the beautiful Asilomar conference cen- ter in Monterey, California, was the 13th meeting in this series, and marked the 30th anniversary of the original meeting. A diverse col- lection of investigators, representing academia and industry across 4 continents, gathered to celebrate the past 30 years, discuss the current state of the art, and share in the excitement of envisioning the next 30 years of neurotrophic factor research and applications.


July 7, 2019

Identification of a Pseudomonas aeruginosa PAO1 DNA methyltransferase, its targets, and physiological roles.

DNA methylation is widespread among prokaryotes, and most DNA methylation reactions are catalyzed by adenine DNA methyltransferases, which are part of restriction-modification (R-M) systems. R-M systems are known for their role in the defense against foreign DNA; however, DNA methyltransferases also play functional roles in gene regulation. In this study, we used single-molecule real-time (SMRT) sequencing to uncover the genome-wide DNA methylation pattern in the opportunistic pathogen Pseudomonas aeruginosa PAO1. We identified a conserved sequence motif targeted by an adenine methyltransferase of a type I R-M system and quantified the presence of N(6)-methyladenine using liquid chromatography-tandem mass spectrometry (LC-MS/MS). Changes in the PAO1 methylation status were dependent on growth conditions and affected P. aeruginosa pathogenicity in a Galleria mellonella infection model. Furthermore, we found that methylated motifs in promoter regions led to shifts in sense and antisense gene expression, emphasizing the role of enzymatic DNA methylation as an epigenetic control of phenotypic traits in P. aeruginosa Since the DNA methylation enzymes are not encoded in the core genome, our findings illustrate how the acquisition of accessory genes can shape the global P. aeruginosa transcriptome and thus may facilitate adaptation to new and challenging habitats.IMPORTANCE With the introduction of advanced technologies, epigenetic regulation by DNA methyltransferases in bacteria has become a subject of intense studies. Here we identified an adenosine DNA methyltransferase in the opportunistic pathogen Pseudomonas aeruginosa PAO1, which is responsible for DNA methylation of a conserved sequence motif. The methylation level of all target sequences throughout the PAO1 genome was approximated to be in the range of 65 to 85% and was dependent on growth conditions. Inactivation of the methyltransferase revealed an attenuated-virulence phenotype in the Galleria mellonella infection model. Furthermore, differential expression of more than 90 genes was detected, including the small regulatory RNA prrF1, which contributes to a global iron-sparing response via the repression of a set of gene targets. Our finding of a methylation-dependent repression of the antisense transcript of the prrF1 small regulatory RNA significantly expands our understanding of the regulatory mechanisms underlying active DNA methylation in bacteria. Copyright © 2017 Doberenz et al.


July 7, 2019

A murine herpesvirus closely related to ubiquitous human herpesviruses causes T-cell depletion.

The human roseoloviruses human herpesvirus 6A (HHV-6A), HHV-6B, and HHV-7 comprise the Roseolovirus genus of the human Betaherpesvirinae subfamily. Infections with these viruses have been implicated in many diseases; however, it has been challenging to establish infections with roseoloviruses as direct drivers of pathology, because they are nearly ubiquitous and display species-specific tropism. Furthermore, controlled study of infection has been hampered by the lack of experimental models, and until now, a mouse roseolovirus has not been identified. Herein we describe a virus that causes severe thymic necrosis in neonatal mice, characterized by a loss of CD4(+) T cells. These phenotypes resemble those caused by the previously described mouse thymic virus (MTV), a putative herpesvirus that has not been molecularly characterized. By next-generation sequencing of infected tissue homogenates, we assembled a contiguous 174-kb genome sequence containing 128 unique predicted open reading frames (ORFs), many of which were most closely related to herpesvirus genes. Moreover, the structure of the virus genome and phylogenetic analysis of multiple genes strongly suggested that this virus is a betaherpesvirus more closely related to the roseoloviruses, HHV-6A, HHV-6B, and HHV-7, than to another murine betaherpesvirus, mouse cytomegalovirus (MCMV). As such, we have named this virus murine roseolovirus (MRV) because these data strongly suggest that MRV is a mouse homolog of HHV-6A, HHV-6B, and HHV-7. IMPORTANCE Herein we describe the complete genome sequence of a novel murine herpesvirus. By sequence and phylogenetic analyses, we show that it is a betaherpesvirus most closely related to the roseoloviruses, human herpesviruses 6A, 6B, and 7. These data combined with physiological similarities with human roseoloviruses collectively suggest that this virus is a murine roseolovirus (MRV), the first definitively described rodent roseolovirus, to our knowledge. Many biological and clinical ramifications of roseolovirus infection in humans have been hypothesized, but studies showing definitive causative relationships between infection and disease susceptibility are lacking. Here we show that MRV infects the thymus and causes T-cell depletion, suggesting that other roseoloviruses may have similar properties. Copyright © 2017 American Society for Microbiology.


July 7, 2019

Quantifying the importance of the rare biosphere for microbial community response to organic pollutants in a freshwater ecosystem.

A single liter of water contains hundreds, if not thousands, of bacterial and archaeal species, each of which typically makes up a very small fraction of the total microbial community (<0.1%), the so-called "rare biosphere." How often, and via what mechanisms, e.g., clonal amplification versus horizontal gene transfer, the rare taxa and genes contribute to microbial community response to environmental perturbations represent important unanswered questions toward better understanding the value and modeling of microbial diversity. We tested whether rare species frequently responded to changing environmental conditions by establishing 20-liter planktonic mesocosms with water from Lake Lanier (Georgia, USA) and perturbing them with organic compounds that are rarely detected in the lake, including 2,4-dichlorophenoxyacetic acid (2,4-D), 4-nitrophenol (4-NP), and caffeine. The populations of the degraders of these compounds were initially below the detection limit of quantitative PCR (qPCR) or metagenomic sequencing methods, but they increased substantially in abundance after perturbation. Sequencing of several degraders (isolates) and time-series metagenomic data sets revealed distinct cooccurring alleles of degradation genes, frequently carried on transmissible plasmids, especially for the 2,4-D mesocosms, and distinct species dominating the post-enrichment microbial communities from each replicated mesocosm. This diversity of species and genes also underlies distinct degradation profiles among replicated mesocosms. Collectively, these results supported the hypothesis that the rare biosphere can serve as a genetic reservoir, which can be frequently missed by metagenomics but enables community response to changing environmental conditions caused by organic pollutants, and they provided insights into the size of the pool of rare genes and species. IMPORTANCE A single liter of water or gram of soil contains hundreds of low-abundance bacterial and archaeal species, the so called rare biosphere. The value of this astonishing biodiversity for ecosystem functioning remains poorly understood, primarily due to the fact that microbial community analysis frequently focuses on abundant organisms. Using a combination of culture-dependent and culture-independent (metagenomics) techniques, we showed that rare taxa and genes commonly contribute to the microbial community response to organic pollutants. Our findings should have implications for future studies that aim to study the role of rare species in environmental processes, including environmental bioremediation efforts of oil spills or other contaminants. Copyright © 2017 American Society for Microbiology.


July 7, 2019

Surveillance of bat coronaviruses in Kenya identifies relatives of human coronaviruses NL63 and 229E and their recombination history.

Bats harbor a large diversity of coronaviruses (CoVs), several of which are related to zoonotic pathogens that cause severe disease in humans. Our screening of bat samples collected in Kenya from 2007 to 2010 not only detected RNA from several novel CoVs but, more significantly, identified sequences that were closely related to human CoVs NL63 and 229E, suggesting that these two human viruses originate from bats. We also demonstrated that human CoV NL63 is a recombinant between NL63-like viruses circulating in Triaenops bats and 229E-like viruses circulating in Hipposideros bats, with the breakpoint located near 5′ and 3′ ends of the spike (S) protein gene. In addition, two further interspecies recombination events involving the S gene were identified, suggesting that this region may represent a recombination “hot spot” in CoV genomes. Finally, using a combination of phylogenetic and distance-based approaches, we showed that the genetic diversity of bat CoVs is primarily structured by host species and subsequently by geographic distances.IMPORTANCE Understanding the driving forces of cross-species virus transmission is central to understanding the nature of disease emergence. Previous studies have demonstrated that bats are the ultimate reservoir hosts for a number of coronaviruses (CoVs), including ancestors of severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), and human CoV 229E (HCoV-229E). However, the evolutionary pathways of bat CoVs remain elusive. We provide evidence for natural recombination between distantly related African bat coronaviruses associated with Triaenops afer and Hipposideros sp. bats that resulted in a NL63-like virus, an ancestor of the human pathogen HCoV-NL63. These results suggest that interspecies recombination may play an important role in CoV evolution and the emergence of novel CoVs with zoonotic potential. Copyright © 2017 American Society for Microbiology.


July 7, 2019

Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome.

Using second-generation sequencing (SGS) RNA-Seq strategies, extensive alterative splicing prediction is impractical and high variability of isoforms expression quantification is inevitable in organisms without true reference dataset. we report the development of a novel analysis method, termed hybrid sequencing and map finding (HySeMaFi) which combines the specific strengths of third-generation sequencing (TGS) (PacBio SMRT sequencing) and SGS (Illumina Hi-Seq/MiSeq sequencing) to effectively decipher gene splicing and to reliably estimate the isoforms abundance. Error-corrected long reads from TGS are capable of capturing full length transcripts or as large partial transcript fragments. Both true and false isoforms, from a particular gene, as well as that containing all possible exons, could be generated by employing different assembly methods in SGS. We first develop an effective method which can establish the mapping relationship between the error-corrected long reads and the longest assembled contig in every corresponding gene. According to the mapping data, the true splicing pattern of the genes was reliably detected, and quantification of the isoforms was also effectively determined. HySeMaFi is also the optimal strategy by which to decipher the full exon expression of a specific gene when the longest mapped contigs were chosen as the reference set.


July 7, 2019

ThermoAlign: a genome-aware primer design tool for tiled amplicon resequencing.

Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. This has been facilitated by computationally encoding the thermodynamics of DNA hybridization for automated design of hybridization and priming oligonucleotides. However, the repetitive composition of genomes challenges the identification of target-specific oligonucleotides, which limits genetics and genomics research on many species. Here, a tool called ThermoAlign was developed that ensures the design of target-specific primer pairs for DNA amplification. This is achieved by evaluating the thermodynamics of hybridization for full-length oligonucleotide-template alignments – thermoalignments – across the genome to identify primers predicted to bind specifically to the target site. For amplification-based resequencing of regions that cannot be amplified by a single primer pair, a directed graph analysis method is used to identify minimum amplicon tiling paths. Laboratory validation by standard and long-range polymerase chain reaction and amplicon resequencing with maize, one of the most repetitive genomes sequenced to date (˜85% repeat content), demonstrated the specificity-by-design functionality of ThermoAlign. ThermoAlign is released under an open source license and bundled in a dependency-free container for wide distribution. It is anticipated that this tool will facilitate multiple applications in genetics and genomics and be useful in the workflow of high-throughput targeted resequencing studies.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.