Menu
July 7, 2019

The genomic sequence of the oral pathobiont strain NI1060 reveals unique strategies for bacterial competition and pathogenicity.

Strain NI1060 is an oral bacterium responsible for periodontitis in a murine ligature-induced disease model. To better understand its pathogenicity, we have determined the complete sequence of its 2,553,982 bp genome. Although closely related to Pasteurella pneumotropica, a pneumonia-associated rodent commensal based on its 16S rRNA, the NI1060 genomic content suggests that they are different species thriving on different energy sources via alternative metabolic pathways. Genomic and phylogenetic analyses showed that strain NI1060 is distinct from the genera currently described in the family Pasteurellaceae, and is likely to represent a novel species. In addition, we found putative virulence genes involved in lipooligosaccharide synthesis, adhesins and bacteriotoxic proteins. These genes are potentially important for host adaption and for the induction of dysbiosis through bacterial competition and pathogenicity. Importantly, strain NI1060 strongly stimulates Nod1, an innate immune receptor, but is defective in two peptidoglycan recycling genes due to a frameshift mutation. The in-depth analysis of its genome thus provides critical insights for the development of NI1060 as a prime model system for infectious disease.


July 7, 2019

Challenges, solutions, and quality metrics of personal genome assembly in advancing precision medicine.

Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.


July 7, 2019

Representing genetic variation with synthetic DNA standards.

The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and by biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed ‘sequins’, that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, which allows them them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy-number variation. We validate the design and performance of sequin standards by comparison to examples in the NA12878 reference genome, and we demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.


July 7, 2019

The two chromosomes of the mitochondrial genome of a sugarcane cultivar: assembly and recombination analysis using long PacBio reads.

Sugarcane accounts for a large portion of the worlds sugar production. Modern commercial cultivars are complex hybrids of S. officinarum and several other Saccharum species. Historical records identify New Guinea as the origin of S. officinarum and that a small number of plants originating from there were used to generate all modern commercial cultivars. The mitochondrial genome can be a useful way to identify the maternal origin of commercial cultivars. We have used the PacBio RSII to sequence and assemble the mitochondrial genome of a South East Asian commercial cultivar, known as Khon Kaen 3. The long read length of this sequencing technology allowed for the mitochondrial genome to be assembled into two distinct circular chromosomes with all repeat sequences spanned by individual reads. Comparison of five commercial hybrids, two S. officinarum and one S. spontaneum to our assembly reveals no structural rearrangements between our assembly, the commercial hybrids and an S. officinarum from New Guinea. The S. spontaneum, from India, and one sample of S. officinarum (unknown origin) are substantially rearranged and have a large number of homozygous variants. This supports the record that S. officinarum plants from New Guinea are the maternal source of all modern commercial hybrids.


July 7, 2019

Comparative genomics and transcriptomics of Pichia pastoris.

Pichia pastoris has emerged as an important alternative host for producing recombinant biopharmaceuticals, owing to its high cultivation density, low host cell protein burden, and the development of strains with humanized glycosylation. Despite its demonstrated utility, relatively little strain engineering has been performed to improve Pichia, due in part to the limited number and inconsistent frameworks of reported genomes and transcriptomes. Furthermore, the co-mingling of genomic, transcriptomic and fermentation data collected about Komagataella pastoris and Komagataella phaffii, the two strains co-branded as Pichia, has generated confusion about host performance for these genetically distinct species. Generation of comparative high-quality genomes and transcriptomes will enable meaningful comparisons between the organisms, and potentially inform distinct biotechnological utilies for each species.Here, we present a comprehensive and standardized comparative analysis of the genomic features of the three most commonly used strains comprising the tradename Pichia: K. pastoris wild-type, K. phaffii wild-type, and K. phaffii GS115. We used a combination of long-read (PacBio) and short-read (Illumina) sequencing technologies to achieve over 1000X coverage of each genome. Construction of individual genomes was then performed using as few as seven individual contigs to create gap-free assemblies. We found substantial syntenic rearrangements between the species and characterized a linear plasmid present in K. phaffii. Comparative analyses between K. phaffii genomes enabled the characterization of the mutational landscape of the GS115 strain. We identified and examined 35 non-synonomous coding mutations present in GS115, many of which are likely to impact strain performance. Additionally, we investigated transcriptomic profiles of gene expression for both species during cultivation on various carbon sources. We observed that the most highly transcribed genes in both organisms were consistently highly expressed in all three carbon sources examined. We also observed selective expression of certain genes in each carbon source, including many sequences not previously reported as promoters for expression of heterologous proteins in yeasts.Our studies establish a foundation for understanding critical relationships between genome structure, cultivation conditions and gene expression. The resources we report here will inform and facilitate rational, organism-wide strain engineering for improved utility as a host for protein production.


July 7, 2019

Isolation and genomic characterization of ‘Desulfuromonas soudanensis WTL’, a metal- and electrode-respiring bacterium from anoxic deep subsurface brine.

Reaching a depth of 713 m below the surface, the Soudan Underground Iron Mine (Soudan, MN, USA) transects a massive Archaean (2.7 Ga) banded iron formation, providing a remarkably accessible window into the terrestrial deep biosphere. Despite organic carbon limitation, metal-reducing microbial communities are present in potentially ancient anoxic brines continuously emanating from exploratory boreholes on Level 27. Using graphite electrodes deposited in situ as bait, we electrochemically enriched and isolated a novel halophilic iron-reducing Deltaproteobacterium, ‘Desulfuromonas soudanensis’ strain WTL, from an acetate-fed three-electrode bioreactor poised at +0.24 V (vs. standard hydrogen electrode). Cyclic voltammetry revealed that ‘D. soudanensis’ releases electrons at redox potentials approximately 100 mV more positive than the model freshwater surface isolate Geobacter sulfurreducens, suggesting that its extracellular respiration is tuned for higher potential electron acceptors. ‘D. soudanensis’ contains a 3,958,620-bp circular genome, assembled to completion using single-molecule real-time (SMRT) sequencing reads, which encodes a complete TCA cycle, 38 putative multiheme c-type cytochromes, one of which contains 69 heme-binding motifs, and a LuxI/LuxR quorum sensing cassette that produces an unidentified N-acyl homoserine lactone. Another cytochrome is predicted to lie within a putative prophage, suggesting that horizontal gene transfer plays a role in respiratory flexibility among metal reducers. Isolation of ‘D. soudanensis’ underscores the utility of electrode-based approaches for enriching rare metal reducers from a wide range of habitats.


July 7, 2019

Assemblytics: a web analytics tool for the detection of variants from an assembly.

Assemblytics is a web app for detecting and analyzing variants from a de novo genome assembly aligned to a reference genome. It incorporates a unique anchor filtering approach to increase robustness to repetitive elements, and identifies six classes of variants based on their distinct alignment signatures. Assemblytics can be applied both to comparing aberrant genomes, such as human cancers, to a reference, or to identify differences between related species. Multiple interactive visualizations enable in-depth explorations of the genomic distributions of variants.http://assemblytics.com, https://github.com/marianattestad/assemblytics CONTACT: mnattest@cshl.eduSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Short tandem repeat number estimation from paired-end reads for multiple individuals by considering coalescent tree.

Two types of approaches are mainly considered for the repeat number estimation in short tandem repeat (STR) regions from high-throughput sequencing data: approaches directly counting repeat patterns included in sequence reads spanning the region and approaches based on detecting the difference between the insert size inferred from aligned paired-end reads and the actual insert size. Although the accuracy of repeat numbers estimated with the former approaches is high, the size of target STR regions is limited to the length of sequence reads. On the other hand, the latter approaches can handle STR regions longer than the length of sequence reads. However, repeat numbers estimated with the latter approaches is less accurate than those with the former approaches.We proposed a new statistical model named coalescentSTR that estimates repeat numbers from paired-end read distances for multiple individuals simultaneously by connecting the read generative model for each individual with their genealogy. In the model, the genealogy is represented by handling coalescent trees as hidden variables, and the summation of the hidden variables is taken on coalescent trees sampled based on phased genotypes located around a target STR region with Markov chain Monte Carlo. In the sampled coalescent trees, repeat number information from insert size data is propagated, and more accurate estimation of repeat numbers is expected for STR regions longer than the length of sequence reads. For finding the repeat numbers maximizing the likelihood of the model on the estimation of repeat numbers, we proposed a state-of-the-art belief propagation algorithm on sampled coalescent trees.We verified the effectiveness of the proposed approach from the comparison with existing methods by using simulation datasets and real whole genome and whole exome data for HapMap individuals analyzed in the 1000 Genomes Project.


July 7, 2019

Hyper-eccentric structural genes in the mitochondrial genome of the algal parasite Hemistasia phaeocysticola.

Diplonemid mitochondria are considered to have very eccentric structural genes. Coding regions of individual diplonemid mitochondrial genes are fragmented into small pieces and found on different circular DNAs. Short RNAs transcribed from each DNA molecule mature through a unique RNA maturation process involving assembly and three types of RNA editing (i.e., U insertion and A-to-I & C-to-U substitutions), although the molecular mechanism(s) of RNA maturation and the evolutionary history of these eccentric structural genes still remain to be understood. Since the gene fragmentation pattern is generally conserved among the diplonemid species studied to date, it was considered that their structural complexity has plateaued and further gene fragmentation could not occur. Here, we show the mitochondrial gene structure of Hemistasia phaeocysticola, which was recently identified as a member of a novel lineage in diplonemids, by comparison of the mitochondrial DNA sequences with cDNA sequences synthesized from mature mRNA. The genes of H. phaeocysticola are fragmented much more finely than those of other diplonemids studied to date. Furthermore, in addition to all known types of RNA editing, it is suggested that a novel processing step (i.e., secondary RNA insertion) is involved in the RNA maturation in the mitochondria of H. phaeocysticola Our findings demonstrate the tremendous plasticity of mitochondrial gene structures.© The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

Effector diversification contributes to Xanthomonas oryzae pv. oryzae phenotypic adaptation in a semi-isolated environment.

Understanding the processes that shaped contemporary pathogen populations in agricultural landscapes is quite important to define appropriate management strategies and to support crop improvement efforts. Here, we took advantage of an historical record to examine the adaptation pathway of the rice pathogen Xanthomonas oryzae pv. oryzae (Xoo) in a semi-isolated environment represented in the Philippine archipelago. By comparing genomes of key Xoo groups we showed that modern populations derived from three Asian lineages. We also showed that diversification of virulence factors occurred within each lineage, most likely driven by host adaptation, and it was essential to shape contemporary pathogen races. This finding is particularly important because it expands our understanding of pathogen adaptation to modern agriculture.


July 7, 2019

A viral immunity chromosome in the marine picoeukaryote, Ostreococcus tauri.

Micro-algae of the genus Ostreococcus and related species of the order Mamiellales are globally distributed in the photic zone of world’s oceans where they contribute to fixation of atmospheric carbon and production of oxygen, besides providing a primary source of nutrition in the food web. Their tiny size, simple cells, ease of culture, compact genomes and susceptibility to the most abundant large DNA viruses in the sea render them attractive as models for integrative marine biology. In culture, spontaneous resistance to viruses occurs frequently. Here, we show that virus-producing resistant cell lines arise in many independent cell lines during lytic infections, but over two years, more and more of these lines stop producing viruses. We observed sweeping over-expression of all genes in more than half of chromosome 19 in resistant lines, and karyotypic analyses showed physical rearrangements of this chromosome. Chromosome 19 has an unusual genetic structure whose equivalent is found in all of the sequenced genomes in this ecologically important group of green algae.


July 7, 2019

Complete circular genome sequence of successful ST8/SCCmecIV community-associated methicillin-resistant Staphylococcus aureus (OC8) in Russia: one-megabase genomic inversion, IS256’s spread, and evolution of Russia ST8-IV.

ST8/SCCmecIV community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has been a common threat, with large USA300 epidemics in the United States. The global geographical structure of ST8/SCCmecIV has not yet been fully elucidated. We herein determined the complete circular genome sequence of ST8/SCCmecIVc strain OC8 from Siberian Russia. We found that 36.0% of the genome was inverted relative to USA300. Two IS256, oppositely oriented, at IS256-enriched hot spots were implicated with the one-megabase genomic inversion (MbIN) and vSaß split. The behavior of IS256 was flexible: its insertion site (att) sequences on the genome and junction sequences of extrachromosomal circular DNA were all divergent, albeit with fixed sizes. A similar multi-IS256 system was detected, even in prevalent ST239 healthcare-associated MRSA in Russia, suggesting IS256’s strong transmission potential and advantage in evolution. Regarding epidemiology, all ST8/SCCmecIVc strains from European, Siberian, and Far Eastern Russia, examined had MbIN, and geographical expansion accompanied divergent spa types and resistance to fluoroquinolones, chloramphenicol, and often rifampicin. Russia ST8/SCCmecIVc has been associated with life-threatening infections such as pneumonia and sepsis in both community and hospital settings. Regarding virulence, the OC8 genome carried a series of toxin and immune evasion genes, a truncated giant surface protein gene, and IS256 insertion adjacent to a pan-regulatory gene. These results suggest that unique single ST8/spa1(t008)/SCCmecIVc CA-MRSA (clade, Russia ST8-IVc) emerged in Russia, and this was followed by large geographical expansion, with MbIN as an epidemiological marker, and fluoroquinolone resistance, multiple virulence factors, and possibly a multi-IS256 system as selective advantages.


July 7, 2019

LongISLND: in silico sequencing of lengthy and noisy datatypes.

LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling.LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press.


July 7, 2019

Emergence of endemic MLST non-typeable vancomycin-resistant Enterococcus faecium.

Enterococcus faecium is a major nosocomial pathogen causing significant morbidity and mortality worldwide. Assessment of E. faecium using MLST to understand the spread of this organism is an important component of hospital infection control measures. Recent studies, however, suggest that MLST might be inadequate for E. faecium surveillance.To use WGS to characterize recently identified vancomycin-resistant E. faecium (VREfm) isolates non-typeable by MLST that appear to be causing a multi-jurisdictional outbreak in Australia.Illumina NextSeq and Pacific Biosciences SMRT sequencing platforms were used to determine the genome sequences of 66 non-typeable E. faecium (NTEfm) isolates. Phylogenetic and bioinformatics analyses were subsequently performed using a number of in silico tools.Sixty-six E. faecium isolates were identified by WGS from multiple health jurisdictions in Australia that could not be typed by MLST due to a missing pstS allele. SMRT sequencing and complete genome assembly revealed a large chromosomal rearrangement in representative strain DMG1500801, which likely facilitated the deletion of the pstS region. Phylogenomic analysis of this population suggests that deletion of pstS within E. faecium has arisen independently on at least three occasions. Importantly, the majority of these isolates displayed a vancomycin-resistant genotype.We have identified NTEfm isolates that appear to be causing a multi-jurisdictional outbreak in Australia. Identification of these isolates has important implications for MLST-based typing activities designed to monitor the spread of VREfm and provides further evidence supporting the use of WGS for hospital surveillance of E. faecium.© The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.