Menu
July 7, 2019

The hidden perils of read mapping as a quality assessment tool in genome sequencing.

This article provides a comparative analysis of the various methods of genome sequencing focusing on verification of the assembly quality. The results of a comparative assessment of various de novo assembly tools, as well as sequencing technologies, are presented using a recently completed sequence of the genome of Lactobacillus fermentum 3872. In particular, quality of assemblies is assessed by using CLC Genomics Workbench read mapping and Optical mapping developed by OpGen. Over-extension of contigs without prior knowledge of contig location can lead to misassembled contigs, even when commonly used quality indicators such as read mapping suggest that a contig is well assembled. Precautions must also be undertaken when using long read sequencing technology, which may also lead to misassembled contigs.


July 7, 2019

HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies.

Achieving complete, accurate, and cost-effective assembly of human genomes is of great importance for realizing the promise of precision medicine. The abundance of repeats and genetic variations in human genomes and the limitations of existing sequencing technologies call for the development of novel assembly methods that can leverage the complementary strengths of multiple technologies. We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next-generation sequencing and single-molecule sequencing technologies to accurately assemble and detect structural variants (SVs) in human genomes. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance the assembly of structurally altered regions in human genomes. We used data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878) to test our approach. The result showed that, compared with existing methods, our approach had a low false discovery rate and substantially improved the detection of many types of SVs, particularly novel large insertions, small indels (10-50 bp), and short tandem repeat expansions and contractions. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.© 2017 Fan et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy. © 2017 Zimin et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism.

Plasmopara viticola causes downy mildew disease of grapevine which is one of the most devastating diseases of viticulture worldwide. Here we report a 101.3?Mb whole genome sequence of P. viticola isolate ‘JL-7-2’ obtained by a combination of Illumina and PacBio sequencing technologies. The P. viticola genome contains 17,014 putative protein-coding genes and has ~26% repetitive sequences. A total of 1,301 putative secreted proteins, including 100 putative RXLR effectors and 90 CRN effectors were identified in this genome. In the secretome, 261 potential pathogenicity genes and 95 carbohydrate-active enzymes were predicted. Transcriptional analysis revealed that most of the RXLR effectors, pathogenicity genes and carbohydrate-active enzymes were significantly up-regulated during infection. Comparative genomic analysis revealed that P. viticola evolved independently from the Arabidopsis downy mildew pathogen Hyaloperonospora arabidopsidis. The availability of the P. viticola genome provides a valuable resource not only for comparative genomic analysis and evolutionary studies among oomycetes, but also enhance our knowledge on the mechanism of interactions between this biotrophic pathogen and its host.


July 7, 2019

An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25?361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107?821, 61% larger than the previous assembly. © The Author 2017. Published by Oxford University Press.


July 7, 2019

Characterization of Class IIa bacteriocin resistance in Enterococcus faecium.

Vancomycin-resistant enterococci, particularly resistant Enterococcus faecium, pose an escalating threat in nosocomial environments because of their innate resistance to many antibiotics, including vancomycin, a treatment of last resort. Many class IIa bacteriocins strongly target these enterococci and may offer a potential alternative for the management of this pathogen. However, E. faecium’s resistance to these peptides remains relatively uncharacterized. Here, we explored the development of resistance of E. faecium to a cocktail of three class IIa bacteriocins: enterocin A, enterocin P, and hiracin JM79. We started by quantifying the frequency of resistance to these peptides in four clinical isolates of E. faecium We then investigated the levels of resistance of E. faecium 6E6 mutants as well as their fitness in different carbon sources. In order to elucidate the mechanism of resistance of E. faecium to class IIa bacteriocins, we completed whole-genome sequencing of resistant mutants and performed reverse transcription-quantitative PCR (qRT-PCR) of a suspected target mannose phosphotransferase (ManPTS). We then verified this ManPTS’s role in bacteriocin susceptibility by showing that expression of the ManPTS in Lactococcus lactis results in susceptibility to the peptide cocktail. Based on the evidence found from these studies, we conclude that, in accord with other studies in E. faecalis and Listeria monocytogenes, resistance to class IIa bacteriocins in E. faecium 6E6 is likely caused by the disruption of a particular ManPTS, which we believe we have identified. Copyright © 2017 American Society for Microbiology.


July 7, 2019

Sequencing and de novo assembly of a near complete indica rice genome.

A high-quality reference genome is critical for understanding genome structure, genetic variation and evolution of an organism. Here we report the de novo assembly of an indica rice genome Shuhui498 (R498) through the integration of single-molecule sequencing and mapping data, genetic map and fosmid sequence tags. The 390.3?Mb assembly is estimated to cover more than 99% of the R498 genome and is more continuous than the current reference genomes of japonica rice Nipponbare (MSU7) and Arabidopsis thaliana (TAIR10). We annotate high-quality protein-coding genes in R498 and identify genetic variations between R498 and Nipponbare and presence/absence variations by comparing them to 17 draft genomes in cultivated rice and its closest wild relatives. Our results demonstrate how to de novo assemble a highly contiguous and near-complete plant genome through an integrative strategy. The R498 genome will serve as a reference for the discovery of genes and structural variations in rice.


July 7, 2019

De novo genome and transcriptome assembly of the Canadian beaver (Castor canadensis).

The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon-gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology. Copyright © 2017 Lok et al.


July 7, 2019

Genome sequence of the fungal strain 14919 producing 3-hydroxy-3-methylglutaryl–coenzyme A reductase inhibitor FR901512.

Fungal strain 14919 was originally isolated from a soil sample collected at Mt. Kiyosumi, Chiba Prefecture, Japan. It produces FR901512, a potent and strong 3-hydroxy-3-methylglutaryl-coenzyme A (HMG-CoA) reductase inhibitor. The genome sequence of fungal strain 14919 was determined and annotated to improve the productivity of FR901512. Copyright © 2017 Itoh et al.


July 7, 2019

Genome sequences of Cyberlindnera fabianii 65, Pichia kudriavzevii 129, and Saccharomyces cerevisiae 131 isolated from fermented masau fruits in Zimbabwe.

Cyberlindnera fabianii 65, Pichia kudriavzevii 129, and Saccharomyces cerevisiae 131 have been isolated from the microbiota of fermented masau fruits. C. fabianii and P. kudriavzevii especially harbor promising features for biotechnology and food applications. Here, we present the draft annotated genome sequences of these isolates. Copyright © 2017 van Rijswijck et al.


July 7, 2019

Complete genome sequence of Amycolatopsis orientalis CPCC200066, the producer of norvancomycin.

Amycolatopsis orientalis CPCC200066 is an actinomycete exploited commercially in China for the production of norvancomycin, an important glycopeptide antibiotic structurally close to the well-known vancomycin. The availability of the complete genome sequence of CPCC200066 would greatly strengthen our understanding of the regulation pattern of norvancomycin biosynthesis and ultimately improve its production, as well as potentiate discoveries of novel bioactive compounds. Here we report the complete genome sequence of A. orientalis CPCC200066, a circular chromosome consisting of 9,490,992bp. Forty putative secondary metabolite biosynthetic gene clusters, including norvancomycin, were predicted, covering 20.3% of the whole genome. To facilitate genetic manipulation of this strain, an efficient transformation system was established by constructing a novel integrative vector pIMBT1, which could be transferred into CPCC200066 by electroporation with high efficiency. FBT1 attB sites were also identified in other known Amycolatopsis genomes, indicating pIMBT1’s prospect to be a novel vector for genus Amycolatopsis. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019

High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development.

Using the latest sequencing and optical mapping technologies, we have produced a high-quality de novo assembly of the apple (Malus domestica Borkh.) genome. Repeat sequences, which represented over half of the assembly, provided an unprecedented opportunity to investigate the uncharacterized regions of a tree genome; we identified a new hyper-repetitive retrotransposon sequence that was over-represented in heterochromatic regions and estimated that a major burst of different transposable elements (TEs) occurred 21 million years ago. Notably, the timing of this TE burst coincided with the uplift of the Tian Shan mountains, which is thought to be the center of the location where the apple originated, suggesting that TEs and associated processes may have contributed to the diversification of the apple ancestor and possibly to its divergence from pear. Finally, genome-wide DNA methylation data suggest that epigenetic marks may contribute to agronomically relevant aspects, such as apple fruit development.


July 7, 2019

Complete genome sequence of Staphylococcus epidermidis 1457.

Staphylococcus epidermidis 1457 is a frequently utilized strain that is amenable to genetic manipulation and has been widely used for biofilm-related research. We report here the whole-genome sequence of this strain, which encodes 2,277 protein-coding genes and 81 RNAs within its 2.4-Mb genome and plasmid. Copyright © 2017 Galac et al.


July 7, 2019

Genome sequence of Escherichia coli E28, a multidrug-resistant strain isolated from a chicken carcass, and its spontaneously inducible prophage.

In this study, we sequenced the complete genome of the multidrug-resistant Escherichia coli strain E28, which was used as an indicator strain for phage therapy in vivo We used a combination of single-molecule real-time and Illumina sequencing technology to reveal the presence of a spontaneously inducible prophage. Copyright © 2017 Schmidt et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.