Menu
July 7, 2019

HISEA: HIerarchical SEed Aligner for PacBio data.

The next generation sequencing (NGS) techniques have been around for over a decade. Many of their fundamental applications rely on the ability to compute good genome assemblies. As the technology evolves, the assembly algorithms and tools have to continuously adjust and improve. The currently dominant technology of Illumina produces reads that are too short to bridge many repeats, setting limits on what can be successfully assembled. The emerging SMRT (Single Molecule, Real-Time) sequencing technique from Pacific Biosciences produces uniform coverage and long reads of length up to sixty thousand base pairs, enabling significantly better genome assemblies. However, SMRT reads are much more expensive and have a much higher error rate than Illumina’s – around 10-15% – mostly due to indels. New algorithms are very much needed to take advantage of the long reads while mitigating the effect of high error rate and lowering the required coverage.An essential step in assembling SMRT data is the detection of alignments, or overlaps, between reads. High error rate and very long reads make this a much more challenging problem than for Illumina data. We present a new pairwise read aligner, or overlapper, HISEA (Hierarchical SEed Aligner) for SMRT sequencing data. HISEA uses a novel two-step k-mer search, employing consistent clustering, k-mer filtering, and read alignment extension.We compare HISEA against several state-of-the-art programs – BLASR, DALIGNER, GraphMap, MHAP, and Minimap – on real datasets from five organisms. We compare their sensitivity, precision, specificity, F1-score, as well as time and memory usage. We also introduce a new, more precise, evaluation method. Finally, we compare the two leading programs, MHAP and HISEA, for their genome assembly performance in the Canu pipeline.Our algorithm has the best alignment detection sensitivity among all programs for SMRT data, significantly higher than the current best. The currently best assembler for SMRT data is the Canu program which uses the MHAP aligner in its pipeline. We have incorporated our new HISEA aligner in the Canu pipeline and benchmarked it against the best pipeline for multiple datasets at two relevant coverage levels: 30x and 50x. Our assemblies are better than those using MHAP for both coverage levels. Moreover, Canu+HISEA assemblies for 30x coverage are comparable with Canu+MHAP assemblies for 50x coverage, while being faster and cheaper.The HISEA algorithm produces alignments with highest sensitivity compared with the current state-of-the-art algorithms. Integrated in the Canu pipeline, currently the best for assembling PacBio data, it produces better assemblies than Canu+MHAP.


July 7, 2019

Complete genome sequence of esterase-producing bacterium Croceicoccus marinus E4A9T.

Croceicoccus marinus E4A9Twas isolated from deep-sea sediment collected from the East Pacific polymetallic nodule area. The strain is able to produce esterase, which is widely used in the food, perfume, cosmetic, chemical, agricultural and pharmaceutical industries. Here we describe the characteristics of strain E4A9, including the genome sequence and annotation, presence of esterases, and metabolic pathways of the organism. The genome of strain E4A9T comprises 4,109,188 bp, with one chromosome (3,001,363 bp) and two large circular plasmids (761,621 bp and 346,204 bp, respectively). Complete genome contains 3653 coding sequences, 48 tRNAs, two operons of 16S-23S-5S rRNA gene and three ncRNAs. Strain E4A9T encodes 10 genes related to esterase, and three of the esterases (E3, E6 and E10) was successfully cloned and expressed in Escherichia coli Rosetta in a soluble form, revealing its potential application in biotechnological industry. Moreover, the genome provides clues of metabolic pathways of strain E4A9T, reflecting its adaptations to the ambient environment. The genome sequence of C. marinus E4A9T now provides the fundamental information for future studies.


July 7, 2019

Efficient transgenesis and annotated genome sequence of the regenerative flatworm model Macrostomum lignano.

Regeneration-capable flatworms are informative research models to study the mechanisms of stem cell regulation, regeneration, and tissue patterning. However, the lack of transgenesis methods considerably hampers their wider use. Here we report development of a transgenesis method for Macrostomum lignano, a basal flatworm with excellent regeneration capacity. We demonstrate that microinjection of DNA constructs into fertilized one-cell stage eggs, followed by a low dose of irradiation, frequently results in random integration of the transgene in the genome and its stable transmission through the germline. To facilitate selection of promoter regions for transgenic reporters, we assembled and annotated the M. lignano genome, including genome-wide mapping of transcription start regions, and show its utility by generating multiple stable transgenic lines expressing fluorescent proteins under several tissue-specific promoters. The reported transgenesis method and annotated genome sequence will permit sophisticated genetic studies on stem cells and regeneration using M. lignano as a model organism.


July 7, 2019

Resequencing of the Leishmania infantum (strain JPCM5) genome and de novo assembly into 36 contigs.

Leishmania parasites are the causative of leishmaniasis, a group of potentially fatal human diseases. Control strategies for leishmaniasis can be enhanced by genome based investigations. The publication in 2005 of the Leishmania major genome sequence, and two years later the genomes for the species Leishmania braziliensis and Leishmania infantum were major milestones. Since then, the L. infantum genome, although highly fragmented and incomplete, has been used widely as the reference genome to address whole transcriptomics and proteomics studies. Here, we report the sequencing of the L. infantum genome by two NGS methodologies and, as a result, the complete genome assembly on 36 contigs (chromosomes). Regarding the present L. infantum genome-draft, 495 new genes have been annotated, a hundred have been corrected and 75 previous annotated genes have been discontinued. These changes are not only the result of an increase in the genome size, but a significant contribution derives from the existence of a large number of incorrectly assembled regions in current chromosomal scaffolds. Furthermore, an improved assembly of tandemly repeated genes has been obtained. All these analyses support that the de novo assembled L. infantum genome represents a robust assembly and should replace the currently available in the databases.


July 7, 2019

Complete genome sequences of two plant-associated Pseudomonas putida isolates with increased heavy-metal tolerance.

We report here the complete genome sequences of two Pseudomonas putida isolates recovered from surfac e-sterilized roots of Sida hermaphrodita The two isolates were characterized by an increased tolerance to zinc, cadmium, and lead. Furthermore, the strains showed typical plant growth-promoting properties, such as the production of indole acetic acid, cellulolytic enzymes, and siderophores. Copyright © 2017 Nesme et al.


July 7, 2019

Whole-genome sequencing of Lactobacillus salivarius strains BCRC 14759 and BCRC 12574.

Lactobacillus salivarius BCRC 14759 has been identified as a high-exopolysaccharide-producing strain with potential as a probiotic or fermented dairy product. Here, we report the genome sequences of L. salivarius BCRC 14759 and the comparable strain BCRC 12574, isolated from human saliva. The PacBio RSII sequencing platform was used to obtain high-quality assemblies for characterization of this probiotic candidate. Copyright © 2017 Chiu et al.


July 7, 2019

Complete genome sequencing and diversity analysis of lipolytic enzymes in Stenotrophomonas maltophilia OUC_Est10

[Objective] The aim of this study was to study the diversity of lipolytic enzymes in Stenotrophomonas maltophilia OUC_Est10. [Methods] Ion exchange chromatography, genome sequencing and heterologous expression were used to study the diversity of lipolytic enzymes in Stenotrophomonas maltophilia OUC_Est10. [Results] Stenotrophomonas maltophilia OUC_Est10 could secret a wide range of lipolytic enzymes (lipases and esterases) as revealed by ion exchange chromatography. The complete genome is of 4668743 bp in length, with an average GC content of 66.25%. Genome annotation indicated the presence of 33 candidate genes whose products possess the predicted lipolytic enzyme activities. Analysis of catalytic features was carried out by expressing five putative lipolytic enzyme genes, and lipolytic enzymes in OUC_Est10 had different catalytic properties. [Conclusion] We proved that Stenotrophomonas maltophilia OUC_Est10 was a good candidate to produce diverse lipolytic enzymes, with potential applications in various fields.


July 7, 2019

Complete genome sequences of two strains of the meat spoilage bacterium Brochothrix thermosphacta isolated from ground chicken.

Brochothrix thermosphacta is an important meat spoilage bacterium. Here we report the genome sequences of two strains of B. thermosphacta isolated from ground chicken. The genome sequences were determined using long-read PacBio single-molecule real-time (SMRT) technology and are the first complete genome sequences reported for B. thermosphacta.


July 7, 2019

Scaffolding of long read assemblies using long range contact information.

Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome.We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C.we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics.Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS.


July 7, 2019

Automation of PacBio SMRTbell NGS library preparation for bacterial genome sequencing.

The PacBio RS II provides for single molecule, real-time DNA technology to sequence genomes and detect DNA modifications. The starting point for high-quality sequence production is high molecular weight genomic DNA. To automate the library preparation process, there must be high-throughput methods in place to assess the genomic DNA, to ensure the size and amounts of the sheared DNA fragments and final library.The library construction automation was accomplished using the Agilent NGS workstation with Bravo accessories for heating, shaking, cooling, and magnetic bead manipulations for template purification. The quality control methods from gDNA input to final library using the Agilent Bioanalyzer System and Agilent TapeStation System were evaluated.Automated protocols of PacBio 10 kb library preparation produced libraries with similar technical performance to those generated manually. The TapeStation System proved to be a reliable method that could be used in a 96-well plate format to QC the DNA equivalent to the standard Bioanalyzer System results. The DNA Integrity Number that is calculated in the TapeStation System software upon analysis of genomic DNA is quite helpful to assure that the starting genomic DNA is not degraded. In this respect, the gDNA assay on the TapeStation System is preferable to the DNA 12000 assay on the Bioanalyzer System, which cannot run genomic DNA, nor can the Bioanalyzer work directly from the 96-well plates.


July 7, 2019

Complete genome sequence of Bacillus velezensis YJ11-1-4, a strain with broad-spectrum antimicrobial activity, isolated from traditional Korean fermented soybean paste.

Bacillus velezensis YJ11-1-4 is a strain that exhibits broad-spectrum antimicrobial activity against various pathogens. It was isolated from doenjang, a traditional Korean fermented soybean paste. The genome comprises a single circular chromosome of 4,006,637 bp with 46.42% G+C content without plasmids. Copyright © 2017 Lee et al.


July 7, 2019

Complete genome sequence of Acetobacter pomorum Oregon-R-modENCODE strain BDGP5, an acetic acid bacterium found in the Drosophila melanogaster gut.

Acetobacter pomorum Oregon-R-modENCODE strain BDGP5 was isolated from Drosophila melanogaster for functional host-microbe interaction studies. The complete genome is composed of a single chromosomal circle of 2,848,089 bp, with a G+C content of 53% and three plasmids of 131,455 bp, 19,216 bp, and 9,160 bp. Copyright © 2017 Wan et al.


July 7, 2019

Complete genome sequence of Bacillus vallismortis NBIF-001, a novel strain from Shangri-La, China, that has high activity against Fusarium oxysporum.

Bacillus vallismortis NBIF-001, a Gram-positive bacterium, was isolated from soil in Shangri-La, China. Here, we provide the complete genome sequence of this bacterium, which has a 3,929,787-bp-long genome, including 4,030 protein-coding genes and 195 RNA genes. This strain possesses a number of genes encoding virulence factors of pathogens. Copyright © 2017 Liu et al.


July 7, 2019

Closed genome sequence of Chryseobacterium piperi strain CTMT/ATCC BAA-1782, a Gram-negative bacterium with clostridial neurotoxin-like coding sequences.

Clostridial neurotoxins, including botulinum and tetanus neurotoxins, are among the deadliest known bacterial toxins. Until recently, the horizontal mobility of this toxin gene family appeared to be limited to the genusClostridiumWe report here the closed genome sequence ofChryseobacterium piperi, a Gram-negative bacterium containing coding sequences with homology to clostridial neurotoxin family proteins. Copyright © 2017 Wentz et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.