Menu
July 7, 2019  |  

A comparison of single-molecule emission in aluminum and gold zero-mode waveguides.

The effect of gold and aluminum zero-mode waveguides (ZMWs) on the brightness of immobilized single emitters was characterized by probing fluorophores that absorb in the green and red regions of the visible spectrum. Aluminum ZMWs enhance the emission of Atto565 fluorophores upon green excitation, but they do not enhance the emission of Atto647N fluorophores upon red excitation. Gold ZMWs increase emission of both fluorophores with Atto647N showing enhancement that is threefold higher than that observed for Atto565. This work indicates that 200 nm gold ZMWs are better suited for single-molecule fluorescence studies in the red region of the visible spectrum, while aluminum appears more suited for the green region of the visible spectrum.


July 7, 2019  |  

Complete genome sequence and transcriptome regulation of the pentose utilizing yeast Sugiyamaella lignohabitans.

Efficient conversion of hexoses and pentoses into value-added chemicals represents one core step for establishing economically feasible biorefineries from lignocellulosic material. While extensive research efforts have recently provided advances in the overall process performance, the quest for new microbial cell factories and novel enzymes sources is still open. As demonstrated recently the yeast Sugiyamaella lignohabitans (formerly Candida lignohabitans) represents a promising microbial cell factory for the production of organic acids from lignocellulosic hydrolysates. We report here the de novo genome assembly of S. lignohabitans using the Single Molecule Real-Time platform, with gene prediction refined by using RNA-seq. The sequencing revealed a 15.98 Mb genome, subdivided into four chromosomes. By phylogenetic analysis, Blastobotrys (Arxula) adeninivorans and Yarrowia lipolytica were found to be close relatives of S. lignohabitans Differential gene expression was evaluated in typical growth conditions on glucose and xylose and allowed a first insight into the transcriptional response of S. lignohabitans to different carbon sources and different oxygenation conditions. Novel sequences for enzymes and transporters involved in the central carbon metabolism, and therefore of potential biotechnological interest, were identified. These data open the way for a better understanding of the metabolism of S. lignohabitans and provide resources for further metabolic engineering.© FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019  |  

RepLong: de novo repeat identification using long read sequencing data.

The identification of repetitive elements is important in genome assembly and phylogenetic analyses. The existing de novo repeat identification methods exploiting the use of short reads are impotent in identifying long repeats. Since long reads are more likely to cover repeat regions completely, using long reads is more favorable for recognizing long repeats.In this study, we propose a novel de novo repeat elements identification method namely RepLong based on PacBio long reads. Given that the reads mapped to the repeat regions are highly overlapped with each other, the identification of repeat elements is equivalent to the discovery of consensus overlaps between reads, which can be further cast into a community detection problem in the network of read overlaps. In RepLong, we first construct a network of read overlaps based on pair-wise alignment of the reads, where each vertex indicates a read and an edge indicates a substantial overlap between the corresponding two reads. Secondly, the communities whose intra connectivity is greater than the inter connectivity are extracted based on network modularity optimization. Finally, representative reads in each community are extracted to form the repeat library. Comparison studies on Drosophila melanogaster and human long read sequencing data with genome-based and short-read-based methods demonstrate the efficiency of RepLong in identifying long repeats. RepLong can handle lower coverage data and serve as a complementary solution to the existing methods to promote the repeat identification performance on long-read sequencing data.The software of RepLong is freely available at https://github.com/ruiguo-bio/replong.ywsun@szu.edu.cn or zhuzx@szu.edu.cn.Supplementary data are available at Bioinformatics online.


July 7, 2019  |  

Current advances in genome sequencing of common wheat and its ancestral species

Common wheat is an important and widely cultivated food crop throughout the world. Much progress has been made in regard to wheat genome sequencing in the last decade. Starting from the sequencing of single chromosomes/chromosome arms whole genome sequences of common wheat and its diploid and tetraploid ancestors have been decoded along with the development of sequencing and assembling technologies. In this review, we give a brief summary on international progress in wheat genome sequencing, and mainly focus on reviewing the effort and contributions made by Chinese scientists.


July 7, 2019  |  

FMLRC: Hybrid long read error correction using an FM-index.

Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy.We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods.Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.


July 7, 2019  |  

Hercules: a profile HMM-based hybrid error correction algorithm for long reads.

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.


July 7, 2019  |  

Smooth q-Gram, and its applications to detection of overlaps among long, error-prone sequencing reads

We propose smoothq-gram, the frst variant of q-gram that captures q-gram pair within a small edit distance. We apply smooth q-gram to the problem of detecting overlapping pairs of error-prone reads produced by single molecule real time sequencing (SMRT), which is the frst and most critical step of the de novo fragment assembly of SMRT reads. We have implemented and tested our algorithm on a set of real world benchmarks. Our empirical results demonstrated the signifcant superiority of our algorithm over the existing q-gram based algorithms in accuracy.


July 7, 2019  |  

The complete genome sequence of Colwellia sp. NB097-1 reveals evidence for the potential genetic basis for its adaptation to cold environment

Colwellia sp. NB097-1, isolated from a marine sediment sample from the Bering Sea, is a psychrophilic bacterium whose optimal and maximal growth temperatures were 13 and 25°C, respectively. Here, we present the complete genome of Colwellia sp. NB097-1, which was 4,661,274bp in length with a GC content of 38.5%. The genome provided evidence for the potential genetic basis for its adaptation to a cold environment, such as producing compatible solutes and cold-shock proteins, increasing membrane fluidity and synthesizing glycogen. Some cold-adaptive proteases were also detected in the genome of Colwellia sp. NB097-1. Protease activity analysis further showed that extracellular proteases of Colwellia sp. NB097-1 remained active at low temperatures. The complete genome sequence may be helpful to reveal how this strain survives at low temperature and to find cold-adaptive proteases that may be useful to industry.


July 7, 2019  |  

Complete genome of Halomonas aestuarii Hb3, isolated from tidal flat

Halomonas aestuarii Hb3, a moderately halophilic bacterium belonging to the class Gammaproteobacteria, was isolated from a tidal flat. Herein, we report the complete genome sequence of its strain Hb3. Its size is estimated at 3.54Mbp with a mean G+C content of 67.9%. The genome includes 3238 open reading frames, 65 transfer RNAs, and four ribosomal RNA gene operons. Genes related to the degradation of monoaromatic compounds, detoxification of arsenic, and production of polymers were identified. These features indicate that this strain may be important for ecological and industrial application.


July 7, 2019  |  

A fast approximate algorithm for mapping long reads to large reference databases.

Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290?×?faster than Burrows-Wheeler Aligner-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each =5?kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and >60,000 genomes.


July 7, 2019  |  

Tigmint: correcting assembly errors using linked reads from large molecules.

Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap.To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing.Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone.


July 7, 2019  |  

Complete genome sequence of the poly-?-glutamate-synthesizing Bacterium Bacillus subtilis Bs-115.

Bacillus subtilis Bs-115 was isolated from the soil of a corn field in Yutai County, Jinan City, Shandong Province, People’s Republic of China, and is characterized by the efficient synthesis of poly-?-glutamate (?-PGA), with corn saccharification liquid as the sole energy and carbon source during the process of ?-PGA formation. Here, we report the complete genome sequence of Bacillus subtilis Bs-115 and the genes associated with poly-?-glutamate synthesis. Copyright © 2018 Wang et al.


July 7, 2019  |  

Reevaluation of the complete genome sequence of Magnetospirillum gryphiswaldense MSR-1 with Single-Molecule Real-Time Sequencing data.

Magnetospirillum gryphiswaldense is a key organism for understanding magnetosome formation and magnetotaxis. As earlier studies suggested a high genomic plasticity, we (re)sequenced the type strain MSR-1 and the laboratory strain R3/S1. Both sequences differ by only 11 point mutations, but organization of the magnetosome island deviates from that of previous genome sequences. Copyright © 2018 Uebe et al.


July 7, 2019  |  

A draft genome sequence for the Ixodes scapularis cell line, ISE6

Background: The tick cell line ISE6, derived from Ixodes scapularis, is commonly used for amplification and detection of arboviruses in environmental or clinical samples. Methods: To assist with sequence-based assays, we sequenced the ISE6 genome with single-molecule, long-read technology. Results: The draft assembly appears near complete based on gene content analysis, though it appears to lack some instances of repeats in this highly repetitive genome. The assembly appears to have separated the haplotypes at many loci. DNA short read pairs, used for validation only, mapped to the cell line assembly at a higher rate than they mapped to the Ixodes scapularis reference genome sequence. Conclusions: The assembly could be useful for filtering host genome sequence from sequence data obtained from cells infected with pathogens.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.