Menu
July 7, 2019

A comprehensive model of DNA fragmentation for the preservation of High Molecular Weight DNA

During DNA extraction the DNA molecule undergoes physical and chemical shearing, causing the DNA to fragment into shorter and shorter pieces. Under common laboratory conditions this fragmentation yields DNA fragments of 5-35 kilobases (kb) in length. This fragment length is more than sufficient for DNA sequencing using short-read technologies which generate reads 50-600 bp in length, but insufficient for long-read sequencing and linked reads where fragment lengths of more than 40 kb may be desirable. This study provides a theoretical framework for quality management to ensure access to high molecular weight DNA in samples. Shearing can be divided into physical and chemical shearing which generate different patterns of fragmentation. Exposure to physical shearing creates a characteristic fragment length where DNA fragments are cut in half by shear stress. This characteristic length can be measured using gel electrophoresis or instruments for DNA fragment analysis. Chemical shearing generates randomly distributed fragment lengths visible as a smear of DNA below the peak fragment length. By measuring the peak of DNA fragment length and the proportion of very short DNA fragments both sources of shearing can be measured using commonly used laboratory techniques, providing a suitable quantification of DNA integrity of DNA for sequencing with long-read technologies.


July 7, 2019

Ten steps to get started in Genome Assembly and Annotation.

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).


July 7, 2019

Supergene evolution triggered by the introgression of a chromosomal inversion.

Supergenes are groups of tightly linked loci whose variation is inherited as a single Mendelian locus and are a common genetic architecture for complex traits under balancing selection [1-8]. Supergene alleles are long-range haplotypes with numerous mutations underlying distinct adaptive strategies, often maintained in linkage disequilibrium through the suppression of recombination by chromosomal rearrangements [1, 5, 7-9]. However, the mechanism governing the formation of supergenes is not well understood and poses the paradox of establishing divergent functional haplotypes in the face of recombination. Here, we show that the formation of the supergene alleles encoding mimicry polymorphism in the butterfly Heliconius numata is associated with the introgression of a divergent, inverted chromosomal segment. Haplotype divergence and linkage disequilibrium indicate that supergene alleles, each allowing precise wing-pattern resemblance to distinct butterfly models, originate from over a million years of independent chromosomal evolution in separate lineages. These “superalleles” have evolved from a chromosomal inversion captured by introgression and maintained in balanced polymorphism, triggering supergene inheritance. This mode of evolution involving the introgression of a chromosomal rearrangement is likely to be a common feature of complex structural polymorphisms associated with the coexistence of distinct adaptive syndromes. This shows that the reticulation of genealogies may have a powerful influence on the evolution of genetic architectures in nature. Copyright © 2018 Elsevier Ltd. All rights reserved.


July 7, 2019

A fast approximate algorithm for mapping long reads to large reference databases.

Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290?×?faster than Burrows-Wheeler Aligner-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each =5?kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and >60,000 genomes.


July 7, 2019

Tigmint: correcting assembly errors using linked reads from large molecules.

Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap.To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing.Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone.


July 7, 2019

Genome sequences of five Mycobacterium bovis strains isolated from farmed animals and wildlife in Canada.

Mycobacterium bovis is the causative agent of bovine tuberculosis, an infectious disease that affects both animals and humans and thus presents a risk to public health and the livestock industry. Here, we report the genome sequences of five Mycobacterium bovis strains that represent major genotype clusters observed in farmed animals and wildlife in Canada.© Crown copyright 2018.


July 7, 2019

Draft genome sequence and annotation of the phytopathogenic Ralstonia pickettii (previously Burkholderia glumae) strain ICMP-8657.

Strain ICMP-8657 was formerly taxonomically classified as Burkholderia glumae and reported to be the producer of an antibacterial pyrazole derivative. Here, we report the draft genome sequence of ICMP-8657, which failed to demonstrate the biosynthetic capacity to produce the stated antibacterial compound, leading to its taxonomic reclassification as Ralstonia pickettii ICMP-8657. Copyright © 2018 Paterson and Gross.


July 7, 2019

Draft genome sequence of Streptomyces sp. strain DH-12, a soilborneisolate from the Thar Desert with broad-spectrum antibacterial activity.

Strain DH-12 exhibits broad-spectrum antibacterial activity toward Gram-positive and Gram-negative pathogens. The 7.6-Mb draft genome sequence gives insight into the complete secondary metabolite production capacity and reveals genes putatively responsible for its antibacterial activity, as well as genes which enable the survival of the organism in an extreme arid environment. Copyright © 2018 Jiao et al.


July 7, 2019

Complete genome sequence of the environmental Burkholderia pseudomallei sequence type 131 isolate MSHR1435, associated with a chronic melioidosis infection.

The Burkholderia pseudomallei isolate MSHR1435 is a fully virulent environmental sequence type 131 (ST131) isolate that is epidemiologically associated with a 17.5-year chronic melioidosis infection. The completed genome will serve as a reference for studies of environmental ecology, virulence, and chronic B. pseudomallei infections. Copyright © 2018 Sahl et al.


July 7, 2019

FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods.

Comprehensive and accurate identification of structural variations (SVs) from next generation sequencing data remains a major challenge. We develop FusorSV, which uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. It includes a fusion model built using analysis of 27 deep-coverage human genomes from the 1000 Genomes Project. We identify 843 novel SV calls that were not reported by the 1000 Genomes Project for these 27 samples. Experimental validation of a subset of these calls yields a validation rate of 86.7%. FusorSV is available at https://github.com/TheJacksonLaboratory/SVE .


July 7, 2019

Genome sequences of two cyanobacterial strains, toxic green Microcystis aeruginosa KW (KCTC 18162P) and nontoxic brown Microcystis sp. strain MC19, under xenic culture conditions.

Bloom-forming cyanobacteria pose concerns for the environment and the health of humans and animals by producing toxins and thus lowering water quality. Here, we report near-complete genome sequences of two Microcystis strains under xenic culture conditions, which were originally isolated from two separate freshwater reservoirs from the Republic of Korea. Copyright © 2018 Jeong et al.


July 7, 2019

Complete genome sequence of Lelliottia nimipressuralis type strain SGAir0187, isolated from tropical air collected in Singapore.

Lelliottia nimipressuralis type strain SGAir0187 was isolated from tropical air samples collected in Singapore. The genome was assembled with an average coverage of 180-fold using Pacific Biosciences long reads and Illumina MiSeq paired-end reads. The genome measures 4.8?Mb and contains 4,424 protein-coding genes, 83 tRNAs, and 25 rRNAs. Copyright © 2018 Heinle et al.


July 7, 2019

Complete genome sequence of Acinetobacter indicus type strain SGAir0564 isolated from tropical air collected in Singapore.

Acinetobacter indicus (Gammaproteobacteria) is a strict aerobic nonmotile bacterium. The strain SGAir0564 was isolated from air samples collected in Singapore. The complete genome is 3.1 Mb and was assembled using a combination of short and long reads. The genome contains 2,808 protein-coding genes, 80 tRNAs, and 21 rRNA subunits. Copyright © 2018 Vettath et al.


July 7, 2019

Genome sequence of Bacillus cereus strain TG1-6, a plant-beneficial rhizobacterium that is highly salt tolerant.

The complete genome sequence of Bacillus cereus strain TG1-6, which is a highly salt-tolerant rhizobacterium that enhances plant tolerance to drought stress, is reported here. The sequencing process was performed based on a combination of pyrosequencing and single-molecule sequencing. The complete genome is estimated to be approximately 5.42?Mb, containing a total of 5,610 predicted protein-coding DNA sequences (CDSs). Copyright © 2018 Vílchez et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.