Menu
July 7, 2019

Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads.

Motivation. The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15-40%, significantly higher than those of the prevalent next generation sequencing (NGS) technologies (less than 1%). Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time. Availability. The source code is available for download at https://github.com/yechengxi/Sparc.


July 7, 2019

The draft genome of MD-2 pineapple using hybrid error correction of long reads.

The introduction of the elite pineapple variety, MD-2, has caused a significant market shift in the pineapple industry. Better productivity, overall increased in fruit quality and taste, resilience to chilled storage and resistance to internal browning are among the key advantages of the MD-2 as compared with its previous predecessor, the Smooth Cayenne. Here, we present the genome sequence of the MD-2 pineapple (Ananas comosus (L.) Merr.) by using the hybrid sequencing technology from two highly reputable platforms, i.e. the PacBio long sequencing reads and the accurate Illumina short reads. Our draft genome achieved 99.6% genome coverage with 27,017 predicted protein-coding genes while 45.21% of the genome was identified as repetitive elements. Furthermore, differential expression of ripening RNASeq library of pineapple fruits revealed ethylene-related transcripts, believed to be involved in regulating the process of non-climacteric pineapple fruit ripening. The MD-2 pineapple draft genome serves as an example of how a complex heterozygous genome is amenable to whole genome sequencing by using a hybrid technology that is both economical and accurate. The genome will make genomic applications more feasible as a medium to understand complex biological processes specific to pineapple. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 7, 2019

Comparative genomic analysis of Klebsiella pneumoniae subsp. pneumoniae KP617 and PittNDM01, NUHL24835, and ATCC BAA-2146 reveals unique evolutionary history of this strain.

Klebsiella pneumoniae subsp. pneumoniae KP617 is a pathogenic strain that coproduces OXA-232 and NDM-1 carbapenemases. We sequenced the genome of KP617, which was isolated from the wound of a Korean burn patient, and performed a comparative genomic analysis with three additional strains: PittNDM01, NUHL24835 and ATCC BAA-2146.The complete genome of KP617 was obtained via multi-platform whole-genome sequencing. Phylogenetic analysis along with whole genome and multi-locus sequence typing of genes of the Klebsiella pneumoniae species showed that KP617 belongs to the WGLW2 group, which includes PittNDM01 and NUHL24835. Comparison of annotated genes showed that KP617 shares 98.3 % of its genes with PittNDM01. Nineteen antibiotic resistance genes were identified in the KP617 genome: bla OXA-1 and bla SHV-28 in the chromosome, bla NDM-1 in plasmid 1, and bla OXA-232 in plasmid 2 conferred resistance to beta-lactams; however, colistin- and tetracycline-resistance genes were not found. We identified 117 virulence factors in the KP617 genome, and discovered that the genes encoding these factors were also harbored by the reference strains; eight genes were lipopolysaccharide-related and four were capsular polysaccharide-related. A comparative analysis of phage-associated regions indicated that two phage regions are specific to the KP617 genome and that prophages did not act as a vehicle for transfer of antimicrobial resistance genes in this strain.Whole-genome sequencing and bioinformatics analysis revealed similarity in the genome sequences and content, and differences in phage-related genes, plasmids and antimicrobial resistance genes between KP617 and the references. In order to elucidate the precise role of these factors in the pathogenicity of KP617, further studies are required.


July 7, 2019

Challenges, solutions, and quality metrics of personal genome assembly in advancing precision medicine.

Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.


July 7, 2019

Genomic insight into the host-endosymbiont relationship of Endozoicomonas montiporae CL-33(T) with its coral host.

The bacterial genus Endozoicomonas was commonly detected in healthy corals in many coral-associated bacteria studies in the past decade. Although, it is likely to be a core member of coral microbiota, little is known about its ecological roles. To decipher potential interactions between bacteria and their coral hosts, we sequenced and investigated the first culturable endozoicomonal bacterium from coral, the E. montiporae CL-33(T). Its genome had potential sign of ongoing genome erosion and gene exchange with its host. Testosterone degradation and type III secretion system are commonly present in Endozoicomonas and may have roles to recognize and deliver effectors to their hosts. Moreover, genes of eukaryotic ephrin ligand B2 are present in its genome; presumably, this bacterium could move into coral cells via endocytosis after binding to coral’s Eph receptors. In addition, 7,8-dihydro-8-oxoguanine triphosphatase and isocitrate lyase are possible type III secretion effectors that might help coral to prevent mitochondrial dysfunction and promote gluconeogenesis, especially under stress conditions. Based on all these findings, we inferred that E. montiporae was a facultative endosymbiont that can recognize, translocate, communicate and modulate its coral host.


July 7, 2019

Evidence for an opportunistic and endophytic lifestyle of the Bursaphelenchus xylophilus-associated bacteria Serratia marcescens PWN146 isolated from wilting Pinus pinaster.

Pine wilt disease (PWD) results from the interaction of three elements: the pathogenic nematode, Bursaphelenchus xylophilus; the insect-vector, Monochamus sp.; and the host tree, mostly Pinus species. Bacteria isolated from B. xylophilus may be a fourth element in this complex disease. However, the precise role of bacteria in this interaction is unclear as both plant-beneficial and as plant-pathogenic bacteria may be associated with PWD. Using whole genome sequencing and phenotypic characterization, we were able to investigate in more detail the genetic repertoire of Serratia marcescens PWN146, a bacterium associated with B. xylophilus. We show clear evidence that S. marcescens PWN146 is able to withstand and colonize the plant environment, without having any deleterious effects towards a susceptible host (Pinus thunbergii), B. xylophilus nor to the nematode model C. elegans. This bacterium is able to tolerate growth in presence of xenobiotic/organic compounds, and use phenylacetic acid as carbon source. Furthermore, we present a detailed list of S. marcescens PWN146 potentials to interfere with plant metabolism via hormonal pathways and/or nutritional acquisition, and to be competitive against other bacteria and/or fungi in terms of resource acquisition or production of antimicrobial compounds. Further investigation is required to understand the role of bacteria in PWD. We have now reinforced the theory that B. xylophilus-associated bacteria may have a plant origin.


July 7, 2019

Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica).

Domesticated apple (Malus?×?domestica Borkh) is a popular temperate fruit with high nutrient levels and diverse flavors. In 2012, global apple production accounted for at least one tenth of all harvested fruits. A high-quality apple genome assembly is crucial for the selection and breeding of new cultivars. Currently, a single reference genome is available for apple, assembled from 16.9?×?genome coverage short reads via Sanger and 454 sequencing technologies. Although a useful resource, this assembly covers only ~89 % of the non-repetitive portion of the genome, and has a relatively short (16.7 kb) contig N50 length. These downsides make it difficult to apply this reference in transcriptive or whole-genome re-sequencing analyses.Here we present an improved hybrid de novo genomic assembly of apple (Golden Delicious), which was obtained from 76 Gb (~102?×?genome coverage) Illumina HiSeq data and 21.7 Gb (~29?×?genome coverage) PacBio data. The final draft genome is approximately 632.4 Mb, representing?~?90 % of the estimated genome. The contig N50 size is 111,619 bp, representing a 7 fold improvement. Further annotation analyses predicted 53,922 protein-coding genes and 2,765 non-coding RNA genes.The new apple genome assembly will serve as a valuable resource for investigating complex apple traits at the genomic level. It is not only suitable for genome editing and gene cloning, but also for RNA-seq and whole-genome re-sequencing studies.


July 7, 2019

Assemblytics: a web analytics tool for the detection of variants from an assembly.

Assemblytics is a web app for detecting and analyzing variants from a de novo genome assembly aligned to a reference genome. It incorporates a unique anchor filtering approach to increase robustness to repetitive elements, and identifies six classes of variants based on their distinct alignment signatures. Assemblytics can be applied both to comparing aberrant genomes, such as human cancers, to a reference, or to identify differences between related species. Multiple interactive visualizations enable in-depth explorations of the genomic distributions of variants.http://assemblytics.com, https://github.com/marianattestad/assemblytics CONTACT: mnattest@cshl.eduSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

The Ditylenchus destructor genome provides new insights into the evolution of plant parasitic nematodes.

Plant-parasitic nematodes were found in 4 of the 12 clades of phylum Nematoda. These nematodes in different clades may have originated independently from their free-living fungivorous ancestors. However, the exact evolutionary process of these parasites is unclear. Here, we sequenced the genome sequence of a migratory plant nematode, Ditylenchus destructor We performed comparative genomics among the free-living nematode, Caenorhabditis elegans and all the plant nematodes with genome sequences available. We found that, compared with C. elegans, the core developmental control processes underwent heavy reduction, though most signal transduction pathways were conserved. We also found D. destructor contained more homologies of the key genes in the above processes than the other plant nematodes. We suggest that Ditylenchus spp. may be an intermediate evolutionary history stage from free-living nematodes that feed on fungi to obligate plant-parasitic nematodes. Based on the facts that D. destructor can feed on fungi and has a relatively short life cycle, and that it has similar features to both C. elegans and sedentary plant-parasitic nematodes from clade 12, we propose it as a new model to study the biology, biocontrol of plant nematodes and the interaction between nematodes and plants.© 2016 The Author(s).


July 7, 2019

Whole-genome sequencing recommendations

Recent technological developments have revolutionized the way we perform genetic analyses. In particular whole-genome sequencing provides access to the entire genetic makeup of an individual, and it is now an affordable approach for many research groups. As a consequence genome sequencing is pervading many fields of biological research. Sequencing technologies are evolving rapidly and so do their applications. Here we provide a first primer on whole-genome sequencing, focusing on two of the most popular applications: (1) de novo genome sequencing, in which the objective is obtaining a high-quality genome assembly that can serve as a reference for a species or variety, and (2) genome resequencing, when there is an available reference genome and the objective is to map sequence variation of an individual or a set of individuals. It is not our intention to provide a comprehensive overview of current methodologies that will likely soon become obsolete, but rather focus on general principles that will have a more general applicability.


July 7, 2019

Strategies for sequence assembly of plant genomes

The field of plant genome assembly has greatly benefited from the development and widespread adoption of next-generation DNA sequencing platforms. Very high sequencing throughputs and low costs per nucleotide have considerably reduced the technical and budgetary constraints associated with early assembly projects done primarily with a traditional Sanger-based approach. Those improvements led to a sharp increase in the number of plant genomes being sequenced, including large and complex genomes of economically important crops. Although next-generation DNA sequencing has considerably improved our understanding of the overall structure and dynamics of many plant genomes, severe limitations still remain because next-generation DNA sequencing reads typically are shorter than Sanger reads. In addition, the software tools used to de novo assemble sequences are not necessarily designed to optimize the use of short reads. These cause challenges, common to many plant species with large genome sizes, high repeat contents, polyploidy and genome-wide duplications. This chapter provides an overview of historical and current methods used to sequence and assemble plant genomes, along with new solutions offered by the emergence of technologies such as single molecule sequencing and optical mapping to address the limitations of current sequence assemblies.


July 7, 2019

Complete genome sequence of Spiroplasma turonicum Tab4cT, a bacterium isolated from horse flies (Haematopota sp.).

Spiroplasma turonicum Tab4c(T) was isolated from a horse fly (Haematopota sp.; probably Haematopota pluvialis) collected at Champchevrier, Indre-et-Loire, Touraine, France, in 1991. Here, we report the complete genome sequence of this bacterium to facilitate the investigation of its biology and the comparative genomics among Spiroplasma spp. Copyright © 2016 Lo et al.


July 7, 2019

Comparative evaluation of the genomes of three common Drosophila-associated bacteria.

Drosophila melanogaster is an excellent model to explore the molecular exchanges that occur between an animal intestine and associated microbes. Previous studies in Drosophila uncovered a sophisticated web of host responses to intestinal bacteria. The outcomes of these responses define critical events in the host, such as the establishment of immune responses, access to nutrients, and the rate of larval development. Despite our steady march towards illuminating the host machinery that responds to bacterial presence in the gut, there are significant gaps in our understanding of the microbial products that influence bacterial association with a fly host. We sequenced and characterized the genomes of three common Drosophila-associated microbes: Lactobacillus plantarum, Lactobacillus brevis and Acetobacter pasteurianus For each species, we compared the genomes of Drosophila-associated strains to the genomes of strains isolated from alternative sources. We found that environmental Lactobacillus strains readily associated with adult Drosophila and were similar to fly isolates in terms of genome organization. In contrast, we identified a strain of A. pasteurianus that apparently fails to associate with adult Drosophila due to an inability to grow on fly nutrient food. Comparisons between association competent and incompetent A. pasteurianus strains identified a short list of candidate genes that may contribute to survival on fly medium. Many of the gene products unique to fly-associated strains have established roles in the stabilization of host-microbe interactions. These data add to a growing body of literature that examines the microbial perspective of host-microbe relationships. © 2016. Published by The Company of Biologists Ltd.


July 7, 2019

Mobile genetic elements: in silico, in vitro, in vivo.

Mobile genetic elements (MGEs), also called transposable elements (TEs), represent universal components of most genomes and are intimately involved in nearly all aspects of genome organization, function and evolution. However, there is currently a gap between the fast pace of TE discovery in silico, driven by the exponential growth of comparative genomic studies, and a limited number of experimental models amenable to more traditional in vitro and in vivo studies of structural, mechanistic and regulatory properties of diverse MGEs. Experimental and computational scientists came together to bridge this gap at a recent conference, ‘Mobile Genetic Elements: in silico, in vitro, in vivo’, held at the Marine Biological Laboratory (MBL) in Woods Hole, MA, USA.© 2016 John Wiley & Sons Ltd.


July 7, 2019

BAC-pool sequencing and analysis confirms growth-associated QTLs in the Asian seabass genome.

The Asian seabass is an important marine food fish that has been cultured for several decades in Asia Pacific. However, the lack of a high quality reference genome has hampered efforts to improve its selective breeding. A 3D BAC pool set generated in this study was screened using 22 SSR markers located on linkage group 2 which contains a growth-related QTL region. Seventy-two clones corresponding to 22 FPC contigs were sequenced by Illumina MiSeq technology. We co-assembled the MiSeq-derived scaffolds from each FPC contig with error-corrected PacBio reads, resulting in 187 sequences covering 9.7?Mb. Eleven genes annotated within this region were found to be potentially associated with growth and their tissue-specific expression was investigated. Correlation analysis demonstrated that SNPs in ctsb, skp1 and ppp2ca can be potentially used as markers for selecting fast-growing fingerlings. Conserved syntenies between seabass LG2 and five other teleosts were identified. This study i) provided a 10?Mb targeted genome assembly; ii) demonstrated NGS of BAC pools as a potential approach for mining candidates underlying QTLs of this species; iii) detected eleven genes potentially responsible for growth in the QTL region; and iv) identified useful SNP markers for selective breeding programs of Asian seabass.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.