What’s nextat PacBio? Explore the innovations shaping the future of genomics Built for what’s now andprepared for what’s next PacBio is advancing every layer of genomic sequencing to meet the…
Search Results
Significant advances in bioinformatics tool development have been made to more efficiently leverage and deliver high-quality genome assemblies with PacBio long-read data. Current data throughput of SMRT Sequencing delivers average read lengths ranging from 10-15 kb with the longest reads exceeding 40 kb. This has resulted in consistent demonstration of a minimum 10-fold improvement in genome assemblies with contig N50 in the megabase range compared to assemblies generated using only short- read technologies. This poster highlights recent advances and resources available for advanced bioinformaticians and developers interested in the current state-of-the-art large genome solutions available as open-source code from PacBio and third-party solutions, including HGAP, MHAP, and ECTools. Resources and tools available on GitHub are reviewed, as well as datasets representing major model research organisms made publically available for community evaluation or interested developers.
Extract It may be hard to believe, but the idea of sequence assembly is around 40 years old. Consider this pair of quotes from Rodger Staden (Staden 1979): “With modern fast sequencing techniques and suitable computer programs it is now possible to sequence whole genomes without the need of restriction maps.” “If the 5′ end of the sequence from one gel reading is the same as the 3′ end of the sequence from another the data is said to overlap. If the overlap is of sufficient length to distinguish it from being a repeat in the sequence the two sequences must be contiguous. The data from the two gel readings can then be joined to form one longer continuous sequence.” Replace “gel reading” with “read” and these sentences would go unnoticed in the introduction of any paper today. Here you can also see the birth of jargon that now pervades the field: overlaps between reads form contigs (contiguous sequences). Just a few months later, Gingeras et al. (1979) described “Computer programs for the assembly of DNA sequences.” It all sounds so modern, until the discussion mentions FORTRAN code stored on magnetic tapes. How, then, can we fill an entire special issue of Genome Research with “new advances” so many years later? To me, this reflects the beauty of the problem—simple enough to be stated in a single paragraph, yet complex enough to sustain a field of research for decades. This dichotomy is common to many famous computational problems; indeed, mathematical formulations of sequence assembly fall into a class of problems known as “NP-hard” that do not admit an easy solution (Medvedev et al. 2007). There is another reason for continued advances in sequence assembly—advances in sequencing technology. As evident from the Staden quotes above, the first assembly methods were …
High-throughput sequencing platforms are generating massive amounts of genomic data from nonmodel species, and these data sets are valuable resources that can be mined to advance a number of research areas. An example is the growing amount of transcriptome data that allow for examination of gene expression in nonmodel species. Here, we show how publicly available transcriptome data from nonmodel primates can be used to design novel research focused on immunogenomics. We mined transcriptome data from the world’s most endangered group of primates, the lemurs of Madagascar, for sequences corresponding to immunoglobulins. Our results confirmed homology between strepsirrhine and haplorrhine primate immunoglobulins and allowed for high-throughput sequencing of expressed antibodies (Ig-seq) in Coquerel’s sifaka (Propithecus coquereli). Using both Pacific Biosciences RS and Ion Torrent PGM sequencing, we performed Ig-seq on two individuals of Coquerel’s sifaka. We generated over 150 000 sequences of expressed antibodies, allowing for molecular characterization of the antigen-binding region. Our analyses suggest that similar VDJ expression patterns exist across all primates, with sequences closely related to the human VH 3 immunoglobulin family being heavily represented in sifaka antibodies. Moreover, the antigen-binding region of sifaka antibodies exhibited similar amino acid variation with respect to haplorrhine primates. Our study represents the first attempt to characterize sequence diversity of the expressed antibody repertoire in a species of lemur. We anticipate that methods similar to ours will provide the framework for investigating the adaptive immune response in wild populations of other nonmodel organisms and can be used to advance the burgeoning field of eco-immunology. © 2014 John Wiley & Sons Ltd.
In the past few years, we have contributed efforts to ~1/5 of the reported fish genomes. Based on our related experience, here we outline recent advances in bioinformatics for fish genomics, with an emphasis on development of software for genome assembly, genome annotation and evolutionary analysis. This review will be helpful for the new players of genome analysis on both animals and plants. In the past decade, whole genome sequences of approximately 50 fish species have been reported [1]. We have been involved in ~1/5 of these international works from 2014 to 2017, such as mudskippers (2014) [2], Chinese large yellow croaker [3], Chinese barbel fishes [4], Asian arowana [5,6], Channel catfish [7], seahorses [8], Japanese flounder [9], Chinese clearhead icefish [10] and Northern snakehead [11]. We are also in charge of the China Auqatic 10-100-1,000 Genomics Program [12], in which ~100 fish genomes are sequencing targets for the next 3~5 years. Based on our previous experience on fish genomic studies, here we outline recent advances in related bioinformatics for fish genomics to share with public readers. Since the basic informatics includes genome assembly, genome annotation and evolutionary analysis, we discuss them one by one in this order.
The development of high-throughput whole genome sequencing (WGS) technologies is changing the face of microbiology, facilitating the comparison of large numbers of genomes from different lineages of a same organism. Our aim was to review the main advances on Helicobacter pylori “omics” and to understand how this is improving our knowledge of the biology, diversity and pathogenesis of H. pylori. Since the first H. pylori isolate was sequenced in 1997, 510 genomes have been deposited in the NCBI archive, providing a basis for improved understanding of the epidemiology and evolution of this important pathogen. This review focuses on works published between April 2015 and March 2016. Helicobacter “omics” is already making an impact and is a growing research field. Ultimately these advances will be translated into a routine clinical laboratory setting in order to improve public health.© 2016 John Wiley & Sons Ltd.
The latest advancements in Sequel II SMRT Sequencing have increased average read lengths up to 50% compared to Sequel II chemistry 1.0 which allows multiplexing of 2-3 small organisms (<500 Mb) such as insects and worms for producing reference quality assemblies, calling structural variants for up to 2 samples with ~3 Gb genomes, analysis of 48 microbial genomes, and up to 8 communities for metagenomic profiling in a single SMRT Cell 8M. With the improved processivity of the new Sequel II sequencing polymerase, more SMRTbell molecules reach rolling circle mode resulting in longer overall read lengths, thus allowing efficient detection of barcodes (up to 80%) in the SMRTbell templates. Multiplexing of genomes larger than microbial organisms is now achievable. In collaboration with the Wellcome Sanger Institute, we have developed a workflow for multiplexing two individual Anopheles coluzzii using as low as 150 ng genomic DNA per individual. The resulting assemblies had high contiguity (contig N50s over 3 Mb) and completeness (>98% of conserved genes) for both individuals. For microbial multiplexing, we multiplexed 48 microbes with varying complexities and sizes ranging 1.6-8.0 Mb in single SMRT Cell 8M. Using a new end-to-end analysis (Microbial Assembly Analysis, SMRT Link 8.0), assemblies resulted in complete circularized genomes (>200-fold coverage) and efficient detection of >3-200 kb plasmids. Finally, the long read lengths (>90 kb) allows detection of barcodes in large insert SMRTbell templates (>15 kb) thus facilitating multiplex of two human samples in 1 SMRT Cell 8M for detecting SVs, Indels and CNVs. Here, we present results and describe workflows for multiplexing samples for specific applications for SMRT Sequencing.
Impressive progress has been made in the field of Next Generation Sequencing (NGS). Through advancements in the fields of molecular biology and technical engineering, parallelization of the sequencing reaction has profoundly increased the total number of produced sequence reads per run. Current sequencing platforms allow for a previously unprecedented view into complex mixtures of RNA and DNA samples. NGS is currently evolving into a molecular microscope finding its way into virtually every fields of biomedical research. In this chapter we review the technical background of the different commercially available NGS platforms with respect to template generation and the sequencing reaction and take a small step towards what the upcoming NGS technologies will bring. We close with an overview of different implementations of NGS into biomedical research. This article is part of a Special Issue entitled: From Genome to Function. Copyright © 2014 Elsevier B.V. All rights reserved.
Advances in Genome Biology and Technology (AGBT) delivers a premier experience where heads of labs, institutions, businesses, financial analysts and other high-level stakeholders come together to advance the field and…
The macaque simian or simian/human immunodeficiency virus (SIV/SHIV) challenge model has been widely used to inform and guide human vaccine trials. Substantial advances have been made recently in the application of repeated-low-dose challenge (RLD) approach to assess SIV/SHIV vaccine efficacies (VE). Some candidate HIV vaccines have shown protective effects in preclinical studies using the macaque SIV/SHIV model but the model’s true predictive value for screening potential HIV vaccine candidates needs to be evaluated further. Here, we review key parameters used in the RLD approach and discuss their relevance for evaluating VE to improve preclinical studies of candidate HIV vaccines.Crown Copyright © 2019. Published by Elsevier Ltd. All rights reserved.
Improving traits in wheat has historically been challenging due to its large and polyploid genome, limited genetic diversity and in-field phenotyping constraints. However, within recent years many of these barriers have been lowered. The availability of a chromosome-level assembly of the wheat genome now facilitates a step-change in wheat genetics and provides a common platform for resources, including variation data, gene expression data and genetic markers. The development of sequenced mutant populations and gene-editing techniques now enables the rapid assessment of gene function in wheat directly. The ability to alter gene function in a targeted manner will unmask the effects of homoeolog redundancy and allow the hidden potential of this polyploid genome to be discovered. New techniques to identify and exploit the genetic diversity within wheat wild relatives now enable wheat breeders to take advantage of these additional sources of variation to address challenges facing food production. Finally, advances in phenomics have unlocked rapid screening of populations for many traits of interest both in greenhouses and in the field. Looking forwards, integrating diverse data types, including genomic, epigenetic and phenomics data, will take advantage of big data approaches including machine learning to understand trait biology in wheat in unprecedented detail. © 2018 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.
Perennial crops, such as fruit trees, are infected by many viruses, which are transmitted through vegetative propagation and grafting of infected plant material. Some of these pathogens cause severe crop losses and often reduce the productive life of the orchards. Detection and characterization of these agents in fruit trees is challenging, however, during the last years, the wide application of high-throughput sequencing (HTS) technologies has significantly facilitated this task. In this review, we present recent advances in the discovery, detection, and characterization of fruit tree viruses and virus-like agents accomplished by HTS approaches. A high number of new viruses have been described in the last 5 years, some of them exhibiting novel genomic features that have led to the proposal of the creation of new genera, and the revision of the current virus taxonomy status. Interestingly, several of the newly identified viruses belong to virus genera previously unknown to infect fruit tree species (e.g., Fabavirus, Luteovirus) a fact that challenges our perspective of plant viruses in general. Finally, applied methodologies, including the use of different molecules as templates, as well as advantages and disadvantages and future directions of HTS in fruit tree virology are discussed.
The applications of probiotics are significant and thus resulted in need of genome analysis of probiotic strains. Various omics methods and systems biology approaches enables us to understand and optimize the metabolic processes. These techniques have increased the researcher’s attention towards gut microbiome and provided a new source for the revelation of uncharacterized biosynthetic pathways which enables novel metabolic engineering approaches. In recent years, the broad and quantitative analysis of modified strains relies on systems biology tools such as in silico design which are commonly used methods for improving strain performance. The genetic manipulation of probiotic microorganisms is crucial for defining their role in intestinal microbiota and exploring their beneficial properties. This review describes an overview of gene editing and systems biology approaches, highlighting the advent of omics methods which allows the study of new routes for studying probiotic bacteria. We have also summarized gene editing tools like TALEN, ZFNs and CRISPR-Cas that edits or cleave the specific target DNA. Furthermore, in this review an overview of proposed design of advanced customized probiotic is also hypothesized to improvise the probiotics.
In 2010, when the National Alliance for Advanced Biofuels and Bioproducts (NAABB) consortium began, little was known about the molecular basis of algal biomass or oil production. Very few algal genome sequences were available and efforts to identify the best-producing wild species through bioprospecting approaches had largely stalled after the U.S. Department of Energy’s Aquatic Species Program. This lack of knowledge included how reduced carbon was partitioned into storage products like triglycerides or starch and the role played by metabolite remodeling in the accumulation of energy-dense storage products. Furthermore, genetic transformation and metabolic engineering approaches to improve algal biomass and oil yields were in their infancy. Genome sequencing and transcriptional profiling were becoming less expensive, however; and the tools to annotate gene expression profiles under various growth and engineered conditions were just starting to be developed for algae. It was in this context that an integrated algal biology program was introduced in the NAABB to address the greatest constraints limiting algal biomass yield. This review describes the NAABB algal biology program, including hypotheses, research objectives, and strategies to move algal biology research into the twenty-first century and to realize the greatest potential of algae biomass systems to produce biofuels.
Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.