+

X

Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.

X

Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.

IMAGES ARE PROVIDED BY Pacific Biosciences ON AN "AS-IS" BASIS. Pacific Biosciences DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, OWNERSHIP, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL Pacific Biosciences BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES OF ANY KIND WHATSOEVER WITH RESPECT TO THE IMAGES.

You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences’ rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences
Contact:

Search scientific conference presentations and posters

Search Query

Author Search

Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome using long-read sequencing

2016 ASM Microbe

2016

Abstract +

Sequence-based estimation of genetic diversity of Plasmodium falciparum, the most lethal malarial parasite, has proved challenging due to a lack of a complete genomic assembly. The skewed AT-richness (~80.6% (A+T)) of its genome and the lack of technology to assemble highly polymorphic sub-telomeric regions that contain clonally variant, multigene virulence families (i.e. var and rifin) have confounded attempts using short-read NGS technologies. Using single molecule, real-time (SMRT) sequencing, we successfully compiled all 14 nuclear chromosomes of the P. falciparum genome from telomere-to-telomere in single contigs. Specifically, amplification-free sequencing generated reads of average length 12 kb, with ?50% of the reads between 15.5 and 50 kb in length. A hierarchical genome assembly process (HGAP), was used to assemble the P. falciparum genome de novo. This assembly accurately resolved centromeres (~90-99% (A+T)) and sub-telomeric regions, and identified large insertions and duplications in the genome that added extra genes to the var and rifin virulence families, along with smaller structural variants such as homopolymer tract expansions. These regions can be used as markers for genetic diversity during comparative genome analyses. Moreover, identifying the polymorphic and repetitive sub-telomeric sequences of parasite populations from endemic areas might inform the link between structural variation and phenotypes such as virulence, drug resistance and disease transmission.

An improved circular consensus algorithm with an application to detect HIV-1 Drug Resistance Associated Mutations (DRAMs)

2016 ASM Microbe

2016

Abstract +

Scientists who require confident resolution of heterogeneous populations across complex regions have been unable to transition to short-read sequencing methods. They continue to depend on Sanger sequencing despite its cost and time inefficiencies. Here we present a new redesigned algorithm that allows the generation of circular consensus sequences (CCS) from individual SMRT Sequencing reads. With this new algorithm, dubbed CCS2, it is possible to reach high quality across longer insert lengths at a lower cost and higher throughput than Sanger sequencing. We applied CCS2 to the characterization of the HIV-1 K103N drug-resistance associated mutation in both clonal and patient samples. This particular DRAM has previously proved to be clinically relevant, but challenging to characterize due to regional sequence context. First, a mutation was introduced into the 3rd position of amino acid position 103 (A>C substitution) of the RT gene on a pNL4-3 backbone by site-directed mutagenesis. Regions spanning ~1.3 kb were PCR amplified from both the non-mutated and mutant (K103N) plasmids, and were sequenced individually and as a 50:50 mixture. Additionally, the proviral reservoir of a subject with known dates of virologic failure of an Efavirenz-based regimen and with documented emergence of drug resistant (K103N) viremia was sequenced at several time points as a proof-of-concept study to determine the kinetics of retention and decay of K103N.Sequencing data were analyzed using the new CCS2 algorithm, which uses a fully-generative probabilistic model of our SMRT Sequencing process to polish consensus sequences to high accuracy. With CCS2, we are able to achieve a per-read empirical quality of QV30 (99.9% accuracy) at 19X coverage. A total of ~5000 1.3 kb consensus sequences with a collective empirical quality of ~QV40 (99.99%) were obtained for each sample. We demonstrate a 0% miscall rate in both unmixed control samples, and estimate a 48:52 frequency for the K103N mutation in the mixed (50:50) plasmid sample, consistent with data produced by orthogonal platforms. Additionally, the K103N escape variant was only detected in proviral samples from time points subsequent (19%) to the emergence of drug resistant viremia. This tool might be used to monitor the HIV reservoir for stable evolutionary changes throughout infection.

Minimization of chimera formation and substitution errors in full-length 16S PCR amplification

2016 ASM Microbe

2016

Abstract +

The constituents and intra-communal interactions of microbial populations have garnered increasing interest in areas such as water remediation, agriculture and human health. Amplification and sequencing of the evolutionarily conserved 16S rRNA gene is an efficient method of profiling communities. Currently, most targeted amplification focuses on short, hypervariable regions of the 16S sequence. Distinguishing information not spanned by the targeted region is lost, and species-level classification is often not possible. PacBio SMRT Sequencing easily spans the entire 1.5 kb 16S gene in a single read, producing highly accurate single-molecule sequences that can improve the identification of individual species in a metapopulation.However, this process still relies upon PCR amplification from a mixture of similar sequences, which may result in chimeras, or recombinant molecules, at rates upwards of 20%. These PCR artifacts make it difficult to identify novel species, and reduce the amount of informative sequences. We investigated multiple factors that may contribute to chimera formation, such as template damage, denaturation time before and during thermocycling, polymerase extension time, and reaction volume. We found two related factors that contribute to chimera formation: the amount of input template into the PCR reaction, and the number of PCR cycles.A second problem that can confound analysis is sequence errors generated during amplification and sequencing. With the updated algorithm for circular consensus sequencing (CCS2), single-molecule reads can be filtered to 99.99% predicted accuracy. Substitution errors in these highly filtered reads may be dominated by mis-incorporations during amplification. Sequence differences in full-length 16S amplicons from several commercial high-fidelity PCR kits were compared.We show results of our experiments and describe our optimized protocol for full-length 16S amplification for SMRT Sequencing. These optimizations have broader implications for other applications that use PCR amplification to phase variations across targeted regions and generate highly accurate reference sequences.

Multiplexing strategies for microbial whole genome SMRT Sequencing

2016 ASM Microbe

2016

Abstract +

As the throughput of the PacBio Systems continues to increase, so has the desire to fully utilize SMRT Cell sequencing capacity to multiplex microbes for whole genome sequencing. Multiplexing is readily achieved by incorporating a unique barcode for each microbe into the SMRTbell adapters and using a streamlined library preparation process. Incorporating barcodes without PCR amplification prevents the loss of epigenetic information and the generation of chimeric sequences, while eliminating the need to generate separate SMRTbell libraries. We multiplexed the genomes of up to 8 unique strains of H. pylori. Each genome was sheared and processed through adapter ligation in a single, addition-only reaction. The barcoded samples were pooled in equimolar quantities and a single SMRTbell library was prepared. We demonstrate successful de novo microbial assembly from all multiplexes tested (2- through 8-plex) using data generated from a single SMRTbell library, run on a single SMRT Cell with the PacBio RS II, and analyzed with standard SMRT Analysis assembly methods. This strategy was successful using both small (1.6 Mb, H. pylori) and medium (5 Mb, E. coli) genomes. This protocol facilitates the sequencing of multiple microbial genomes in a single run, greatly increasing throughput and reducing costs per genome.

WGS SMRT Sequencing of patient samples from a fecal microbiota transplant trial

2016 ASM Microbe

2016

Abstract +

Fecal samples were obtained from human subjects in the first blinded, placebo-controlled trial to evaluate the efficacy and safety of fecal microbiota transplant (FMT) for treatment of recurrent C. difficile infection. Samples included pre-and post-FMT transplant, post-placebo transplant, and the donor control; samples were taken at 2 and 8 week post-FMT. Sequencing was done on the PacBio Sequel System, with the goal of obtaining high quality sequences covering whole genes or gene clusters, which will be used to better understand the relationship between the composition and functional capabilities of intestinal microbiomes and patient health. Methods: Samples were randomly sheared to 2-3 kb fragments, a sufficient length to cover most genes, and SMRTbell libraries were prepared using standard protocols. Libraries were run on the Sequel System, which has a throughput of hundreds of thousands of reads per SMRT Cell, adequate yield to sample the complex microbiomes of post-transplant and donor samples.Results: Here we characterize samples, describe library prep methods and detail Sequel System operation, including run conditions. Descriptive statistics of data output (primary analysis) are presented, along with SMRT Analysis reports on circular consensus sequence (CCS) reads generated using an updated algorithm (CCS2). Final sequencing yields are filtered at various levels of predicted accuracy from 90% to 99.9%. Previous studies done using the PacBio RS II System demonstrated the ability to profile at the species level, and in some cases the strain level, and provided functional insight. Conclusions: These results demonstrate that the Sequel System is well-suited for characterization of complex microbial communities, with the ability for high-throughput generation of extremely accurate single-molecule sequences, each several kilobases in length. The entire process from shearing and library prep through sequencing and CCS analysis can be completed in less than 48 hours.

Workflow for processing high-throughput, Single Molecule, Real-Time Sequencing data for analyzing the microbiome of patients undergoing fecal microbiota transplantation

2016 ASM Microbe

2016

Abstract +

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500 bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, Single Molecule, Real-Time (SMRT) Sequencing reads in the 1-3 kb range, with >99% accuracy can be generated using the previous generation PacBio RS II or, in much higher throughput, using the new Sequel System. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With single-molecule reads >1 kb at >99% consensus accuracy, it is reasonable to expect a high percentage of reads to include genes or gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individualÕs microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.

Resolving KIR genotypes and haplotypes simultaneously using Single Molecule, Real-Time Sequencing

The European Human Genetics Conference 2016

2016

Abstract +

The killer immunoglobulin-like receptors (KIR) genes belong to the immunoglobulin superfamily and are widely studied due to the critical role they play in coordinating the innate immune response to infection and disease. Highly accurate, contiguous, long reads, like those generated by SMRT Sequencing, when combined with target-enrichment protocols, provide a straightforward strategy for generating complete de novo assembled KIR haplotypes. We have explored two different methods to capture the KIR region; one applying the use of fosmid clones and one using Nimblegen capture.

Highly sensitive and cost-effective detection of somatic cancer variants using single-molecule, real-time sequencing

American Association for Cancer Research

2016

Abstract +

Next-Generation Sequencing (NGS) technologies allow for molecular profiling of cancer samples with high sensitivity and speed at reduced cost. For efficient profiling of cancer samples, it is important that the NGS methods used are not only robust, but capable of accurately detecting low-frequency somatic mutations. Single Molecule, Real-Time (SMRT) Sequencing offers several advantages, including the ability to sequence single molecules with very high accuracy (>QV40) using the circular consensus sequencing (CCS) approach. The availability of genetically defined, human genomic reference standards provides an industry standard for the development and quality control of molecular assays for studying cancer variants. Here we characterize SMRT Sequencing for the detection of low-frequency somatic variants using the Quantitative Multiplex DNA Reference Standards from Horizon Discovery, combined with amplification of the variants using the Multiplicom Tumor Hotspot MASTR Plus assay. First, we sequenced a reference standard containing precise allelic frequencies from 1% to 24.5% for major oncology targets verified using digital PCR. This reference material recapitulates the complexity of tumor composition and serves as a well-characterized control. The control sample was amplified using the Multiplicom Tumor Hotspot MASTR Plus assay that targets 252 amplicons (121-254 bp) from 26 relevant cancer genes, which includes all 11 variants in the control sample. Next, we sequenced control samples prepared by SeraCare Life Sciences, which contained a defined mutation at allelic frequencies from 10% down to 0.1%. The wild type and mutant amplicons were serially diluted, sequenced and analyzed using SMRT Sequencing to identify the variants and determine the observed frequency. The random error profile and high-accuracy CCS reads make it possible to accurately detect low-frequency somatic variants.

Reconstruction of the spinach coding genome using full-length transcriptome without a reference genome

11th Annual DOE Joint Genome Institute Genomics of Energy & Environment Meeting

2016

Abstract +

For highly complex and large genomes, a well-annotated genome may be computationally challenging and costly, yet the study of alternative splicing events and gene annotations usually rely on the existence of a genome. Long-read sequencing technology provides new opportunities to sequence full-length cDNAs, avoiding computational challenges that short read transcript assembly brings. The use of single molecule, real-time sequencing from PacBio to sequence transcriptomes (the Iso-Seq method), which produces de novo, high-quality, full-length transcripts, has revealed an astonishing amount of alternative splicing in eukaryotic species. With the Iso-Seq method, it is now possible to reconstruct the transcribed regions of the genome using just the transcripts themselves. We present Cogent, a tool for finding gene families and reconstructing the coding genome in the absence of a high-quality reference genome. Cogent uses k-mer similarities to first partition the transcripts into different gene families. Then, for each gene family, the transcripts are used to build a splice graph. Cogent identifies bubbles resulting from sequencing errors, minor variants, and exon skipping events, and attempts to resolve each splice graph down to the minimal set of reconstructed contigs. We apply Cogent to the Iso-Seq data for spinach, Spinacia oleracea, for which there is also a PacBio-based draft genome to validate the reconstruction. The Iso-Seq dataset consists of 68,263 fulllength, Quiver-polished transcript sequences ranging from 528 bp to 6 kbp long (mean: 2.1 kbp). Using the genome mapping as ground truth, we found that 95% (8045/8446) of the Cogent gene families found corresponded to a single genomic loci. For families that contained multiple loci, they were often homologous genes that would be categorized as belonging to the same gene family. Coding genome reconstruction was then performed individually for each gene family. A total of 86% (7283/8446) of the gene families were resolved to a single contig by Cogent, and was validated to be also a single contig in the genome. In 59 cases, Cogent reconstructed a single contig, however the contig corresponded to 2 or more loci in the genome, suggesting possible scaffolding opportunities. In 24 cases, the transcripts had no hits to the genome, though Pfam and BLAST searches of the transcripts show that they were indeed coding, suggesting that the genome is missing certain coding portions. Given the high quality of the spinach genome, we were not surprised to find that Cogent only minorly improved the genome space. However the ability of Cogent to accurately identify gene families and reconstruct the coding genome in a de novo fashion shows that it will be extremely powerful when applied to datasets for which there is no or low-quality reference genome.

An improved circular consensus algorithm with an application to detection of HIV-1 Drug-Resistance Associated Mutations (DRAMs)

Advances in Genome Biology and Technology

2016

Abstract +

Scientists who require confident resolution of heterogeneous populations across complex regions have been unable to transition to short-read sequencing methods. They continue to depend on Sanger Sequencing despite its cost and time inefficiencies. Here we present a new redesigned algorithm that allows the generation of circular consensus sequences (CCS) from individual SMRT Sequencing reads. With this new algorithm, dubbed CCS2, it is possible to reach arbitrarily high quality across longer insert lengths at a lower cost and higher throughput than Sanger Sequencing. We apply this new algorithm, dubbed CCS2, to the characterization of the HIV-1 K103N drug-resistance associated mutation, which is both important clinically, and represents a challenge due to regional sequence context. A mutation was introduced into the 3rd position of amino acid position 103 (A>C substitution) of the RT gene on a pNL4-3 backbone by site-directed mutagenesis. Regions spanning ~1,300 bp were PCR amplified from both the non-mutated and mutant (K103N) plasmids, and were sequenced individually and as a 50:50 mixture. Sequencing data were analyzed using the new CCS2 algorithm, which uses a fully-generative probabilistic model of our SMRT Sequencing process to polish consensus sequences to arbitrarily high accuracy. This result, previously demonstrated for multi-molecule consensus sequences with the Quiver algorithm, is made possible by incorporating per-Zero Mode Waveguide (ZMW) characteristics, thus accounting for the intrinsic changes in the sequencing process that are unique to each ZMW. With CCS2, we are able to achieve a per-read empirical quality of QV30 with 19X coverage. This yields ~5000 1.3 kb consensus sequences with a collective empirical quality of ~QV40. Additionally, we demonstrate a 0% miscall rate in both unmixed samples, and estimate a 48:52% frequency for the K103N mutation in the mixed sample, consistent with data produced by orthogonal platforms.

Application specific barcoding strategies for SMRT Sequencing

Advances in Genome Biology and Technology

2016

Abstract +

Over the last few years, several advances were implemented in the PacBio RS II System to maximize throughput and efficiency while reducing the cost per sample. The number of useable bases per SMRT Cell now exceeds 1 Gb with the latest P6-C4 chemistry and 6-hour movies. For applications such as microbial sequencing, targeted sequencing, Iso-Seq (full-length isoform sequencing) and Nimblegen’s target enrichment method, current SMRT Cell yields could be an excess relative to project requirements. To this end, barcoding is a viable option for multiplexing samples. For microbial sequencing, multiplexing can be accomplished by tagging sheared genomic DNA during library construction with modified SMRTbell adapters. We studied the performance of 2- to 8-plex microbial sequencing. For full-length amplicon sequencing such as HLA typing, amplicons as large as 5 kb may be barcoded during amplification using barcoded locus-specific primers. Alternatively, amplicons may be barcoded during SMRTbell library construction using barcoded SMRTbell adapters. The preferred barcoding strategy depends on the user’s existing workflow and flexibility to changing and/or updating existing workflows. Using barcoded adapters, five Class I and II genes (3.3 – 5.8 kb) x 96 patients can be multiplexed and typed. For Iso-Seq full-length cDNA sequencing, barcodes are incorporated during 1st-strand synthesis and are enabled by tailing the oligo-dT primer with any PacBio published 16-bp barcode sequences. RNA samples from 6 maize tissues were multiplexed to generate barcoded cDNA libraries. The NimbleGen SeqCap Target Enrichment method, combined with PacBio’s long-read sequencing, provides comprehensive view of multi-kilobase contiguous regions, both exonic and intronic regions. To make this cost effective, we recommend barcoding samples for pooling prior to target enrichment and capture. Here, we present specific examples of strategies and best practices for multiplexing samples for different applications for SMRT Sequencing. Additionally, we describe recommendations for analyzing barcoded samples.

Immune regions are no longer incomprehensible with SMRT Sequencing

Advances in Genome Biology and Technology

2016

Abstract +

The complex immune regions of the genome, including MHC and KIR, contain large copy number variants (CNVs), a high density of genes, hyper-polymorphic gene alleles, and conserved extended haplotypes (CEH) with enormous linkage disequilibrium (LDs). This level of complexity and inherent biases of short-read sequencing make it challenging for extracting immune region haplotype information from reference-reliant, shotgun sequencing and GWAS methods. As NGS based genome and exome sequencing and SNP arrays have become a routine for population studies, numerous efforts are being made for developing software to extract and or impute the immune gene information from these datasets. Despite these efforts, the fine mapping of causal variants of immune genes for their well-documented association with cancer, drug-induced hypersensitivity and immune-related diseases, has been slower than expected. This has in many ways limited our understanding of the mechanisms leading to immune disease. In the present work, we demonstrate the advantages of long reads delivered by SMRT Sequencing for assembling complete haplotypes of MHC and KIR gene clusters, as well as calling correct genotypes of genes comprised within them. All the genotype information is detected at allele- level with full phasing information across SNP-poor regions. Genotypes were called correctly from targeted gene amplicons, haplotypes, as well as from a completely assembled 5 Mb contig of the MHC region from a de novo assembly of whole genome shotgun data. De novo analysis pipeline used in all these approaches allowed for reference-free analysis without imputation, a key for interrogation without prior knowledge about ethnic backgrounds. These methods are thus easily adoptable for previously uncharacterized human or non-human species.

Long-read assembly of the Aedes aegypti Aag2 cell line genome resolves ancient endogenous viral elements

Advances in Genome Biology and Technology

2016

Abstract +

Transmission of arboviruses such as Dengue and Zika viruses by Aedes aegypti causes widespread and debilitating disease across the globe. Disease in humans can include severe acute symptoms such as hemorrhagic fever, organ failure, and encephalitis; and yet, mosquitoes tolerate high titers of virus in a persistent infection. The mechanisms responsible for tolerance to viral infection in mosquitoes are still unclear. Recent publications have highlighted the integration of genetic material from non-retroviral RNA viruses into the genome of the host during infection that relies upon endogenous retro-transcriptase activity from transposons. These endogenous viral elements (EVEs) found in the genome are predicted to be ancient and at least some EVEs are under purifying selection, which suggests that they are beneficial to the host. In order characterize EVE biogenesis in a tractable system we sequenced the Ae. aegypti cell line, Aag2, to 58X coverage and here present a de novo assembly of the genome. The assembly consists of 1.7 Gb of genomic and 255 Mb of alternative haplotype specific sequence, made up of contigs with a N50 of 1.4 Mb; a value that, when compared with other assemblies of the Aedes genus, is from 1-3 orders of magnitude longer. The Aag2 genome is highly repetitive (70%), most of which is classified as transposable elements (60%). We identify a plethora of EVEs in the genome homologous to a diverse range of extant viruses, many of which cluster in these regions of highly repetitive DNA. The highly contiguous nature of this assembly allows for a more comprehensive identification of the transposable elements and EVEs that are most likely to be lost in assemblies lacking the read length of SMRT Sequencing. Transmission of arboviruses such as Dengue Virus by Aedes aegypti causes widespread and debilitating disease across the globe. Disease in humans can include severe acute symptoms such as hemorrhagic fever, organ failure, and encephalitis; and yet, mosquitoes tolerate high titers of virus in a persistent infection. The mechanisms responsible for tolerance to viral infection in mosquitoes are still unclear. Recent publications have highlighted the integration of genetic material from non-retroviral RNA viruses into the genome of the host during infection that relies upon endogenous retro-transcriptase activity from transposons. These endogenous viral elements (EVEs) found in the genome are predicted to be ancient and at least some EVEs are under purifying selection, which suggests that they are beneficial to the host. In order characterize EVE biogenesis in a tractable system we sequenced the Ae. aegypti cell line, Aag2, to 58X coverage and here present a de novo assembly of the genome. The assembly consists of 1.7 Gb of genomic and 255 Mb of alternative haplotype specific sequence, made up of contigs with a N50 of 1.4 Mb; a value that, when compared with other assemblies of the Aedes genus, is from 1-3 orders of magnitude longer. The Aag2 genome is highly repetitive (70%), most of which is classified as transposable elements (60%). We identify a plethora of EVEs in the genome homologous to a diverse range of extant viruses, many of which cluster in these regions of highly repetitive DNA. The highly contiguous nature of this assembly allows for a more comprehensive identification of the transposable elements and EVEs that are most likely to be lost in assemblies lacking the read length of SMRT Sequencing. Transmission of arboviruses such as Dengue Virus by Aedes aegypti causes widespread and debilitating disease across the globe. Disease in humans can include severe acute symptoms such as hemorrhagic fever, organ failure, and encephalitis; and yet, mosquitoes tolerate high titers of virus in a persistent infection. The mechanisms responsible for tolerance to viral infection in mosquitoes are still unclear.

Low-input long-read sequencing for complete microbial genomes and metagenomic community analysis

Advances in Genome Biology and Technology

2016

Abstract +

Microbial genome sequencing can be done quickly, easily, and efficiently with the PacBio® sequencing instruments, resulting in complete de novo assemblies. Alternative protocols have been developed to reduce the amount of purified DNA required for SMRT Sequencing, to broaden applicability to lower-abundance samples. If 50-100 ng of microbial DNA is available, a 10-20 kb SMRTbell library can be made. The resulting library can be loaded onto multiple SMRT Cells, yielding more than enough data for complete assembly of microbial genomes using the SMRT Portal assembly program HGAP, plus base modification analysis. The entire process can be done in less than 3 days by standard laboratory personnel. This approach is particularly important for analysis of metagenomic communities, in which genomic DNA is often limited. From these samples, full-length 16S amplicons can be generated, prepped with the standard SMRTbell library prep protocol, and sequenced. Alternatively, a 2 kb sheared library, made from a few ng of input DNA, can also be used to elucidate the microbial composition of a community, and may provide information about biochemical pathways present in the sample. In both these cases, 1-2 kb reads with >99.9% accuracy can be obtained from Circular Consensus Sequencing.

Profiling the microbiome in fecal microbiota transplantation using circular consensus and Single Molecule, Real-Time Sequencing

Advances in Genome Biology and Technology

2016

Abstract +

There are many sequencing-based approaches to understanding complex metagenomic communities spanning targeted amplification to whole-sample shotgun sequencing. While targeted approaches provide valuable data at low sequencing depth, they are limited by primer design and PCR. Whole-sample shotgun experiments generally use short-read sequencing, which results in data processing difficulties. For example, reads less than 500bp in length will rarely cover a complete gene or region of interest, and will require assembly. This not only introduces the possibility of incorrectly combining sequence from different community members, it requires a high depth of coverage. As such, rare community members may not be represented in the resulting assembly. Circular-consensus, single molecule, real-time (SMRT®) Sequencing reads in the 1-3kb range, with >99% accuracy can be efficiently generated for low amounts of input DNA. 10 ng of input DNA sequenced in 4 SMRT Cells on the PacBio RS II would generate >100,000 such reads. While throughput is lower compared to short-read sequencing methods, the reads are a true random sampling of the underlying community since SMRT Sequencing has been shown to have very low sequence-context bias. With reads >1 kb at >99% accuracy it is reasonable to expect a high percentage of reads include gene fragments useful for analysis without the need for de novo assembly. Here we present the results of circular consensus sequencing for an individual’s microbiome, before and after undergoing fecal microbiota transplantation (FMT) in order to treat a chronic Clostridium difficile infection. We show that even with relatively low sequencing depth, the long-read, assembly-free, random sampling allows us to profile low abundance community members at the species level. We also show that using shotgun sampling with long reads allows a level of functional insight not possible with classic targeted 16S, or short read sequencing, due to entire genes being covered in single reads.

Event

Microbiology and Infection 2107

March 5 - March 8, 2017

Stay
Current

Visit our blog »