PacBio sample Archives

July 19, 2019 |

Preparation of next-generation DNA sequencing libraries from ultra-low amounts of input DNA: Application to single-molecule, real-time (SMRT) sequencing on the Pacific Biosciences RS II.

We have developed and validated an amplification-free method for generating DNA sequencing libraries from very low amounts of input DNA (500 picograms – 20 nanograms) for single- molecule sequencing on the Pacific Biosciences (PacBio) RS II sequencer. The common challenge of high input requirements for single-molecule sequencing is overcome by using a carrier DNA in conjunction with optimized sequencing preparation conditions and re-use of the MagBead-bound complex. Here we describe how this method can be used to produce sequencing yields comparable to those generated from standard input amounts, but by using 1000-fold less starting material.

July 19, 2019 |

Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations.

Deletion of tumor-suppressor genes as well as other genomic rearrangements pervade cancer genomes across numerous types of solid tumor and hematologic malignancies. However, even for a specific rearrangement, the breakpoints may vary between individuals, such as the recurrent CDKN2A deletion. Characterizing the exact breakpoints for structural variants (SVs) is useful for designating patient-specific tumor biomarkers. We propose AmBre (Amplification of Breakpoints), a method to target SV breakpoints occurring in samples composed of heterogeneous tumor and germline DNA. Additionally, AmBre validates SVs called by whole-exome/genome sequencing and hybridization arrays. AmBre involves a PCR-based approach to amplify the DNA segment containing an SV’s breakpoint and then confirms breakpoints using sequencing by Pacific Biosciences RS. To amplify breakpoints with PCR, primers tiling specified target regions are carefully selected with a simulated annealing algorithm to minimize off-target amplification and maximize efficiency at capturing all possible breakpoints within the target regions. To confirm correct amplification and obtain breakpoints, PCR amplicons are combined without barcoding and simultaneously long-read sequenced using a single SMRT cell. Our algorithm efficiently separates reads based on breakpoints. Each read group supporting the same breakpoint corresponds with an amplicon and a consensus amplicon sequence is called. AmBre was used to discover CDKN2A deletion breakpoints in cancer cell lines: A549, CEM, Detroit562, MOLT4, MCF7, and T98G. Also, we successfully assayed RUNX1-RUNX1T1 reciprocal translocations by finding both breakpoints in the Kasumi-1 cell line. AmBre successfully targets SVs where DNA harboring the breakpoints are present in 1:1000 mixtures.

July 19, 2019 |

Long-read, whole-genome shotgun sequence data for five model organisms.

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.

July 19, 2019 |

Accurate detection of complex structural variations using single-molecule sequencing.

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.

July 7, 2019 |

Complete genome sequence of Salmonella enterica subsp. enterica serovar Agona 460004 2-1, associated with a multistate outbreak in the United States.

Within the last several years, Salmonella enterica subsp. enterica serovar Agona has been among the 20 most frequently isolated serovars in clinical cases of salmonellosis. In this report, the complete genome sequence of S. Agona strain 460004 2-1 isolated from unsweetened puffed-rice cereal during a multistate outbreak in 2008 was sequenced using single-molecule real-time DNA sequencing. Copyright © 2015 Hoffmann et al.

July 7, 2019 |

Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology.In this study, the high error rate of PacBio long sequence reads of A. comosus’s total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus.The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.

July 7, 2019 |

First fully closed genome sequence of Salmonella enterica subsp. enterica serovar Cubana associated with a food-borne outbreak.

Salmonella enterica subsp. enterica serovar Cubana (Salmonella serovar Cubana) is associated with human and animal disease. Here, we used third-generation, single-molecule, real-time DNA sequencing to determine the first complete genome sequence of Salmonella serovar Cubana CFSAN002050, which was isolated from fresh alfalfa sprouts during a multistate outbreak in 2012. Copyright © 2014 Hoffmann et al.

July 7, 2019 |

A precise chloroplast genome of Nelumbo nucifera (Nelumbonaceae) evaluated with Sanger, Illumina MiSeq, and PacBio RS II sequencing platforms: insight into the plastid evolution of basal eudicots.

BackgroundThe chloroplast genome is important for plant development and plant evolution. Nelumbo nucifera is one member of relict plants surviving from the late Cretaceous. Recently, a new sequencing platform PacBio RS II, known as `SMRT (Single Molecule, Real-Time) sequencing¿, has been developed. Using the SMRT sequencing to investigate the chloroplast genome of N. nucifera will help to elucidate the plastid evolution of basal eudicots.ResultsThe sizes of the de novo assembled complete chloroplast genome of N. nucifera were 163,307 bp, 163,747 bp and 163,600 bp with average depths of coverage of 7×, 712× and 105× sequenced by Sanger, Illumina MiSeq and PacBio RS II, respectively. The precise chloroplast genome of N. nucifera was obtained from PacBio RS II data proofread by Illumina MiSeq reads, with a quadripartite structure containing a large single copy region (91,846 bp) and a small single copy region (19,626 bp) separated by two inverted repeat regions (26,064 bp). The genome contains 113 different genes, including four distinct rRNAs, 30 distinct tRNAs and 79 distinct peptide-coding genes. A phylogenetic analysis of 133 taxa from 56 orders indicated that Nelumbo with an age of 177 million years is a sister clade to Platanus, which belongs to the basal eudicots. Basal eudicots began to emerge during the early Jurassic with estimated divergence times at 197 million years using MCMCTree. IR expansions/contractions within the basal eudicots seem to have occurred independently.ConclusionsBecause of long reads and lack of bias in coverage of AT-rich regions, PacBio RS II showed a great promise for highly accurate `finished¿ genomes, especially for a de novo assembly of genomes. N. nucifera is one member of basal eudicots, however, evolutionary analyses of IR structural variations of N. nucifera and other basal eudicots suggested that IR expansions/contractions occurred independently in these basal eudicots or were caused by independent insertions and deletions. The precise chloroplast genome of N. nucifera will present new information for structural variation of chloroplast genomes and provide new insight into the evolution of basal eudicots at the primary sequence and structural level.

July 7, 2019 |

Draft genome sequences of Escherichia coli strains isolated from septic patients.

We present the draft genome sequences of six strains of Escherichia coli isolated from blood cultures collected from patients with sepsis. The strains were collected from two patient sets, those with a high severity of illness, and those with a low severity of illness. Each genome was sequenced by both Illumina and PacBio for comparison. Copyright © 2014 Dunitz et al.

July 7, 2019 |

Complete genome sequence and methylome analysis of Acinetobacter calcoaceticus 65.

Acinetobacter calcoaceticus 65 is the original source strain for the restriction enzyme Acc65I. Its complete sequence and full methylome were determined using single-molecule real-time (SMRT) sequencing. Copyright © 2017 Fomenkov et al.

July 7, 2019 |

Whole-genome comparative analysis of Salmonella enterica serovar Newport strains reveals lineage-specific divergence.

Salmonella enterica subsp. enterica serovar Newport has been associated with various foodborne outbreaks in humans and animals. Phylogenetically, serovar Newport is one of several Salmonella serovars that are polyphyletic. To understand more about the polyphyletic nature of this serovar, six food, environment, and human isolates from different Newport lineages were selected for genome comparison analyses. Whole genome comparisons demonstrated that heterogeneity mostly occurred in the prophage regions. Lineage-specific characteristics were also present in the Salmonella pathogenicity islands and fimbrial operons. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.

July 7, 2019 |

Comparative sequence analysis of multidrug-resistant IncA/C plasmids from Salmonella enterica

Determinants of multidrug resistance (MDR) are often encoded on mobile elements, such as plasmids, transposons, and integrons, which have the potential to transfer among foodborne pathogens, as well as to other virulent pathogens, increasing the threats these traits pose to human and veterinary health. Our understanding of MDR among Salmonella has been limited by the lack of closed plasmid genomes for comparisons across resistance phenotypes, due to difficulties in effectively separating the DNA of these high-molecular weight, low-copy-number plasmids from chromosomal DNA. To resolve this problem, we demonstrate an efficient protocol for isolating, sequencing and closing IncA/C plasmids from Salmonella sp. using single molecule real-time sequencing on a Pacific Biosciences (Pacbio) RS II Sequencer. We obtained six Salmonella enterica isolates from poultry, representing six different serovars, each exhibiting the MDR-Ampc resistance profile. Salmonella plasmids were obtained using a modified mini preparation and transformed with Escherichia coli DH10Br. A Qiagen Large-Construct kit™ was used to recover highly concentrated and purified plasmid DNA that was sequenced using PacBio technology. These six closed IncA/C plasmids ranged in size from 104 to 191 kb and shared a stable, conserved backbone containing 98 core genes, with only six differences among those core genes. The plasmids encoded a number of antimicrobial resistance genes, including those for quaternary ammonium compounds and mercury. We then compared our six IncA/C plasmid sequences: first with 14 IncA/C plasmids derived from S. enterica available at the National Center for Biotechnology Information (NCBI), and then with an additional 38 IncA/C plasmids derived from different taxa. These comparisons allowed us to build an evolutionary picture of how antimicrobial resistance may be mediated by this common plasmid backbone. Our project provides detailed genetic information about resistance genes in plasmids, advances in plasmid sequencing, and phylogenetic analyses, and important insights about how MDR evolution occurs across diverse serotypes from different animal sources, particularly in agricultural settings where antimicrobial drug use practices vary.

July 7, 2019 |

Complete genome sequence of Salmonella enterica subsp. enterica serovar Minnesota strain

Mango has been implicated as food vehicle in several Salmonella-causing foodborne outbreaks. Here, Salmonella enterica subsp. enterica serovar Minnesota was isolated from fresh mango fruit imported from Mexico in 2014. The complete genome sequence of S. Minnesota CFSAN017963 was sequenced using single-molecule real-time DNA sequencing. Distinct prophage regions, Salmonella pathogenicity islands, and fimbrial gene clusters were observed in comparative genomic analysis on S. Minnesota CFSAN017963 with other phylogenetically closely related Salmonella serovars. Core genome multilocus sequencing typing analysis of all the S. Minnesota isolates in the Genbank and Enterobase also revealed a high genomic diversity among the genomes analyzed.

July 7, 2019 |

Complete genome and methylome sequences of Salmonella enterica subsp. enterica serovar Panama (ATCC 7378) and Salmonella enterica subsp. enterica serovar Sloterdijk (ATCC 15791).

Salmonella enterica spp. are pathogenic bacteria commonly associated with food-borne outbreaks in human and animals. Salmonella enterica spp. are characterized into more than 2,500 different serotypes, which makes epidemiological surveillance and outbreak control more difficult. In this report, we announce the first complete genome and methylome sequences from two Salmonella type strains associated with food-borne outbreaks, Salmonella enterica subsp. enterica serovar Panama (ATCC 7378) and Salmonella enterica subsp. enterica serovar Sloterdijk (ATCC 15791). Copyright © 2016 Yao et al.

July 7, 2019 |

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10?kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads.We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9?min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools.https://github.com/lh3/minimap and https://github.com/lh3/miniasmhengli@broadinstitute.orgSupplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Auto Tag: PacBio sample

Preparation of next-generation DNA sequencing libraries from ultra-low amounts of input DNA: Application to single-molecule, real-time (SMRT) sequencing on the Pacific Biosciences RS II.

Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations.

Long-read, whole-genome shotgun sequence data for five model organisms.

Accurate detection of complex structural variations using single-molecule sequencing.

Complete genome sequence of Salmonella enterica subsp. enterica serovar Agona 460004 2-1, associated with a multistate outbreak in the United States.

Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

First fully closed genome sequence of Salmonella enterica subsp. enterica serovar Cubana associated with a food-borne outbreak.

A precise chloroplast genome of Nelumbo nucifera (Nelumbonaceae) evaluated with Sanger, Illumina MiSeq, and PacBio RS II sequencing platforms: insight into the plastid evolution of basal eudicots.

Draft genome sequences of Escherichia coli strains isolated from septic patients.

Complete genome sequence and methylome analysis of Acinetobacter calcoaceticus 65.

Whole-genome comparative analysis of Salmonella enterica serovar Newport strains reveals lineage-specific divergence.

Comparative sequence analysis of multidrug-resistant IncA/C plasmids from Salmonella enterica

Complete genome sequence of Salmonella enterica subsp. enterica serovar Minnesota strain

Complete genome and methylome sequences of Salmonella enterica subsp. enterica serovar Panama (ATCC 7378) and Salmonella enterica subsp. enterica serovar Sloterdijk (ATCC 15791).

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert