With SMRT Link you can unlock the power of PacBio Single Molecule, Real-Time (SMRT) Sequencing using our portfolio of software tools designed to set up and monitor sequencing runs, review performance metrics, analyze, visualize, and annotate your sequencing data.
Advances in sequence consensus and clustering algorithms for effective de novo assembly and haplotyping applications.
One of the major applications of DNA sequencing technology is to bring together information that is distant in sequence space so that understanding genome structure and function becomes easier on a large scale. The Single Molecule Real Time (SMRT) Sequencing platform provides direct sequencing data that can span several thousand bases to tens of thousands of bases in a high-throughput fashion. In contrast to solving genomic puzzles by patching together smaller piece of information, long sequence reads can decrease potential computation complexity by reducing combinatorial factors significantly. We demonstrate algorithmic approaches to construct accurate consensus when the differences between reads are dominated by insertions and deletions. High-performance implementations of such algorithms allow more efficient de novo assembly with a pre-assembly step that generates highly accurate, consensus-based reads which can be used as input for existing genome assemblers. In contrast to recent hybrid assembly approach, only a single ~10 kb or longer SMRTbell library is necessary for the hierarchical genome assembly process (HGAP). Meanwhile, with a sensitive read-clustering algorithm with the consensus algorithms, one is able to discern haplotypes that differ by less than 1% different from each other over a large region. One of the related applications is to generate accurate haplotype sequences for HLA loci. Long sequence reads that can cover the whole 3 kb to 4 kb diploid genomic regions will simplify the haplotyping process. These algorithms can also be applied to resolve individual populations within mixed pools of DNA molecules that are similar to each, e.g., by sequencing viral quasi-species samples.
Single molecule RNA sequencing uncovers trans-splicing and improves annotations in Anopheles stephensi.
Single molecule real-time (SMRT) sequencing has recently been used to obtain full-length cDNA sequences that improve genome annotation and reveal RNA isoforms. Here, we used one such method called isoform sequencing from Pacific Biosciences (PacBio) to sequence a cDNA library from the Asian malaria mosquito Anopheles stephensi. More than 600 000 full-length cDNAs, referred to as reads of insert, were identified. Owing to the inherently high error rate of PacBio sequencing, we tested different approaches for error correction. We found that error correction using Illumina RNA sequencing (RNA-seq) generated more data than using the default SMRT pipeline. The full-length error-corrected PacBio reads greatly improved the gene annotation of Anopheles stephensi: 4867 gene models were updated and 1785 alternatively spliced isoforms were added to the annotation. In addition, six trans-splicing events, where exons from different primary transcripts were joined together, were identified in An. stephensi. All six trans-splicing events appear to be conserved in Culicidae, as they are also found in Anopheles gambiae and Aedes aegypti. The proteins encoded by trans-splicing events are also highly conserved and the orthologues of these proteins are cis-spliced in outgroup species, indicating that trans-splicing may arise as a mechanism to rescue genes that broke up during evolution.© 2017 The Royal Entomological Society.
Translating genomics into practice for real-time surveillance and response to carbapenemase-producing Enterobacteriaceae: evidence from a complex multi-institutional KPC outbreak.
Until recently, Klebsiella pneumoniae carbapenemase (KPC)-producing Enterobacteriaceae were rarely identified in Australia. Following an increase in the number of incident cases across the state of Victoria, we undertook a real-time combined genomic and epidemiological investigation. The scope of this study included identifying risk factors and routes of transmission, and investigating the utility of genomics to enhance traditional field epidemiology for informing management of established widespread outbreaks.All KPC-producing Enterobacteriaceae isolates referred to the state reference laboratory from 2012 onwards were included. Whole-genome sequencing was performed in parallel with a detailed descriptive epidemiological investigation of each case, using Illumina sequencing on each isolate. This was complemented with PacBio long-read sequencing on selected isolates to establish high-quality reference sequences and interrogate characteristics of KPC-encoding plasmids.Initial investigations indicated that the outbreak was widespread, with 86 KPC-producing Enterobacteriaceae isolates (K. pneumoniae 92%) identified from 35 different locations across metropolitan and rural Victoria between 2012 and 2015. Initial combined analyses of the epidemiological and genomic data resolved the outbreak into distinct nosocomial transmission networks, and identified healthcare facilities at the epicentre of KPC transmission. New cases were assigned to transmission networks in real-time, allowing focussed infection control efforts. PacBio sequencing confirmed a secondary transmission network arising from inter-species plasmid transmission. Insights from Bayesian transmission inference and analyses of within-host diversity informed the development of state-wide public health and infection control guidelines, including interventions such as an intensive approach to screening contacts following new case detection to minimise unrecognised colonisation.A real-time combined epidemiological and genomic investigation proved critical to identifying and defining multiple transmission networks of KPC Enterobacteriaceae, while data from either investigation alone were inconclusive. The investigation was fundamental to informing infection control measures in real-time and the development of state-wide public health guidelines on carbapenemase-producing Enterobacteriaceae surveillance and management.
The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome–78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.
Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing.
Single Molecule, Real-Time (SMRT(®)) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Complete genome sequence of Vibrio campbellii strain 20130629003S01 isolated from shrimp with acute hepatopancreatic necrosis disease.
Vibrio campbellii is widely distributed in the marine environment and is an important pathogen of aquatic organisms such as shrimp, fish, and mollusks. An isolate of V. campbellii carrying the pirAB(vp) gene, causing acute hepatopancreatic necrosis disease (AHPND), has been reported. There are no previous reports about the complete genome of V. campbellii causing AHPND (VCAHPND). To extend our understanding of the pathogenesis of VCAHPND at the genomic level, the genome of V. campbellii 20130629003S01 isolated from a shrimp with AHPND was sequenced and analysed.The complete genome sequence of V. campbellii 20130629003S01 was generated using the PacBio RSII platform with single molecule, real-time sequencing. The 20130629003S01 strain consists of two circular chromosomes (3,621,712 bp in chromosome 1 and 2,245,751 bp in chromosome 2) and four plasmids of 70,066, 204,531, 143,140, and 86,121 bp. The genome contains a total of 5855 protein coding genes, 134 tRNA genes and 37 rRNA genes. The average nucleotide identity value of 20130629003S01 and other reference V. campbellii strains was 97.46%, suggesting that they are closely related.The genome sequence of V. campbellii 20130629003S01 and its comparative analysis with other V. campbellii strains that we present here are important for a better understanding of the genomic characteristics of VCAHPND.
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
Complete genome sequences of three Neisseria gonorrhoeae laboratory reference Strains, determined using PacBio Single-Molecule Real-Time technology.
Neisseria gonorrhoeae, the etiological agent that causes the sexually transmitted infection gonorrhea, is a significant public health concern due to the emergence of antimicrobial resistance. We report the complete genome sequences of three reference isolates with varied antimicrobial susceptibility that will aid in elucidating the genetic mechanisms that confer resistance. Copyright © 2015 Abrams et al.
Complete genome of Pandoraea pnomenusa RB-38, an oxalotrophic bacterium isolated from municipal solid waste landfill site.
Pandoraea pnomenusa RB-38 is a bacterium isolated from a former sanitary landfill site. Here, we present the complete genome of P. pnomenusa RB38 in which an oxalate utilization pathway was identified. The genome analysis suggested the potential of this strain as an effective biocontrol agent against oxalate-producing phytopathogens. Copyright © 2015 Elsevier B.V. All rights reserved.
The Southern Ocean houses a diverse and productive community of organisms. Unicellular eukaryotic diatoms are the main primary producers in this environment, where photosynthesis is limited by low concentrations of dissolved iron and large seasonal fluctuations in light, temperature and the extent of sea ice. How diatoms have adapted to this extreme environment is largely unknown. Here we present insights into the genome evolution of a cold-adapted diatom from the Southern Ocean, Fragilariopsis cylindrus, based on a comparison with temperate diatoms. We find that approximately 24.7 per cent of the diploid F. cylindrus genome consists of genetic loci with alleles that are highly divergent (15.1 megabases of the total genome size of 61.1 megabases). These divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO2. Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation. Divergent alleles may be involved in adaptation to environmental fluctuations in the Southern Ocean.
Elucidation of the evolutionary history and interrelatedness of Plasmodium species that infect humans has been hampered by a lack of genetic information for three human-infective species: P. malariae and two P. ovale species (P. o. curtisi and P. o. wallikeri). These species are prevalent across most regions in which malaria is endemic and are often undetectable by light microscopy, rendering their study in human populations difficult. The exact evolutionary relationship of these species to the other human-infective species has been contested. Using a new reference genome for P. malariae and a manually curated draft P. o. curtisi genome, we are now able to accurately place these species within the Plasmodium phylogeny. Sequencing of a P. malariae relative that infects chimpanzees reveals similar signatures of selection in the P. malariae lineage to another Plasmodium lineage shown to be capable of colonization of both human and chimpanzee hosts. Molecular dating suggests that these host adaptations occurred over similar evolutionary timescales. In addition to the core genome that is conserved between species, differences in gene content can be linked to their specific biology. The genome suggests that P. malariae expresses a family of heterodimeric proteins on its surface that have structural similarities to a protein crucial for invasion of red blood cells. The data presented here provide insight into the evolution of the Plasmodium genus as a whole.
Comparative genomics and transcriptome analysis of Aspergillus niger and metabolic engineering for citrate production.
Despite a long and successful history of citrate production in Aspergillus niger, the molecular mechanism of citrate accumulation is only partially understood. In this study, we used comparative genomics and transcriptome analysis of citrate-producing strains-namely, A. niger H915-1 (citrate titer: 157?g?L(-1)), A1 (117?g?L(-1)), and L2 (76?g?L(-1))-to gain a genome-wide view of the mechanism of citrate accumulation. Compared with A. niger A1 and L2, A. niger H915-1 contained 92 mutated genes, including a succinate-semialdehyde dehydrogenase in the ?-aminobutyric acid shunt pathway and an aconitase family protein involved in citrate synthesis. Furthermore, transcriptome analysis of A. niger H915-1 revealed that the transcription levels of 479 genes changed between the cell growth stage (6?h) and the citrate synthesis stage (12?h, 24?h, 36?h, and 48?h). In the glycolysis pathway, triosephosphate isomerase was up-regulated, whereas pyruvate kinase was down-regulated. Two cytosol ATP-citrate lyases, which take part in the cycle of citrate synthesis, were up-regulated, and may coordinate with the alternative oxidases in the alternative respiratory pathway for energy balance. Finally, deletion of the oxaloacetate acetylhydrolase gene in H915-1 eliminated oxalate formation but neither influence on pH decrease nor difference in citrate production were observed.
Herein, we report the genome sequence of a Clostridium difficile strain isolated from the feces of antibiotic-treated C57BL/6 mice. We have named this strain, which differs considerably from those of the previously sequenced C. difficile strains, LEM1. Copyright © 2017 Etienne-Mesmin et al.
Complete genome sequences of the xylose-fermenting Candida intermedia strains CBS 141442 and PYCC 4715.
Sustainable biofuel production from lignocellulosic materials requires efficient and complete use of all abundant sugars in the biomass, including xylose. Here, we report on the de novo genome assemblies of two strains of the xylose-fermenting yeast Candida intermedia: CBS 141442 and PYCC 4715. Copyright © 2017 Moreno et al.