August 19, 2021  |  

Technical Note: Preparing DNA for PacBio HiFi sequencing – Extraction and quality control

Single Molecule, Real-Time (SMRT) Sequencing uses the natural process of DNA replication to sequence long fragments of native DNA in order to produce highly accurate long reads, or HiFi reads. As such, starting with high-quality, high molecular weight (HMW) genomic DNA (gDNA) will result in longer libraries and better performance during sequencing. This technical note is intended to give recommendations, tips and tricks for the extraction of DNA, as well as assessing and preserving the quality and size of your DNA sample to be used for HiFi sequencing.


August 19, 2021  |  Infectious disease research

Infographic: A brief history of microbiology

Our understanding of microbiology has evolved enormously over the last 150 years. Few institutions have witnessed our collective progress more closely than the National Collection of Type Cultures (NCTC). In fact, the collection itself is a record of the many milestones microbiologists have crossed, building on the discoveries of those who came before. To date, 60% of NCTC’s historic collection now has a closed, finished reference genome, thanks to PacBio Single Molecule, Real- Time (SMRT) Sequencing. We are excited to be their partner in crossing this latest milestone on their quest to improve human and animal health by understanding the microscopic world.


August 19, 2021  |  

Case Study: Mining complex metagenomes for protein discovery with long-read sequencing

The bacteria living on and within us can impact health, disease, and even our behavior, but there is still much to learn about the breadth of their effects. The torrent of new discoveries unleashed by high-throughput sequencing has captured the imagination of scientists and the public alike. Scientists at Second Genome are hoping to apply these insights to improve human health, leveraging their bioinformatics expertise to mine bacterial communities for potential therapeutics. Recently they teamed up with scientists at PacBio to explore how long-read sequencing might supplement their short-read-based pipeline for gene discovery, using an environmental sample as a test case. They were especially interested in identifying unique, complete, and error-free gene clusters in metagenomic assemblies.


August 19, 2021  |  

Case Study: Diving Deep – Revealing the mysteries of marine life with SMRT Sequencing

Many scientists are using PacBio Single Molecule, Real-Time (SMRT) Sequencing to explore the genomes and transcriptomes of a wide variety of marine species and ecosystems. These studies are already adding to our understanding of how marine species adapt and evolve, contributing to conservation efforts, and informing how we can optimize food production through efficient aquaculture.


August 19, 2021  |  

Case Study: Sequencing an historic bacterial collection for the future

The UK’s National Collection of Type Cultures (NCTC) is a unique collection of more than 5,000 expertly preserved and authenticated bacterial cultures, many of historical significance. Founded in 1920, NCTC is the longest established collection of its type anywhere in the world, with a history of its own that has reflected — and contributed to — the evolution of microbiology for more than 100 years.


June 1, 2021  |  

Comparative genomics of Shiga toxin-producing Escherichia coli O145:H28 strains associated with the 2007 Belgium and 2010 US outbreaks.

Shiga toxin-producing Escherichia coli (STEC) is an emerging pathogen. Recently there has been a global in the number of outbreaks caused by non-O157 STECs, typically involving six serogroups O26, O45, 0103, 0111, and 0145. STEC O145:H28 has been associated with severe human disease including hemolytic-uremic syndrome (HUS), and is demonstrated by the 2007 Belgian ice-cream-associated outbreak and 2010 US lettuce-associated outbreak, with over 10% of patients developing HUS in each. The goal of this work was to do comparative genomics of strains, clinical and environmental, to investigate genome diversity and virulence evolution of this important foodborne pathogen.


June 1, 2021  |  

Genome sequencing of endosymbiotic bacterial Streptomyces sp. from Antartic lichen using Single Molecule Real-time Sequencing (SMRT) technology.

Along with the advent of next-generation sequencing (NGS) techniques, it has become possible to sequence a microbial genome very quickly with high coverage. Recently, PacificBioscience developed single molecule real-time sequencing (SMRT) technology, 3rd generation sequencing platform, which provide much longer (average read length: 1.5Kb) reads without PCR amplification. We did de novo sequencing of Streptomyces sp. using Illumina GAIIx, Roche 454 and PacBio RS system and compared the data. The endosymbiotic bacteria Streptomyces sp. PAMC 26508 was isolated from Antarctic lichen Psoroma sp. that grows attached rocks on Barton Peninsula, King George Island, Antarctica (62, 13’S, 58, 47’W). With 4 SMRT cells, we could get more than 15x coverage of corrected sequence data for de novo assembly. Comparing the performance of other sequencing platforms, PacBio platform could generate data on similar manner with general mid-level GC content organism. In conclusion, PacBio RS system, SMRT technology, shows better performance with high GC content organisms and is expected to be the new tool to improve the de novo sequencing and assembly.


June 1, 2021  |  

Genome sequencing of microbial genomes using Single Molecule Real-time sequencing (SMRT) technology.

In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. Although each technology is able to produce vast quantities of sequence information, in every case the underlying chemistry limits reads to very short lengths. We present a examining de novo assembly comparison with bacterial genome assembly varying genome size (from 3.1Mb to 7.6Mb) and different G+C contents (from 43% to 71%), respectively. We analyzed Solexa reads, 454 reads and Pacbio RS reads from Streptomyces sp. (Genome size, 7.6 Mb; G+C content, 71%), Psychrobacter sp. (Genome size, 3.5 Mb; G+C content, 43%), Salinibacterium sp. (Genome size, 3.1 Mb; G+C content, 61%) and Frigoribacterium sp. (Genome size, 3.3 Mb; G+C content, 63%). We assembly each bacterial genome using Celera assembler 7.0 with and without PacBio RS reads. We found out that the assemble result with Pacbio RS reads have less contigs and scaffolds, and better N50 values.


June 1, 2021  |  

Advances in sequence consensus and clustering algorithms for effective de novo assembly and haplotyping applications.

One of the major applications of DNA sequencing technology is to bring together information that is distant in sequence space so that understanding genome structure and function becomes easier on a large scale. The Single Molecule Real Time (SMRT) Sequencing platform provides direct sequencing data that can span several thousand bases to tens of thousands of bases in a high-throughput fashion. In contrast to solving genomic puzzles by patching together smaller piece of information, long sequence reads can decrease potential computation complexity by reducing combinatorial factors significantly. We demonstrate algorithmic approaches to construct accurate consensus when the differences between reads are dominated by insertions and deletions. High-performance implementations of such algorithms allow more efficient de novo assembly with a pre-assembly step that generates highly accurate, consensus-based reads which can be used as input for existing genome assemblers. In contrast to recent hybrid assembly approach, only a single ~10 kb or longer SMRTbell library is necessary for the hierarchical genome assembly process (HGAP). Meanwhile, with a sensitive read-clustering algorithm with the consensus algorithms, one is able to discern haplotypes that differ by less than 1% different from each other over a large region. One of the related applications is to generate accurate haplotype sequences for HLA loci. Long sequence reads that can cover the whole 3 kb to 4 kb diploid genomic regions will simplify the haplotyping process. These algorithms can also be applied to resolve individual populations within mixed pools of DNA molecules that are similar to each, e.g., by sequencing viral quasi-species samples.


June 1, 2021  |  

Automated, non-hybrid de novo genome assemblies and epigenomes of bacterial pathogens.

Understanding the genetic basis of infectious diseases is critical to enacting effective treatments, and several large-scale sequencing initiatives are underway to collect this information. Sequencing bacterial samples is typically performed by mapping sequence reads against genomes of known reference strains. While such resequencing informs on the spectrum of single-nucleotide differences relative to the chosen reference, it can miss numerous other forms of variation known to influence pathogenicity: structural variations (duplications, inversions), acquisition of mobile elements (phages, plasmids), homonucleotide length variation causing phase variation, and epigenetic marks (methylation, phosphorothioation) that influence gene expression to switch bacteria from non- pathogenic to pathogenic states. Therefore, sequencing methods which provide complete, de novo genome assemblies and epigenomes are necessary to fully characterize infectious disease agents in an unbiased, hypothesis-free manner. Hybrid assembly methods have been described that combine long sequence reads from SMRT DNA Sequencing with short reads (SMRT CCS (circular consensus) or second-generation reads), wherein the short reads are used to error-correct the long reads which are then used for assembly. We have developed a new paradigm for microbial de novo assemblies in which SMRT sequencing reads from a single long insert library are used exclusively to close the genome through a hierarchical genome assembly process, thereby obviating the need for a second sample preparation, sequencing run, and data set. We have applied this method to achieve closed de novo genomes with accuracies exceeding QV50 (>99.999%) for numerous disease outbreak samples, including E. coli, Salmonella, Campylobacter, Listeria, Neisseria, and H. pylori. The kinetic information from the same SMRT Sequencing reads is utilized to determine epigenomes. Approximately 70% of all methyltransferase specificities we have determined to date represent previously unknown bacterial epigenetic signatures. With relatively short sequencing run times and automated analysis pipelines, it is possible to go from an unknown DNA sample to its complete de novo genome and epigenome in about a day.


June 1, 2021  |  

Automated, non-hybrid de novo genome assemblies and epigenomes of bacterial pathogens

Understanding the genetic basis of infectious diseases is critical to enacting effective treatments, and several large-scale sequencing initiatives are underway to collect this information. Sequencing bacterial samples is typically performed by mapping sequence reads against genomes of known reference strains. While such resequencing informs on the spectrum of single nucleotide differences relative to the chosen reference, it can miss numerous other forms of variation known to influence pathogenicity: structural variations (duplications, inversions), acquisition of mobile elements (phages, plasmids), homonucleotide length variation causing phase variation, and epigenetic marks (methylation, phosphorothioation) that influence gene expression to switch bacteria from non-pathogenic to pathogenic states. Therefore, sequencing methods which provide complete, de novo genome assemblies and epigenomes are necessary to fully characterize infectious disease agents in an unbiased, hypothesis-free manner. Hybrid assembly methods have been described that combine long sequence reads from SMRT DNA sequencing with short, high-accuracy reads (SMRT (circular consensus sequencing) CCS or second-generation reads) to generate long, highly accurate reads that are then used for assembly. We have developed a new paradigm for microbial de novo assemblies in which long SMRT sequencing reads (average readlengths >5,000 bases) are used exclusively to close the genome through a hierarchical genome assembly process, thereby obviating the need for a second sample preparation, sequencing run and data set. We have applied this method to achieve closed de novo genomes with accuracies exceeding QV50 (>99.999%) to numerous disease outbreak samples, including E. coli, Salmonella, Campylobacter, Listeria, Neisseria, and H. pylori. The kinetic information from the same SMRT sequencing reads is utilized to determine epigenomes. Approximately 70% of all methyltransferase specificities we have determined to date represent previously unknown bacterial epigenetic signatures. The process has been automated and requires less than 1 day from an unknown DNA sample to its complete de novo genome and epigenome.


June 1, 2021  |  

Getting the most out of your PacBio libraries with size selection.

PacBio RS II sequencing chemistries provide read lengths beyond 20 kb with high consensus accuracy. The long read lengths of P4-C2 chemistry and demonstrated consensus accuracy of 99.999% are ideal for applications such as de novo assembly, targeted sequencing and isoform sequencing. The recently launched P5-C3 chemistry generates even longer reads with N50 often >10,000 bp, making it the best choice for scaffolding and spanning structural rearrangements. With these chemistry advances, PacBio’s read length performance is now primarily determined by the SMRTbell library itself. Size selection of a high-quality, sheared 20 kb library using the BluePippin™ System has been demonstrated to increase the N50 read length by as much as 5 kb with C3 chemistry. BluePippin size selection or a more stringent AMPure® PB selection cutoff can be used to recover long fragments from degraded genomic material. The selection of chemistries, P4-C2 versus P5-C3, is highly dependent on the final size distribution of the SMRTbell library and experimental goals. PacBio’s long read lengths also allow for the sequencing of full-length cDNA libraries at single-molecule resolution. However, longer transcripts are difficult to detect due to lower abundance, amplification bias, and preferential loading of smaller SMRTbell constructs. Without size selection, most sequenced transcripts are 1-1.5 kb. Size selection dramatically increases the number of transcripts >1.5 kb, and is essential for >3 kb transcripts.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.