Quality control Archives

August 19, 2021 | Sample + library preparation

Technical note — Preparing DNA for PacBio HiFi sequencing – Extraction and quality control

Single Molecule, Real-Time (SMRT) Sequencing uses the natural process of DNA replication to sequence long fragments of native DNA in order to produce highly accurate long reads, or HiFi reads. As such, starting with high-quality, high molecular weight (HMW) genomic DNA (gDNA) will result in longer libraries and better performance during sequencing. This technical note is intended to give recommendations, tips and tricks for the extraction of DNA, as well as assessing and preserving the quality and size of your DNA sample to be used for HiFi sequencing.

July 19, 2019

A benchmark study on error assessment and quality control of CCS reads derived from the PacBio RS.

PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.

July 19, 2019

Comparative genomics reveals the diversity of restriction-modification systems and DNA methylation sites in Listeria monocytogenes.

Listeria monocytogenes is a bacterial pathogen that is found in a wide variety of anthropogenic and natural environments. Genome sequencing technologies are rapidly becoming a powerful tool in facilitating our understanding of how genotype, classification phenotypes, and virulence phenotypes interact to predict the health risks of individual bacterial isolates. Currently, 57 closed L. monocytogenes genomes are publicly available, representing three of the four phylogenetic lineages, and they suggest that L. monocytogenes has high genomic synteny. This study contributes an additional 15 closed L. monocytogenes genomes that were used to determine the associations between the genome and methylome with host invasion magnitude. In contrast to previous findings, large chromosomal inversions and rearrangements were detected in five isolates at the chromosome terminus and within rRNA genes, including a previously undescribed inversion within rRNA-encoding regions. Each isolate’s epigenome contained highly diverse methyltransferase recognition sites, even within the same serotype and methylation pattern. Eleven strains contained a single chromosomally encoded methyltransferase, one strain contained two methylation systems (one system on a plasmid), and three strains exhibited no methylation, despite the occurrence of methyltransferase genes. In three isolates a new, unknown DNA modification was observed in addition to diverse methylation patterns, accompanied by a novel methylation system. Neither chromosome rearrangement nor strain-specific patterns of epigenome modification observed within virulence genes were correlated with serotype designation, clonal complex, or in vitro infectivity. These data suggest that genome diversity is larger than previously considered in L. monocytogenes and that as more genomes are sequenced, additional structure and methylation novelty will be observed in this organism.Listeria monocytogenes is the causative agent of listeriosis, a disease which manifests as gastroenteritis, meningoencephalitis, and abortion. Among Salmonella, Escherichia coli, Campylobacter, and Listeria-causing the most prevalent foodborne illnesses-infection by L. monocytogenes carries the highest mortality rate. The ability of L. monocytogenes to regulate its response to various harsh environments enables its persistence and transmission. Small-scale comparisons of L. monocytogenes focusing solely on genome contents reveal a highly syntenic genome yet fail to address the observed diversity in phenotypic regulation. This study provides a large-scale comparison of 302 L. monocytogenes isolates, revealing the importance of the epigenome and restriction-modification systems as major determinants of L. monocytogenes phylogenetic grouping and subsequent phenotypic expression. Further examination of virulence genes of select outbreak strains reveals an unprecedented diversity in methylation statuses despite high degrees of genome conservation. Copyright © 2017 American Society for Microbiology.

July 19, 2019

Quality control of the traditional patent medicine Yimu Wan based on SMRT Sequencing and DNA barcoding.

Substandard traditional patent medicines may lead to global safety-related issues. Protecting consumers from the health risks associated with the integrity and authenticity of herbal preparations is of great concern. Of particular concern is quality control for traditional patent medicines. Here, we establish an effective approach for verifying the biological composition of traditional patent medicines based on single-molecule real-time (SMRT) sequencing and DNA barcoding. Yimu Wan (YMW), a classical herbal prescription recorded in the Chinese Pharmacopoeia, was chosen to test the method. Two reference YMW samples were used to establish a standard method for analysis, which was then applied to three different batches of commercial YMW samples. A total of 3703 and 4810 circular-consensus sequencing (CCS) reads from two reference and three commercial YMW samples were mapped to the ITS2 and psbA-trnH regions, respectively. Moreover, comparison of intraspecific genetic distances based on SMRT sequencing data with reference data from Sanger sequencing revealed an ITS2 and psbA-trnH intergenic spacer that exhibited high intraspecific divergence, with the sites of variation showing significant differences within species. Using the CCS strategy for SMRT sequencing analysis was adequate to guarantee the accuracy of identification. This study demonstrates the application of SMRT sequencing to detect the biological ingredients of herbal preparations. SMRT sequencing provides an affordable way to monitor the legality and safety of traditional patent medicines.

July 7, 2019

StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics.

Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. “provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages.

July 7, 2019

A spontaneous mutation in kdsD, a biosynthesis gene for 3 Deoxy-D-manno-Octulosonic Acid, occurred in a ciprofloxacin resistant strain of Francisella tularensis and caused a high level of attenuation in murine models of tularemia.

Francisella tularensis, a gram-negative facultative intracellular bacterial pathogen, is the causative agent of tularemia and able to infect many mammalian species, including humans. Because of its ability to cause a lethal infection, low infectious dose, and aerosolizable nature, F. tularensis subspecies tularensis is considered a potential biowarfare agent. Due to its in vitro efficacy, ciprofloxacin is one of the antibiotics recommended for post-exposure prophylaxis of tularemia. In order to identify therapeutics that will be efficacious against infections caused by drug resistant select-agents and to better understand the threat, we sought to characterize an existing ciprofloxacin resistant (CipR) mutant in the Schu S4 strain of F. tularensis by determining its phenotypic characteristics and sequencing the chromosome to identify additional genetic alterations that may have occurred during the selection process. In addition to the previously described genetic alterations, the sequence of the CipR mutant strain revealed several additional mutations. Of particular interest was a frameshift mutation within kdsD which encodes for an enzyme necessary for the production of 3-Deoxy-D-manno-Octulosonic Acid (KDO), an integral component of the lipopolysaccharide (LPS). A kdsD mutant was constructed in the Schu S4 strain. Although it was not resistant to ciprofloxacin, the kdsD mutant shared many phenotypic characteristics with the CipR mutant, including growth defects under different conditions, sensitivity to hydrophobic agents, altered LPS profiles, and attenuation in multiple models of murine tularemia. This study demonstrates that the KdsD enzyme is essential for Francisella virulence and may be an attractive therapeutic target for developing novel medical countermeasures.

July 7, 2019

Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii.

The probiotic yeast, Saccharomyces boulardii (Sb) is known to be effective against many gastrointestinal disorders and antibiotic-associated diarrhea. To understand molecular basis of probiotic-properties ascribed to Sb we determined the complete genomes of two strains of Sb i.e. Biocodex and unique28 and the draft genomes for three other Sb strains that are marketed as probiotics in India. We compared these genomes with 145 strains of S. cerevisiae (Sc) to understand genome-level similarities and differences between these yeasts. A distinctive feature of Sb from other Sc is absence of Ty elements Ty1, Ty3, Ty4 and associated LTR. However, we could identify complete Ty2 and Ty5 elements in Sb. The genes for hexose transporters HXT11 and HXT9, and asparagine-utilization are absent in all Sb strains. We find differences in repeat periods and copy numbers of repeats in flocculin genes that are likely related to the differential adhesion of Sb as compared to Sc. Core-proteome based taxonomy places Sb strains along with wine strains of Sc. We find the introgression of five genes from Z. bailii into the chromosome IV of Sb and wine strains of Sc. Intriguingly, genes involved in conferring known probiotic properties to Sb are conserved in most Sc strains.

July 7, 2019

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism.

Plasmopara viticola causes downy mildew disease of grapevine which is one of the most devastating diseases of viticulture worldwide. Here we report a 101.3?Mb whole genome sequence of P. viticola isolate ‘JL-7-2’ obtained by a combination of Illumina and PacBio sequencing technologies. The P. viticola genome contains 17,014 putative protein-coding genes and has ~26% repetitive sequences. A total of 1,301 putative secreted proteins, including 100 putative RXLR effectors and 90 CRN effectors were identified in this genome. In the secretome, 261 potential pathogenicity genes and 95 carbohydrate-active enzymes were predicted. Transcriptional analysis revealed that most of the RXLR effectors, pathogenicity genes and carbohydrate-active enzymes were significantly up-regulated during infection. Comparative genomic analysis revealed that P. viticola evolved independently from the Arabidopsis downy mildew pathogen Hyaloperonospora arabidopsidis. The availability of the P. viticola genome provides a valuable resource not only for comparative genomic analysis and evolutionary studies among oomycetes, but also enhance our knowledge on the mechanism of interactions between this biotrophic pathogen and its host.

July 7, 2019

Genetic and genomic tools for Cannabis sativa

The Cannabis industry is currently one of the fastest growing industries in the United States. Given the changing legal status of the plant, and the rapidly advancing research, updated information on the advancement of Cannabis genomics is needed. This versatile plant is used as medicine and for food, fiber, and bioremediation. Insights from modern, high-throughput genomic technology are revolutionizing our understanding of the plant and are providing new tools to further improve our knowledge and utilization of this unique species. This review quantifies and evaluates the currently available genomic resources for Cannabis research, including six whole-genome assemblies, two transcriptomes, and 393 other substantial genomic resources, as well as other smaller publicly available genetic and genomic resources. The open-source approaches followed by many leading scientists in the field promote collaboration and facilitate these rapid advances.

July 7, 2019

Genomic and phenotypic analyses of Pseudomonas psychrotolerans PRS08-11306 reveal a turnerbactin biosynthesis gene cluster that contributes to nitrogen fixation.

Plant-microbe interactions can provide agronomic benefits, such as enhancing nutrient uptake and providing fixed nitrogen. The Pseudomonas psychrotolerans strain PRS08-11306 was isolated from rice seeds and can enhance plant growth. Here, we analyzed the P. psychrotolerans genome, which is ~5Mb, with 4389 coding sequences, 77 tRNAs, and 7 rRNAs. Genome analysis identified a cluster of turnerbactin biosynthetic genes, which are responsible for the production of a catecholate siderophore and contribute to nitrogen fixation for the host. Analysis of the transcription factor mutant ?rpoS, which does not express this gene cluster, confirmed the relationship between the gene cluster and siderophore production. The nitrogen fixation characteristics of the cluster were confirmed in a plant growth-promoting experiment. The annotated full genome sequence of this strain sheds light on the role of P. psychrotolerans PRS08-11306 as a plant beneficial bacterium. Copyright © 2017. Published by Elsevier B.V.

July 7, 2019

Complete genome sequence of the cellulose-producing strain Komagataeibacter nataicola RZS01.

Komagataeibacter nataicola is an acetic acid bacterium (AAB) that can produce abundant bacterial cellulose and tolerate high concentrations of acetic acid. To globally understand its fermentation characteristics, we present a high-quality complete genome sequence of K. nataicola RZS01. The genome consists of a 3,485,191-bp chromosome and 6 plasmids, which encode 3,514 proteins and bear three cellulose synthase operons. Phylogenetic analysis at the genome level provides convincing evidence of the evolutionary position of K. nataicola with respect to related taxa. Genomic comparisons with other AAB revealed that RZS01 shares 36.1%~75.1% of sequence similarity with other AAB. The sequence data was also used for metabolic analysis of biotechnological substrates. Analysis of the resistance to acetic acid at the genomic level indicated a synergistic mechanism responsible for acetic acid tolerance. The genomic data provide a viable platform that can be used to understand and manipulate the phenotype of K. nataicola RZS01 to further improve bacterial cellulose production.

July 7, 2019

Microsatellite length scoring by Single Molecule Real Time Sequencing – Effects of sequence structure and PCR regime.

Microsatellites are DNA sequences consisting of repeated, short (1-6 bp) sequence motifs that are highly mutable by enzymatic slippage during replication. Due to their high intrinsic variability, microsatellites have important applications in population genetics, forensics, genome mapping, as well as cancer diagnostics and prognosis. The current analytical standard for microsatellites is based on length scoring by high precision electrophoresis, but due to increasing efficiency next-generation sequencing techniques may provide a viable alternative. Here, we evaluated single molecule real time (SMRT) sequencing, implemented in the PacBio series of sequencing apparatuses, as a means of microsatellite length scoring. To this end we carried out multiplexed SMRT sequencing of plasmid-carried artificial microsatellites of varying structure under different pre-sequencing PCR regimes. For each repeat structure, reads corresponding to the target length dominated. We found that pre-sequencing amplification had large effects on scoring accuracy and error distribution relative to controls, but that the effects of the number of amplification cycles were generally weak. In line with expectations enzymatic slippage decreased proportionally with microsatellite repeat unit length and increased with repetition number. Finally, we determined directional mutation trends, showing that PCR and SMRT sequencing introduced consistent but opposing error patterns in contraction and expansion of the microsatellites on the repeat motif and single nucleotide level.

July 7, 2019

Representing genetic variation with synthetic DNA standards.

The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and by biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed ‘sequins’, that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, which allows them them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy-number variation. We validate the design and performance of sequin standards by comparison to examples in the NA12878 reference genome, and we demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.

July 7, 2019

Understanding the pathogenicity of Burkholderia contaminans, an emerging pathogen in cystic fibrosis.

Several bacterial species from the Burkholderia cepacia complex (Bcc) are feared opportunistic pathogens that lead to debilitating lung infections with a high risk of developing fatal septicemia in cystic fibrosis (CF) patients. However, the pathogenic potential of other Bcc species is yet unknown. To elucidate clinical relevance of Burkholderia contaminans, a species frequently isolated from CF respiratory samples in Ibero-American countries, we aimed to identify its key virulence factors possibly linked with an unfavorable clinical outcome. We performed a genome-wide comparative analysis of two isolates of B. contaminans ST872 from sputum and blood culture of a female CF patient in Argentina. RNA-seq data showed significant changes in expression for quorum sensing-regulated virulence factors and motility and chemotaxis. Furthermore, we detected expression changes in a recently described low-oxygen-activated (lxa) locus which encodes stress-related proteins, and for two clusters responsible for the biosynthesis of antifungal and hemolytic compounds pyrrolnitrin and occidiofungin. Based on phenotypic assays that confirmed changes in motility and in proteolytic, hemolytic and antifungal activities, we were able to distinguish two phenotypes of B. contaminans that coexisted in the host and entered her bloodstream. Whole genome sequencing revealed that the sputum and bloodstream isolates (each representing a distinct phenotype) differed by over 1,400 mutations as a result of a mismatch repair-deficient hypermutable state of the sputum isolate. The inferred lack of purifying selection against nonsynonymous mutations and the high rate of pseudogenization in the derived isolate indicated limited evolutionary pressure during evolution in the nutrient-rich, stable CF sputum environment. The present study is the first to examine the genomic and transcriptomic differences between longitudinal isolates of B. contaminans. Detected activity of a number of putative virulence factors implies a genuine pathogenic nature of this novel Bcc species.

July 7, 2019

The complete genome sequence of the nicotine-degrading bacterium Shinella sp. HZN7.

Nicotine is a natural alkaloid that is very toxic to humans. To eliminate the harmful effects of nicotine in the environment, biological methods employing microbes to degrade nicotine are required (Brandsch, 2006; Liu et al., 2015). Shinella sp. HZN7 can degrade nicotine efficiently via the variant of a pyridine and pyrrolidine pathways (VPP; Ma et al., 2013; Qiu et al., 2014, 2015). The main intermediates in this pathway include 6-hydroxy-nicotine, 6-hydroxy-N-methylmyosmine, 6-hydroxypseudooxynicotine, 6-hydroxy-3-succinoyl-pyridine, and 2,5-dihydroxypyridine. This strain is the first nicotine-degrading bacterium to be isolated from the genus Shinella.

Asset Tag: Quality control

Technical note — Preparing DNA for PacBio HiFi sequencing – Extraction and quality control

A benchmark study on error assessment and quality control of CCS reads derived from the PacBio RS.

Comparative genomics reveals the diversity of restriction-modification systems and DNA methylation sites in Listeria monocytogenes.

Quality control of the traditional patent medicine Yimu Wan based on SMRT Sequencing and DNA barcoding.

StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics.

A spontaneous mutation in kdsD, a biosynthesis gene for 3 Deoxy-D-manno-Octulosonic Acid, occurred in a ciprofloxacin resistant strain of Francisella tularensis and caused a high level of attenuation in murine models of tularemia.

Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii.

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism.

Genetic and genomic tools for Cannabis sativa

Genomic and phenotypic analyses of Pseudomonas psychrotolerans PRS08-11306 reveal a turnerbactin biosynthesis gene cluster that contributes to nitrogen fixation.

Complete genome sequence of the cellulose-producing strain Komagataeibacter nataicola RZS01.

Microsatellite length scoring by Single Molecule Real Time Sequencing – Effects of sequence structure and PCR regime.

Representing genetic variation with synthetic DNA standards.

Understanding the pathogenicity of Burkholderia contaminans, an emerging pathogen in cystic fibrosis.

The complete genome sequence of the nicotine-degrading bacterium Shinella sp. HZN7.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert