2015 SMRT Informatics Developers Conference Presentation Slides: Shinichi Morishita of the University of Tokyo presented on how his team has been using SMRT Sequencing to better understand methylomes, metagenomes and structural variation of various eukaryotic genomes.
In addition to the genome and transcriptome, epigenetic information is essential to understand biological processes and their regulation, and their misregulation underlying disease. Traditionally, epigenetic DNA modifications are detected using upfront sample preparation steps such as bisulfite conversion, followed by sequencing. Bisulfite sequencing has provided a wealth of knowledge about human epigenetics, however it does not access the entire genome due to limitations in read length and GC- bias of the sequencing technologies used. In contrast, Single Molecule, Real-Time (SMRT) DNA Sequencing is unique in that it can detect DNA base modifications as part of the sequencing process. It can thereby leverage the long read lengths and lack of GC bias for more comprehensive views of the human epigenome. I will highlight several examples of this capability towards the generation of new biological insights, including the resolution of methylation states in repetitive and GC-rich regions of the genome, and large-scale changes in the methylation status across a cancer genome as a function of drug sensitivity.
In this ASHG 2020 PacBio Workshop Emily Farrow of Children’s Mercy Kansas City, shares how the incorporation of long-read sequencing into the Genomic Answers for Kids research study is increasing…
Characterization of Reference Materials for Genetic Testing of CYP2D6 Alleles: A GeT-RM Collaborative Project.
Pharmacogenetic testing increasingly is available from clinical and research laboratories. However, only a limited number of quality control and other reference materials currently are available for the complex rearrangements and rare variants that occur in the CYP2D6 gene. To address this need, the Division of Laboratory Systems, CDC-based Genetic Testing Reference Material Coordination Program, in collaboration with members of the pharmacogenetic testing and research communities and the Coriell Cell Repositories (Camden, NJ), has characterized 179 DNA samples derived from Coriell cell lines. Testing included the recharacterization of 137 genomic DNAs that were genotyped in previous Genetic Testing Reference Material Coordination Program studies and 42 additional samples that had not been characterized previously. DNA samples were distributed to volunteer testing laboratories for genotyping using a variety of commercially available and laboratory-developed tests. These publicly available samples will support the quality-assurance and quality-control programs of clinical laboratories performing CYP2D6 testing.Published by Elsevier Inc.
Rational development of transformation in Clostridium thermocellum ATCC 27405 via complete methylome analysis and evasion of native restriction-modification systems.
A major barrier to both metabolic engineering and fundamental biological studies is the lack of genetic tools in most microorganisms. One example is Clostridium thermocellum ATCC 27405T, where genetic tools are not available to help validate decades of hypotheses. A significant barrier to DNA transformation is restriction-modification systems, which defend against foreign DNA methylated differently than the host. To determine the active restriction-modification systems in this strain, we performed complete methylome analysis via single-molecule, real-time sequencing to detect 6-methyladenine and 4-methylcytosine and the rarely used whole-genome bisulfite sequencing to detect 5-methylcytosine. Multiple active systems were identified, and corresponding DNA methyltransferases were expressed from the Escherichia coli chromosome to mimic the C. thermocellum methylome. Plasmid methylation was experimentally validated and successfully electroporated into C. thermocellum ATCC 27405. This combined approach enabled genetic modification of the C. thermocellum-type strain and acts as a blueprint for transformation of other non-model microorganisms.
DNA Methylation at the Schizophrenia and Intelligence GWAS-Implicated MIR137HG Locus May Be Associated with Disease and Cognitive Functions
The largest genome-wide association studies have identified schizophrenia and intelligence associated variants in the MIR137HG locus containing genes encoding microRNA-137 and microRNA-2682. In the present study, we investigated DNA methylation in the MIR137HG intragenic CpG island (CGI) in the peripheral blood of 44 patients with schizophrenia and 50 healthy controls. The CGI included the entire MIR137 gene and the region adjacent to the 5′-end of MIR2682. The aim of the study was to examine the relationship of the CGI methylation with schizophrenia and cognitive functioning. The methylation level of 91 CpG located in the selected region was established for each participant by means of single-molecule real-time bisulfite sequencing. All subjects completed the battery of neuropsychological tests. We found that the CGI was hypomethylated in both groups, except for one site—CpG (chr1: 98?511?049), with significant interindividual variability in methylation. A higher level of methylation of this CpG was seen in male patients and was associated with a decrease in the cognitive index in the combined sample of patients and controls. Our data suggest that further investigation of mechanisms that regulate the MIR137 and MIR2682 genes expression might help to understand the molecular basis of cognitive deficits in schizophrenia.
DNA methylation is a process by which methyl groups are added to cytosine or adenine. DNA methylation can change the activity of the DNA molecule without changing the sequence. Methylation of 5-methylcytosine (5mC) is widespread in both eukaryotes and prokaryotes, and it is a very important epigenetic modification event, which can regulate gene activity and influence a number of key processes such as genomic imprinting, cell differentiation, transcriptional regulation, and chromatin remodeling. Profiling DNA methylation across the genome is critical to understanding the influence of methylation in normal biology and diseases including cancer. Recent discoveries of 5-methylcytosine (5mC) oxidation derivatives including 5-hydroxymethylcytosine (5hmC), 5-formylcytsine (5fC), and 5-carboxycytosine (5caC) in mammalian genome further expand our understanding of the methylation regulation. Genome-wide analyses such as microarrays and next-generation sequencing technologies have been used to assess large fractions of the methylome. A number of different quantitative approaches have also been established to map the DNA epigenomes with single-base resolution, as represented by the bisulfite-based methods, such as classical bisulfite sequencing, pyrosequencing etc. These methods have been used to generate base-resolution maps of 5mC and its oxidation derivatives in genomic samples. The focus of this chapter is to provide the methodologies that have been developed to detect the cytosine derivatives in the genomic DNA.
Prokaryotic DNA contains three types of methylation: N6-methyladenine, N4-methylcytosine and 5-methylcytosine. The lack of tools to analyse the frequency and distribution of methylated residues in bacterial genomes has prevented a full understanding of their functions. Now, advances in DNA sequencing technology, including single-molecule, real-time sequencing and nanopore-based sequencing, have provided new opportunities for systematic detection of all three forms of methylated DNA at a genome-wide scale and offer unprecedented opportunities for achieving a more complete understanding of bacterial epigenomes. Indeed, as the number of mapped bacterial methylomes approaches 2,000, increasing evidence supports roles for methylation in regulation of gene expression, virulence and pathogen-host interactions.
Long-read sequencing unveils IGH-DUX4 translocation into the silenced IGH allele in B-cell acute lymphoblastic leukemia.
[email protected] proto-oncogene translocation is a common oncogenic event in lymphoid lineage cancers such as B-ALL, lymphoma and multiple myeloma. Here, to investigate the interplay between [email protected] proto-oncogene translocation and IGH allelic exclusion, we perform long-read whole-genome and transcriptome sequencing along with epigenetic and 3D genome profiling of Nalm6, an IGH-DUX4 positive B-ALL cell line. We detect significant allelic imbalance on the wild-type over the IGH-DUX4 haplotype in expression and epigenetic data, showing IGH-DUX4 translocation occurs on the silenced IGH allele. In vitro, this reduces the oncogenic stress of DUX4 high-level expression. Moreover, patient samples of IGH-DUX4 B-ALL have similar expression profile and IGH breakpoints as Nalm6, suggesting a common mechanism to allow optimal dosage of non-toxic DUX4 expression.
Corals comprise a biomineralizing cnidarian, dinoflagellate algal symbionts, and associated microbiome of prokaryotes and viruses. Ongoing efforts to conserve coral reefs by identifying the major stress response pathways and thereby laying the foundation to select resistant genotypes rely on a robust genomic foundation. Here we generated and analyzed a high quality long-read based ~886 Mbp nuclear genome assembly and transcriptome data from the dominant rice coral, Montipora capitata from Hawai’i. Our work provides insights into the architecture of coral genomes and shows how they differ in size and gene inventory, putatively due to population size variation. We describe a recent example of foreign gene acquisition via a bacterial gene transfer agent and illustrate the major pathways of stress response that can be used to predict regulatory components of the transcriptional networks in M. capitata. These genomic resources provide insights into the adaptive potential of these sessile, long-lived species in both natural and human influenced environments and facilitate functional and population genomic studies aimed at Hawaiian reef restoration and conservation.
The DNA base modification N6-methyladenine (m6A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m6A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.
Development of a metabolic pathway transfer and genomic integration system for the syngas-fermenting bacterium Clostridium ljungdahlii.
Clostridium spp. can synthesize valuable chemicals and fuels by utilizing diverse waste-stream substrates, including starchy biomass, lignocellulose, and industrial waste gases. However, metabolic engineering in Clostridium spp. is challenging due to the low efficiency of gene transfer and genomic integration of entire biosynthetic pathways.We have developed a reliable gene transfer and genomic integration system for the syngas-fermenting bacterium Clostridium ljungdahlii based on the conjugal transfer of donor plasmids containing large transgene cassettes (>?5 kb) followed by the inducible activation of Himar1 transposase to promote integration. We established a conjugation protocol for the efficient generation of transconjugants using the Gram-positive origins of replication repL and repH. We also investigated the impact of DNA methylation on conjugation efficiency by testing donor constructs with all possible combinations of Dam and Dcm methylation patterns, and used bisulfite conversion and PacBio sequencing to determine the DNA methylation profile of the C. ljungdahlii genome, resulting in the detection of four sequence motifs with N6-methyladenosine. As proof of concept, we demonstrated the transfer and genomic integration of a heterologous acetone biosynthesis pathway using a Himar1 transposase system regulated by a xylose-inducible promoter. The functionality of the integrated pathway was confirmed by detecting enzyme proteotypic peptides and the formation of acetone and isopropanol by C. ljungdahlii cultures utilizing syngas as a carbon and energy source.The developed multi-gene delivery system offers a versatile tool to integrate and stably express large biosynthetic pathways in the industrial promising syngas-fermenting microorganism C. ljungdahlii. The simple transfer and stable integration of large gene clusters (like entire biosynthetic pathways) is expanding the range of possible fermentation products of heterologously expressing recombinant strains. We also believe that the developed gene delivery system can be adapted to other clostridial strains as well.
Transcription activator-like effector nucleases (TALENs) have become a powerful tool for genome editing due to the simple code linking the amino acid sequences of their DNA-binding domains to TALEN nucleotide targets. While the initial TALEN-design guidelines are very useful, user-friendly tools defining optimal TALEN designs for robust genome editing need to be developed. Here we evaluated existing guidelines and developed new design guidelines for TALENs based on 205 TALENs tested, and established the scoring algorithm for predicting TALEN activity (SAPTA) as a new online design tool. For any input gene of interest, SAPTA gives a ranked list of potential TALEN target sites, facilitating the selection of optimal TALEN pairs based on predicted activity. SAPTA-based TALEN designs increased the average intracellular TALEN monomer activity by >3-fold, and resulted in an average endogenous gene-modification frequency of 39% for TALENs containing the repeat variable di-residue NK that favors specificity rather than activity. It is expected that SAPTA will become a useful and flexible tool for designing highly active TALENs for genome-editing applications. SAPTA can be accessed via the website at http://baolab.bme.gatech.edu/Research/BioinformaticTools/TAL_targeter.html.
It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with low-coverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer’s neighborhood.Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with “neighbor” modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.https://github.com/alibashir/EMMCKmer.
PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II’s sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.