We address the problem of observing personal diploid methylomes, CpG methylome pairs of homologous chromosomes that are distinguishable with respect to phased heterozygous variants (PHVs), which is challenging due to scarcity of PHVs in personal genomes. Single molecule real-time (SMRT) sequencing is promising as it outputs long reads with CpG methylation information, but a serious concern is whether reliable PHVs are available in erroneous SMRT reads with an error rate of ~15%. To overcome the issue, we propose a statistical model that reduces the error rate of phasing CpG site to 1%, thereby calling CpG hypomethylation in each haplotype with…
DNA methylation is the most common form of DNA modification in prokaryotic and eukaryotic genomes. We have applied the method of single-molecule, real-time (SMRT) DNA sequencing that is capable of direct detection of modified bases at single-nucleotide resolution to characterize the specificity of several bacterial DNA methyltransferases (MTases). In addition to previously described SMRT sequencing of N6-methyladenine and 5-methylcytosine, we show that N4-methylcytosine also has a specific kinetic signature and is therefore identifiable using this approach. We demonstrate for all three prokaryotic methylation types that SMRT sequencing confirms the identity and position of the methylated base in cases where the…
DNA methylation has essential roles in transcriptional regulation, imprinting, X chromosome inactivation and other cellular processes, and aberrant CpG methylation is directly involved in the pathogenesis of human imprinting disorders and many cancers. To address the need for a quantitative and highly multiplexed bisulfite sequencing method with long read lengths for targeted CpG methylation analysis, we developed single-molecule real-time bisulfite sequencing (SMRT-BS).Optimized bisulfite conversion and PCR conditions enabled the amplification of DNA fragments up to ~1.5 kb, and subjecting overlapping 625-1491 bp amplicons to SMRT-BS indicated high reproducibility across all amplicon lengths (r?=?0.972) and low standard deviations (=0.10) between individual CpG sites…
CGGBP1 is a repetitive DNA-binding transcription regulator with target sites at CpG-rich sequences such as CGG repeats and Alu-SINEs and L1-LINEs. The role of CGGBP1 as a possible mediator of CpG methylation however remains unknown. At CpG-rich sequences cytosine methylation is a major mechanism of transcriptional repression. Concordantly, gene-rich regions typically carry lower levels of CpG methylation than the repetitive elements. It is well known that at interspersed repeats Alu-SINEs and L1-LINEs high levels of CpG methylation constitute a transcriptional silencing and retrotransposon inactivating mechanism.Here, we have studied genome-wide CpG methylation with or without CGGBP1-depletion. By high throughput sequencing of…
Genome instability is a hallmark of many tumors and recently, next-generation sequencing methods have enabled analyses of tumor genomes at an unprecedented level. Studying rearrangement-prone chromosomal regions (putative “breakpoint hotspots”) in detail, however, necessitates molecular assays that can detect de novo DNA fusions arising from these hotspots. Here we demonstrate the utility of a long-distance inverse PCR-based method for the detection and screening of de novo DNA rearrangements in uterine leiomyomas, one of the most common types of human neoplasm. This assay allows in principle any genomic region suspected of instability to be queried for DNA rearrangements originating there. No…
Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it produces long read lengths, and its kinetic information is sensitive to DNA modifications.We propose a novel linear-time algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Using a practical…
For the majority of congenital heart diseases (CHDs), the full complexity of the causative molecular network, which is driven by genetic, epigenetic, and environmental factors, is yet to be elucidated. Epigenetic alterations are suggested to play a pivotal role in modulating the phenotypic expression of CHDs and their clinical course during life. Candidate approaches implied that DNA methylation might have a developmental role in CHD and contributes to the long-term progress of non-structural cardiac diseases. The aim of the present study is to define the postnatal epigenome of two common cardiac malformations, representing epigenetic memory, and adaption to hemodynamic alterations,…
Testing for conserved and novel mechanisms underlying phenotypic evolution requires a diversity of genomes available for comparison spanning multiple independent lineages. For example, complex social behavior in insects has been investigated primarily with eusocial lineages, nearly all of which are Hymenoptera. If conserved genomic influences on sociality do exist, we need data from a wider range of taxa that also vary in their levels of sociality. Here, we present the assembled and annotated genome of the subsocial beetle Nicrophorus vespilloides, a species long used to investigate evolutionary questions of complex social behavior. We used this genome to address two questions.…
For the past two decades, bisulfite sequencing has been a widely used method for quantitative CpG methylation detection of genomic DNA. Coupled with PCR amplicon cloning, bisulfite Sanger sequencing allows for allele-specific CpG methylation assessment; however, its time-consuming protocol and inability to multiplex has recently been overcome by next-generation bisulfite sequencing techniques. Although high-throughput sequencing platforms have enabled greater accuracy in CpG methylation quantitation as a result of increased bisulfite sequencing depth, most common sequencing platforms generate reads that are similar in length to the typical bisulfite PCR size range (~300-500 bp). Using the Pacific Biosciences (PacBio) sequencing platform, we developed…
Butanol is currently one of the most discussed biofuels. Its use provides many benefits in comparison to bio-ethanol, but the price of its fermentative production is still high. Genetic improvements could help solve many problems associated with butanol production during ABE fermentation, such as its toxicity, low concentration achievable in the cultivation medium, the need for a relatively expensive substrate, and many more. Clostridium pasteurianum NRRL B-598 is non-type strain producing butanol, acetone, and a negligible amount of ethanol. Its main benefits are high oxygen tolerance, utilization of a wide range of carbon and nitrogen sources, and the availability of…
A gene-level targeted enrichment method for direct detection of epigenetic modifications is described. The approach is demonstrated on the CGG-repeat region of the FMR1 gene, for which large repeat expansions, hitherto refractory to sequencing, are known to cause fragile X syndrome. In addition to achieving a single-locus enrichment of nearly 700,000-fold, the elimination of all amplification steps removes PCR-induced bias in the repeat count and preserves the native epigenetic modifications of the DNA. In conjunction with the single-molecule real-time sequencing approach, this enrichment method enables direct readout of the methylation status and the CGG repeat number of the FMR1 allele(s)…
Micro-algae of the genus Ostreococcus and related species of the order Mamiellales are globally distributed in the photic zone of world’s oceans where they contribute to fixation of atmospheric carbon and production of oxygen, besides providing a primary source of nutrition in the food web. Their tiny size, simple cells, ease of culture, compact genomes and susceptibility to the most abundant large DNA viruses in the sea render them attractive as models for integrative marine biology. In culture, spontaneous resistance to viruses occurs frequently. Here, we show that virus-producing resistant cell lines arise in many independent cell lines during lytic…