Stanford University Archives - Page 2 of 4

September 22, 2019 |

Finding function in mystery transcripts.

Little is known about the function of most long non-coding RNAs. But a suite of new tools might change that.

September 21, 2019 |

Long-read genome sequencing identifies causal structural variation in a Mendelian disease.

PurposeCurrent clinical genomics assays primarily utilize short-read sequencing (SRS), but SRS has limited ability to evaluate repetitive regions and structural variants. Long-read sequencing (LRS) has complementary strengths, and we aimed to determine whether LRS could offer a means to identify overlooked genetic variation in patients undiagnosed by SRS.MethodsWe performed low-coverage genome LRS to identify structural variants in a patient who presented with multiple neoplasia and cardiac myxomata, in whom the results of targeted clinical testing and genome SRS were negative.ResultsThis LRS approach yielded 6,971 deletions and 6,821 insertions?>?50?bp. Filtering for variants that are absent in an unrelated control and overlap a disease gene coding exon identified three deletions and three insertions. One of these, a heterozygous 2,184?bp deletion, overlaps the first coding exon of PRKAR1A, which is implicated in autosomal dominant Carney complex. RNA sequencing demonstrated decreased PRKAR1A expression. The deletion was classified as pathogenic based on guidelines for interpretation of sequence variants.ConclusionThis first successful application of genome LRS to identify a pathogenic variant in a patient suggests that LRS has significant potential for the identification of disease-causing structural variation. Larger studies will ultimately be required to evaluate the potential clinical utility of LRS.

July 19, 2019 |

Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing.

Single-molecule real-time (SMRT) DNA sequencing allows the systematic detection of chemical modifications such as methylation but has not previously been applied on a genome-wide scale. We used this approach to detect 49,311 putative 6-methyladenine (m6A) residues and 1,407 putative 5-methylcytosine (m5C) residues in the genome of a pathogenic Escherichia coli strain. We obtained strand-specific information for methylation sites and a quantitative assessment of the frequency of methylation at each modified position. We deduced the sequence motifs recognized by the methyltransferase enzymes present in this strain without prior knowledge of their specificity. Furthermore, we found that deletion of a phage-encoded methyltransferase-endonuclease (restriction-modification; RM) system induced global transcriptional changes and led to gene amplification, suggesting that the role of RM systems extends beyond protecting host genomes from foreign DNA.

July 19, 2019 |

Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.

Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.

July 19, 2019 |

Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic.

DNA modifications such as methylation and DNA damage can play critical regulatory roles in biological systems. Single molecule, real time (SMRT) sequencing technology generates DNA sequences as well as DNA polymerase kinetic information that can be used for the direct detection of DNA modifications. We demonstrate that local sequence context has a strong impact on DNA polymerase kinetics in the neighborhood of the incorporation site during the DNA synthesis reaction, allowing for the possibility of estimating the expected kinetic rate of the enzyme at the incorporation site using kinetic rate information collected from existing SMRT sequencing data (historical data) covering the same local sequence contexts of interest. We develop an Empirical Bayesian hierarchical model for incorporating historical data. Our results show that the model could greatly increase DNA modification detection accuracy, and reduce requirement of control data coverage. For some DNA modifications that have a strong signal, a control sample is not even needed by using historical data as alternative to control. Thus, sequencing costs can be greatly reduced by using the model. We implemented the model in a R package named seqPatch, which is available at https://github.com/zhixingfeng/seqPatch.

July 19, 2019 |

Polymorphic microsatellite markers for a wind-dispersed tropical tree species, Triplaris cumingiana (Polygonaceae).

Novel microsatellite markers were characterized in the wind-dispersed and dioecious neotropical tree Triplaris cumingiana (Polygonaceae) for use in understanding the ecological processes and genetic impacts of pollen- and seed-mediated gene flow in tropical forests. •Sixty-two microsatellite primer pairs were screened, from which 12 markers showing five or more alleles per locus (range 5-17) were tested on 47 individuals. Observed and expected heterozygosities averaged 0.692 and 0.731, respectively. Polymorphism information content was between 0.417 and 0.874. Linkage disequilibrium was observed in one of the 66 pairwise comparisons between loci. Two loci showed deviation from Hardy-Weinberg equilibrium. An additional 14 markers exhibiting lower polymorphism were characterized on a smaller number of individuals. •These microsatellite markers have high levels of polymorphism and reproducibility and will be useful in studying gene flow and population structure in T. cumingiana.

July 19, 2019 |

Quantifying genome-editing outcomes at endogenous loci with SMRT sequencing.

Targeted genome editing with engineered nucleases has transformed the ability to introduce precise sequence modifications at almost any site within the genome. A major obstacle to probing the efficiency and consequences of genome editing is that no existing method enables the frequency of different editing events to be simultaneously measured across a cell population at any endogenous genomic locus. We have developed a novel method for quantifying individual genome editing outcomes at any site of interest using single molecule real time (SMRT) DNA sequencing. We show that this approach can be applied at various loci, using multiple engineered nuclease platforms including TALENs, RNA guided endonucleases (CRISPR/Cas9), and ZFNs, and in different cell lines to identify conditions and strategies in which the desired engineering outcome has occurred. This approach facilitates the evaluation of new gene editing technologies and permits sensitive quantification of editing outcomes in almost every experimental system used.

July 19, 2019 |

Global methylation state at base-pair resolution of the Caulobacter genome throughout the cell cycle.

The Caulobacter DNA methyltransferase CcrM is one of five master cell-cycle regulators. CcrM is transiently present near the end of DNA replication when it rapidly methylates the adenine in hemimethylated GANTC sequences. The timing of transcription of two master regulator genes and two cell division genes is controlled by the methylation state of GANTC sites in their promoters. To explore the global extent of this regulatory mechanism, we determined the methylation state of the entire chromosome at every base pair at five time points in the cell cycle using single-molecule, real-time sequencing. The methylation state of 4,515 GANTC sites, preferentially positioned in intergenic regions, changed progressively from full to hemimethylation as the replication forks advanced. However, 27 GANTC sites remained unmethylated throughout the cell cycle, suggesting that these protected sites could participate in epigenetic regulatory functions. An analysis of the time of activation of every cell-cycle regulatory transcription start site, coupled to both the position of a GANTC site in their promoter regions and the time in the cell cycle when the GANTC site transitions from full to hemimethylation, allowed the identification of 59 genes as candidates for epigenetic regulation. In addition, we identified two previously unidentified N(6)-methyladenine motifs and showed that they maintained a constant methylation state throughout the cell cycle. The cognate methyltransferase was identified for one of these motifs as well as for one of two 5-methylcytosine motifs.

July 19, 2019 |

Resolving complex tandem repeats with long reads.

Resolving tandemly repeated genomic sequences is a necessary step in improving our understanding of the human genome. Short tandem repeats (TRs), or microsatellites, are often used as molecular markers in genetics, and clinically, variation in microsatellites can lead to genetic disorders like Huntington’s diseases. Accurately resolving repeats, and in particular TRs, remains a challenging task in genome alignment, assembly and variation calling. Though tools have been developed for detecting microsatellites in short-read sequencing data, these are limited in the size and types of events they can resolve. Single-molecule sequencing technologies may potentially resolve a broader spectrum of TRs given their increased length, but require new approaches given their significantly higher raw error profiles. However, due to inherent error profiles of the single-molecule technologies, these reads presents a unique challenge in terms of accurately identifying and estimating the TRs.Here we present PacmonSTR, a reference-based probabilistic approach, to identify the TR region and estimate the number of these TR elements in long DNA reads. We present a multistep approach that requires as input, a reference region and the reference TR element. Initially, the TR region is identified from the long DNA reads via a 3-stage modified Smith-Waterman approach and then, expected number of TR elements is calculated using a pair-Hidden Markov Models-based method. Finally, TR-based genotype selection (or clustering: homozygous/heterozygous) is performed with Gaussian mixture models, using the Akaike information criteria, and coverage expectations. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

July 7, 2019 |

Late pleistocene Australian marsupial DNA clarifies the affinities of extinct megafaunal kangaroos and wallabies.

Understanding the evolution of Australia’s extinct marsupial megafauna has been hindered by a relatively incomplete fossil record and convergent or highly specialized morphology, which confound phylogenetic analyses. Further, the harsh Australian climate and early date of most megafaunal extinctions (39-52 ka) means that the vast majority of fossil remains are unsuitable for ancient DNA analyses. Here, we apply cross-species DNA capture to fossils from relatively high latitude, high altitude caves in Tasmania. Using low-stringency hybridization and high-throughput sequencing, we were able to retrieve mitochondrial sequences from two extinct megafaunal macropodid species. The two specimens, Simosthenurus occidentalis (giant short-faced kangaroo) and Protemnodon anak (giant wallaby), have been radiocarbon dated to 46-50 and 40-45 ka, respectively. This is significantly older than any Australian fossil that has previously yielded DNA sequence information. Processing the raw sequence data from these samples posed a bioinformatic challenge due to the poor preservation of DNA. We explored several approaches in order to maximize the signal-to-noise ratio in retained sequencing reads. Our findings demonstrate the critical importance of adopting stringent processing criteria when distant outgroups are used as references for mapping highly fragmented DNA. Based on the most stringent nucleotide data sets (879 bp for S. occidentalis and 2,383 bp for P. anak), total-evidence phylogenetic analyses confirm that macropodids consist of three primary lineages: Sthenurines such as Simosthenurus (extinct short-faced kangaroos), the macropodines (all other wallabies and kangaroos), and the enigmatic living banded hare-wallaby Lagostrophus fasciatus (Lagostrophinae). Protemnodon emerges as a close relative of Macropus (large living kangaroos), a position not supported by recent morphological phylogenetic analyses. © The Authors 2014. Published by Oxford University Press on behalf of Molecular Biology and Evolution. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

July 7, 2019 |

Do read errors matter for genome assembly?

While most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read length and error rate in terms of the information needed for the perfect assembly of the genome. Using an adversarial erasure error model, we make progress on this problem by establishing a critical read length, as a function of the genome and the error rate, above which perfect assembly is guaranteed. For several real genomes, including those from the GAGE dataset, we verify that this critical read length is not significantly greater than the read length required for perfect assembly from reads without errors.

July 7, 2019 |

Complete sequences of six IncA/C plasmids of multidrug-resistant Salmonella enterica subsp. enterica serotype Newport.

Multidrug-resistant (MDR) Salmonella enterica subsp. enterica serotype Newport has been a long-standing public health concern in the United States. We present the complete sequences of six IncA/C plasmids from animal-derived MDR S. Newport ranging from 80.1 to 158.5 kb. They shared a genetic backbone with S. Newport IncA/C plasmids pSN254 and pAM04528. Copyright © 2015 Cao et al.

July 7, 2019 |

svviz: a read viewer for validating structural variants.

Visualizing read alignments is the most effective way to validate candidate structural variants (SVs) with existing data. We present svviz, a sequencing read visualizer for SVs that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele and identifying reads that match one allele better than the other. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared with the single reference genome-based view common to most current read browsers. The browser view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations and manual refinement of breakpoints. svviz supports data from most modern sequencing platforms.svviz is implemented in python and freely available from http://svviz.github.io/. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

July 7, 2019 |

Lesions from patients with sporadic cerebral cavernous malformations harbor somatic mutations in the CCM genes: evidence for a common biochemical pathway for CCM pathogenesis.

Cerebral cavernous malformations (CCMs) are vascular lesions affecting the central nervous system. CCM occurs either sporadically or in an inherited, autosomal dominant manner. Constitutional (germline) mutations in any of three genes, KRIT1, CCM2 and PDCD10, can cause the inherited form. Analysis of CCM lesions from inherited cases revealed biallelic somatic mutations, indicating that CCM follows a Knudsonian two-hit mutation mechanism. It is still unknown, however, if the sporadic cases of CCM also follow this genetic mechanism. We extracted DNA from 11 surgically excised lesions from sporadic CCM patients, and sequenced the three CCM genes in each specimen using a next-generation sequencing approach. Four sporadic CCM lesion samples (36%) were found to contain novel somatic mutations. Three of the lesions contained a single somatic mutation, and one lesion contained two biallelic somatic mutations. Herein, we also describe evidence of somatic mosaicism in a patient presenting with over 130 CCM lesions localized to one hemisphere of the brain. Finally, in a lesion regrowth sample, we found that the regrown CCM lesion contained the same somatic mutation as the original lesion. Together, these data bolster the idea that all forms of CCM have a genetic underpinning of the two-hit mutation mechanism in the known CCM genes. Recent studies have found aberrant Rho kinase activation in inherited CCM pathogenesis, and we present evidence that this pathway is activated in sporadic CCM patients. These results suggest that all CCM patients, including those with the more common sporadic form, are potentially amenable to the same therapy. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

July 7, 2019 |

The functions of DNA methylation by CcrM in Caulobacter crescentus: a global approach.

DNA methylation is involved in a diversity of processes in bacteria, including maintenance of genome integrity and regulation of gene expression. Here, using Caulobacter crescentus as a model, we exploit genome-wide experimental methods to uncover the functions of CcrM, a DNA methyltransferase conserved in most Alphaproteobacteria. Using single molecule sequencing, we provide evidence that most CcrM target motifs (GANTC) switch from a fully methylated to a hemi-methylated state when they are replicated, and back to a fully methylated state at the onset of cell division. We show that DNA methylation by CcrM is not required for the control of the initiation of chromosome replication or for DNA mismatch repair. By contrast, our transcriptome analysis shows that >10% of the genes are misexpressed in cells lacking or constitutively over-expressing CcrM. Strikingly, GANTC methylation is needed for the efficient transcription of dozens of genes that are essential for cell cycle progression, in particular for DNA metabolism and cell division. Many of them are controlled by promoters methylated by CcrM and co-regulated by other global cell cycle regulators, demonstrating an extensive cross talk between DNA methylation and the complex regulatory network that controls the cell cycle of C. crescentus and, presumably, of many other Alphaproteobacteria.

Asset Tag: Stanford University

Finding function in mystery transcripts.

Long-read genome sequencing identifies causal structural variation in a Mendelian disease.

Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing.

Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.

Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic.

Polymorphic microsatellite markers for a wind-dispersed tropical tree species, Triplaris cumingiana (Polygonaceae).

Quantifying genome-editing outcomes at endogenous loci with SMRT sequencing.

Global methylation state at base-pair resolution of the Caulobacter genome throughout the cell cycle.

Resolving complex tandem repeats with long reads.

Late pleistocene Australian marsupial DNA clarifies the affinities of extinct megafaunal kangaroos and wallabies.

Do read errors matter for genome assembly?

Complete sequences of six IncA/C plasmids of multidrug-resistant Salmonella enterica subsp. enterica serotype Newport.

svviz: a read viewer for validating structural variants.

Lesions from patients with sporadic cerebral cavernous malformations harbor somatic mutations in the CCM genes: evidence for a common biochemical pathway for CCM pathogenesis.

The functions of DNA methylation by CcrM in Caulobacter crescentus: a global approach.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert