2015 SMRT Informatics Developers Conference Presentation Slides: Adam English, from the Human Genome Sequencing Center at Baylor College of Medicine presents on the structural variation tools being developed at Baylor.
Structural variant calling combining Illumina and low-coverage Pacbio Detection of large genomic variation (structural variants) has proven challenging using short-read methods. Long-read approaches which can span these large events have promise to dramatically expand the ability to accurately call structural variants. Although sequencing with Pacific Biosciences (Pacbio) long-read technology has become increasingly high throughput, generating high coverage with the technology can still be limiting and investigators often would like to know what pacbio coverages are adequate to call structural variants. Here, we present a method to identify a substantially higher fraction of structural variants in the human genome using low-coverage pacbio data by multiple strategies for ensembling data types and algorithms. Algorithmically, we combine three structural variant callers: PBHoney by Adam English, Sniffles by Fritz Sedlazeck, and Parliament by Adam English (which we have modified to improve for speed). Parliament itself uses a combination of Pacbio and Illumina data with a number of short-read callers (Breakdancer, Pindel, Crest, CNVnator, Delly, and Lumpy). We show that the outputs of these three programs are largely complementary to each other, with each able to uniquely access different sets of structural variants at different coverages. Combining them together can more than double the recall of true structural variants from a truth set relative to sequencing with Illumina alone, with substantial improvements even at low pacbio coverages (3x – 7x). This allows us to present for the first time cost-benefit tradeoffs to investigators about how much pacbio sequencing will yield what improvements in SV-calling. This work also builds upon the foundational work of Genome in a Bottle led by Justin Zook in establishing a truth set for structural variants in the Ashkenazim-Jewish trio data recently released. This work demonstrates the power of this benchmark set – one of the first of its kind for structural variation data – to help understand and refine the accuracies of calling structural variants with a number of approaches.
Brett Hannigan, Computational Biology Project Leader at DNAnexus, demonstrates a fast, accurate, and cost-efficient solution for diploid-aware de novo genome assembly utilizing FALCON on the DNAnexus platform.
Andrew Carroll, Director of Science at DNAnexus, presents how to greatly improve the accuracy of SV-calling by using long-read PacBio sequencing and fast and easy-to-run cloud-optimized apps like PBHoney, Parliament,…
Jonas Korlach spoke about recent SMRT Sequencing updates, such as latest Sequel System chemistry release (1.2.1) and updates to the Integrative Genomics Viewer that’s now update optimized for PacBio data….
High throughput random mutagenesis and Single Molecule Real Time Sequencing of the muscle nicotinic acetylcholine receptor.
High throughput random mutagenesis is a powerful tool to identify which residues are important for the function of a protein, and gain insight into its structure-function relation. The human muscle nicotinic acetylcholine receptor was used to test whether this technique previously used for monomeric receptors can be applied to a pentameric ligand-gated ion channel. A mutant library for the a1 subunit of the channel was generated by error-prone PCR, and full length sequences of all 2816 mutants were retrieved using single molecule real time sequencing. Each a1 mutant was co-transfected with wildtype ß1, d, and e subunits, and the channel function characterized by an ion flux assay. To test whether the strategy could map the structure-function relation of this receptor, we attempted to identify mutations that conferred resistance to competitive antagonists. Mutant hits were defined as receptors that responded to the nicotinic agonist epibatidine, but were not inhibited by either a-bungarotoxin or tubocurarine. Eight a1 subunit mutant hits were identified, six of which contained mutations at position Y233 or V275 in the transmembrane domain. Three single point mutations (Y233N, Y233H, and V275M) were studied further, and found to enhance the potencies of five channel agonists tested. This suggests that the mutations made the channel resistant to the antagonists, not by impairing antagonist binding, but rather by producing a gain-of-function phenotype, e.g. increased agonist sensitivity. Our data show that random high throughput mutagenesis is applicable to multimeric proteins to discover novel functional mutants, and outlines the benefits of using single molecule real time sequencing with regards to quality control of the mutant library as well as downstream mutant data interpretation.
CGG repeat-induced FMR1 silencing depends on the expansion size in human iPSCs and neurons carrying unmethylated full mutations.
In fragile X syndrome (FXS), CGG repeat expansion greater than 200 triplets is believed to trigger FMR1 gene silencing and disease etiology. However, FXS siblings have been identified with more than 200 CGGs, termed unmethylated full mutation (UFM) carriers, without gene silencing and disease symptoms. Here, we show that hypomethylation of the FMR1 promoter is maintained in induced pluripotent stem cells (iPSCs) derived from two UFM individuals. However, a subset of iPSC clones with large CGG expansions carries silenced FMR1. Furthermore, we demonstrate de novo silencing upon expansion of the CGG repeat size. FMR1 does not undergo silencing during neuronal differentiation of UFM iPSCs, and expression of large unmethylated CGG repeats has phenotypic consequences resulting in neurodegenerative features. Our data suggest that UFM individuals do not lack the cell-intrinsic ability to silence FMR1 and that inter-individual variability in the CGG repeat size required for silencing exists in the FXS population. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
How well can we create phased, diploid, human genomes?: An assessment of FALCON-Unzip phasing using a human trio
Long read sequencing technology has allowed researchers to create de novo assemblies with impressive continuity[1,2]. This advancement has dramatically increased the number of reference genomes available and hints at the possibility of a future where personal genomes are assembled rather than resequenced. In 2016 Pacific Biosciences released the FALCON-Unzip framework, which can provide long, phased haplotype contigs from de novo assemblies. This phased genome algorithm enhances the accuracy of highly heterozygous organisms and allows researchers to explore questions that require haplotype information such as allele-specific expression and regulation. However, validation of this technique has been limited to small genomes or inbred individuals. As a roadmap to personal genome assembly and phasing, we assess the phasing accuracy of FALCON-Unzip in humans using publicly available data for the Ashkenazi trio from the Genome in a Bottle Consortium. To assess the accuracy of the Unzip algorithm, we assembled the genome of the son using FALCON and FALCON Unzip, genotyped publicly available short read data for the mother and the father, and observed the inheritance pattern of the parental SNPs along the phased genome of the son. We found that 72.8% of haplotype contigs share SNPs with only one parent suggesting that these contigs are correctly phased. Most mis-phased SNPs are random but present in high frequency toward the end of haplotype contigs. Approximately 20.7% of mis-phased haplotype contigs contain clusters of mis-phased SNPs, suggesting that haplotypes were mis-joined by FALCON-Unzip. Mis-joined boundaries in those contigs are located in areas of low SNP density. This research demonstrates that the FALCON-Unzip algorithm can be used to create long and accurate haplotypes for humans and identifies problematic regions that could benefit in future improvement.