Menu
July 19, 2019

Mapping the landscape of tandem repeat variability by targeted long read single molecule sequencing in familial X-linked intellectual disability.

The etiology of more than half of all patients with X-linked intellectual disability remains elusive, despite array-based comparative genomic hybridization, whole exome or genome sequencing. Since short read massive parallel sequencing approaches do not allow the detection of larger tandem repeat expansions, we hypothesized that such expansions could be a hidden cause of X-linked intellectual disability.We selectively captured over 1800 tandem repeats on the X chromosome and characterized them by long read single molecule sequencing in 3 families with idiopathic X-linked intellectual disability. In male DNA samples, full tandem repeat length sequences were obtained for 88-93% of the targets and up to 99.6% of the repeats with a moderate guanine-cytosine content. Read length and analysis pipeline allow to detect cases of >?900?bp tandem repeat expansion. In one family, one repeat expansion co-occurs with down-regulation of the neighboring MIR222 gene. This gene has previously been implicated in intellectual disability and is apparently linked to FMR1 and NEFH overexpression associated with neurological disorders.This study demonstrates the power of single molecule sequencing to measure tandem repeat lengths and detect expansions, and suggests that tandem repeat mutations may be a hidden cause of X-linked intellectual disability.


July 19, 2019

Whole-genome sequencing reveals principles of brain retrotransposition in neurodevelopmental disorders.

Neural progenitor cells undergo somatic retrotransposition events, mainly involving L1 elements, which can be potentially deleterious. Here, we analyze the whole genomes of 20 brain samples and 80 non-brain samples, and characterized the retrotransposition landscape of patients affected by a variety of neurodevelopmental disorders including Rett syndrome, tuberous sclerosis, ataxia-telangiectasia and autism. We report that the number of retrotranspositions in brain tissues is higher than that observed in non-brain samples and even higher in pathologic vs normal brains. The majority of somatic brain retrotransposons integrate into pre-existing repetitive elements, preferentially A/T rich L1 sequences, resulting in nested insertions. Our findings document the fingerprints of encoded endonuclease independent mechanisms in the majority of L1 brain insertion events. The insertions are “non-classical” in that they are truncated at both ends, integrate in the same orientation as the host element, and their target sequences are enriched with a CCATT motif in contrast to the classical endonuclease motif of most other retrotranspositions. We show that L1Hs elements integrate preferentially into genes associated with neural functions and diseases. We propose that pre-existing retrotransposons act as “lightning rods” for novel insertions, which may give fine modulation of gene expression while safeguarding from deleterious events. Overwhelmingly uncontrolled retrotransposition may breach this safeguard mechanism and increase the risk of harmful mutagenesis in neurodevelopmental disorders.


July 19, 2019

The Dominant and Poorly Penetrant Phenotypes of Maize Unstable factor for orange1 Are Caused by DNA Methylation Changes at a Linked Transposon.

The maize (Zea mays) mutant Unstable factor for orange1 (Ufo1) has been implicated in the epigenetic modifications of pericarp color1 (p1), which regulates the production of the flavonoid pigments phlobaphenes. Here, we show that the ufo1 gene maps to a genetically recalcitrant region near the centromere of chromosome 10. Transcriptome analysis of Ufo1-1 mutant and wild-type plants identified a candidate gene in the mapping region using a comparative sequence-based approach. The candidate gene, GRMZM2G053177, is overexpressed by >45-fold in multiple tissues of Ufo1-1, explaining the dominance of Ufo1-1 and its phenotypes. In the mutant stock, GRMZM2G053177 has a unique transcript originating within a CACTA transposon inserted in its first intron, and it is missing the first four codons of the wild-type transcript. GRMZM2G053177 expression is regulated by the DNA methylation status of the CACTA transposon, explaining the incomplete penetrance and poor expressivity of Ufo1-1 Transgenic overexpression lines of GRMZM2G053177 (Ufo1-1) phenocopy the p1-induced pigmentation in coleoptiles, tassels, leaf sheaths, husks, pericarps, and cob glumes. Transcriptome analysis of Ufo1 versus wild-type tissues revealed changes in several pathways related to abiotic and biotic stress. Thus, this study addresses the enigma of Ufo1 identity in maize, which had gone unsolved for more than 50 years.© 2018 American Society of Plant Biologists. All rights reserved.


July 7, 2019

Exploring possible DNA structures in real-time polymerase kinetics using Pacific Biosciences sequencer data.

BackgroundPausing of DNA polymerase can indicate the presence of a DNA structure that differs from the canonical double-helix. Here we detail a method to investigate how polymerase pausing in the Pacific Biosciences sequencer reads can be related to DNA sequences. The Pacific Biosciences sequencer uses optics to view a polymerase and its interaction with a single DNA molecule in real-time, offering a unique way to detect potential alternative DNA structures.ResultsWe have developed a new way to examine polymerase kinetics data and relate it to the DNA sequence by using a wavelet transform of read information from the sequencer. We use this method to examine how polymerase kinetics are related to nucleotide base composition. We then examine tandem repeat sequences known for their ability to form different DNA structures: (CGG)n and (CG)n repeats which can, respectively, form G-quadruplex DNA and Z-DNA. We find pausing around the (CGG)n repeat that may indicate the presence of G-quadruplexes in some of the sequencer reads. The (CG)n repeat does not appear to cause polymerase pausing, but its kinetics signature nevertheless suggests the possibility that alternative nucleotide conformations may sometimes be present.ConclusionWe discuss the implications of using our method to discover DNA sequences capable of forming alternative structures. The analyses presented here can be reproduced on any Pacific Biosciences kinetics data for any DNA pattern of interest using an R package that we have made publicly available.


July 7, 2019

A unique chromatin complex occupies young a-satellite arrays of human centromeres.

The intractability of homogeneous a-satellite arrays has impeded understanding of human centromeres. Artificial centromeres are produced from higher-order repeats (HORs) present at centromere edges, although the exact sequences and chromatin conformations of centromere cores remain unknown. We use high-resolution chromatin immunoprecipitation (ChIP) of centromere components followed by clustering of sequence data as an unbiased approach to identify functional centromere sequences. We find that specific dimeric a-satellite units shared by multiple individuals dominate functional human centromeres. We identify two recently homogenized a-satellite dimers that are occupied by precisely positioned CENP-A (cenH3) nucleosomes with two ~100-base pair (bp) DNA wraps in tandem separated by a CENP-B/CENP-C-containing linker, whereas pericentromeric HORs show diffuse positioning. Precise positioning is largely maintained, whereas abundance decreases exponentially with divergence, which suggests that young a-satellite dimers with paired ~100-bp particles mediate evolution of functional human centromeres. Our unbiased strategy for identifying functional centromeric sequences should be generally applicable to tandem repeat arrays that dominate the centromeres of most eukaryotes.


July 7, 2019

Do echinoderm genomes measure up?

Echinoderm genome sequences are a corpus of useful information about a clade of animals that serve as research models in fields ranging from marine ecology to cell and developmental biology. Genomic information from echinoids has contributed to insights into the gene interactions that drive the developmental process at the molecular level. Such insights often rely heavily on genomic information and the kinds of questions that can be asked thus depend on the quality of the sequence information. Here we describe the history of echinoderm genomic sequence assembly and present details about the quality of the data obtained. All of the sequence information discussed here is posted on the echinoderm information web system, Echinobase.org. Copyright © 2015 Elsevier B.V. All rights reserved.


July 7, 2019

Best practices in insect genome sequencing: What works and what doesn’t.

The last decade of decreasing DNA sequencing costs and proliferating sequencing services in core labs and companies has brought the de-novo genome sequencing and assembly of insect species within reach for many entomologists. However, sequence production alone is not enough to generate a high quality reference genome, and in many cases, poor planning can lead to extremely fragmented genome assemblies preventing high quality gene annotation and other desired analyses. Insect genomes can be problematic to assemble, due to combinations of high polymorphism, inability to breed for genome homozygocity, and small physical sizes limiting the quantity of DNA able to be isolated from a single individual. Recent advances in sequencing technology and assembly strategies are enabling a revolution for insect genome reference sequencing and assembly. Here we review historical and new genome sequencing and assembly strategies, with a particular focus on their application to arthropod genomes. We highlight both the need to design sequencing strategies for the requirements of the assembly software, and new long-read technologies that are enabling a return to traditional assembly approaches. Finally, we compare and contrast very cost effective short read draft genome strategies with the long read approaches that although entailing additional cost, bring a higher likelihood of success and the possibility of archival assembly qualities approaching that of finished genomes.


July 7, 2019

Genomes of ‘Candidatus Liberibacter solanacearum’ Haplotype A from New Zealand and the United States Suggest Significant Genome Plasticity in the Species.

‘Candidatus Liberibacter solanacearum’ contains two solanaceous crop-infecting haplotypes, A and B. Two haplotype A draft genomes were assembled and compared with ZC1 (haplotype B), revealing inversion and relocation genomic rearrangements, numerous single-nucleotide polymorphisms, and differences in phage-related regions. Differences in prophage location and sequence were seen both within and between haplotype comparisons. OrthoMCL and BLAST analyses identified 46 putative coding sequences present in haplotype A that were not present in haplotype B. Thirty-eight of these loci were not found in sequences from other Liberibacter spp. Quantitative polymerase chain reaction (qPCR) assays designed to amplify sequences from 15 of these loci were screened against a panel of ‘Ca. L. solanacearum’-positive samples to investigate genetic diversity. Seven of the assays demonstrated within-haplotype diversity; five failed to amplify loci in at least one haplotype A sample while three assays produced amplicons from some haplotype B samples. Eight of the loci assays showed consistent A-B differentiation. Differences in genome arrangements, prophage, and qPCR results suggesting locus diversity within the haplotypes provide more evidence for genetic complexity in this emerging bacterial species.


July 7, 2019

Human gene-centered transcription factor networks for enhancers and disease variants.

Gene regulatory networks (GRNs) comprising interactions between transcription factors (TFs) and regulatory loci control development and physiology. Numerous disease-associated mutations have been identified, the vast majority residing in non-coding regions of the genome. As current GRN mapping methods test one TF at a time and require the use of cells harboring the mutation(s) of interest, they are not suitable to identify TFs that bind to wild-type and mutant loci. Here, we use gene-centered yeast one-hybrid (eY1H) assays to interrogate binding of 1,086 human TFs to 246 enhancers, as well as to 109 non-coding disease mutations. We detect both loss and gain of TF interactions with mutant loci that are concordant with target gene expression changes. This work establishes eY1H assays as a powerful addition to the toolkit of mapping human GRNs and for the high-throughput characterization of genomic variants that are rapidly being identified by genome-wide association studies. Copyright © 2015 Elsevier Inc. All rights reserved.


July 7, 2019

It’s more than stamp collecting: how genome sequencing can unify biological research.

The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, while the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to ‘big science’ survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. Copyright © 2015 Elsevier Ltd. All rights reserved.


July 7, 2019

Covalent modification of bacteriophage T4 DNA inhibits CRISPR-Cas9.

The genomic DNAs of tailed bacteriophages are commonly modified by the attachment of chemical groups. Some forms of DNA modification are known to protect phage DNA from cleavage by restriction enzymes, but others are of unknown function. Recently, the CRISPR-Cas nuclease complexes were shown to mediate bacterial adaptive immunity by RNA-guided target recognition, raising the question of whether phage DNA modifications may also block attack by CRISPR-Cas9. We investigated phage T4 as a model system, where cytosine is replaced with glucosyl-hydroxymethylcytosine (glc-HMC). We first quantified the extent and distribution of covalent modifications in T4 DNA by single-molecule DNA sequencing and enzymatic probing. We then designed CRISPR spacer sequences targeting T4 and found that wild-type T4 containing glc-HMC was insensitive to attack by CRISPR-Cas9 but mutants with unmodified cytosine were sensitive. Phage with HMC showed only intermediate sensitivity. While this work was in progress, another group reported examples of heavily engineered CRISRP-Cas9 complexes that could, in fact, overcome the effects of T4 DNA modification, indicating that modifications can inhibit but do not always fully block attack.Bacteria were recently found to have a form of adaptive immunity, the CRISPR-Cas systems, which use nucleic acid pairing to recognize and cleave genomic DNA of invaders such as bacteriophage. Historic work with tailed phages has shown that phage DNA is often modified by covalent attachment of large chemical groups. Here we demonstrate that DNA modification in phage T4 inhibits attack by the CRISPR-Cas9 system. This finding provides insight into mechanisms of host-virus competition and also a new set of tools that may be useful in modulating the activity of CRISPR-Cas9 in genome engineering applications. Copyright © 2015 Bryson et al.


July 7, 2019

BreakSeek: a breakpoint-based algorithm for full spectral range INDEL detection.

Although recent developed algorithms have integrated multiple signals to improve sensitivity for insertion and deletion (INDEL) detection, they are far from being perfect and still have great limitations in detecting a full size range of INDELs. Here we present BreakSeek, a novel breakpoint-based algorithm, which can unbiasedly and efficiently detect both homozygous and heterozygous INDELs, ranging from several base pairs to over thousands of base pairs, with accurate breakpoint and heterozygosity rate estimations. Comprehensive evaluations on both simulated and real datasets revealed that BreakSeek outperformed other existing methods on both sensitivity and specificity in detecting both small and large INDELs, and uncovered a significant amount of novel INDELs that were missed before. In addition, by incorporating sophisticated statistic models, we for the first time investigated and demonstrated the importance of handling false and conflicting signals for multi-signal integrated methods.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 7, 2019

Library construction for high-throughput mobile element identification and genotyping.

Mobile genetic elements are discrete DNA elements that can move around and copy themselves in a genome. As a ubiquitous component of the genome, mobile elements contribute to both genetic and epigenetic variation. Therefore, it is important to determine the genome-wide distribution of mobile elements. Here we present a targeted high-throughput sequencing protocol called Mobile Element Scanning (ME-Scan) for genome-wide mobile element detection. We will describe oligonucleotides design, sequencing library construction, and computational analysis for the ME-Scan protocol.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.