Menu
July 7, 2019

Structural variation detection using next-generation sequencing data: A comparative technical review.

Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies. Copyright © 2016 Elsevier Inc. All rights reserved.


July 7, 2019

Unbiased identification of signal-activated transcription factors by barcoded synthetic tandem repeat promoter screening (BC-STAR-PROM).

The discovery of transcription factors (TFs) controlling pathways in health and disease is of paramount interest. We designed a widely applicable method, dubbed barcorded synthetic tandem repeat promoter screening (BC-STAR-PROM), to identify signal-activated TFs without any a priori knowledge about their properties. The BC-STAR-PROM library consists of ~3000 luciferase expression vectors, each harboring a promoter (composed of six tandem repeats of synthetic random DNA) and an associated barcode of 20 base pairs (bp) within the 3′ untranslated mRNA region. Together, the promoter sequences encompass >400,000 bp of random DNA, a sequence complexity sufficient to capture most TFs. Cells transfected with the library are exposed to a signal, and the mRNAs that it encodes are counted by next-generation sequencing of the barcodes. This allows the simultaneous activity tracking of each of the ~3000 synthetic promoters in a single experiment. Here we establish proof of concept for BC-STAR-PROM by applying it to the identification of TFs induced by drugs affecting actin and tubulin cytoskeleton dynamics. BC-STAR-PROM revealed that serum response factor (SRF) is the only immediate early TF induced by both actin polymerization and microtubule depolymerization. Such changes in cytoskeleton dynamics are known to occur during the cell division cycle, and real-time bioluminescence microscopy indeed revealed cell-autonomous SRF-myocardin-related TF (MRTF) activity bouts in proliferating cells.© 2016 Gosselin et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

New high copy tandem repeat in the content of the chicken W chromosome.

The content of repetitive DNA in avian genomes is considerably less than in other investigated vertebrates. The first descriptions of tandem repeats were based on the results of routine biochemical and molecular biological experiments. Both satellite DNA and interspersed repetitive elements were annotated using library-based approach and de novo repeat identification in assembled genome. The development of deep-sequencing methods provides datasets of high quality without preassembly allowing one to annotate repetitive elements from unassembled part of genomes. In this work, we search the chicken assembly and annotate high copy number tandem repeats from unassembled short raw reads. Tandem repeat (GGAAA)n has been identified and found to be the second after telomeric repeat (TTAGGG)n most abundant in the chicken genome. Furthermore, (GGAAA)n repeat forms expanded arrays on the both arms of the chicken W chromosome. Our results highlight the complexity of repetitive sequences and update data about organization of sex W chromosome in chicken.


July 7, 2019

STRetch: detecting and discovering pathogenic short tandem repeat expansions.

Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Most existing tools for detecting STR variation with short reads do so within the read length and so are unable to detect the majority of pathogenic expansions. Here we present STRetch, a new genome-wide method to scan for STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting STR expansions using short-read whole-genome sequencing data at known pathogenic loci as well as novel STR loci. STRetch is open source software, available from github.com/Oshlack/STRetch .


July 7, 2019

Mitochondrial genomes of two diplectanids (Platyhelminthes: Monogenea) expose paraphyly of the order Dactylogyridea and extensive tRNA gene rearrangements.

Recent mitochondrial phylogenomics studies have reported a sister-group relationship of the orders Capsalidea and Dactylogyridea, which is inconsistent with previous morphology- and molecular-based phylogenies. As Dactylogyridea mitochondrial genomes (mitogenomes) are currently represented by only one family, to improve the phylogenetic resolution, we sequenced and characterized two dactylogyridean parasites, Lamellodiscus spari and Lepidotrema longipenis, belonging to a non-represented family Diplectanidae.The L. longipenis mitogenome (15,433 bp) contains the standard 36 flatworm mitochondrial genes (atp8 is absent), whereas we failed to detect trnS1, trnC and trnG in L. spari (14,614 bp). Both mitogenomes exhibit unique gene orders (among the Monogenea), with a number of tRNA rearrangements. Both long non-coding regions contain a number of different (partially overlapping) repeat sequences. Intriguingly, these include putative tRNA pseudogenes in a tandem array (17 trnV pseudogenes in L. longipenis, 13 trnY pseudogenes in L. spari). Combined nucleotide diversity, non-synonymous/synonymous substitutions ratio and average sequence identity analyses consistently showed that nad2, nad5 and nad4 were the most variable PCGs, whereas cox1, cox2 and cytb were the most conserved. Phylogenomic analysis showed that the newly sequenced species of the family Diplectanidae formed a sister-group with the Dactylogyridae + Capsalidae clade. Thus Dactylogyridea (represented by the Diplectanidae and Dactylogyridae) was rendered paraphyletic (with high statistical support) by the nested Capsalidea (represented by the Capsalidae) clade.Our results show that nad2, nad5 and nad4 (fast-evolving) would be better candidates than cox1 (slow-evolving) for species identification and population genetics studies in the Diplectanidae. The unique gene order pattern further suggests discontinuous evolution of mitogenomic gene order arrangement in the Class Monogenea. This first report of paraphyly of the Dactylogyridea highlights the need to generate more molecular data for monogenean parasites, in order to be able to clarify their relationships using large datasets, as single-gene markers appear to provide a phylogenetic resolution which is too low for the task.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.