Menu
September 22, 2019

Structural variants exhibit allelic heterogeneity and shape variation in complex traits

Despite extensive effort to reveal the genetic basis of complex phenotypic variation, studies typically explain only a fraction of trait heritability. It has been hypothesized that individually rare hidden structural variants (SVs) could account for a significant fraction of variation in complex traits. To investigate this hypothesis, we assembled 14 Drosophila melanogaster genomes and systematically identified more than 20,000 euchromatic SVs, of which ~40% are invisible to high specificity short read genotyping approaches. SVs are common in Drosophila genes, with almost one third of diploid individuals harboring an SV in genes larger than 5kb, and nearly a quarter harboring multiple SVs in genes larger than 10kb. We show that SV alleles are rarer than amino acid polymorphisms, implying that they are more strongly deleterious. A number of functionally important genes harbor previously hidden structural variants that likely affect complex phenotypes (e.g., Cyp6g1, Drsl5, Cyp28d1&2, InR, and Gss1&2). Furthermore, SVs are overrepresented in quantitative trait locus candidate genes from eight Drosophila Synthetic Population Resource (DSPR) mapping experiments. We conclude that SVs are pervasive in genomes, are frequently present as heterogeneous allelic series, and can act as rare alleles of large effect.


September 22, 2019

Parliament2: Fast structural variant calling using optimized combinations of callers

Here we present Parliament2: a structural variant caller which combines multiple best-in-class structural variant callers to create a highly accurate callset. This captures more events than the individual callers achieve independently. Parliament2 uses a call-overlap-genotype approach that is highly extensible to new methods and presents users the choice to run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, and Manta to run. Parliament2 applies an additional parallelization framework to speed certain callers and executes these in parallel, taking advantage of the different resource requirements to complete structural variant calling much faster than running the programs individually. Parliament2 is available as a Docker container, which pre-installs all required dependencies. This allows users to run any caller with easy installation and execution. This Docker container can easily be deployed in cloud or local environments and is available as an app on DNAnexus.


September 22, 2019

Deletions linked to PROG1 gene participate in plant architecture domestication in Asian and African rice.

Improving the yield by modifying plant architecture was a key step during crop domestication. Here, we show that a 110-kb deletion on the short arm of chromosome 7 in Asian cultivated rice (Oryza sativa), which is closely linked to the previously identified PROSTRATE GROWTH 1 (PROG1) gene, harbors a tandem repeat of seven zinc-finger genes. Three of these genes regulate the plant architecture, suggesting that the deletion also promoted the critical transition from the prostrate growth and low yield of wild rice (O. rufipogon) to the erect growth and high yield of Asian cultivated rice. We refer to this locus as RICE PLANT ARCHITECTURE DOMESTICATION (RPAD). Further, a similar but independent 113-kb deletion is detected at the RPAD locus in African cultivated rice. These results indicate that the deletions, eliminating a tandem repeat of zinc-finger genes, may have been involved in the parallel domestication of plant architecture in Asian and African rice.


September 22, 2019

Genomic structural variations within five continental populations of Drosophila melanogaster.

Chromosomal structural variations (SV) including insertions, deletions, inversions, and translocations occur within the genome and can have a significant effect on organismal phenotype. Some of these effects are caused by structural variations containing genes. Large structural variations represent a significant amount of the genetic diversity within a population. We used a global sampling of Drosophila melanogaster (Ithaca, Zimbabwe, Beijing, Tasmania, and Netherlands) to represent diverse populations within the species. We used long-read sequencing and optical mapping technologies to identify SVs in these genomes. Among the five lines examined, we found an average of 2,928 structural variants within these genomes. These structural variations varied greatly in size and location, included many exonic regions, and could impact adaptation and genomic evolution. Copyright © 2018 Long et al.


September 22, 2019

How long are long tandem repeats? A challenge for current methods of whole-genome sequence assembly: The case of satellites in Caenorhabditis elegans.

Repetitive genome regions have been difficult to sequence, mainly because of the comparatively small size of the fragments used in assembly. Satellites or tandem repeats are very abundant in nematodes and offer an excellent playground to evaluate different assembly methods. Here, we compare the structure of satellites found in three different assemblies of the Caenorhabditis elegans genome: the original sequence obtained by Sanger sequencing, an assembly based on PacBio technology, and an assembly using Nanopore sequencing reads. In general, satellites were found in equivalent genomic regions, but the new long-read methods (PacBio and Nanopore) tended to result in longer assembled satellites. Important differences exist between the assemblies resulting from the two long-read technologies, such as the sizes of long satellites. Our results also suggest that the lengths of some annotated genes with internal repeats which were assembled using Sanger sequencing are likely to be incorrect.


September 22, 2019

Targeted genotyping of variable number tandem repeats with adVNTR.

Whole-genome sequencing is increasingly used to identify Mendelian variants in clinical pipelines. These pipelines focus on single-nucleotide variants (SNVs) and also structural variants, while ignoring more complex repeat sequence variants. Here, we consider the problem of genotyping Variable Number Tandem Repeats (VNTRs), composed of inexact tandem duplications of short (6-100 bp) repeating units. VNTRs span 3% of the human genome, are frequently present in coding regions, and have been implicated in multiple Mendelian disorders. Although existing tools recognize VNTR carrying sequence, genotyping VNTRs (determining repeat unit count and sequence variation) from whole-genome sequencing reads remains challenging. We describe a method, adVNTR, that uses hidden Markov models to model each VNTR, count repeat units, and detect sequence variation. adVNTR models can be developed for short-read (Illumina) and single-molecule (Pacific Biosciences [PacBio]) whole-genome and whole-exome sequencing, and show good results on multiple simulated and real data sets.© 2018 Bakhtiari et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Combining probabilistic alignments with read pair information improves accuracy of split-alignments.

Split-alignments provide base-pair-resolution evidence of genomic rearrangements. In practice, they are found by first computing high-scoring local alignments, parts of which are then combined into a split-alignment. This approach is challenging when aligning a short read to a large and repetitive reference, as it tends to produce many spurious local alignments leading to ambiguities in identifying the correct split-alignment. This problem is further exacerbated by the fact that rearrangements tend to occur in repeat-rich regions.We propose a split-alignment technique that combats the issue of ambiguous alignments by combining information from probabilistic alignment with positional information from paired-end reads. We demonstrate that our method finds accurate split-alignments, and that this translates into improved performance of variant-calling tools that rely on split-alignments.An open-source implementation is freely available at: https://bitbucket.org/splitpairedend/last-split-pe.Supplementary data are available at Bioinformatics online.


September 22, 2019

Computational tools to unmask transposable elements.

A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply ‘mask’ repeats.


September 22, 2019

TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data.

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.


September 22, 2019

3D molecular cytology of Hop (Humulus lupulus) meiotic chromosomes reveals non-disomic pairing and segregation, aneuploidy, and genomic structural variation.

Hop (Humulus lupulus L.) is an important crop worldwide, known as the main flavoring ingredient in beer. The diversifying brewing industry demands variation in flavors, superior process properties, and sustainable agronomics, which are the focus of advanced molecular breeding efforts in hops. Hop breeders have been limited in their ability to create strains with desirable traits, however, because of the unusual and unpredictable inheritance patterns and associated non-Mendelian genetic marker segregation. Cytogenetic analysis of meiotic chromosome behavior has also revealed conspicuous and prevalent occurrences of multiple, atypical, non-disomic chromosome complexes, including those involving autosomes in late prophase. To explore the role of meiosis in segregation distortion, we undertook 3D cytogenetic analysis of hop pollen mother cells stained with DAPI and FISH. We used telomere FISH to demonstrate that hop exhibits a normal telomere clustering bouquet. We also identified and characterized a new sub-terminal 180 bp satellite DNA tandem repeat family called HSR0, located proximal to telomeres. Highly variable 5S rDNA FISH patterns within and between plants, together with the detection of anaphase chromosome bridges, reflect extensive departures from normal disomic signal composition and distribution. Subsequent FACS analysis revealed variable DNA content in a cultivated pedigree. Together, these findings implicate multiple phenomena, including aneuploidy, segmental aneuploidy, or chromosome rearrangements, as contributing factors to segregation distortion in hop.


September 22, 2019

Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data

Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.


September 22, 2019

Analysis of structural variants in four African cichlids highlights an association with developmental and immune related genes

African Lakes Cichlids are one of the most impressive example of adaptive radiation. Independently in Lake Victoria, Tanganyika, and Malawi, several hundreds of species arose within the last 10 million to 100,000 years. Whereas most analyses in cichlids focused on nucleotide substitutions across species to investigate the genetic bases of this explosive radiation, to date, no study has investigated the contribution of structural variants (SVs) to speciation events (through a reduction of gene flow) and adaptation to different ecological niches. Here, we annotate and characterize the repertoires and evolutionary potential of different SV classes (deletion, duplication, inversion, insertions and translocations) in five cichlid species (Astatotilapia burtoni, Metriaclima zebra, Neolamprologus brichardi, Pundamilia nyererei and Oreochromis niloticus). We investigate the patterns of gain/loss evolution across the phylogeny for each SV type enabling the identification of both lineage specific events and a set of conserved SVs, common to all four species in the radiation. Both deletion and inversion events show a significant overlap with SINE elements, while inversions additionally show a limited, but significant association with DNA transposons. Genes lying inside inverted regions are enriched for genes regulating behaviour, or involved in skeletal and visual system development. Moreover, we find that duplicated genes show enrichment for textquoterightantigen processing and presentationtextquoteright (GO:0019882) and other immune related categories. Altogether, we provide the first, comprehensive overview of rearrangement evolution in East African Cichlids, and some initial insights into their possible contribution to adaptation.


September 22, 2019

Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.

DNA conformation may deviate from the classical B-form in ~13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.© 2018 Guiblet et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Evolutionary conservation of Y Chromosome ampliconic gene families despite extensive structural variation.

Despite claims that the mammalian Y Chromosome is on a path to extinction, comparative sequence analysis of primate Y Chromosomes has shown the decay of the ancestral single-copy genes has all but ceased in this eutherian lineage. The suite of single-copy Y-linked genes is highly conserved among the majority of eutherian Y Chromosomes due to strong purifying selection to retain dosage-sensitive genes. In contrast, the ampliconic regions of the Y Chromosome, which contain testis-specific genes that encode the majority of the transcripts on eutherian Y Chromosomes, are rapidly evolving and are thought to undergo species-specific turnover. However, ampliconic genes are known from only a handful of species, limiting insights into their long-term evolutionary dynamics. We used a clone-based sequencing approach employing both long- and short-read sequencing technologies to assemble ~2.4 Mb of representative ampliconic sequence dispersed across the domestic cat Y Chromosome, and identified the major ampliconic gene families and repeat units. We analyzed fluorescence in situ hybridization, qPCR, and whole-genome sequence data from 20 cat species and revealed that ampliconic gene families are conserved across the cat family Felidae but show high transcript diversity, copy number variation, and structural rearrangement. Our analysis of ampliconic gene evolution unveils a complex pattern of long-term gene content stability despite extensive structural variation on a nonrecombining background.© 2018 Brashear et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Regulation of yeast-to-hyphae transition in Yarrowia lipolytica.

The yeast Yarrowia lipolytica undergoes a morphological transition from yeast-to-hyphal growth in response to environmental conditions. A forward genetic screen was used to identify mutants that reliably remain in the yeast phase, which were then assessed by whole-genome sequencing. All the smooth mutants identified, so named because of their colony morphology, exhibit independent loss of DNA at a repetitive locus made up of interspersed ribosomal DNA and short 10- to 40-mer telomere-like repeats. The loss of repetitive DNA is associated with downregulation of genes with stress response elements (5′-CCCCT-3′) and upregulation of genes with cell cycle box (5′-ACGCG-3′) motifs in their promoter region. The stress response element is bound by the transcription factor Msn2p in Saccharomyces cerevisiae We confirmed that the Y. lipolyticamsn2 (Ylmsn2) ortholog is required for hyphal growth and found that overexpression of Ylmsn2 enables hyphal growth in smooth strains. The cell cycle box is bound by the Mbp1p/Swi6p complex in S. cerevisiae to regulate G1-to-S phase progression. We found that overexpression of either the Ylmbp1 or Ylswi6 homologs decreased hyphal growth and that deletion of either Ylmbp1 or Ylswi6 promotes hyphal growth in smooth strains. A second forward genetic screen for reversion to hyphal growth was performed with the smooth-33 mutant to identify additional genetic factors regulating hyphal growth in Y. lipolytica Thirteen of the mutants sequenced from this screen had coding mutations in five kinases, including the histidine kinases Ylchk1 and Ylnik1 and kinases of the high-osmolarity glycerol response (HOG) mitogen-activated protein (MAP) kinase cascade Ylssk2, Ylpbs2, and Ylhog1 Together, these results demonstrate that Y. lipolytica transitions to hyphal growth in response to stress through multiple signaling pathways.IMPORTANCE Many yeasts undergo a morphological transition from yeast-to-hyphal growth in response to environmental conditions. We used forward and reverse genetic techniques to identify genes regulating this transition in Yarrowia lipolytica We confirmed that the transcription factor Ylmsn2 is required for the transition to hyphal growth and found that signaling by the histidine kinases Ylchk1 and Ylnik1 as well as the MAP kinases of the HOG pathway (Ylssk2, Ylpbs2, and Ylhog1) regulates the transition to hyphal growth. These results suggest that Y. lipolytica transitions to hyphal growth in response to stress through multiple kinase pathways. Intriguingly, we found that a repetitive portion of the genome containing telomere-like and rDNA repeats may be involved in the transition to hyphal growth, suggesting a link between this region and the general stress response. Copyright © 2018 Pomraning et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.