Menu
September 22, 2019  |  

Genomic structural variations within five continental populations of Drosophila melanogaster.

Chromosomal structural variations (SV) including insertions, deletions, inversions, and translocations occur within the genome and can have a significant effect on organismal phenotype. Some of these effects are caused by structural variations containing genes. Large structural variations represent a significant amount of the genetic diversity within a population. We used a global sampling of Drosophila melanogaster (Ithaca, Zimbabwe, Beijing, Tasmania, and Netherlands) to represent diverse populations within the species. We used long-read sequencing and optical mapping technologies to identify SVs in these genomes. Among the five lines examined, we found an average of 2,928 structural variants within these genomes. These structural variations varied greatly in size and location, included many exonic regions, and could impact adaptation and genomic evolution. Copyright © 2018 Long et al.


September 22, 2019  |  

How long are long tandem repeats? A challenge for current methods of whole-genome sequence assembly: The case of satellites in Caenorhabditis elegans.

Repetitive genome regions have been difficult to sequence, mainly because of the comparatively small size of the fragments used in assembly. Satellites or tandem repeats are very abundant in nematodes and offer an excellent playground to evaluate different assembly methods. Here, we compare the structure of satellites found in three different assemblies of the Caenorhabditis elegans genome: the original sequence obtained by Sanger sequencing, an assembly based on PacBio technology, and an assembly using Nanopore sequencing reads. In general, satellites were found in equivalent genomic regions, but the new long-read methods (PacBio and Nanopore) tended to result in longer assembled satellites. Important differences exist between the assemblies resulting from the two long-read technologies, such as the sizes of long satellites. Our results also suggest that the lengths of some annotated genes with internal repeats which were assembled using Sanger sequencing are likely to be incorrect.


September 22, 2019  |  

Targeted genotyping of variable number tandem repeats with adVNTR.

Whole-genome sequencing is increasingly used to identify Mendelian variants in clinical pipelines. These pipelines focus on single-nucleotide variants (SNVs) and also structural variants, while ignoring more complex repeat sequence variants. Here, we consider the problem of genotyping Variable Number Tandem Repeats (VNTRs), composed of inexact tandem duplications of short (6-100 bp) repeating units. VNTRs span 3% of the human genome, are frequently present in coding regions, and have been implicated in multiple Mendelian disorders. Although existing tools recognize VNTR carrying sequence, genotyping VNTRs (determining repeat unit count and sequence variation) from whole-genome sequencing reads remains challenging. We describe a method, adVNTR, that uses hidden Markov models to model each VNTR, count repeat units, and detect sequence variation. adVNTR models can be developed for short-read (Illumina) and single-molecule (Pacific Biosciences [PacBio]) whole-genome and whole-exome sequencing, and show good results on multiple simulated and real data sets.© 2018 Bakhtiari et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Combining probabilistic alignments with read pair information improves accuracy of split-alignments.

Split-alignments provide base-pair-resolution evidence of genomic rearrangements. In practice, they are found by first computing high-scoring local alignments, parts of which are then combined into a split-alignment. This approach is challenging when aligning a short read to a large and repetitive reference, as it tends to produce many spurious local alignments leading to ambiguities in identifying the correct split-alignment. This problem is further exacerbated by the fact that rearrangements tend to occur in repeat-rich regions.We propose a split-alignment technique that combats the issue of ambiguous alignments by combining information from probabilistic alignment with positional information from paired-end reads. We demonstrate that our method finds accurate split-alignments, and that this translates into improved performance of variant-calling tools that rely on split-alignments.An open-source implementation is freely available at: https://bitbucket.org/splitpairedend/last-split-pe.Supplementary data are available at Bioinformatics online.


September 22, 2019  |  

Computational tools to unmask transposable elements.

A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply ‘mask’ repeats.


September 22, 2019  |  

TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data.

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.


September 22, 2019  |  

Nonmutational mechanism of inheritance in the Archaeon Sulfolobus solfataricus.

Epigenetic phenomena have not yet been reported in archaea, which are presumed to use a classical genetic process of heritability. Here, analysis of independent lineages of Sulfolobus solfataricus evolved for enhanced fitness implicated a non-Mendelian basis for trait inheritance. The evolved strains, called super acid-resistant Crenarchaeota (SARC), acquired traits of extreme acid resistance and genome stability relative to their wild-type parental lines. Acid resistance was heritable because it was retained regardless of extensive passage without selection. Despite the hereditary pattern, in one strain, it was impossible for these SARC traits to result from mutation because its resequenced genome had no mutation. All strains also had conserved, heritable transcriptomes implicated in acid resistance. In addition, they had improved genome stability with absent or greatly decreased mutation and transposition relative to a passaged control. A mechanism that would confer these traits without DNA sequence alteration could involve posttranslationally modified archaeal chromatin proteins. To test this idea, homologous recombination with isogenic DNA was used to perturb native chromatin structure. Recombination at up-regulated loci from the heritable SARC transcriptome reduced acid resistance and gene expression in the majority of recombinants. In contrast, recombination at a control locus that was not part of the heritable transcriptome changed neither acid resistance nor gene expression. Variation in the amount of phenotypic and expression changes across individuals was consistent with Rad54-dependent chromatin remodeling that dictated crossover location and branch migration. These data support an epigenetic model implicating chromatin structure as a contributor to heritable traits.


September 22, 2019  |  

3D molecular cytology of Hop (Humulus lupulus) meiotic chromosomes reveals non-disomic pairing and segregation, aneuploidy, and genomic structural variation.

Hop (Humulus lupulus L.) is an important crop worldwide, known as the main flavoring ingredient in beer. The diversifying brewing industry demands variation in flavors, superior process properties, and sustainable agronomics, which are the focus of advanced molecular breeding efforts in hops. Hop breeders have been limited in their ability to create strains with desirable traits, however, because of the unusual and unpredictable inheritance patterns and associated non-Mendelian genetic marker segregation. Cytogenetic analysis of meiotic chromosome behavior has also revealed conspicuous and prevalent occurrences of multiple, atypical, non-disomic chromosome complexes, including those involving autosomes in late prophase. To explore the role of meiosis in segregation distortion, we undertook 3D cytogenetic analysis of hop pollen mother cells stained with DAPI and FISH. We used telomere FISH to demonstrate that hop exhibits a normal telomere clustering bouquet. We also identified and characterized a new sub-terminal 180 bp satellite DNA tandem repeat family called HSR0, located proximal to telomeres. Highly variable 5S rDNA FISH patterns within and between plants, together with the detection of anaphase chromosome bridges, reflect extensive departures from normal disomic signal composition and distribution. Subsequent FACS analysis revealed variable DNA content in a cultivated pedigree. Together, these findings implicate multiple phenomena, including aneuploidy, segmental aneuploidy, or chromosome rearrangements, as contributing factors to segregation distortion in hop.


September 22, 2019  |  

Leishmania genome dynamics during environmental adaptation reveal strain-specific differences in gene copy number variation, karyotype instability, and telomeric amplification.

Protozoan parasites of the genus Leishmania adapt to environmental change through chromosome and gene copy number variations. Only little is known about external or intrinsic factors that govern Leishmania genomic adaptation. Here, by conducting longitudinal genome analyses of 10 new Leishmania clinical isolates, we uncovered important differences in gene copy number among genetically highly related strains and revealed gain and loss of gene copies as potential drivers of long-term environmental adaptation in the field. In contrast, chromosome rather than gene amplification was associated with short-term environmental adaptation to in vitro culture. Karyotypic solutions were highly reproducible but unique for a given strain, suggesting that chromosome amplification is under positive selection and dependent on species- and strain-specific intrinsic factors. We revealed a progressive increase in read depth towards the chromosome ends for various Leishmania isolates, which may represent a nonclassical mechanism of telomere maintenance that can preserve integrity of chromosome ends during selection for fast in vitro growth. Together our data draw a complex picture of Leishmania genomic adaptation in the field and in culture, which is driven by a combination of intrinsic genetic factors that generate strain-specific phenotypic variations, which are under environmental selection and allow for fitness gain.IMPORTANCE Protozoan parasites of the genus Leishmania cause severe human and veterinary diseases worldwide, termed leishmaniases. A hallmark of Leishmania biology is its capacity to adapt to a variety of unpredictable fluctuations inside its human host, notably pharmacological interventions, thus, causing drug resistance. Here we investigated mechanisms of environmental adaptation using a comparative genomics approach by sequencing 10 new clinical isolates of the L. donovani, L. major, and L. tropica complexes that were sampled across eight distinct geographical regions. Our data provide new evidence that parasites adapt to environmental change in the field and in culture through a combination of chromosome and gene amplification that likely causes phenotypic variation and drives parasite fitness gains in response to environmental constraints. This novel form of gene expression regulation through genomic change compensates for the absence of classical transcriptional control in these early-branching eukaryotes and opens new venues for biomarker discovery. Copyright © 2018 Bussotti et al.


September 22, 2019  |  

Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data

Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.


September 22, 2019  |  

Analysis of structural variants in four African cichlids highlights an association with developmental and immune related genes

African Lakes Cichlids are one of the most impressive example of adaptive radiation. Independently in Lake Victoria, Tanganyika, and Malawi, several hundreds of species arose within the last 10 million to 100,000 years. Whereas most analyses in cichlids focused on nucleotide substitutions across species to investigate the genetic bases of this explosive radiation, to date, no study has investigated the contribution of structural variants (SVs) to speciation events (through a reduction of gene flow) and adaptation to different ecological niches. Here, we annotate and characterize the repertoires and evolutionary potential of different SV classes (deletion, duplication, inversion, insertions and translocations) in five cichlid species (Astatotilapia burtoni, Metriaclima zebra, Neolamprologus brichardi, Pundamilia nyererei and Oreochromis niloticus). We investigate the patterns of gain/loss evolution across the phylogeny for each SV type enabling the identification of both lineage specific events and a set of conserved SVs, common to all four species in the radiation. Both deletion and inversion events show a significant overlap with SINE elements, while inversions additionally show a limited, but significant association with DNA transposons. Genes lying inside inverted regions are enriched for genes regulating behaviour, or involved in skeletal and visual system development. Moreover, we find that duplicated genes show enrichment for textquoterightantigen processing and presentationtextquoteright (GO:0019882) and other immune related categories. Altogether, we provide the first, comprehensive overview of rearrangement evolution in East African Cichlids, and some initial insights into their possible contribution to adaptation.


September 22, 2019  |  

Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools.

We produced an extensive collection of deep re-sequencing datasets for the Venter/HuRef genome using the Illumina massively-parallel DNA sequencing platform. The original Venter genome sequence is a very-high quality phased assembly based on Sanger sequencing. Therefore, researchers developing novel computational tools for the analysis of human genome sequence variation for the dominant Illumina sequencing technology can test and hone their algorithms by making variant calls from these Venter/HuRef datasets and then immediately confirm the detected variants in the Sanger assembly, freeing them of the need for further experimental validation. This process also applies to implementing and benchmarking existing genome analysis pipelines. We prepared and sequenced 200?bp and 350?bp short-insert whole-genome sequencing libraries (sequenced to 100x and 40x genomic coverages respectively) as well as 2?kb, 5?kb, and 12?kb mate-pair libraries (49x, 122x, and 145x physical coverages respectively). Lastly, we produced a linked-read library (128x physical coverage) from which we also performed haplotype phasing.


September 22, 2019  |  

Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.

DNA conformation may deviate from the classical B-form in ~13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.© 2018 Guiblet et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Evolutionary conservation of Y Chromosome ampliconic gene families despite extensive structural variation.

Despite claims that the mammalian Y Chromosome is on a path to extinction, comparative sequence analysis of primate Y Chromosomes has shown the decay of the ancestral single-copy genes has all but ceased in this eutherian lineage. The suite of single-copy Y-linked genes is highly conserved among the majority of eutherian Y Chromosomes due to strong purifying selection to retain dosage-sensitive genes. In contrast, the ampliconic regions of the Y Chromosome, which contain testis-specific genes that encode the majority of the transcripts on eutherian Y Chromosomes, are rapidly evolving and are thought to undergo species-specific turnover. However, ampliconic genes are known from only a handful of species, limiting insights into their long-term evolutionary dynamics. We used a clone-based sequencing approach employing both long- and short-read sequencing technologies to assemble ~2.4 Mb of representative ampliconic sequence dispersed across the domestic cat Y Chromosome, and identified the major ampliconic gene families and repeat units. We analyzed fluorescence in situ hybridization, qPCR, and whole-genome sequence data from 20 cat species and revealed that ampliconic gene families are conserved across the cat family Felidae but show high transcript diversity, copy number variation, and structural rearrangement. Our analysis of ampliconic gene evolution unveils a complex pattern of long-term gene content stability despite extensive structural variation on a nonrecombining background.© 2018 Brashear et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Regulation of yeast-to-hyphae transition in Yarrowia lipolytica.

The yeast Yarrowia lipolytica undergoes a morphological transition from yeast-to-hyphal growth in response to environmental conditions. A forward genetic screen was used to identify mutants that reliably remain in the yeast phase, which were then assessed by whole-genome sequencing. All the smooth mutants identified, so named because of their colony morphology, exhibit independent loss of DNA at a repetitive locus made up of interspersed ribosomal DNA and short 10- to 40-mer telomere-like repeats. The loss of repetitive DNA is associated with downregulation of genes with stress response elements (5′-CCCCT-3′) and upregulation of genes with cell cycle box (5′-ACGCG-3′) motifs in their promoter region. The stress response element is bound by the transcription factor Msn2p in Saccharomyces cerevisiae We confirmed that the Y. lipolyticamsn2 (Ylmsn2) ortholog is required for hyphal growth and found that overexpression of Ylmsn2 enables hyphal growth in smooth strains. The cell cycle box is bound by the Mbp1p/Swi6p complex in S. cerevisiae to regulate G1-to-S phase progression. We found that overexpression of either the Ylmbp1 or Ylswi6 homologs decreased hyphal growth and that deletion of either Ylmbp1 or Ylswi6 promotes hyphal growth in smooth strains. A second forward genetic screen for reversion to hyphal growth was performed with the smooth-33 mutant to identify additional genetic factors regulating hyphal growth in Y. lipolytica Thirteen of the mutants sequenced from this screen had coding mutations in five kinases, including the histidine kinases Ylchk1 and Ylnik1 and kinases of the high-osmolarity glycerol response (HOG) mitogen-activated protein (MAP) kinase cascade Ylssk2, Ylpbs2, and Ylhog1 Together, these results demonstrate that Y. lipolytica transitions to hyphal growth in response to stress through multiple signaling pathways.IMPORTANCE Many yeasts undergo a morphological transition from yeast-to-hyphal growth in response to environmental conditions. We used forward and reverse genetic techniques to identify genes regulating this transition in Yarrowia lipolytica We confirmed that the transcription factor Ylmsn2 is required for the transition to hyphal growth and found that signaling by the histidine kinases Ylchk1 and Ylnik1 as well as the MAP kinases of the HOG pathway (Ylssk2, Ylpbs2, and Ylhog1) regulates the transition to hyphal growth. These results suggest that Y. lipolytica transitions to hyphal growth in response to stress through multiple kinase pathways. Intriguingly, we found that a repetitive portion of the genome containing telomere-like and rDNA repeats may be involved in the transition to hyphal growth, suggesting a link between this region and the general stress response. Copyright © 2018 Pomraning et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.