Structural variation Archives - Page 26 of 31

July 7, 2019

Disease onset in X-linked dystonia-parkinsonism correlates with expansion of a hexameric repeat within an SVA retrotransposon in TAF1.

X-linked dystonia-parkinsonism (XDP) is a neurodegenerative disease associated with an antisense insertion of a SINE-VNTR-Alu (SVA)-type retrotransposon within an intron ofTAF1This unique insertion coincides with six additional noncoding sequence changes inTAF1, the gene that encodes TATA-binding protein-associated factor-1, which appear to be inherited together as an identical haplotype in all reported cases. Here we examined the sequence of this SVA in XDP patients (n= 140) and detected polymorphic variation in the length of a hexanucleotide repeat domain, (CCCTCT)nThe number of repeats in these cases ranged from 35 to 52 and showed a highly significant inverse correlation with age at disease onset. Because other SVAs exhibit intrinsic promoter activity that depends in part on the hexameric domain, we assayed the transcriptional regulatory effects of varying hexameric lengths found in the unique XDP SVA retrotransposon using luciferase reporter constructs. When inserted sense or antisense to the luciferase reading frame, the XDP variants repressed or enhanced transcription, respectively, to an extent that appeared to vary with length of the hexamer. Further in silico analysis of this SVA sequence revealed multiple motifs predicted to form G-quadruplexes, with the greatest potential detected for the hexameric repeat domain. These data directly link sequence variation within the XDP-specific SVA sequence to phenotypic variability in clinical disease manifestation and provide insight into potential mechanisms by which this intronic retroelement may induce transcriptional interference inTAF1expression. Copyright © 2017 the Author(s). Published by PNAS.

July 7, 2019

Comparative whole-genomic analysis of an ancient L2 lineage Mycobacterium novel phylogenetic clade and common genetic determinants of hypervirulent strains.

Background: Development of improved therapeutics against tuberculosis (TB) is hindered by an inadequate understanding of the relationship between disease severity and genetic diversity of its causative agent, Mycobacterium tuberculosis. We previously isolated a hypervirulent M. tuberculosis strain H112 from an HIV-negative patient with an aggressive disease progression from pulmonary TB to tuberculous meningitis—the most severe manifestation of tuberculosis. Human macrophage challenge experiment demonstrated that the strain H112 exhibited significantly better intracellular survivability and induced lower level of TNF-a than the reference virulent strain H37Rv and other 123 clinical isolates. Aim: The present study aimed to identify the potential genetic determinants of mycobacterial virulence that were common to strain H112 and hypervirulent M. tuberculosis strains of the same phylogenetic clade isolated in other global regions. Methods: A low-virulent M. tuberculosis strain H54 which belonged to the same phylogenetic lineage (L2) as strain H112 was selected from a collection of 115 clinical isolates. Both H112 and H54 were whole-genome-sequenced using PacBio sequencing technology. A comparative genomics approach was adopted to identify mutations present in strain H112 but absent in strain H54. Subsequently, an extensive phylogenetic analysis was conducted by including all publically available M. tuberculosis genomes. Single-nucleotide-polymorphisms (SNPs) and structural variations (SVs) common to hypervirulent strains in the global collection of genomes were considered as potential genetic determinants of hypervirulence. Results: Sequencing data revealed that both H112 and H54 were identified as members of the same sub-lineage L2.2.1. After excluding the lineage-related mutations shared between H112 and H54, we analyzed the phylogenetic relatedness of H112 with global collection of M. tuberculosis genomes (n = 4,338), and identified a novel phylogenetic clade in which four hypervirulent strains isolated from geographically diverse regions were clustered together. All hypervirulent strains in the clade shared 12 SNPs and 5 SVs with H112, including those affecting key virulence-associated loci, notably, a deleterious SNP (rv0178 p. D150E) within mce1 operon and an intergenic deletion (854259_ 854261delCC) in close-proximity to phoP. Conclusion: The present study identified common genetic factors in a novel phylogenetic clade of hypervirulent M. tuberculosis. The causative role of these mutations in mycobacterial virulence should be validated in future study.

July 7, 2019

A recurrence-based approach for validating structural variation using long-read sequencing technology.

Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read-based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.© The Authors 2017. Published by Oxford University Press.

July 7, 2019

The state of whole-genome sequencing

Over the last decade, a technological paradigm shift has slashed the cost of DNA sequencing by over five orders of magnitude. Today, the cost of sequencing a human genome is a few thousand dollars, and it continues to fall. Here, we review the most cost-effective platforms for whole-genome sequencing (WGS) as well as emerging technologies that may displace or complement these. We also discuss the practical challenges of generating and analyzing WGS data, and how WGS has unlocked new strategies for discovering genes and variants underlying both rare and common human diseases.

July 7, 2019

Effects of genome structure variation, homeologous genes and repetitive DNA on polyploid crop research in the age of genomics.

Compared to diploid species, allopolyploid crop species possess more complex genomes, higher productivity, and greater adaptability to changing environments. Next generation sequencing techniques have produced high-density genetic maps, whole genome sequences, transcriptomes and epigenomes for important polyploid crops. However, several problems interfere with the full application of next generation sequencing techniques to these crops. Firstly, different types of genomic variation affect sequence assembly and QTL mapping. Secondly, duplicated or homoeologous genes can diverge in function and then lead to emergence of many minor QTL, which increases difficulties in fine mapping, cloning and marker assisted selection. Thirdly, repetitive DNA sequences arising in polyploid crop genomes also impact sequence assembly, and are increasingly being shown to produce small RNAs to regulate gene expression and hence phenotypic traits. We propose that these three key features should be considered together when analyzing polyploid crop genomes. It is apparent that dissection of genomic structural variation, elucidation of the function and mechanism of interaction of homoeologous genes, and investigation of the de novo roles of repeat sequences in agronomic traits are necessary for genomics-based crop breeding in polyploids. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

July 7, 2019

Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.

July 7, 2019

Resolving complex structural genomic rearrangements using a randomized approach.

Complex chromosomal rearrangements are structural genomic alterations involving multiple instances of deletions, duplications, inversions, or translocations that co-occur either on the same chromosome or represent different overlapping events on homologous chromosomes. We present SVelter, an algorithm that identifies regions of the genome suspected to harbor a complex event and then resolves the structure by iteratively rearranging the local genome structure, in a randomized fashion, with each structure scored against characteristics of the observed sequencing data. SVelter is able to accurately reconstruct complex chromosomal rearrangements when compared to well-characterized genomes that have been deeply sequenced with both short and long reads.

July 7, 2019

Rapid evolution of citrate utilization by Escherichia coli by direct selection requires citT and dctA.

The isolation of aerobic citrate-utilizing Escherichia coli (Cit(+)) in long-term evolution experiments (LTEE) has been termed a rare, innovative, presumptive speciation event. We hypothesized that direct selection would rapidly yield the same class of E. coli Cit(+) mutants and follow the same genetic trajectory: potentiation, actualization, and refinement. This hypothesis was tested with wild-type E. coli strain B and with K-12 and three K-12 derivatives: an E. coli ?rpoS::kan mutant (impaired for stationary-phase survival), an E. coli ?citT::kan mutant (deleted for the anaerobic citrate/succinate antiporter), and an E. coli ?dctA::kan mutant (deleted for the aerobic succinate transporter). E. coli underwent adaptation to aerobic citrate metabolism that was readily and repeatedly achieved using minimal medium supplemented with citrate (M9C), M9C with 0.005% glycerol, or M9C with 0.0025% glucose. Forty-six independent E. coli Cit(+) mutants were isolated from all E. coli derivatives except the E. coli ?citT::kan mutant. Potentiation/actualization mutations occurred within as few as 12 generations, and refinement mutations occurred within 100 generations. Citrate utilization was confirmed using Simmons, Christensen, and LeMaster Richards citrate media and quantified by mass spectrometry. E. coli Cit(+) mutants grew in clumps and in long incompletely divided chains, a phenotype that was reversible in rich media. Genomic DNA sequencing of four E. coli Cit(+) mutants revealed the required sequence of mutational events leading to a refined Cit(+) mutant. These events showed amplified citT and dctA loci followed by DNA rearrangements consistent with promoter capture events for citT. These mutations were equivalent to the amplification and promoter capture CitT-activating mutations identified in the LTEE.IMPORTANCE E. coli cannot use citrate aerobically. Long-term evolution experiments (LTEE) performed by Blount et al. (Z. D. Blount, J. E. Barrick, C. J. Davidson, and R. E. Lenski, Nature 489:513-518, 2012, http://dx.doi.org/10.1038/nature11514 ) found a single aerobic, citrate-utilizing E. coli strain after 33,000 generations (15 years). This was interpreted as a speciation event. Here we show why it probably was not a speciation event. Using similar media, 46 independent citrate-utilizing mutants were isolated in as few as 12 to 100 generations. Genomic DNA sequencing revealed an amplification of the citT and dctA loci and DNA rearrangements to capture a promoter to express CitT, aerobically. These are members of the same class of mutations identified by the LTEE. We conclude that the rarity of the LTEE mutant was an artifact of the experimental conditions and not a unique evolutionary event. No new genetic information (novel gene function) evolved. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

July 7, 2019

Structural insights into DNA sequence recognition by Type ISP restriction-modification enzymes.

Engineering restriction enzymes with new sequence specificity has been an unaccomplished challenge, presumably because of the complexity of target recognition. Here we report detailed analyses of target recognition by Type ISP restriction-modification enzymes. We determined the structure of the Type ISP enzyme LlaGI bound to its target and compared it with the previously reported structure of a close homologue that binds to a distinct target, LlaBIII. The comparison revealed that, although the two enzymes use almost a similar set of structural elements for target recognition, the residues that read the bases vary. Change in specificity resulted not only from appropriate substitution of amino acids that contacted the bases but also from new contacts made by positionally distinct residues directly or through a water bridge. Sequence analyses of 552 Type ISP enzymes showed that the structural elements involved in target recognition of LlaGI and LlaBIII were structurally well-conserved but sequentially less-conserved. In addition, the residue positions within these structural elements were under strong evolutionary constraint, highlighting the functional importance of these regions. The comparative study helped decipher a partial consensus code for target recognition by Type ISP enzymes.© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 7, 2019

Third-generation sequencing and the future of genomics

Third-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address long-standing problems in de novo genome assembly, structural variation analysis and haplotype phasing.

July 7, 2019

Draft genome sequences of three European laboratory derivatives from enterohemorrhagic Escherichia coli O157:H7 strain EDL933, including two plasmids.

Escherichia coliO157:H7 EDL933, isolated in 1982 in the United States, was the first enterohemorrhagicE. coli(EHEC) strain sequenced. Unfortunately, European labs can no longer receive the original strain. We checked three European EDL933 derivatives and found major genetic deviations (deletions, inversions) in two strains. All EDL933 strains contain the cryptic EHEC-plasmid, not reported before. Copyright © 2016 Fellner et al.

July 7, 2019

Exploring structural variants in environmentally sensitive gene families.

Environmentally sensitive plant gene families like NBS-LRRs, receptor kinases, defensins and others, are known to be highly variable. However, most existing strategies for discovering and describing structural variation in complex gene families provide incomplete and imperfect results. The move to de novo genome assemblies for multiple accessions or individuals within a species is enabling more comprehensive and accurate insights about gene family variation. Earlier array-based genome hybridization and sequence-based read mapping methods were limited by their reliance on a reference genome and by misplacement of paralogous sequences. Variant discovery based on de novo genome assemblies overcome the problems arising from a reference genome and reduce sequence misplacement. As de novo genome sequencing moves to the use of longer reads, artifacts will be minimized, intact tandem gene clusters will be constructed accurately, and insights into rapid evolution will become feasible. Copyright © 2016 Elsevier Ltd. All rights reserved.

July 7, 2019

The Atlantic salmon genome provides insights into rediploidization.

The whole-genome duplication 80 million years ago of the common ancestor of salmonids (salmonid-specific fourth vertebrate whole-genome duplication, Ss4R) provides unique opportunities to learn about the evolutionary fate of a duplicated vertebrate genome in 70 extant lineages. Here we present a high-quality genome assembly for Atlantic salmon (Salmo salar), and show that large genomic reorganizations, coinciding with bursts of transposon-mediated repeat expansions, were crucial for the post-Ss4R rediploidization process. Comparisons of duplicate gene expression patterns across a wide range of tissues with orthologous genes from a pre-Ss4R outgroup unexpectedly demonstrate far more instances of neofunctionalization than subfunctionalization. Surprisingly, we find that genes that were retained as duplicates after the teleost-specific whole-genome duplication 320 million years ago were not more likely to be retained after the Ss4R, and that the duplicate retention was not influenced to a great extent by the nature of the predicted protein interactions of the gene products. Finally, we demonstrate that the Atlantic salmon assembly can serve as a reference sequence for the study of other salmonids for a range of purposes.

July 7, 2019

Conservation of the essential genome among Caulobacter and Brevundimonas species.

When the genomes of Caulobacter isolates NA1000 and K31 were compared, numerous genome rearrangements were observed. In contrast, similar comparisons of closely related species of other bacterial genera revealed nominal rearrangements. A phylogenetic analysis of the 16S rRNA indicated that K31 is more closely related to Caulobacter henricii CB4 than to other known Caulobacters. Therefore, we sequenced the CB4 genome and compared it to all of the available Caulobacter genomes to study genome rearrangements, discern the conservation of the NA1000 essential genome, and address concerns about using 16S rRNA to group Caulobacter species. We also sequenced the novel bacteria, Brevundimonas DS20, a representative of the genus most closely related to Caulobacter and used it as part of an outgroup for phylogenetic comparisons. We expected to find that there would be fewer rearrangements when comparing more closely related Caulobacters. However, we found that relatedness was not correlated with the amount of observed “genome scrambling.” We also discovered that nearly all of the essential genes previously identified for C. crescentus are present in the other Caulobacter genomes and in the Brevundimonas genomes as well. However, a few of these essential genes were only found in NA1000, and some were missing in a combination of one or more species, while other proteins were 100 % identical across species. Also, phylogenetic comparisons of highly conserved genomic regions revealed clades similar to those identified by 16S rRNA-based phylogenies, verifying that 16S rRNA sequence comparisons are a valid method for grouping Caulobacters.

July 7, 2019

Gene duplication confers enhanced expression of 27-kDa ?-zein for endosperm modification in quality protein maize.

The maizeopaque2(o2) mutant has a high nutritional value but it develops a chalky endosperm that limits its practical use. Genetic selection foro2modifiers can convert the normally chalky endosperm of the mutant into a hard, vitreous phenotype, yielding what is known as quality protein maize (QPM). Previous studies have shown that enhanced expression of 27-kDa ?-zein in QPM is essential for endosperm modification. Taking advantage of genome-wide association study analysis of a natural population, linkage mapping analysis of a recombinant inbred line population, and map-based cloning, we identified a quantitative trait locus (q?27) affecting expression of 27-kDa ?-zein.q?27was mapped to the same region as the majoro2 modifier(o2 modifier1) on chromosome 7 near the 27-kDa ?-zein locus.q?27resulted from a 15.26-kb duplication at the 27-kDa ?-zein locus, which increases the level of gene expression. This duplication occurred before maize domestication; however, the gene structure ofq?27appears to be unstable and the DNA rearrangement frequently occurs at this locus. Because enhanced expression of 27-kDa ?-zein is critical for endosperm modification in QPM,q?27is expected to be under artificial selection. This discovery provides a useful molecular marker that can be used to accelerate QPM breeding.

Auto Tag: Structural variation

Disease onset in X-linked dystonia-parkinsonism correlates with expansion of a hexameric repeat within an SVA retrotransposon in TAF1.

Comparative whole-genomic analysis of an ancient L2 lineage Mycobacterium novel phylogenetic clade and common genetic determinants of hypervirulent strains.

A recurrence-based approach for validating structural variation using long-read sequencing technology.

The state of whole-genome sequencing

Effects of genome structure variation, homeologous genes and repetitive DNA on polyploid crop research in the age of genomics.

Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Resolving complex structural genomic rearrangements using a randomized approach.

Rapid evolution of citrate utilization by Escherichia coli by direct selection requires citT and dctA.

Structural insights into DNA sequence recognition by Type ISP restriction-modification enzymes.

Third-generation sequencing and the future of genomics

Draft genome sequences of three European laboratory derivatives from enterohemorrhagic Escherichia coli O157:H7 strain EDL933, including two plasmids.

Exploring structural variants in environmentally sensitive gene families.

The Atlantic salmon genome provides insights into rediploidization.

Conservation of the essential genome among Caulobacter and Brevundimonas species.

Gene duplication confers enhanced expression of 27-kDa ?-zein for endosperm modification in quality protein maize.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert