Variant detection Archives - Page 18 of 37

September 22, 2019

Genomic structural variations within five continental populations of Drosophila melanogaster.

Chromosomal structural variations (SV) including insertions, deletions, inversions, and translocations occur within the genome and can have a significant effect on organismal phenotype. Some of these effects are caused by structural variations containing genes. Large structural variations represent a significant amount of the genetic diversity within a population. We used a global sampling of Drosophila melanogaster (Ithaca, Zimbabwe, Beijing, Tasmania, and Netherlands) to represent diverse populations within the species. We used long-read sequencing and optical mapping technologies to identify SVs in these genomes. Among the five lines examined, we found an average of 2,928 structural variants within these genomes. These structural variations varied greatly in size and location, included many exonic regions, and could impact adaptation and genomic evolution. Copyright © 2018 Long et al.

September 22, 2019

How long are long tandem repeats? A challenge for current methods of whole-genome sequence assembly: The case of satellites in Caenorhabditis elegans.

Repetitive genome regions have been difficult to sequence, mainly because of the comparatively small size of the fragments used in assembly. Satellites or tandem repeats are very abundant in nematodes and offer an excellent playground to evaluate different assembly methods. Here, we compare the structure of satellites found in three different assemblies of the Caenorhabditis elegans genome: the original sequence obtained by Sanger sequencing, an assembly based on PacBio technology, and an assembly using Nanopore sequencing reads. In general, satellites were found in equivalent genomic regions, but the new long-read methods (PacBio and Nanopore) tended to result in longer assembled satellites. Important differences exist between the assemblies resulting from the two long-read technologies, such as the sizes of long satellites. Our results also suggest that the lengths of some annotated genes with internal repeats which were assembled using Sanger sequencing are likely to be incorrect.

September 22, 2019

Targeted genotyping of variable number tandem repeats with adVNTR.

Whole-genome sequencing is increasingly used to identify Mendelian variants in clinical pipelines. These pipelines focus on single-nucleotide variants (SNVs) and also structural variants, while ignoring more complex repeat sequence variants. Here, we consider the problem of genotyping Variable Number Tandem Repeats (VNTRs), composed of inexact tandem duplications of short (6-100 bp) repeating units. VNTRs span 3% of the human genome, are frequently present in coding regions, and have been implicated in multiple Mendelian disorders. Although existing tools recognize VNTR carrying sequence, genotyping VNTRs (determining repeat unit count and sequence variation) from whole-genome sequencing reads remains challenging. We describe a method, adVNTR, that uses hidden Markov models to model each VNTR, count repeat units, and detect sequence variation. adVNTR models can be developed for short-read (Illumina) and single-molecule (Pacific Biosciences [PacBio]) whole-genome and whole-exome sequencing, and show good results on multiple simulated and real data sets.© 2018 Bakhtiari et al.; Published by Cold Spring Harbor Laboratory Press.

September 22, 2019

Combining probabilistic alignments with read pair information improves accuracy of split-alignments.

Split-alignments provide base-pair-resolution evidence of genomic rearrangements. In practice, they are found by first computing high-scoring local alignments, parts of which are then combined into a split-alignment. This approach is challenging when aligning a short read to a large and repetitive reference, as it tends to produce many spurious local alignments leading to ambiguities in identifying the correct split-alignment. This problem is further exacerbated by the fact that rearrangements tend to occur in repeat-rich regions.We propose a split-alignment technique that combats the issue of ambiguous alignments by combining information from probabilistic alignment with positional information from paired-end reads. We demonstrate that our method finds accurate split-alignments, and that this translates into improved performance of variant-calling tools that rely on split-alignments.An open-source implementation is freely available at: https://bitbucket.org/splitpairedend/last-split-pe.Supplementary data are available at Bioinformatics online.

September 22, 2019

Computational tools to unmask transposable elements.

A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply ‘mask’ repeats.

September 22, 2019

TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data.

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.

September 22, 2019

Full-length extension of HLA allele sequences by HLA allele-specific hemizygous Sanger sequencing (SSBT).

The gold standard for typing at the allele level of the highly polymorphic Human Leucocyte Antigen (HLA) gene system is sequence based typing. Since sequencing strategies have mainly focused on identification of the peptide binding groove, full-length sequence information is lacking for >90% of the HLA alleles. One of the goals of the 17th IHIWS workshop is to establish full-length sequences for as many HLA alleles as possible. In our component “Extension of HLA sequences by full-length HLA allele-specific hemizygous Sanger sequencing” we have used full-length hemizygous Sanger Sequence Based Typing to achieve this goal. We selected samples of which full length sequences were not available in the IPD-IMGT/HLA database. In total we have generated the full-length sequences of 48 HLA-A, 45 -B and 31 -C alleles. For HLA-A extended alleles, 39/48 showed no intron differences compared to the first allele of the corresponding allele group, for HLA-B this was 26/45 and for HLA-C 20/31. Comparing the intron sequences to other alleles of the same allele group revealed that in 5/48 HLA-A, 16/45 HLA-B and 8/31 HLA-C alleles the intron sequence was identical to another allele of the same allele group. In the remaining 10 cases, the sequence either showed polymorphism at a conserved nucleotide or was the result of a gene conversion event. Elucidation of the full-length sequence gives insight in the polymorphic content of the alleles and facilitates the identification of its evolutionary origin. Copyright © 2018 American Society for Histocompatibility and Immunogenetics. All rights reserved.

September 22, 2019

Full gene HLA class I sequences of 79 novel and 519 mostly uncommon alleles from a large United States registry population.

HLA class I assignments were obtained at single genotype, G-level resolution from 98?855 volunteers for an unrelated donor registry in the United States. In spite of the diverse ancestry of the volunteers, over 99% of the assignments at each locus are common. Within this population, 52 novel alleles differing in exons 2 and 3 are identified and characterized. Previously reported alleles with incomplete sequences in the IPD-IMGT/HLA database (n?=?519) were selected for full gene sequencing and, from this sampling, another 27 novel alleles are described.© 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

September 22, 2019

3D molecular cytology of Hop (Humulus lupulus) meiotic chromosomes reveals non-disomic pairing and segregation, aneuploidy, and genomic structural variation.

Hop (Humulus lupulus L.) is an important crop worldwide, known as the main flavoring ingredient in beer. The diversifying brewing industry demands variation in flavors, superior process properties, and sustainable agronomics, which are the focus of advanced molecular breeding efforts in hops. Hop breeders have been limited in their ability to create strains with desirable traits, however, because of the unusual and unpredictable inheritance patterns and associated non-Mendelian genetic marker segregation. Cytogenetic analysis of meiotic chromosome behavior has also revealed conspicuous and prevalent occurrences of multiple, atypical, non-disomic chromosome complexes, including those involving autosomes in late prophase. To explore the role of meiosis in segregation distortion, we undertook 3D cytogenetic analysis of hop pollen mother cells stained with DAPI and FISH. We used telomere FISH to demonstrate that hop exhibits a normal telomere clustering bouquet. We also identified and characterized a new sub-terminal 180 bp satellite DNA tandem repeat family called HSR0, located proximal to telomeres. Highly variable 5S rDNA FISH patterns within and between plants, together with the detection of anaphase chromosome bridges, reflect extensive departures from normal disomic signal composition and distribution. Subsequent FACS analysis revealed variable DNA content in a cultivated pedigree. Together, these findings implicate multiple phenomena, including aneuploidy, segmental aneuploidy, or chromosome rearrangements, as contributing factors to segregation distortion in hop.

September 22, 2019

Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data

Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.

September 22, 2019

Diagnostic and Therapeutic Strategies for Fluoropyrimidine Treatment of Patients Carrying Multiple DPYD Variants.

DPYD genotyping prior to fluoropyrimidine treatment is increasingly implemented in clinical care. Without phasing information (i.e., allelic location of variants), current genotype-based dosing guidelines cannot be applied to patients carrying multiple DPYD variants. The primary aim of this study is to examine diagnostic and therapeutic strategies for fluoropyrimidine treatment of patients carrying multiple DPYD variants. A case series of patients carrying multiple DPYD variants is presented. Different genotyping techniques were used to determine phasing information. Phenotyping was performed by dihydropyrimidine dehydrogenase (DPD) enzyme activity measurements. Publicly available databases were queried to explore the frequency and phasing of variants of patients carrying multiple DPYD variants. Four out of seven patients carrying multiple DPYD variants received a full dose of fluoropyrimidines and experienced severe toxicity. Phasing information could be retrieved for four patients. In three patients, variants were located on two different alleles, i.e., in trans. Recommended dose reductions based on the phased genotype differed from the phenotype-derived dose reductions in three out of four cases. Data from publicly available databases show that the frequency of patients carrying multiple DPYD variants is low (< 0.2%), but higher than the frequency of the commonly tested DPYD*13 variant (0.1%). Patients carrying multiple DPYD variants are at high risk of developing severe toxicity. Additional analyses are required to determine the correct dose of fluoropyrimidine treatment. In patients carrying multiple DPYD variants, we recommend that a DPD phenotyping assay be carried out to determine a safe starting dose.

September 22, 2019

Analysis of structural variants in four African cichlids highlights an association with developmental and immune related genes

African Lakes Cichlids are one of the most impressive example of adaptive radiation. Independently in Lake Victoria, Tanganyika, and Malawi, several hundreds of species arose within the last 10 million to 100,000 years. Whereas most analyses in cichlids focused on nucleotide substitutions across species to investigate the genetic bases of this explosive radiation, to date, no study has investigated the contribution of structural variants (SVs) to speciation events (through a reduction of gene flow) and adaptation to different ecological niches. Here, we annotate and characterize the repertoires and evolutionary potential of different SV classes (deletion, duplication, inversion, insertions and translocations) in five cichlid species (Astatotilapia burtoni, Metriaclima zebra, Neolamprologus brichardi, Pundamilia nyererei and Oreochromis niloticus). We investigate the patterns of gain/loss evolution across the phylogeny for each SV type enabling the identification of both lineage specific events and a set of conserved SVs, common to all four species in the radiation. Both deletion and inversion events show a significant overlap with SINE elements, while inversions additionally show a limited, but significant association with DNA transposons. Genes lying inside inverted regions are enriched for genes regulating behaviour, or involved in skeletal and visual system development. Moreover, we find that duplicated genes show enrichment for textquoterightantigen processing and presentationtextquoteright (GO:0019882) and other immune related categories. Altogether, we provide the first, comprehensive overview of rearrangement evolution in East African Cichlids, and some initial insights into their possible contribution to adaptation.

September 22, 2019

Two novel alleles, HLA-A*32:01:01:09 and 32:01:01:10, identified by Pacific Bioscience’s SMRT sequencing.

September 22, 2019

Report from the Killer-cell Immunoglobulin-like Receptors (KIR) component of the 17th International HLA and Immunogenetics Workshop.

The goals of the KIR component of the 17th International HLA and Immunogenetics Workshop (IHIW) were to encourage and educate researchers to begin analyzing KIR at allelic resolution, and to survey the nature and extent of KIR allelic diversity across human populations. To represent worldwide diversity, we analyzed 1269 individuals from ten populations, focusing on the most polymorphic KIR genes, which express receptors having three immunoglobulin (Ig)-like domains (KIR3DL1/S1, KIR3DL2 and KIR3DL3). We identified 13 novel alleles of KIR3DL1/S1, 13 of KIR3DL2 and 18 of KIR3DL3. Previously identified alleles, corresponding to 33 alleles of KIR3DL1/S1, 38 of KIR3DL2, and 43 of KIR3DL3, represented over 90% of the observed allele frequencies for these genes. In total we observed 37 KIR3DL1/S1 allotypes, 40 for KIR3DL2 and 44 for KIR3DL3. As KIR allotype diversity can affect NK cell function, this demonstrates potential for high functional diversity worldwide. Allelic variation further diversifies KIR haplotypes. We determined KIR3DL3?~?KIR3DL1/S1?~?KIR3DL2 haplotypes from five of the studied populations, and observed multiple population-specific haplotypes in each. This included 234 distinct haplotypes in European Americans, 191 in Ugandans, 35 in Papuans, 95 in Egyptians and 86 in Spanish populations. For another 35 populations, encompassing 642,105 individuals we focused on KIR3DL2 and identified another 375 novel alleles, with approximately half of them observed in more than one individual. The KIR allelic level data gathered from this project represents the most comprehensive summary of global KIR allelic diversity to date, and continued analysis will improve understanding of KIR allelic polymorphism in global populations. Further, the wealth of new data gathered in the course of this workshop component highlights the value of collaborative, community-based efforts in immunogenetics research, exemplified by the IHIW.Copyright © 2018. Published by Elsevier Inc.

September 22, 2019

Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.

DNA conformation may deviate from the classical B-form in ~13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.© 2018 Guiblet et al.; Published by Cold Spring Harbor Laboratory Press.

Asset Tag: Variant detection

Genomic structural variations within five continental populations of Drosophila melanogaster.

How long are long tandem repeats? A challenge for current methods of whole-genome sequence assembly: The case of satellites in Caenorhabditis elegans.

Targeted genotyping of variable number tandem repeats with adVNTR.

Combining probabilistic alignments with read pair information improves accuracy of split-alignments.

Computational tools to unmask transposable elements.

TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data.

Full-length extension of HLA allele sequences by HLA allele-specific hemizygous Sanger sequencing (SSBT).

Full gene HLA class I sequences of 79 novel and 519 mostly uncommon alleles from a large United States registry population.

3D molecular cytology of Hop (Humulus lupulus) meiotic chromosomes reveals non-disomic pairing and segregation, aneuploidy, and genomic structural variation.

Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data

Diagnostic and Therapeutic Strategies for Fluoropyrimidine Treatment of Patients Carrying Multiple DPYD Variants.

Analysis of structural variants in four African cichlids highlights an association with developmental and immune related genes

Two novel alleles, HLA-A*32:01:01:09 and 32:01:01:10, identified by Pacific Bioscience’s SMRT sequencing.

Report from the Killer-cell Immunoglobulin-like Receptors (KIR) component of the 17th International HLA and Immunogenetics Workshop.

Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert