Phasing Archives - Page 30 of 30

July 7, 2019

HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning.

Second-generation DNA sequencing techniques generate short reads that can result in fragmented genome assemblies. Third-generation sequencing platforms mitigate this limitation by producing longer reads that span across complex and repetitive regions. However, the usefulness of such long reads is limited because of high sequencing error rates. To exploit the full potential of these longer reads, it is imperative to correct the underlying errors. We propose HECIL-Hybrid Error Correction with Iterative Learning-a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read alignments. We demonstrate that HECIL outperforms state-of-the-art error correction algorithms for an overwhelming majority of evaluation metrics on diverse, real-world data sets including E. coli, S. cerevisiae, and the malaria vector mosquito A. funestus. Additionally, we provide an optional avenue of improving the performance of HECIL’s core algorithm by introducing an iterative learning paradigm that enhances the correction policy at each iteration by incorporating knowledge gathered from previous iterations via data-driven confidence metrics assigned to prior corrections.

July 7, 2019

TriPoly: haplotype estimation for polyploids using sequencing data of related individuals.

Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci.We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring.TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly.Supplementary data are available at Bioinformatics online.

July 7, 2019

Meeting report: mobile genetic elements and genome plasticity 2018

The Mobile Genetic Elements and Genome Plasticity conference was hosted by Keystone Symposia in Santa Fe, NM USA, February 11–15, 2018. The organizers were Marlene Belfort, Evan Eichler, Henry Levin and Lynn Maquat. The goal of this conference was to bring together scientists from around the world to discuss the function of transposable elements and their impact on host species. Central themes of the meeting included recent innovations in genome analysis and the role of mobile DNA in disease and evolution. The conference included 200 scientists who participated in poster presentations, short talks selected from abstracts, and invited talks. A total of 58 talks were organized into eight sessions and two workshops. The topics varied from mechanisms of mobilization, to the structure of genomes and their defense strategies to protect against transposable elements.

July 7, 2019

Fast-SG: an alignment-free algorithm for hybrid assembly.

Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes.Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.

July 7, 2019

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method.A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.

July 7, 2019

Implementation of pharmacogenomics in everyday clinical settings.

Currently, germline pharmacogenomics (PGx) is successfully implemented within certain specialties in clinical care. With the integration of PGx in pharmacotherapy multiple stakeholders are involved, which are identified in this chapter. Clinically relevant pharmacogenes with their related PGx test are discussed, along with diagnostic test criteria to guide clinicians and policy makers in PGx test selection. The chapter further reviews the similarities and the differences between the guidelines of the Dutch Pharmacogenetics Working Group and the Clinical Pharmacogenetics Implementation Consortium which both support healthcare professionals in understanding PGx test results and help guiding pharmacotherapy by providing evidence-based dosing recommendations. Finally, clinical studies which provide scientific evidence and information on cost-effectiveness supporting clinical implementation of PGx in clinical care are discussed along with the remaining barriers for adoption of PGx testing by healthcare professionals.© 2018 Elsevier Inc. All rights reserved.

July 7, 2019

Traditional Norwegian kveik are a genetically distinct group of domesticated Saccharomyces cerevisiae brewing yeasts.

The widespread production of fermented food and beverages has resulted in the domestication of Saccharomyces cerevisiae yeasts specifically adapted to beer production. While there is evidence beer yeast domestication was accelerated by industrialization of beer, there also exists a farmhouse brewing culture in western Norway which has passed down yeasts referred to as kveik for generations. This practice has resulted in ale yeasts which are typically highly flocculant, phenolic off flavor negative (POF-), and exhibit a high rate of fermentation, similar to previously characterized lineages of domesticated yeast. Additionally, kveik yeasts are reportedly high-temperature tolerant, likely due to the traditional practice of pitching yeast into warm (>28°C) wort. Here, we characterize kveik yeasts from 9 different Norwegian sources via PCR fingerprinting, whole genome sequencing of selected strains, phenotypic screens, and lab-scale fermentations. Phylogenetic analysis suggests that kveik yeasts form a distinct group among beer yeasts. Additionally, we identify a novel POF- loss-of-function mutation, as well as SNPs and CNVs potentially relevant to the thermotolerance, high ethanol tolerance, and high fermentation rate phenotypes of kveik strains. We also identify domestication markers related to flocculation in kveik. Taken together, the results suggest that Norwegian kveik yeasts are a genetically distinct group of domesticated beer yeasts with properties highly relevant to the brewing sector.

July 7, 2019

Recombination hotspots in an extended human pseudoautosomal domain predicted from double-strand break maps and characterized by sperm-based crossover analysis.

The human X and Y chromosomes are heteromorphic but share a region of homology at the tips of their short arms, pseudoautosomal region 1 (PAR1), that supports obligate crossover in male meiosis. Although the boundary between pseudoautosomal and sex-specific DNA has traditionally been regarded as conserved among primates, it was recently discovered that the boundary position varies among human males, due to a translocation of ~110 kb from the X to the Y chromosome that creates an extended PAR1 (ePAR). This event has occurred at least twice in human evolution. So far, only limited evidence has been presented to suggest this extension is recombinationally active. Here, we sought direct proof by examining thousands of gametes from each of two ePAR-carrying men, for two subregions chosen on the basis of previously published male X-chromosomal meiotic double-strand break (DSB) maps. Crossover activity comparable to that seen at autosomal hotspots was observed between the X and the ePAR borne on the Y chromosome both at a distal and a proximal site within the 110-kb extension. Other hallmarks of classic recombination hotspots included evidence of transmission distortion and GC-biased gene conversion. We observed good correspondence between the male DSB clusters and historical recombination activity of this region in the X chromosomes of females, as ascertained from linkage disequilibrium analysis; this suggests that this region is similarly primed for crossover in both male and female germlines, although sex-specific differences may also exist. Extensive resequencing and inference of ePAR haplotypes, placed in the framework of the Y phylogeny as ascertained by both Y microsatellites and single nucleotide polymorphisms, allowed us to estimate a minimum rate of crossover over the entire ePAR region of 6-fold greater than genome average, comparable with pedigree estimates of PAR1 activity generally. We conclude ePAR very likely contributes to the critical crossover function of PAR1.

July 7, 2019

Picky comprehensively detects high-resolution structural variants in nanopore long reads.

Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky ( https://github.com/TheJacksonLaboratory/Picky ), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.

July 7, 2019

Single-phase PacBio de novo assembly of the genome of the chytrid fungus Batrachochytrium dendrobatidis, a pathogen of Amphibia.

Here, we present an updated genome assembly of the diploid chytrid fungus Batrachochytrium dendrobatidis strain RTP6. This strain is part of the global panzootic lineage (BdGPL) and was isolated in Dunedin, New Zealand. The assembly was generated using PacBio long-read and Illumina short-read data, allowing for the accurate phasing of heterozygosities.

July 7, 2019

Signatures of selection and environmental adaptation across the goat genome post-domestication.

Since goat was domesticated 10,000 years ago, many factors have contributed to the differentiation of goat breeds and these are classified mainly into two types: (i) adaptation to different breeding systems and/or purposes and (ii) adaptation to different environments. As a result, approximately 600 goat breeds have developed worldwide; they differ considerably from one another in terms of phenotypic characteristics and are adapted to a wide range of climatic conditions. In this work, we analyzed the AdaptMap goat dataset, which is composed of data from more than 3000 animals collected worldwide and genotyped with the CaprineSNP50 BeadChip. These animals were partitioned into groups based on geographical area, production uses, available records on solid coat color and environmental variables including the sampling geographical coordinates, to investigate the role of natural and/or artificial selection in shaping the genome of goat breeds.Several signatures of selection on different chromosomal regions were detected across the different breeds, sub-geographical clusters, phenotypic and climatic groups. These regions contain genes that are involved in important biological processes, such as milk-, meat- or fiber-related production, coat color, glucose pathway, oxidative stress response, size, and circadian clock differences. Our results confirm previous findings in other species on adaptation to extreme environments and human purposes and provide new genes that could explain some of the differences between goat breeds according to their geographical distribution and adaptation to different environments.These analyses of signatures of selection provide a comprehensive first picture of the global domestication process and adaptation of goat breeds and highlight possible genes that may have contributed to the differentiation of this species worldwide.

July 7, 2019

Allele-level KIR genotyping of more than a million samples: Workflow, algorithm, and observations.

The killer-cell immunoglobulin-like receptor (KIR) genes regulate natural killer cell activity, influencing predisposition to immune mediated disease, and affecting hematopoietic stem cell transplantation (HSCT) outcome. Owing to the complexity of the KIR locus, with extensive gene copy number variation (CNV) and allelic diversity, high-resolution characterization of KIR has so far been applied only to relatively small cohorts. Here, we present a comprehensive high-throughput KIR genotyping approach based on next generation sequencing. Through PCR amplification of specific exons, our approach delivers both copy numbers of the individual genes and allelic information for every KIR gene. Ten-fold replicate analysis of a set of 190 samples revealed a precision of 99.9%. Genotyping of an independent set of 360 samples resulted in an accuracy of more than 99% taking into account consistent copy number prediction. We applied the workflow to genotype 1.8 million stem cell donor registry samples. We report on the observed KIR allele diversity and relative abundance of alleles based on a subset of more than 300,000 samples. Furthermore, we identified more than 2,000 previously unreported KIR variants repeatedly in independent samples, underscoring the large diversity of the KIR region that awaits discovery. This cost-efficient high-resolution KIR genotyping approach is now applied to samples of volunteers registering as potential donors for HSCT. This will facilitate the utilization of KIR as additional selection criterion to improve unrelated donor stem cell transplantation outcome. In addition, the approach may serve studies requiring high-resolution KIR genotyping, like population genetics and disease association studies.

July 7, 2019

The Draft Genome of the MD-2 Pineapple

The main challenge in assembling plant genome is its ploidy level, repeats content, and polymorphism. The second-generation sequencing delivered the throughput and the accuracy that is crucial to whole-genome sequencing but insufficient and remained challenging for some plant species. It is known that genomes produced by next-gen- eration sequencing produced small contigs that would inflate the number of annotated genes (Varshney et al. 2011) and missed on the transposable elements that are abun- dant in plant genome due to their repetitive nature (Michael and Jackson 2013).

January 23, 2017

Tutorial: Long Amplicon Analysis application

This tutorial provides an overview of the Long Amplicon Analysis (LAA) application. The LAA algorithm generates highly accurate, phased and full-length consensus sequences from long amplicons. Applications of LAA include…

Auto Tag: Phasing

HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning.

TriPoly: haplotype estimation for polyploids using sequencing data of related individuals.

Meeting report: mobile genetic elements and genome plasticity 2018

Fast-SG: an alignment-free algorithm for hybrid assembly.

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Implementation of pharmacogenomics in everyday clinical settings.

Traditional Norwegian kveik are a genetically distinct group of domesticated Saccharomyces cerevisiae brewing yeasts.

Recombination hotspots in an extended human pseudoautosomal domain predicted from double-strand break maps and characterized by sperm-based crossover analysis.

Picky comprehensively detects high-resolution structural variants in nanopore long reads.

Single-phase PacBio de novo assembly of the genome of the chytrid fungus Batrachochytrium dendrobatidis, a pathogen of Amphibia.

Signatures of selection and environmental adaptation across the goat genome post-domestication.

Allele-level KIR genotyping of more than a million samples: Workflow, algorithm, and observations.

The Draft Genome of the MD-2 Pineapple

Tutorial: Long Amplicon Analysis application

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert