Menu
September 22, 2019  |  

TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data.

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.


September 22, 2019  |  

Integrative haplotype estimation with sub-linear complexity

The number of human genomes being genotyped or sequenced increases exponentially and efficient haplotype estimation methods able to handle this amount of data are now required. Here, we present a new method, SHAPEIT4, which substantially improves upon other methods to process large genotype and high coverage sequencing datasets. It notably exhibits sub-linear scaling with sample size, provides highly accurate haplotypes and allows integrating external phasing information such as large reference panels of haplotypes, collections of pre-phased variants and long sequencing reads. We provide SHAPET4 in an open source format on https://odelaneau.github.io/shapeit4/ and demonstrate its performance in terms of accuracy and running times on two gold standard datasets: the UK Biobank data and the Genome In A Bottle.


September 22, 2019  |  

CompStor Novos: a low cost yet fast assembly-based variant calling for personal genomes

Application of assembly methods for personal genome analysis from next generation sequencing data has been limited by the requirement for an expensive supercomputer hardware or long computation times when using ordinary resources. We describe CompStor Novos, achieving supercomputer-class performance in de novo assembly computation time on standard server hardware, based on a tiered-memory algorithm. Run on commercial off-the-shelf servers, Novos assembly is more precise and 10-20 times faster than that of existing assembly algorithms. Furthermore, we integrated Novos into a variant calling pipeline and demonstrate that both compute times and precision of calling point variants and indels compare well with standard alignment-based pipelines. Additionally, assembly eliminates bias in the estimation of allele frequency for indels and naturally enables discovery of breakpoints for structural variants with base pair resolution. Thus, Novos bridges the gap between alignment-based and assembly-based genome analyses. Extension and adaption of its underlying algorithm will help quickly and fully harvest information in sequencing reads for personal genome reconstruction.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.