Menu
July 7, 2019  |  

BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper

De novo assembly is the process of reconstructing genomes from DNA fragments (reads), which may contain redundancy and errors. Longer reads simplify assembly and improve contiguity of the output, but current long-read technologies come with high error rates. A crucial step of de novo genome assembly for long reads consists of finding overlapping reads. We present Berkeley Long-Read to Long-Read Aligner and Overlapper (BELLA), which implement a novel approach to compute overlaps using Sparse Generalized Matrix Multiplication (SpGEMM). We present a probabilistic model which demonstrates the soundness of using short, fixed length k-mers to detect overlaps, avoiding expensive pairwise alignment of all reads against all others. We then introduce a notion of reliable k-mers based on our probabilistic model. The use of reliable k-mers eliminates both the k-mer set explosion that would otherwise happen with highly erroneous reads and the spurious overlaps due to k-mers originating from repetitive regions. Finally, we present a new method to separate true alignments from false positives depending on the alignment score. Using this methodology, which is employed in BELLAtextquoterights precise mode, the probability of false positives drops exponentially as the length of overlap between sequences increases. On simulated data, BELLA achieves an average of 2.26% higher recall than state-of-the-art tools in its sensitive mode and 18.90% higher precision than state-of-the-art tools in its precise mode, while being performance competitive.


July 7, 2019  |  

Reference genes for RT-qPCR normalisation in different tissues, developmental stages and stress conditions of Hypericum perforatum

Hypericum perforatum is a widely known medicinal herb used mostly as a remedy for depression because of its abundant secondary metabolites. Quantitative real-time PCR (qRT-PCR) is an optimized method for the efficient and reliable quantification of gene expression studies. In general, reference genes are used in qRT-PCR analysis because of their known or suspected housekeeping roles. However, their expression level cannot be assumed to remain stable under all possible experimental conditions. Thus, the identification of high quality reference genes is very necessary for the interpretation of qRT-PCR data. In this study, we investigated the expression of fourteen candidate genes, including nine housekeeping genes and five potential candidate genes. Additionally, the HpHYP1 gene, belonging to the PR-10 family associated with stress control, was used for validation of the candidate reference genes. Three programs were applied to evaluate the gene expression stability across four different plant tissues, three developmental stages and a set of abiotic stress and hormonal treatments. The candidate genes showed a wide range of Ct values in all samples, indicating that they are differentially expressed. Integrating all of the algorithms and evaluations, ACT2 and TUB-ß were the most stable combination overall and for different developmental stages samples. Moreover, ACT2 and EF1-a were considered to be the two most applicable reference genes for different tissues and for stress samples. Majority of the conventional housekeeping genes exhibited better than the potential reference genes. The obtained results will contribute to improving credibility of standardization and quantification of transcription levels in future expression research of H. perforatum.


July 7, 2019  |  

De novo genome assembly of the olive fruit fly (Bactrocera oleae) developed through a combination of linked-reads and long-read technologies

Long-read sequencing has greatly contributed to the generation of high quality assemblies, albeit at a high cost. It is also not always clear how to combine sequencing platforms. We sequenced the genome of the olive fruit fly (Bactrocera oleae), the most important pest in the olive fruits agribusiness industry, using Illumina short-reads, mate-pairs, 10x Genomics linked-reads, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT). The 10x linked-reads assembly gave the most contiguous assembly with an N50 of 2.16 Mb. Scaffolding the linked-reads assembly using long-reads from ONT gave a more contiguous assembly with scaffold N50 of 4.59 Mb. We also present the most extensive transcriptome datasets of the olive fly derived from different tissues and stages of development. Finally, we used the Chromosome Quotient method to identify Y-chromosome scaffolds and show that the long-reads based assembly generates very highly contiguous Y-chromosome assembly.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.