Circular consensus sequencing Archives - Page 32 of 33

July 7, 2019

TERRA promotes telomerase-mediated telomere elongation in Schizosaccharomyces pombe.

Telomerase-mediated telomere elongation provides cell populations with the ability to proliferate indefinitely. Telomerase is capable of recognizing and extending the shortest telomeres in cells; nevertheless, how this mechanism is executed remains unclear. Here, we show that, in the fission yeast Schizosaccharomyces pombe, shortened telomeres are highly transcribed into the evolutionarily conserved long noncoding RNA TERRA A fraction of TERRA produced upon telomere shortening is polyadenylated and largely devoid of telomeric repeats, and furthermore, telomerase physically interacts with this polyadenylated TERRA in vivo We also show that experimentally enhanced transcription of a manipulated telomere promotes its association with telomerase and concomitant elongation. Our data represent the first direct evidence that TERRA stimulates telomerase recruitment and activity at chromosome ends in an organism with human-like telomeres. © 2016 The Authors.

July 7, 2019

Long single-molecule reads can resolve the complexity of the influenza virus composed of rare, closely related mutant variants

As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2 % and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://?alan.?cs.?gsu.?edu/?NGS/???q=?content/?2snv.

July 7, 2019

Chloroplast genomes: diversity, evolution, and applications in genetic engineering.

Chloroplasts play a crucial role in sustaining life on earth. The availability of over 800 sequenced chloroplast genomes from a variety of land plants has enhanced our understanding of chloroplast biology, intracellular gene transfer, conservation, diversity, and the genetic basis by which chloroplast transgenes can be engineered to enhance plant agronomic traits or to produce high-value agricultural or biomedical products. In this review, we discuss the impact of chloroplast genome sequences on understanding the origins of economically important cultivated species and changes that have taken place during domestication. We also discuss the potential biotechnological applications of chloroplast genomes.

July 7, 2019

Microsatellite length scoring by Single Molecule Real Time Sequencing – Effects of sequence structure and PCR regime.

Microsatellites are DNA sequences consisting of repeated, short (1-6 bp) sequence motifs that are highly mutable by enzymatic slippage during replication. Due to their high intrinsic variability, microsatellites have important applications in population genetics, forensics, genome mapping, as well as cancer diagnostics and prognosis. The current analytical standard for microsatellites is based on length scoring by high precision electrophoresis, but due to increasing efficiency next-generation sequencing techniques may provide a viable alternative. Here, we evaluated single molecule real time (SMRT) sequencing, implemented in the PacBio series of sequencing apparatuses, as a means of microsatellite length scoring. To this end we carried out multiplexed SMRT sequencing of plasmid-carried artificial microsatellites of varying structure under different pre-sequencing PCR regimes. For each repeat structure, reads corresponding to the target length dominated. We found that pre-sequencing amplification had large effects on scoring accuracy and error distribution relative to controls, but that the effects of the number of amplification cycles were generally weak. In line with expectations enzymatic slippage decreased proportionally with microsatellite repeat unit length and increased with repetition number. Finally, we determined directional mutation trends, showing that PCR and SMRT sequencing introduced consistent but opposing error patterns in contraction and expansion of the microsatellites on the repeat motif and single nucleotide level.

July 7, 2019

Challenges, solutions, and quality metrics of personal genome assembly in advancing precision medicine.

Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.

July 7, 2019

The complete chloroplast genome sequence of the medicinal plant Swertia mussotii using the PacBio RS II platform.

Swertia mussotii is an important medicinal plant that has great economic and medicinal value and is found on the Qinghai Tibetan Plateau. The complete chloroplast (cp) genome of S. mussotii is 153,431 bp in size, with a pair of inverted repeat (IR) regions of 25,761 bp each that separate an large single-copy (LSC) region of 83,567 bp and an a small single-copy (SSC) region of 18,342 bp. The S. mussotii cp genome encodes 84 protein-coding genes, 37 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. The identity, number, and GC content of S. mussotii cp genes were similar to those in the genomes of other Gentianales species. Via analysis of the repeat structure, 11 forward repeats, eight palindromic repeats, and one reverse repeat were detected in the S. mussotii cp genome. There are 45 SSRs in the S. mussotii cp genome, the majority of which are mononucleotides found in all other Gentianales species. An entire cp genome comparison study of S. mussotii and two other species in Gentianaceae was conducted. The complete cp genome sequence provides intragenic information for the cp genetic engineering of this medicinal plant.

July 7, 2019

Privacy-preserving read mapping using locality sensitive hashing and secure kmer voting

The recent explosion in the amount of available genome sequencing data imposes high computational demands on the tools designed to analyze it. Low-cost cloud computing has the potential to alleviate this burden. However, moving personal genome data analysis to the cloud raises serious privacy concerns. Read alignment is a critical and computationally intensive first step of most genomic data analysis pipelines. While significant effort has been dedicated to optimize the sensitivity and runtime efficiency of this step, few approaches have addressed outsourcing this computation securely to an untrusted party. The few secure solutions that have been proposed either do not scale to whole genome sequencing datasets or are not competitive with the state of the art in read mapping. In this paper, we present BALAUR, a privacy-preserving read mapping algorithm based on locality sensitive hashing and secure kmer voting. BALAUR securely outsources a significant portion of the computation to the public cloud by formulating the alignment task as a voting scheme between encrypted read and reference kmers. Our approach can easily handle typical genome-scale datasets and is highly competitive with non-cryptographic state-of-the-art read aligners in both accuracy and runtime performance on simulated and real read data. Moreover, our approach is significantly faster than state-of-the-art read aligners in long read mapping.

July 7, 2019

Hyper-eccentric structural genes in the mitochondrial genome of the algal parasite Hemistasia phaeocysticola.

Diplonemid mitochondria are considered to have very eccentric structural genes. Coding regions of individual diplonemid mitochondrial genes are fragmented into small pieces and found on different circular DNAs. Short RNAs transcribed from each DNA molecule mature through a unique RNA maturation process involving assembly and three types of RNA editing (i.e., U insertion and A-to-I & C-to-U substitutions), although the molecular mechanism(s) of RNA maturation and the evolutionary history of these eccentric structural genes still remain to be understood. Since the gene fragmentation pattern is generally conserved among the diplonemid species studied to date, it was considered that their structural complexity has plateaued and further gene fragmentation could not occur. Here, we show the mitochondrial gene structure of Hemistasia phaeocysticola, which was recently identified as a member of a novel lineage in diplonemids, by comparison of the mitochondrial DNA sequences with cDNA sequences synthesized from mature mRNA. The genes of H. phaeocysticola are fragmented much more finely than those of other diplonemids studied to date. Furthermore, in addition to all known types of RNA editing, it is suggested that a novel processing step (i.e., secondary RNA insertion) is involved in the RNA maturation in the mitochondria of H. phaeocysticola Our findings demonstrate the tremendous plasticity of mitochondrial gene structures.© The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

July 7, 2019

Comparative genomics reveals Lysinibacillus sphaericus group comprises a novel species.

Early in the 1990s, it was recognized that Lysinibacillus sphaericus, one of the most popular and effective entomopathogenic bacteria, was a highly heterogeneous group. Many authors have even proposed it comprises more than one species, but the lack of phenotypic traits that guarantee an accurate differentiation has not allowed this issue to be clarified. Now that genomic technologies are rapidly advancing, it is possible to address the problem from a whole genome perspective, getting insights into the phylogeny, evolutive history and biology itself.The genome of the Colombian strain L. sphaericus OT4b.49 was sequenced, assembled and annotated, obtaining 3 chromosomal contigs and no evidence of plasmids. Using these sequences and the 13 other L. sphaericus genomes available on the NCBI database, we carried out comparative genomic analyses that included whole genome alignments, searching for mobile elements, phylogenomic metrics (TETRA, ANI and in-silico DDH) and pan-genome assessments. The results support the hypothesis about this species as a very heterogeneous group. The entomopathogenic lineage is actually a single and independent species with 3728 core genes and 2153 accessory genes, whereas each non-toxic strain seems to be a separate species, though without a clear circumscription. Toxin-encoding genes, binA, B and mtx1, 2, 3 could be acquired via horizontal gene transfer in a single evolutionary event. The non-toxic strain OT4b.31 is the most related with the type strain KCTC 3346.The current L. sphaericus is actually a sensu lato due to a sub-estimation of diversity accrued using traditional non-genomics based classification strategies. The toxic lineage is the most studied with regards to its larvicidal activity, which is a greatly conserved trait among these strains and thus, their differentiating feature. Further studies are needed in order to establish a univocal classification of the non-toxic strains that, according to our results, seem to be a paraphyletic group.

July 7, 2019

Conservation genetics of an endangered grassland butterfly (Oarisma poweshiek) reveals historically high gene flow despite recent and rapid range loss

1. In poorly dispersing species gene flow can be facilitated when suitable habitat is widespread, allowing for increased dispersal between neighbouring locations. The Poweshiek skipperling [Oarisma poweshiek (Parker)], a federally endangered butterfly, has undergone a rapid, recent demographic decline following the loss of tallgrass prairie and fen habitats range wide. The loss of habitat, now restricted geographic range, and poor dispersal ability have left O. poweshiek at increased risk of extinction. 2. We studied the population genetics of six remaining populations of O. poweshiek in order to test the hypothesis that gene flow was historically high despite limited long-distance dispersal capability. Utilising nine microsatellite loci developed by PacBio sequencing, we tested for patterns of isolation by distance, low population genetic structure and alternative gene flow models. 3. Populations from southern Manitoba, Canada to the Lower Peninsula of Michigan, USA are only weakly genetically differentiated despite having low diversity. We found no support for isolation by distance, and Bayesian estimates of historical gene flow support our hypothesis that high levels of gene flow previously connected populations from Michigan to Wisconsin. 4. Prairie grasslands have been reduced tremendously over the past century, but the low mobility of O. poweshiek suggests that rapid loss of populations over the past decade cannot be simply explained by fragmentation of habitat. 5. As a species at high risk of extinction, understanding historical processes of gene flow will allow for informed management decisions with respect to head-starting individuals for population reintroductions and for conserving networks of habitat that will allow for high levels of gene flow.

July 7, 2019

Probabilistic viral quasispecies assembly

Viruses are pathogens that cause infectious diseases. The swarm of virions is subject to the host’s immune pressure and possibly antiviral therapy. It may escape this selective pressure and gain selective advantage by acquiring one or more of the genomic alterations: single-nucleotide variants (SNVs), loss or gain of one or more amino acids, large deletions, for example, due to alternative splicing, or recombination of different strains. Genotypic antiretroviral drug resistance testing is performed via sequencing. Next-generation sequencing (NGS) technologies revolutionized assessing viral genetic diversity experimentally. In viral quasispecies analysis, there are two main goals: the identification of low-frequency variants and haplotype assembly on a whole-genome scale. PacBio performs single-molecule sequencing. This chapter elaborates human haplotyping and its relationship to probabilistic viral haplotype reconstruction methods. Viral quasispecies assembly has the potential to replace the current de facto diversity estimation by SNV calling. With advances in library preparation, increasing sensitivity of sequencing platforms, and more sophisticated models, it might be possible to detect all or most viral strains in a single individual.

July 7, 2019

Capturing pairwise and multi-way chromosomal conformations using chromosomal walks.

Chromosomes are folded into highly compacted structures to accommodate physical constraints within nuclei and to regulate access to genomic information. Recently, global mapping of pairwise contacts showed that loops anchoring topological domains (TADs) are highly conserved between cell types and species. Whether pairwise loops synergize to form higher-order structures is still unclear. Here we develop a conformation capture assay to study higher-order organization using chromosomal walks (C-walks) that link multiple genomic loci together into proximity chains in human and mouse cells. This approach captures chromosomal structure at varying scales. Inter-chromosomal contacts constitute only 7-10% of the pairs and are restricted by interfacing TADs. About half of the C-walks stay within one chromosome, and almost half of those are restricted to intra-TAD spaces. C-walks that couple 2-4 TADs indicate stochastic associations between transcriptionally active, early replicating loci. Targeted analysis of thousands of 3-walks anchored at highly expressed genes support pairwise, rather than hub-like, chromosomal topology at active loci. Polycomb-repressed Hox domains are shown by the same approach to enrich for synergistic hubs. Together, the data indicate that chromosomal territories, TADs, and intra-TAD loops are primarily driven by nested, possibly dynamic, pairwise contacts.

July 7, 2019

MICADo – Looking for mutations in targeted PacBio cancer data: an alignment-free method.

Targeted sequencing is commonly used in clinical application of NGS technology since it enables generation of sufficient sequencing depth in the targeted genes of interest and thus ensures the best possible downstream analysis. This notwithstanding, the accurate discovery and annotation of disease causing mutations remains a challenging problem even in such favorable context. The difficulty is particularly salient in the case of third generation sequencing technology, such as PacBio. We present MICADo, a de Bruijn graph based method, implemented in python, that makes possible to distinguish between patient specific mutations and other alterations for targeted sequencing of a cohort of patients. MICADo analyses NGS reads for each sample within the context of the data of the whole cohort in order to capture the differences between specificities of the sample with respect to the cohort. MICADo is particularly suitable for sequencing data from highly heterogeneous samples, especially when it involves high rates of non-uniform sequencing errors. It was validated on PacBio sequencing datasets from several cohorts of patients. The comparison with two widely used available tools, namely VarScan and GATK, shows that MICADo is more accurate, especially when true mutations have frequencies close to backgound noise. The source code is available at http://github.com/cbib/MICADo.

July 7, 2019

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data.In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing.The source code is freely available at https://sourceforge.net/projects/frame-pro/yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

TeloPCR-seq: a high-throughput sequencing approach for telomeres.

We have developed a high-throughput sequencing approach that enables us to determine terminal telomere sequences from tens of thousands of individual Schizosaccharomyces pombe telomeres. This method provides unprecedented coverage of telomeric sequence complexity in fission yeast. S. pombe telomeres are composed of modular degenerate repeats that can be explained by variation in usage of the TER1 RNA template during reverse transcription. Taking advantage of this deep sequencing approach, we find that ‘like’ repeat modules are highly correlated within individual telomeres. Moreover, repeat module preference varies with telomere length, suggesting that existing repeats promote the incorporation of like repeats and/or that specific conformations of the telomerase holoenzyme efficiently and/or processively add repeats of like nature. After the loss of telomerase activity, this sequencing and analysis pipeline defines a population of telomeres with altered sequence content. This approach will be adaptable to study telomeric repeats in other organisms and also to interrogate repetitive sequences throughout the genome that are inaccessible to other sequencing methods.© 2016 Federation of European Biochemical Societies.

Auto Tag: Circular consensus sequencing

TERRA promotes telomerase-mediated telomere elongation in Schizosaccharomyces pombe.

Long single-molecule reads can resolve the complexity of the influenza virus composed of rare, closely related mutant variants

Chloroplast genomes: diversity, evolution, and applications in genetic engineering.

Microsatellite length scoring by Single Molecule Real Time Sequencing – Effects of sequence structure and PCR regime.

Challenges, solutions, and quality metrics of personal genome assembly in advancing precision medicine.

The complete chloroplast genome sequence of the medicinal plant Swertia mussotii using the PacBio RS II platform.

Privacy-preserving read mapping using locality sensitive hashing and secure kmer voting

Hyper-eccentric structural genes in the mitochondrial genome of the algal parasite Hemistasia phaeocysticola.

Comparative genomics reveals Lysinibacillus sphaericus group comprises a novel species.

Conservation genetics of an endangered grassland butterfly (Oarisma poweshiek) reveals historically high gene flow despite recent and rapid range loss

Probabilistic viral quasispecies assembly

Capturing pairwise and multi-way chromosomal conformations using chromosomal walks.

MICADo – Looking for mutations in targeted PacBio cancer data: an alignment-free method.

Improve homology search sensitivity of PacBio data by correcting frameshifts.

TeloPCR-seq: a high-throughput sequencing approach for telomeres.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert