University of California San Diego Archives - Page 5 of 6

July 7, 2019 |

Length-independent DNA packing into nanopore zero-mode waveguides for low-input DNA sequencing.

Compared with conventional methods, single-molecule real-time (SMRT) DNA sequencing exhibits longer read lengths than conventional methods, less GC bias, and the ability to read DNA base modifications. However, reading DNA sequence from sub-nanogram quantities is impractical owing to inefficient delivery of DNA molecules into the confines of zero-mode waveguides-zeptolitre optical cavities in which DNA sequencing proceeds. Here, we show that the efficiency of voltage-induced DNA loading into waveguides equipped with nanopores at their floors is five orders of magnitude greater than existing methods. In addition, we find that DNA loading is nearly length-independent, unlike diffusive loading, which is biased towards shorter fragments. We demonstrate here loading and proof-of-principle four-colour sequence readout of a polymerase-bound 20,000-base-pair-long DNA template within seconds from a sub-nanogram input quantity, a step towards low-input DNA sequencing and mammalian epigenomic mapping of native DNA samples.

July 7, 2019 |

Variant review with the Integrative Genomics Viewer.

Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV’s variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org Cancer Res; 77(21); e31-34. ©2017 AACR.©2017 American Association for Cancer Research.

July 7, 2019 |

Characterization of four multidrug resistance plasmids captured from the sediments of an urban coastal wetland.

Self-transmissible and mobilizable plasmids contribute to the emergence and spread of multidrug-resistant bacteria by enabling the horizontal transfer of acquired antibiotic resistance. The objective of this study was to capture and characterize self-transmissible and mobilizable resistance plasmids from a coastal wetland impacted by urban stormwater runoff and human wastewater during the rainy season. Four plasmids were captured, two self-transmissible and two mobilizable, using both mating and enrichment approaches. Plasmid genomes, sequenced with either Illumina or PacBio platforms, revealed representatives of incompatibility groups IncP-6, IncR, IncN3, and IncF. The plasmids ranged in size from 36 to 144 kb and encoded known resistance genes for most of the major classes of antibiotics used to treat Gram-negative infections (tetracyclines, sulfonamides, ß-lactams, fluoroquinolones, aminoglycosides, and amphenicols). The mobilizable IncP-6 plasmid pLNU-11 was discovered in a strain of Citrobacter freundii enriched from the wetland sediments with tetracycline and nalidixic acid, and encodes a novel AmpC-like ß-lactamase (blaWDC-1), which shares less than 62% amino acid sequence identity with the PDC class of ß-lactamases found in Pseudomonas aeruginosa. Although the IncR plasmid pTRE-1611 was captured by mating wetland bacteria with P. putida KT2440 as recipient, it was found to be mobilizable rather than self-transmissible. Two self-transmissible multidrug-resistance plasmids were also captured: the small (48 kb) IncN3 plasmid pTRE-131 was captured by mating wetland bacteria with Escherichia coli HY842 where it is seemed to be maintained at nearly 240 copies per cell, while the large (144 kb) IncF plasmid pTRE-2011, which was isolated from a cefotaxime-resistant environmental strain of E. coli ST744, exists at just a single copy per cell. Furthermore, pTRE-2011 bears the globally epidemic blaCTX-M-55 extended-spectrum ß-lactamase downstream of ISEcp1. Our results indicate that urban coastal wetlands are reservoirs of diverse self-transmissible and mobilizable plasmids of relevance to human health.

July 7, 2019 |

SV2: Accurate structural variation genotyping and de novo mutation detection from whole genomes.

Structural Variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease.Here we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2).Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

July 7, 2019 |

Ultraaccurate genome sequencing and haplotyping of single human cells.

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.

July 7, 2019 |

Integrating mass spectrometry and genomics for cyanobacterial metabolite discovery.

Filamentous marine cyanobacteria produce bioactive natural products with both potential therapeutic value and capacity to be harmful to human health. Genome sequencing has revealed that cyanobacteria have the capacity to produce many more secondary metabolites than have been characterized. The biosynthetic pathways that encode cyanobacterial natural products are mostly uncharacterized, and lack of cyanobacterial genetic tools has largely prevented their heterologous expression. Hence, a combination of cutting edge and traditional techniques has been required to elucidate their secondary metabolite biosynthetic pathways. Here, we review the discovery and refined biochemical understanding of the olefin synthase and fatty acid ACP reductase/aldehyde deformylating oxygenase pathways to hydrocarbons, and the curacin A, jamaicamide A, lyngbyabellin, columbamide, and a trans-acyltransferase macrolactone pathway encoding phormidolide. We integrate into this discussion the use of genomics, mass spectrometric networking, biochemical characterization, and isolation and structure elucidation techniques.

July 7, 2019 |

hybridSPAdes: an algorithm for hybrid assembly of short and long reads.

Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost.We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads.hybridSPAdes is implemented in C++?as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades CONTACT: d.antipov@spbu.ruSupplementary information: supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019 |

Assembly of long error-prone reads using de Bruijn graphs.

The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.

July 7, 2019 |

Cloche is a bHLH-PAS transcription factor that drives haemato-vascular specification.

Vascular and haematopoietic cells organize into specialized tissues during early embryogenesis to supply essential nutrients to all organs and thus play critical roles in development and disease. At the top of the haemato-vascular specification cascade lies cloche, a gene that when mutated in zebrafish leads to the striking phenotype of loss of most endothelial and haematopoietic cells and a significant increase in cardiomyocyte numbers. Although this mutant has been analysed extensively to investigate mesoderm diversification and differentiation and continues to be broadly used as a unique avascular model, the isolation of the cloche gene has been challenging due to its telomeric location. Here we used a deletion allele of cloche to identify several new cloche candidate genes within this genomic region, and systematically genome-edited each candidate. Through this comprehensive interrogation, we succeeded in isolating the cloche gene and discovered that it encodes a PAS-domain-containing bHLH transcription factor, and that it is expressed in a highly specific spatiotemporal pattern starting during late gastrulation. Gain-of-function experiments show that it can potently induce endothelial gene expression. Epistasis experiments reveal that it functions upstream of etv2 and tal1, the earliest expressed endothelial and haematopoietic transcription factor genes identified to date. A mammalian cloche orthologue can also rescue blood vessel formation in zebrafish cloche mutants, indicating a highly conserved role in vertebrate vasculogenesis and haematopoiesis. The identification of this master regulator of endothelial and haematopoietic fate enhances our understanding of early mesoderm diversification and may lead to improved protocols for the generation of endothelial and haematopoietic cells in vivo and in vitro.

July 7, 2019 |

Short tandem repeat number estimation from paired-end reads for multiple individuals by considering coalescent tree.

Two types of approaches are mainly considered for the repeat number estimation in short tandem repeat (STR) regions from high-throughput sequencing data: approaches directly counting repeat patterns included in sequence reads spanning the region and approaches based on detecting the difference between the insert size inferred from aligned paired-end reads and the actual insert size. Although the accuracy of repeat numbers estimated with the former approaches is high, the size of target STR regions is limited to the length of sequence reads. On the other hand, the latter approaches can handle STR regions longer than the length of sequence reads. However, repeat numbers estimated with the latter approaches is less accurate than those with the former approaches.We proposed a new statistical model named coalescentSTR that estimates repeat numbers from paired-end read distances for multiple individuals simultaneously by connecting the read generative model for each individual with their genealogy. In the model, the genealogy is represented by handling coalescent trees as hidden variables, and the summation of the hidden variables is taken on coalescent trees sampled based on phased genotypes located around a target STR region with Markov chain Monte Carlo. In the sampled coalescent trees, repeat number information from insert size data is propagated, and more accurate estimation of repeat numbers is expected for STR regions longer than the length of sequence reads. For finding the repeat numbers maximizing the likelihood of the model on the estimation of repeat numbers, we proposed a state-of-the-art belief propagation algorithm on sampled coalescent trees.We verified the effectiveness of the proposed approach from the comparison with existing methods by using simulation datasets and real whole genome and whole exome data for HapMap individuals analyzed in the 1000 Genomes Project.

July 7, 2019 |

Complete genome sequence of a cylindrospermopsin-producing cyanobacterium, Cylindrospermopsis raciborskii CS505, containing a circular chromosome and a single extrachromosomal element.

Cylindrospermopsis raciborskii is a freshwater cyanobacterium producing bloom events and toxicity in drinking water source reservoirs. We present the first genome sequence for C. raciborskii CS505 (Australia), containing one 4.1-Mbp chromosome and one 110-Kbp plasmid having G+C contents of 40.3% (3933 genes) and 39.3% (111 genes), respectively. Copyright © 2016 Fuentes-Valdés et al.

July 7, 2019 |

Chromosome assembly of large and complex genomes using multiple references

Despite the rapid development of sequencing technologies, assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout, a reference-assisted assembly tool that now works for large and complex genomes. Taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. Using Ragout, we transformed NGS assemblies of 15 different Mus musculus and one Mus spretus genomes into sets of complete chromosomes, leaving less than 5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long PacBio reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. Additionally, we applied Ragout to Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared to other genomes from the Muridae family. Chromosome color maps confirmed most large-scale rearrangements that Ragout detected.

July 7, 2019 |

Genomewide Dam methylation in Escherichia coli during long-term stationary phase.

DNA methylation in prokaryotes is widespread. The most common modification of the genome is the methylation of adenine at the N-6 position. In Escherichia coli K-12 and many gammaproteobacteria, this modification is catalyzed by DNA adenine methyltransferase (Dam) at the GATC consensus sequence and is known to modulate cellular processes including transcriptional regulation of gene expression, initiation of chromosomal replication, and DNA mismatch repair. While studies thus far have focused on the motifs associated with methylated adenine (meA), the frequency of meA across the genome, and temporal dynamics during early periods of incubation, here we conduct the first study on the temporal dynamics of adenine methylation in E. coli by Dam throughout all five phases of the bacterial life cycle in the laboratory. Using single-molecule real-time sequencing, we show that virtually all GATC sites are significantly methylated over time; nearly complete methylation of the chromosome was confirmed by mass spectroscopy analysis. However, we also detect 66 sites whose methylation patterns change significantly over time within a population, including three sites associated with sialic acid transport and catabolism, suggesting a potential role for Dam regulation of these genes; differential expression of this subset of genes was confirmed by quantitative real-time PCR. Further, we show significant growth defects of the dam mutant during long-term stationary phase (LTSP). Together these data suggest that the cell places a high premium on fully methylating the chromosome and that alterations in methylation patterns may have significant impact on patterns of transcription, maintenance of genetic fidelity, and cell survival. IMPORTANCE While it has been shown that methylation remains relatively constant into early stationary phase of E. coli, this study goes further through death phase and long-term stationary phase, a unique time in the bacterial life cycle due to nutrient limitation and strong selection for mutants with increased fitness. The absence of methylation at GATC sites can influence the mutation frequency within a population due to aberrant mismatch repair. Therefore, it is important to investigate the methylation status of GATC sites in an environment where cells may not prioritize methylation of the chromosome. This study demonstrates that chromosome methylation remains a priority even under conditions of nutrient limitation, indicating that continuous methylation at GATC sites could be under positive selection.

July 7, 2019 |

Next-generation sequencing of Haematococcus lacustris reveals an extremely large 1.35-megabase chloroplast genome.

Haematococcus lacustris is an industrially relevant microalga that is used for the production of the carotenoid astaxanthin. Here, we report the use of PacBio long-read sequencing to assemble the chloroplast genome of H. lacustris strain UTEX:2505. At 1.35?Mb, this is the largest assembled chloroplast of any plant or alga known to date. Copyright © 2018 Bauman et al.

July 7, 2019 |

RIFRAF: a frame-resolving consensus algorithm.

Protein coding genes can be studied using long-read next generation sequencing. However, high rates of indel sequencing errors are problematic, corrupting the reading frame. Even the consensus of multiple independent sequence reads retains indel errors. To solve this problem, we introduce Reference-Informed Frame-Resolving multiple-Alignment Free template inference algorithm (RIFRAF), a sequence consensus algorithm that takes a set of error-prone reads and a reference sequence and infers an accurate in-frame consensus. RIFRAF uses a novel structure, analogous to a two-layer hidden Markov model: the consensus is optimized to maximize alignment scores with both the set of noisy reads and with a reference. The template-to-reads component of the model encodes the preponderance of indels, and is sensitive to the per-base quality scores, giving greater weight to more accurate bases. The reference-to-template component of the model penalizes frame-destroying indels. A local search algorithm proceeds in stages to find the best consensus sequence for both objectives.Using Pacific Biosciences SMRT sequences from an HIV-1 env clone, NL4-3, we compare our approach to other consensus and frame correction methods. RIFRAF consistently finds a consensus sequence that is more accurate and in-frame, especially with small numbers of reads. It was able to perfectly reconstruct over 80% of consensus sequences from as few as three reads, whereas the best alternative required twice as many. RIFRAF is able to achieve these results and keep the consensus in-frame even with a distantly related reference sequence. Moreover, unlike other frame correction methods, RIFRAF can detect and keep true indels while removing erroneous ones.RIFRAF is implemented in Julia, and source code is publicly available at https://github.com/MurrellGroup/Rifraf.jl.Supplementary data are available at Bioinformatics online.

Auto Tag: University of California San Diego

Length-independent DNA packing into nanopore zero-mode waveguides for low-input DNA sequencing.

Variant review with the Integrative Genomics Viewer.

Characterization of four multidrug resistance plasmids captured from the sediments of an urban coastal wetland.

SV2: Accurate structural variation genotyping and de novo mutation detection from whole genomes.

Ultraaccurate genome sequencing and haplotyping of single human cells.

Integrating mass spectrometry and genomics for cyanobacterial metabolite discovery.

hybridSPAdes: an algorithm for hybrid assembly of short and long reads.

Assembly of long error-prone reads using de Bruijn graphs.

Cloche is a bHLH-PAS transcription factor that drives haemato-vascular specification.

Short tandem repeat number estimation from paired-end reads for multiple individuals by considering coalescent tree.

Complete genome sequence of a cylindrospermopsin-producing cyanobacterium, Cylindrospermopsis raciborskii CS505, containing a circular chromosome and a single extrachromosomal element.

Chromosome assembly of large and complex genomes using multiple references

Genomewide Dam methylation in Escherichia coli during long-term stationary phase.

Next-generation sequencing of Haematococcus lacustris reveals an extremely large 1.35-megabase chloroplast genome.

RIFRAF: a frame-resolving consensus algorithm.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert