Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects.We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis.Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.
A genome sequence assembly represents a model of a genome. This article explores some tools and methods for assessing the quality of an assembly, using publicly available data for Streptomyces species as the example. There is great variability in quality of assemblies deposited in GenBank. Only in a small minority of these assemblies are the raw data available, enabling full appraisal of the assembly quality. © 2015 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.
Background: Numerous completed or on-going whole genome sequencing projects have highlighted the fact that obtaining a high quality genome sequence is necessary to address comparative genomics questions such as structural variations among genotypes and gain or loss of specific function. Despite the spectacular progress that has been made in sequencing technologies, obtaining accurate and reliable data is still a challenge, both at the whole genome scale and when targeting specific genomic regions. These problems are even more noticeable for complex plant genomes. Most plant genomes are known to be particularly challenging due to their size, high density of repetitive elements and various levels of ploidy. To overcome these problems, we have developed a strategy to reduce genome complexity by using the large insert BAC libraries combined with next generation sequencing technologies. Results: We compared two different technologies (Roche-454 and Pacific Biosciences PacBio RS II) to sequence pools of BAC clones in order to obtain the best quality sequence. We targeted nine BAC clones from different species (maize, wheat, strawberry, barley, sugarcane and sunflower) known to be complex in terms of sequence assembly. We sequenced the pools of the nine BAC clones with both technologies. We compared assembly results and highlighted differences due to the sequencing technologies used. Conclusions: We demonstrated that the long reads obtained with the PacBio RS II technology serve to obtain a better and more reliable assembly, notably by preventing errors due to duplicated or repetitive sequences in the same region.
Monoclonal antibody 10-1074 targets the V3 glycan supersite on the HIV-1 envelope (Env) protein. It is among the most potent anti-HIV-1 neutralizing antibodies isolated so far. Here we report on its safety and activity in 33 individuals who received a single intravenous infusion of the antibody. 10-1074 was well tolerated and had a half-life of 24.0 d in participants without HIV-1 infection and 12.8 d in individuals with HIV-1 infection. Thirteen individuals with viremia received the highest dose of 30 mg/kg 10-1074. Eleven of these participants were 10-1074-sensitive and showed a rapid decline in viremia by a mean of 1.52 log10 copies/ml. Virologic analysis revealed the emergence of multiple independent 10-1074-resistant viruses in the first weeks after infusion. Emerging escape variants were generally resistant to the related V3-specific antibody PGT121, but remained sensitive to antibodies targeting nonoverlapping epitopes, such as the anti-CD4-binding-site antibodies 3BNC117 and VRC01. The results demonstrate the safety and activity of 10-1074 in humans and support the idea that antibodies targeting the V3 glycan supersite might be useful for the treatment and prevention of HIV-1 infection.
Utricularia gibba, the humped bladderwort, is a carnivorous plant that retains a tiny nuclear genome despite at least two rounds of whole genome duplication (WGD) since common ancestry with grapevine and other species. We used a third-generation genome assembly with several complete chromosomes to reconstruct the two most recent lineage-specific ancestral genomes that led to the modern U. gibba genome structure. Patterns of subgenome dominance in the most recent WGD, both architectural and transcriptional, are suggestive of allopolyploidization, which may have generated genomic novelty and led to instantaneous speciation. Syntenic duplicates retained in polyploid blocks are enriched for transcription factor functions, whereas gene copies derived from ongoing tandem duplication events are enriched in metabolic functions potentially important for a carnivorous plant. Among these are tandem arrays of cysteine protease genes with trap-specific expression that evolved within a protein family known to be useful in the digestion of animal prey. Further enriched functions among tandem duplicates (also with trap-enhanced expression) include peptide transport (intercellular movement of broken-down prey proteins), ATPase activities (bladder-trap acidification and transmembrane nutrient transport), hydrolase and chitinase activities (breakdown of prey polysaccharides), and cell-wall dynamic components possibly associated with active bladder movements. Whereas independently polyploid Arabidopsis syntenic gene duplicates are similarly enriched for transcriptional regulatory activities, Arabidopsis tandems are distinct from those of U. gibba, while still metabolic and likely reflecting unique adaptations of that species. Taken together, these findings highlight the special importance of tandem duplications in the adaptive landscapes of a carnivorous plant genome.
SMRT Gate: A method for validation of synthetic constructs on Pacific Biosciences sequencing platforms.
Current DNA assembly methods are prone to sequence errors, requiring rigorous quality control (QC) to identify incorrect assemblies or synthesized constructs. Such errors can lead to misinterpretation of phenotypes. Because of this intrinsic problem, routine QC analysis is generally performed on three or more clones using a combination of restriction endonuclease assays, colony PCR, and Sanger sequencing. However, as new automation methods emerge that enable high-throughput assembly, QC using these techniques has become a major bottleneck. Here, we describe a quick and affordable methodology for the QC of synthetic constructs. Our method involves a one-pot digestion-ligation DNA assembly reaction, based on the Golden Gate assembly methodology, that is coupled with Pacific Biosciences’ Single Molecule, Real-Time (PacBio SMRT) sequencing technology.
Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing.
High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ~40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5?kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process.© 2018 Wiley Periodicals, Inc.
Sensitive detection of mitochondrial DNA variants for analysis of mitochondrial DNA-enriched extracts from frozen tumor tissue.
Large variation exists in mitochondrial DNA (mtDNA) not only between but also within individuals. Also in human cancer, tumor-specific mtDNA variation exists. In this work, we describe the comparison of four methods to extract mtDNA as pure as possible from frozen tumor tissue. Also, three state-of-the-art methods for sensitive detection of mtDNA variants were evaluated. The main aim was to develop a procedure to detect low-frequent single-nucleotide mtDNA-specific variants in frozen tumor tissue. We show that of the methods evaluated, DNA extracted from cytosol fractions following exonuclease treatment results in highest mtDNA yield and purity from frozen tumor tissue (270-fold mtDNA enrichment). Next, we demonstrate the sensitivity of detection of low-frequent single-nucleotide mtDNA variants (=1% allele frequency) in breast cancer cell lines MDA-MB-231 and MCF-7 by single-molecule real-time (SMRT) sequencing, UltraSEEK chemistry based mass spectrometry, and digital PCR. We also show de novo detection and allelic phasing of variants by SMRT sequencing. We conclude that our sensitive procedure to detect low-frequent single-nucleotide mtDNA variants from frozen tumor tissue is based on extraction of DNA from cytosol fractions followed by exonuclease treatment to obtain high mtDNA purity, and subsequent SMRT sequencing for (de novo) detection and allelic phasing of variants.
Global genetic diversity of var2csa in Plasmodium falciparum with implications for malaria in pregnancy and vaccine development.
Malaria infection during pregnancy, caused by the sequestering of Plasmodium falciparum parasites in the placenta, leads to high infant mortality and maternal morbidity. The parasite-placenta adherence mechanism is mediated by the VAR2CSA protein, a target for natural occurring immunity. Currently, vaccine development is based on its ID1-DBL2Xb domain however little is known about the global genetic diversity of the encoding var2csa gene, which could influence vaccine efficacy. In a comprehensive analysis of the var2csa gene in >2,000?P. falciparum field isolates across 23 countries, we found that var2csa is duplicated in high prevalence (>25%), African and Oceanian populations harbour a much higher diversity than other regions, and that insertions/deletions are abundant leading to an underestimation of the diversity of the locus. Further, ID1-DBL2Xb haplotypes associated with adverse birth outcomes are present globally, and African-specific haplotypes exist, which should be incorporated into vaccine design.
Retrohoming of a mobile group II intron in human cells suggests how eukaryotes limit group II intron proliferation.
Mobile bacterial group II introns are evolutionary ancestors of spliceosomal introns and retroelements in eukaryotes. They consist of an autocatalytic intron RNA (a “ribozyme”) and an intron-encoded reverse transcriptase, which function together to promote intron integration into new DNA sites by a mechanism termed “retrohoming”. Although mobile group II introns splice and retrohome efficiently in bacteria, all examined thus far function inefficiently in eukaryotes, where their ribozyme activity is limited by low Mg2+ concentrations, and intron-containing transcripts are subject to nonsense-mediated decay (NMD) and translational repression. Here, by using RNA polymerase II to express a humanized group II intron reverse transcriptase and T7 RNA polymerase to express intron transcripts resistant to NMD, we find that simply supplementing culture medium with Mg2+ induces the Lactococcus lactis Ll.LtrB intron to retrohome into plasmid and chromosomal sites, the latter at frequencies up to ~0.1%, in viable HEK-293 cells. Surprisingly, under these conditions, the Ll.LtrB intron reverse transcriptase is required for retrohoming but not for RNA splicing as in bacteria. By using a genetic assay for in vivo selections combined with deep sequencing, we identified intron RNA mutations that enhance retrohoming in human cells, but <4-fold and not without added Mg2+. Further, the selected mutations lie outside the ribozyme catalytic core, which appears not readily modified to function efficiently at low Mg2+ concentrations. Our results reveal differences between group II intron retrohoming in human cells and bacteria and suggest constraints on critical nucleotide residues of the ribozyme core that limit how much group II intron retrohoming in eukaryotes can be enhanced. These findings have implications for group II intron use for gene targeting in eukaryotes and suggest how differences in intracellular Mg2+ concentrations between bacteria and eukarya may have impacted the evolution of introns and gene expression mechanisms.
The incorrect alignments are a severe problem in variant calling, and remain as a challenge computational issue in Bioinformatics field. Although there have been some methods utilizing the re-alignment approach to tackle the misalignments, a standalone re-alignment tool for long sequencing reads is lacking. Hence, we present a standalone tool to correct the misalignments, called ProbAlign. It can be integrated into the pipelines of not only variant calling but also other genomic applications. We demonstrate the use of re-alignment in two diverse and important genomics fields: variant calling and viral quasispecies reconstruction. First, variant calling results in the Pacific Biosciences SMRT re-sequencing data of NA12878 show that false positives can be reduced by 43.5%, and true positives can be increased by 24.8% averagely, after re-alignment. Second, results in reconstructing a 5-virus-mix show that the viral population can be completely unraveled, and also the estimation of quasispecies frequencies has been improved, after re-alignment. ProbAlign is freely available in the PyroTools toolkit (https://github.com/homopolymer/PyroTools).
Clostridium chauvoei is the etiological agent of blackleg, a disease of cattle and sheep with high mortality rates, causing severe economic losses in livestock production. Here, we report the draft genome sequence of the virulent C. chauvoei strain JF4335 (2.8 Mbp and 28% G+C content) and the annotation of the genome.
The human roseoloviruses human herpesvirus 6A (HHV-6A), HHV-6B, and HHV-7 comprise the Roseolovirus genus of the human Betaherpesvirinae subfamily. Infections with these viruses have been implicated in many diseases; however, it has been challenging to establish infections with roseoloviruses as direct drivers of pathology, because they are nearly ubiquitous and display species-specific tropism. Furthermore, controlled study of infection has been hampered by the lack of experimental models, and until now, a mouse roseolovirus has not been identified. Herein we describe a virus that causes severe thymic necrosis in neonatal mice, characterized by a loss of CD4(+) T cells. These phenotypes resemble those caused by the previously described mouse thymic virus (MTV), a putative herpesvirus that has not been molecularly characterized. By next-generation sequencing of infected tissue homogenates, we assembled a contiguous 174-kb genome sequence containing 128 unique predicted open reading frames (ORFs), many of which were most closely related to herpesvirus genes. Moreover, the structure of the virus genome and phylogenetic analysis of multiple genes strongly suggested that this virus is a betaherpesvirus more closely related to the roseoloviruses, HHV-6A, HHV-6B, and HHV-7, than to another murine betaherpesvirus, mouse cytomegalovirus (MCMV). As such, we have named this virus murine roseolovirus (MRV) because these data strongly suggest that MRV is a mouse homolog of HHV-6A, HHV-6B, and HHV-7. IMPORTANCE Herein we describe the complete genome sequence of a novel murine herpesvirus. By sequence and phylogenetic analyses, we show that it is a betaherpesvirus most closely related to the roseoloviruses, human herpesviruses 6A, 6B, and 7. These data combined with physiological similarities with human roseoloviruses collectively suggest that this virus is a murine roseolovirus (MRV), the first definitively described rodent roseolovirus, to our knowledge. Many biological and clinical ramifications of roseolovirus infection in humans have been hypothesized, but studies showing definitive causative relationships between infection and disease susceptibility are lacking. Here we show that MRV infects the thymus and causes T-cell depletion, suggesting that other roseoloviruses may have similar properties. Copyright © 2017 American Society for Microbiology.
Lost in plasmids: next generation sequencing and the complex genome of the tick-borne pathogen Borrelia burgdorferi.
Borrelia (B.) burgdorferi sensu lato, including the tick-transmitted agents of human Lyme borreliosis, have particularly complex genomes, consisting of a linear main chromosome and numerous linear and circular plasmids. The number and structure of plasmids is variable even in strains within a single genospecies. Genes on these plasmids are known to play essential roles in virulence and pathogenicity as well as host and vector associations. For this reason, it is essential to explore methods for rapid and reliable characterisation of molecular level changes on plasmids. In this study we used three strains: a low passage isolate of B. burgdorferi sensu stricto strain B31(-NRZ) and two closely related strains (PAli and PAbe) that were isolated from human patients. Sequences of these strains were compared to the previously sequenced reference strain B31 (available in GenBank) to obtain proof-of-principle information on the suitability of next generation sequencing (NGS) library construction and sequencing methods on the assembly of bacterial plasmids. We tested the effectiveness of different short read assemblers on Illumina sequences, and of long read generation methods on sequence data from Pacific Bioscience single-molecule real-time (SMRT) and nanopore (Oxford Nanopore Technologies) sequencing technology.Inclusion of mate pair library reads improved the assembly in some plasmids as did prior enrichment of plasmids. While cp32 plasmids remained refractory to assembly using only short reads they were effectively assembled by long read sequencing methods. The long read SMRT and nanopore sequences came, however, at the cost of indels (insertions or deletions) appearing in an unpredictable manner. Using long and short read technologies together allowed us to show that the three B. burgdorferi s.s. strains investigated here, whilst having similar plasmid structures to each other (apart from fusion of cp32 plasmids), differed significantly from the reference strain B31-GB, especially in the case of cp32 plasmids.Short read methods are sufficient to assemble the main chromosome and many of the plasmids in B. burgdorferi. However, a combination of short and long read sequencing methods is essential for proper assembly of all plasmids including cp32 and thus, for gaining an understanding of host- or vector adaptations. An important conclusion from our work is that the evolution of Borrelia plasmids appears to be dynamic. This has important implications for the development of useful research strategies to monitor the risk of Lyme disease occurrence and how to medically manage it.
Draft genome sequence of Plantibacter flavus strain 251 isolated from a plant growing in a chronically hydrocarbon-contaminated site.
Plantibacter flavus isolate 251 is a bacterial endophyte isolated from an Achillea millefolium plant growing in a natural oil seep soil located in Oil Springs, Ontario, Canada. We present here a draft genome sequence of an infrequently reported genus Plantibacter, highlighting an endophytic lifestyle and biotechnological potential. Copyright © 2017 Lumactud et al.