Background: The HIV-1 proviral reservoir is incredibly stable, even while undergoing antiretroviral therapy, and is seen as the major barrier to HIV-1 eradication. Identifying and comprehensively characterizing this reservoir will be critical to achieving an HIV cure. Historically, this has been a tedious and labor intensive process, requiring high-replicate single-genome amplification reactions, or overlapping amplicons that are then reconstructed into full-length genomes by algorithmic imputation. Here, we present a deep sequencing and analysis method able to determine the exact identity and relative abundances of near-full-length HIV genomes from samples containing mixtures of genomes without shearing or complex bioinformatic reconstruction. Methods: We generated clonal near-full-length (~9 kb) amplicons derived from single genome amplification (SGA) of primary proviral isolates or PCR of well-documented control strains. These clonal products were mixed at various abundances and sequenced as near-full-length (~9 kb) amplicons without shearing. Each mixture yielded many near-full-length HIV-1 reads. Mathematical analysis techniques resolved the complex mixture of reads into estimates of distinct near-full-length viral genomes with their relative abundances. Results: Single Molecule, Real-Time (SMRT) Sequencing data contained near-full-length (~9 kb) continuous reads for each sample including some runs with greater than 10,000 near-full-length-genome reads in a three-hour sequencing run. Our methods correctly recapitulated exactly the originating genomes at a single-base resolution and their relative abundances in both mixtures of clonal controls and SGAs, and these results were validated using independent sequencing methods. Correct resolution was achieved even when genomes differed only by a single base. Minor abundances of 5% were reliably detected. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules, a rapid time-to-result. The single-molecule, full-length nature of this sequencing method allows us to estimate variant subspecies and relative abundances with single-nucleotide resolution. This method allows for reference-agnostic and cost-effective full-genome sequencing of HIV-1, which could both further our understanding of latent infection and develop novel and improved tools for quantifying HIV provirus, which will be critical to cure HIV.
We have developed barcoding reagents and workflows for multiplexing amplicons or fragmented native genomic (DNA) prior to Single Molecule, Real-Time (SMRT) Sequencing. The long reads of PacBio’s SMRT Sequencing enable detection of linked mutations across multiple kilobases (kb) of sequence. This feature is particularly useful in the context of mutational analysis or SNP confirmation, where a large number of samples are generated routinely. To validate this workflow, a set of 384 1.7-kb amplicons, each derived from variants of the Phi29 DNA polymerase gene, were barcoded during amplification, pooled, and sequenced on a single SMRT Cell. To demonstrate the applicability of the method to longer inserts, a library of 96 5-kb clones derived from the E. coli genome was sequenced.
Outside of the simplest cases (haploid, bacteria, or inbreds), genomic information is not carried in a single reference per individual, but rather has higher ploidy (n=>2) for almost all organisms. The existence of two or more highly related sequences within an individual makes it extremely difficult to build high quality, highly contiguous genome assemblies from short DNA fragments. Based on the earlier work on a polyploidy aware assembler, FALCON ( https://github.com/PacificBiosciences/FALCON) , we developed new algorithms and software (“FALCON-unzip”) for de novo haplotype reconstructions from SMRT Sequencing data. We generate two datasets for developing the algorithms and the prototype software: (1) whole genome sequencing data from a highly repetitive diploid fungal (Clavicorona pyxidata) and (2) whole genome sequencing data from an F1 hybrid from two inbred Arabidopsis strains: Cvi-0 and Col-0. For the fungal genome, we achieved an N50 of 1.53 Mb (of the 1n assembly contigs) of the ~42 Mb 1n genome and an N50 of the haplotigs (haplotype specific contigs) of 872 kb from a 95X read length N50 ~16 kb dataset. We found that ~ 45% of the genome was highly heterozygous and ~55% of the genome was highly homozygous. We developed methods to assess the base-level accuracy and local haplotype phasing accuracy of the assembly with short-read data from the Illumina® platform. For the ArabidopsisF1 hybrid genome, we found that 80% of the genome could be separated into haplotigs. The long range accuracy of phasing haplotigs was evaluated by comparing them to the assemblies from the two inbred parental lines. We show that a more complete view of all haplotypes could provide useful biological insights through improved annotation, characterization of heterozygous variants of all sizes, and resolution of differential allele expression. The current Falcon-Unzip method will lead to understand how to solve more difficult polyploid genome assembly problems and improve the computational efficiency for large genome assemblies. Based on this work, we can develop a pipeline enabling routinely assemble diploid or polyploid genomes as haplotigs, representing a comprehensive view of the genomes that can be studied with the information at hand.
Un-zipping diploid genomes – revealing all kinds of heterozygous variants from comprehensive hapltotig assemblies
Outside of the simplest cases (haploid, bacteria, or inbreds), genomic information is not carried in a single reference per individual, but rather has higher ploidy (n=>2) for almost all organisms. The existence of two or more highly related sequences within an individual makes it extremely difficult to build high quality, highly contiguous genome assemblies from short DNA fragments. Based on the earlier work on a polyploidy aware assembler, FALCON (https://github.com/PacificBiosciences/FALCON), we developed new algorithms and software (FALCON-unzip) for de novo haplotype reconstructions from SMRT Sequencing data. We apply the algorithms and the prototype software for (1) a highly repetitive diploid fungal genome (Clavicorona pyxidata) and (2) an F1 hybrid from two inbred Arabidopsis strains: CVI-0 and COL-0. For the fungal genome, we achieved an N50 of 1.53 Mb (of the 1n assembly contigs) of the ~42 Mb 1n genome and an N50 of the haplotigs of 872 kb from a 95X read length N50 ~16 kb dataset. We found that ~ 45% of the genome was highly heterozygous and ~55% of the genome was highly homozygous. We developed methods to assess the base-level accuracy and local haplotype phasing accuracy of the assembly with short-read data from the Illumina platform. For the Arabidopsis F1 hybrid genome, we found that 80% of the genome could be separated into haplotigs. The long range accuracy of phasing haplotigs was evaluated by comparing them to the assemblies from the two inbred parental lines. We show that a more complete view of all haplotypes could provide useful biological insights through improved annotation, characterization of heterozygous variants of all sizes, and resolution of differential allele expression. Finally, we applied this method to WGS human data sets to demonstrate the potential for resolving complicated, medically-relevant genomic regions.
Grant Cramer from the University of Nevada, Reno, and Dario Cantu from the Univeristy of Callifornia, Davis, discuss past challenges with sequencing Clone 8 of Cabernet Sauvignon (Vitis vinifera). An…
Arthrinium phaeospermum (Corda) M.B. Ellis is a globally distributed pathogenic fungus with a wide host range; its hosts include not only plants, but also humans and animals. This study aimed to develop genomic resources for A. phaeospermum to provide solid data and a theoretical basis for further studies of its pathogenesis, transcriptomics, proteomics, metabolomics and RNA genomics. The genome was obtained from the mycelia of the strain AP-Z13 using a combination of analyses with the high-throughput Illumina HiSeq 4000 system and PacBio RSII LongRead sequencing platform. Functional annotation was performed by BLASTing protein sequences against those in different publicly available databases to obtain their corresponding annotations. The genome is 48.45?Mb in size, with an N90 scaffold size of 1,931,147?bp, and encodes 19,836 putative predicted genes. This is the first report of the genome-scale assembly and annotation for A. phaeospermum, the first species in the genus Arthrinium to be subjected to whole genome sequencing. Copyright © 2019 Elsevier Inc. All rights reserved.
Puccinia novopanici is an important biotrophic fungal pathogen that causes rust disease in switchgrass. Lack of genomic resources for P. novopanici has hampered the progress towards developing effective disease resistance against this pathogen. Therefore, we have sequenced the whole genome of P. novopanici and generated a framework to understand pathogenicity mechanisms, identify effectors, repeat element invasion, genome evolution, and comparative genomics among Puccinia species in the future. Long and short read sequences were generated from P. novopanici genomic DNA by PacBio and Illumina technologies, respectively, and assembled a 99.9 megabase (Mb) genome. Transcripts of P. novopanici were predicted from assembled genome using MAKER and were further validated by RNAseq data. The genome sequence information of P. novopanici will be a valuable resource for researchers working on monocot rusts and plant disease resistance in general.
The genomes of polyextremophilic Cyanidiales contain 1% horizontally transferred genes with diverse adaptive functions.
The role and extent of horizontal gene transfer (HGT) in eukaryotes are hotly disputed topics that impact our understanding of the origin of metabolic processes and the role of organelles in cellular evolution. We addressed this issue by analyzing 10 novel Cyanidiales genomes and determined that 1% of their gene inventory is HGT-derived. Numerous HGT candidates share a close phylogenetic relationship with prokaryotes that live in similar habitats as the Cyanidiales and encode functions related to polyextremophily. HGT candidates differ from native genes in GC-content, number of splice sites, and gene expression. HGT candidates are more prone to loss, which may explain the absence of a eukaryotic pan-genome. Therefore, the lack of a pan-genome and cumulative effects fail to provide substantive arguments against our hypothesis of recurring HGT followed by differential loss in eukaryotes. The maintenance of 1% HGTs, even under selection for genome reduction, underlines the importance of non-endosymbiosis related foreign gene acquisition. © 2019, Rossoni et al.
Complete Genome Sequence of Leisingera aquamixtae R2C4, Isolated from a Self-Regenerating Biocathode Consortium.
Here, we present the complete genome sequence of Leisingera aquamixtae R2C4, isolated from the electroautotrophic microbial consortium biocathode MCL (Marinobacter-Chromatiaceae-Labrenzia). As an isolate of a current-producing system, the genome sequence of L. aquamixtae will yield insights regarding electrode-associated microorganisms and communities. A dark pigment is also observed during cultivation.Copyright © 2019 Bird et al.
Draft Genome Sequence of Streptomyces sp. Strain RKND-216, an Antibiotic Producer Isolated from Marine Sediment in Prince Edward Island, Canada.
Streptomyces sp. strain RKND-216 was isolated from marine sediment collected in Prince Edward Island, Canada, and produces a putatively novel bioactive natural product with antitubercular activity. The genome assembly consists of two contigs covering 5.61?Mb. Genome annotation identified 4,618 predicted protein-coding sequences and 19 predicted natural product biosynthetic gene clusters.Copyright © 2019 Liang et al.
Finished Genome Sequence of the Indole-3-Acetic Acid-Catabolizing Bacterium Pseudomonas putida 1290.
Use of indole-3-acetic acid (IAA) as a carbon, nitrogen, and energy source by Pseudomonas putida 1290 is linked to the possession of a gene cluster that codes for conversion to catechol. Here, we present the genomic context of this iac gene cluster, which includes genes for IAA chemotaxis/transport and catechol catabolism.Copyright © 2019 Laird and Leveau.
Methylomes of Two Extremely Halophilic Archaea Species, Haloarcula marismortui and Haloferax mediterranei.
The genomes of two extremely halophilic Archaea species, Haloarcula marismortui and Haloferax mediterranei, were sequenced using single-molecule real-time sequencing. The ~4-Mbp genomes are GC rich with multiple large plasmids and two 4-methyl-cytosine patterns. Methyl transferases were incorporated into the Restriction Enzymes Database (REBASE), and gene annotation was incorporated into the Haloarchaeal Genomes Database (HaloWeb).Copyright © 2019 DasSarma et al.
Draft Genome Sequence of Alteromonas sp. Strain RKMC-009, Isolated from Xestospongia muta via In Situ Culturing Using an Isolation Chip Diffusion Chamber.
We report the draft whole-genome sequence of Alteromonas sp. strain RKMC-009, which was isolated in situ from the sponge Xestospongia muta in San Salvador, The Bahamas, using an isolation chip (ichip). Automated biosynthetic gene cluster analysis using antiSMASH 4.0 predicted the presence of 22 biosynthetic gene clusters.Copyright © 2019 MacIntyre et al.
Deinococcus wulumuqiensis 479 (formerly known as Deinococcus radiodurans 479) is the original source strain for the restriction enzyme DrdI. Its complete sequence and full methylome were determined using Pacific Biosciences single-molecule real-time (SMRT) sequencing. Copyright © 2019 Fomenkov et al.
Genomic and transcriptomic characterization of Pseudomonas aeruginosa small colony variants derived from a chronic infection model.
Phenotypic change is a hallmark of bacterial adaptation during chronic infection. In the case of chronic Pseudomonas aeruginosa lung infection in patients with cystic fibrosis, well-characterized phenotypic variants include mucoid and small colony variants (SCVs). It has previously been shown that SCVs can be reproducibly isolated from the murine lung following the establishment of chronic infection with mucoid P. aeruginosa strain NH57388A. Using a combination of single-molecule real-time (PacBio) and Illumina sequencing we identify a large genomic inversion in the SCV through recombination between homologous regions of two rRNA operons and an associated truncation of one of the 16S rRNA genes and suggest this may be the genetic switch for conversion to the SCV phenotype. This phenotypic conversion is associated with large-scale transcriptional changes distributed throughout the genome. This global rewiring of the cellular transcriptomic output results in changes to normally differentially regulated genes that modulate resistance to oxidative stress, central metabolism and virulence. These changes are of clinical relevance because the appearance of SCVs during chronic infection is associated with declining lung function.