Background: HIV-1 proviruses in peripheral blood mononuclear cells (PBMCs) are felt to be an important reservoir of HIV-1 infection. Given that this pool represents an archival library, it can be used to study virus evolution and CD4+ T cell survival. Accurate study of this pool is burdened by difficulties encountered in sequencing a full-length proviral genome, typically accomplished by assembling overlapping pieces and imputing the full genome. Methodology: Cryopreserved PBMCs collected from a total of 8 HIV+ patients from 1997-2001 were used for genomic DNA extraction. Patients had been receiving cART for 2-8 years at the time samples were obtained. 7 patients had pVL >50 copies/mL (mean: 312,282, range: 18,372-683,400) and 1 had pVL <50. Genomic DNA was subjected to limiting dilution prior to amplification of near-full-length genomes by a newly developed nested PCR. The predicted size of the PCR product was 9.0 kb, spanning from the 5’ LTR through the 3’ LTR. Single molecules were sequenced as near-full-length amplicons directly from PCR products without shearing using commercially available P4-C2 reagents and standard protocols on a PacBio RS II instrument. Quality of the genomes was validated by clonal positive controls and synthetic mixtures. Results: Near-full-length provirus genome sequences were successfully obtained from all 8 patients as continuous long reads from single molecules. PacBio sequencing required approximately 10% of the PCR product needed for Sanger sequencing and generated 325 MB per 3-hour run including 1,800 full-length intact genome reads on average. One patient’s sample was not at a limiting dilution and analysis revealed multiple subspecies. For 8 near-fulllength provirus genomes derived from the other 7 patients, large internal deletions were noted in 2 proviruses; APOBEC-mediated hypermutations were seen in 2 proviruses; and 4 proviruses appeared to be intact genomes. All of the defective proviruses showed a complete absence of resistance mutations in either RT or protease, even after 2-8 years of cART. On the contrary, all of the intact proviruses contained evidence of ART-resistance associated mutations suggesting that they represented relatively recent variants. Conclusions: Combining a novel protocol for full-length limiting dilution amplification of proviruses with PacBio SMRT sequencing allowed for the generation of near-full-length genomes with good quality and an ability to detect minor variants at the 1-10% level. Preliminary data analyses suggest that defective proviruses may represent archival variants that persist long-term in host cells, while intact proviruses within the PBMC pool showing evidence of active virus replication may represent more recent variants.
Complete resequencing of extended genomic regions using fosmid target capture and single molecule real-time (SMRT) long read sequencing technology.
A longstanding goal of genomic analysis is the identification of causal genetic factors contributing to disease. While the common disease/common variant hypothesis has been tested in many genome-wide association studies, few advancements in identifying causal variation have been realized, and instead recent findings point away from common variants towards aggregate rare variants as causal. A challenge is obtaining complete phased genomic sequences over extended genomic regions from sufficient numbers of cases and controls to identify all potential variation causal of a disease. To address this, we modified methods for targeted DNA isolation using fosmid technology and single-molecule, long-sequence-read generaton that combine for complete, haplotype-resolved resequencing across extended genomic subregions. As proof of principal, we validated the approach by resequencing four 800 kbp segments that span a major histocompatibility complex (MHC) common extended haplotype (CEH) associated with disease. The data revealed the extent of conservation exposing a near identity among four DR4 CEHs over conserved regions, detailing rare variation and measuring sequence accuracy. In a second test, we sequenced the complete KIR haplotypes from 8 individuals within a specific timeframe and cost. Single molecule long-read sequencing technology generated contiguous full-length fosmid sequences of 30 to 40 kb in a single read, allowing assembly of resolved haplotypes with very little data processing. All of the sequences produced from these projects were contiguous, phased, with accuracy above 99.99%. The results demonstrated that cost-effective scale-up is possible to generate scores to hundreds of phased chromosomal sequences of extended lengths that can encompass genomic regions associated with disease.
Mitochondrial DNA (mtDNA) is a compact, double-stranded circular genome of 16,569 bp with a cytosine-rich light (L) chain and a guanine-rich heavy (H) chain. mtDNA mutations have been increasingly recognized as important contributors to an array of human diseases such as Parkinson’s disease, Alzheimer’s disease, colorectal cancer and Kearns–Sayre syndrome. mtDNA mutations can affect all of the 1000-10,000 copies of the mitochondrial genome present in a cell (homoplasmic mutation) or only a subset of copies (heteroplasmic mutation). The ratio of normal to mutant mtDNAs within cells is a significant factor in whether mutations will result in disease, as well as the clinical presentation, penetrance, and severity of the phenotype. Over time, heteroplasmic mutations can become homoplastic due to differential replication and random assortment. Full characterization of the mitochondrial genome would involve detection of not only homoplastic but heteroplasmic mutations, as well as complete phasing. Previously, we sequenced human mtDNA on the PacBio RS II System with two partially overlapping amplicons. Here, we present amplification-free, full-length sequencing of linearized mtDNA using the Sequel System. Full-length sequencing allows variant phasing along the entire mitochondrial genome, identification of heteroplasmic variants, and detection of epigenetic modifications that are lost in amplicon-based methods.
High-quality insect genomes are essential resources to understand insect biology and to combat them as disease vectors and agricultural pests. It is desirable to sequence a single individual for a reference genome to avoid complications from multiple alleles during de novo assembly. However, the small body size of many insects poses a challenge for the use of long-read sequencing technologies which often have high DNA-input requirements. The previously described PacBio Low DNA Input Protocol starts with ~100 ng of DNA and allows for high-quality assemblies of single mosquitoes among others and represents a significant step in reducing such requirements. Here, we describe a new library protocol with a further 20-fold reduction in the DNA input quantity. Starting with just 5 ng of high molecular weight DNA, we describe the successful sequencing and de novo genome assembly of a single male sandfly (Phlebotomus papatasi, the main vector of the Old World cutaneous leishmaniasis), using HiFi data generated on the PacBio Sequel II System and assembled with FALCON. The assembly shows a high degree of completeness (>97% of BUSCO genes are complete), contiguity (contig N50 of 1 Mb), and sequence accuracy (>98% of BUSCO genes without frameshift errors). This workflow has general utility for small-bodied insects and other plant and animal species for both focused research studies or in conjunction with large-scale genome projects.
User Group Meeting: High-throughput cell lysis and nucleic acid extraction powered by AFA technology
In this PacBio User Group Meeting presentation, Eugenio Daviso from Covaris talks about the use of adaptive focused acoustics for gentle cell lysis and extraction of high molecular weight DNA.
User Group Meeting: Getting a read on one little worm with PacBio’s low DNA input workflow and the Agilent Femto Pulse System
In this PacBio User Group Meeting presentation, Erin Bernberg from the University of Delaware reports on using the Agilent Femto Pulse System for high-resolution, highly sensitive fragment analysis and on…
PAG Conference: The impact of highly accurate PacBio sequence data on the assembly of a tetraploid rose
In this presentation at PAG 2020, Bart Nijland of Genetwister Technologies explains how his team set out to make a haplotype-aware assembly of the highly complex tetraploid Rosa x hybrida…
In this ASHG 2020 PacBio Workshop Emily Farrow of Children’s Mercy Kansas City, shares how the incorporation of long-read sequencing into the Genomic Answers for Kids research study is increasing…
Biochemical characterization of a novel cold-adapted agarotetraose-producing a-agarase, AgaWS5, from Catenovulum sediminis WS1-A.
Although many ß-agarases that hydrolyze the ß-1,4 linkages of agarose have been biochemically characterized, only three a-agarases that hydrolyze the a-1,3 linkages are reported to date. In this study, a new a-agarase, AgaWS5, from Catenovulum sediminis WS1-A, a new agar-degrading marine bacterium, was biochemically characterized. AgaWS5 belongs to the glycoside hydrolase (GH) 96 family. AgaWS5 consists of 1295 amino acids (140 kDa) and has the 65% identity to an a-agarase, AgaA33, obtained from an agar-degrading bacterium Thalassomonas agarivorans JAMB-A33. AgaWS5 showed the maximum activity at a pH and temperature of 8 and 40 °C, respectively. AgaWS5 showed a cold-tolerance, and it retained more than 40% of its maximum enzymatic activity at 10 °C. AgaWS5 is predicted to have several calcium-binding sites. Thus, its activity was slightly enhanced in the presence of Ca2+, and was strongly inhibited by EDTA. The Km and Vmax of AgaWS5 for agarose were 10.6 mg/mL and 714.3 U/mg, respectively. Agarose-liquefication, thin layer chromatography, and mass and NMR spectroscopic analyses demonstrated that AgaWS5 is an endo-type a-agarase that degrades agarose and mainly produces agarotetraose. Thus, in this study, a novel cold-adapted GH96 agarotetraose-producing a-agarase was identified.
Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
Insights into the bacterial species and communities of a full-scale anaerobic/anoxic/oxic wastewater treatment plant by using third-generation sequencing.
For the first time, full-length 16S rRNA sequencing method was applied to disclose the bacterial species and communities of a full-scale wastewater treatment plant using an anaerobic/anoxic/oxic (A/A/O) process in Wuhan, China. The compositions of the bacteria at phylum and class levels in the activated sludge were similar to which revealed by Illumina Miseq sequencing. At genus and species levels, third-generation sequencing showed great merits and accuracy. Typical functional taxa classified to ammonia-oxidizing bacteria (AOB), nitrite-oxidizing bacteria (NOB), denitrifying bacteria (DB), anaerobic ammonium oxidation bacteria (ANAMMOXB) and polyphosphate-accumulating organisms (PAOs) were presented, which were Nitrosomonas (1.11%), Nitrospira (3.56%), Pseudomonas (3.88%), Planctomycetes (13.80%), Comamonadaceae (1.83%), respectively. Pseudomonas (3.88%) and Nitrospira (3.56%) were the most predominating two genera, mainly containing Pseudomonas extremaustralis (1.69%), Nitrospira defluvii (3.13%), respectively. Bacteria regarding to nitrogen and phosphorus removal at species level were put forward. The predicted functions proved that the A/A/O process was efficient regarding nitrogen and organics removal. Copyright © 2019 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Complete genome sequence of Paenisporosarcina antarctica CGMCC 1.6503 T, a marine psychrophilic bacterium isolated from Antarctica
A marine psychrophilic bacterium _Paenisporosarcina antarctica_ CGMCC 1.6503T (= JCM 14646T) was isolated off King George Island, Antarctica (62°13’31? S 58°57’08? W). In this study, we report the complete genome sequence of _Paenisporosarcina antarctica_, which is comprised of 3,972,524?bp with a mean G?+?C content of 37.0%. By gene function and metabolic pathway analyses, studies showed that strain CGMCC 1.6503T encodes a series of genes related to cold adaptation, including encoding fatty acid desaturases, dioxygenases, antifreeze proteins and cold shock proteins, and possesses several two-component regulatory systems, which could assist this strain in responding to the cold stress, the oxygen stress and the osmotic stress in Antarctica. The complete genome sequence of _P. antarctica_ may provide further insights into the genetic mechanism of cold adaptation for Antarctic marine bacteria.
Ilyonectria mors-panacis is a serious disease hampering the production of Panax notoginseng, an important Chinese medicinal herb, widely used for its anti-inflammatory, anti-fatigue, hepato-protective, and coronary heart disease prevention effects. Here, we report the first Illumina-Pacbio hybrid sequenced draft genome assembly of I. mors-panacis strain G3B and its annotation. The availability of this genome sequence not only represents an important tool toward understanding the genetics behind the infection mechanism of I. mors-panacis strain G3B but also will help illuminate the complexities of the taxonomy of this species.
Chemical defense against predators is widespread in natural ecosystems. Occasionally, taxonomically distant organisms share the same defense chemical. Here, we describe an unusual tripartite marine symbiosis, in which an intracellular bacterial symbiont (“Candidatus Endobryopsis kahalalidefaciens”) uses a diverse array of biosynthetic enzymes to convert simple substrates into a library of complex molecules (the kahalalides) for chemical defense of the host, the alga Bryopsis sp., against predation. The kahalalides are subsequently hijacked by a third partner, the herbivorous mollusk Elysia rufescens, and employed similarly for defense. “Ca E. kahalalidefaciens” has lost many essential traits for free living and acts as a factory for kahalalide production. This interaction between a bacterium, an alga, and an animal highlights the importance of chemical defense in the evolution of complex symbioses.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.