June 1, 2021  |  

Complete HIV-1 genomes from single molecules: Diversity estimates in two linked transmission pairs using clustering and mutual information.

We sequenced complete HIV-1 genomes from single molecules using Single Molecule, Real- Time (SMRT) Sequencing and derive de novo full-length genome sequences. SMRT sequencing yields long-read sequencing results from individual DNA molecules with a rapid time-to-result. These attributes make it a useful tool for continuous monitoring of viral populations. The single-molecule nature of the sequencing method allows us to estimate variant subspecies and relative abundances by counting methods. We detail mathematical techniques used in viral variant subspecies identification including clustering distance metrics and mutual information. Sequencing was performed in order to better understand the relationships between the specific sequences of transmitted viruses in linked transmission pairs. Samples representing HIV transmission pairs were selected from the Zambia Emory HIV Research Project (Lusaka, Zambia) and sequenced. We examine Single Genome Amplification (SGA) prepped samples and samples containing complex mixtures of genomes. Whole genome consensus estimates for each of the samples were made. Genome reads were clustered using a simple distance metric on aligned reads. Appropriate thresholds were chosen to yield distinct clusters of HIV genomes within samples. Mutual information between columns in the genome alignments was used to measure dependence. In silico mixtures of reads from the SGA samples were made to simulate samples containing exactly controlled complex mixtures of genomes and our clustering methods were applied to these complex mixtures. SMRT Sequencing data contained multiple full-length (greater than 9 kb) continuous reads for each sample. Simple whole genome consensus estimates easily identified transmission pairs. The clustering of the genome reads showed diversity differences between the samples, allowing us to characterize the diversity of the individual quasi-species comprising the patient viral populations across the full genome. Mutual information identified possible dependencies of different positions across the full HIV-1 genome. The SGA consensus genomes agreed with prior Sanger sequencing. Our clustering methods correctly segregated reads to their correct originating genome for the synthetic SGA mixtures. The results open up the potential for reference-agnostic and cost effective full genome sequencing of HIV-1.


June 1, 2021  |  

Rapid sequencing of HIV-1 genomes as single molecules from simple and complex samples.

Background: To better understand the relationships among HIV-1 viruses in linked transmission pairs, we sequenced several samples representing HIV transmission pairs from the Zambia Emory HIV Research Project (Lusaka, Zambia) using Single Molecule, Real-Time (SMRT) Sequencing. Methods: Single molecules were sequenced as full-length (9.6 kb) amplicons directly from PCR products without shearing. This resulted in multiple, fully-phased, complete HIV-1 genomes for each patient. We examined Single Genome Amplification (SGA) prepped samples, as well as samples containing complex mixtures of genomes. We detail mathematical techniques used in viral variant subspecies identification, including clustering distance metrics and mutual information, which were used to derive multiple de novo full-length genome sequences for each patient. Whole genome consensus estimates for each sample were made. Genome reads were clustered using a simple distance metric on aligned reads. Appropriate thresholds were chosen to yield distinct clusters of HIV-1 genomes within samples. Mutual information between columns in the genome alignments was used to measure dependence. In silico mixtures of reads from the SGA samples were made to simulate samples containing exactly controlled complex mixtures of genomes and our clustering methods were applied to these complex mixtures. Results: SMRT Sequencing data contained multiple full-length (>9 kb) continuous reads for each sample. Simple whole-genome consensus estimates easily identified transmission pairs. Clustering of genome reads showed diversity differences between samples, allowing characterization of the quasi-species diversity comprising the patient viral populations across the full genome. Mutual information identified possible dependencies of different positions across the full HIV-1 genome. The SGA consensus genomes agreed with prior Sanger sequencing. Our clustering methods correctly segregated reads to their correct originating genome for the synthetic SGA mixtures. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules with a rapid time-to-result. These attributes make it a useful tool for continuous monitoring of viral populations. The single-molecule nature of the sequencing method allows us to estimate variant subspecies and relative abundances by counting methods. The results open up the potential for reference-agnostic and cost effective full genome sequencing of HIV-1.


June 1, 2021  |  

Genome analysis of a bacterium that causes lameness.

Lameness is a significant problem resulting in millions of dollars in lost revenue annually. In commercial broilers, the most common cause of lameness is bacterial chondronecrosis with osteomyelitis (BCO). We are using a wire flooring model to induce lameness attributable to BCO. We used 16S ribosomal DNA sequencing to determine that Staphylococcus spp. were the main species associated with BCO. Staphylococcus agnetis, which previously had not been isolated from poultry, was the principal species isolated from the majority of the bone lesion samples. Administering S. agnetis in the drinking water to broilers reared on wire flooring increased the incidence of BCO three-fold when compared with broilers drinking tap water (P = 0.001). We found that the minimum effective dose of Staphylococcus agnetis to induce BCO in broilers grown on wire flooring experiment is 105 cfu/ml. We used PacBio and Illumina sequencing to assemble a 2.4 Mbp contig representing the genome and a 34 kbp contig for the largest plasmid of S. agnetis. Annotation of this genome is underway through comparative genomics with other Staphylococcus genomes, and identification of virulence factors. Our goal is to elucidate genetic diversity, toxins, and pathogenicity determinants, for this poorly characterized species. Isolating pathogenic bacterial species, defining their likely route of transmission to broilers, and genomic analyses will contribute substantially to the development of measures for mitigating BCO losses in poultry.


June 1, 2021  |  

Long-read assembly of the Aedes aegypti Aag2 cell line genome resolves ancient endogenous viral elements

Transmission of arboviruses such as Dengue Virus by Aedes aegypti causes debilitating disease across the globe. Disease in humans can include severe acute symptoms such as hemorrhagic fever and organ failure, but mosquitoes tolerate high titers of virus in a persistent infection. The mechanisms responsible for this viral tolerance are unclear. Recent publications highlighted the integration of genetic material from non-retroviral RNA viruses into the genome of the host during infection that relies upon endogenous retro-transcriptase activity from transposons. These endogenous viral elements (EVEs) found in the genome are predicted to be ancient, and at least some EVEs are under purifying selection, suggesting they are beneficial to the host. To characterize EVE biogenesis in a tractable system, we sequenced the Ae. aegypti cell line, Aag2, to 58-fold coverage and present a de novo assembly of the genome. The assembly contains 1.7 Gb of genomic and 255 Mb of alternative haplotype specific sequence, consisting of contigs with a N50 of 1.4 Mb; a value that, when compared with other assemblies of the Aedes genus, is from 1-3 orders of magnitude longer. The Aag2 genome is highly repetitive (70%), most of which is classified as transposable elements (60%). We identify EVEs in the genome homologous to a range of extant viruses, many of which cluster in these regions of repetitive DNA. The contiguous assembly allows for more comprehensive identification of the transposable elements and EVEs that are most likely to be lost in assemblies lacking the read length of SMRT Sequencing.


June 1, 2021  |  

Long-read assembly of the Aedes aegypti Aag2 cell line genome resolves ancient endogenous viral elements

Transmission of arboviruses such as Dengue and Zika viruses by Aedes aegypti causes widespread and debilitating disease across the globe. Disease in humans can include severe acute symptoms such as hemorrhagic fever, organ failure, and encephalitis; and yet, mosquitoes tolerate high titers of virus in a persistent infection. The mechanisms responsible for tolerance to viral infection in mosquitoes are still unclear. Recent publications have highlighted the integration of genetic material from non-retroviral RNA viruses into the genome of the host during infection that relies upon endogenous retro-transcriptase activity from transposons. These endogenous viral elements (EVEs) found in the genome are predicted to be ancient and at least some EVEs are under purifying selection, which suggests that they are beneficial to the host. In order characterize EVE biogenesis in a tractable system we sequenced the Ae. aegypti cell line, Aag2, to 58X coverage and here present a de novo assembly of the genome. The assembly consists of 1.7 Gb of genomic and 255 Mb of alternative haplotype specific sequence, made up of contigs with a N50 of 1.4 Mb; a value that, when compared with other assemblies of the Aedes genus, is from 1-3 orders of magnitude longer. The Aag2 genome is highly repetitive (70%), most of which is classified as transposable elements (60%). We identify a plethora of EVEs in the genome homologous to a diverse range of extant viruses, many of which cluster in these regions of highly repetitive DNA. The highly contiguous nature of this assembly allows for a more comprehensive identification of the transposable elements and EVEs that are most likely to be lost in assemblies lacking the read length of SMRT Sequencing. Transmission of arboviruses such as Dengue Virus by Aedes aegypti causes widespread and debilitating disease across the globe. Disease in humans can include severe acute symptoms such as hemorrhagic fever, organ failure, and encephalitis; and yet, mosquitoes tolerate high titers of virus in a persistent infection. The mechanisms responsible for tolerance to viral infection in mosquitoes are still unclear. Recent publications have highlighted the integration of genetic material from non-retroviral RNA viruses into the genome of the host during infection that relies upon endogenous retro-transcriptase activity from transposons. These endogenous viral elements (EVEs) found in the genome are predicted to be ancient and at least some EVEs are under purifying selection, which suggests that they are beneficial to the host. In order characterize EVE biogenesis in a tractable system we sequenced the Ae. aegypti cell line, Aag2, to 58X coverage and here present a de novo assembly of the genome. The assembly consists of 1.7 Gb of genomic and 255 Mb of alternative haplotype specific sequence, made up of contigs with a N50 of 1.4 Mb; a value that, when compared with other assemblies of the Aedes genus, is from 1-3 orders of magnitude longer. The Aag2 genome is highly repetitive (70%), most of which is classified as transposable elements (60%). We identify a plethora of EVEs in the genome homologous to a diverse range of extant viruses, many of which cluster in these regions of highly repetitive DNA. The highly contiguous nature of this assembly allows for a more comprehensive identification of the transposable elements and EVEs that are most likely to be lost in assemblies lacking the read length of SMRT Sequencing. Transmission of arboviruses such as Dengue Virus by Aedes aegypti causes widespread and debilitating disease across the globe. Disease in humans can include severe acute symptoms such as hemorrhagic fever, organ failure, and encephalitis; and yet, mosquitoes tolerate high titers of virus in a persistent infection. The mechanisms responsible for tolerance to viral infection in mosquitoes are still unclear.


June 1, 2021  |  

Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome using long-read sequencing

Sequence-based estimation of genetic diversity of Plasmodium falciparum, the most lethal malarial parasite, has proved challenging due to a lack of a complete genomic assembly. The skewed AT-richness (~80.6% (A+T)) of its genome and the lack of technology to assemble highly polymorphic sub-telomeric regions that contain clonally variant, multigene virulence families (i.e. var and rifin) have confounded attempts using short-read NGS technologies. Using single molecule, real-time (SMRT) sequencing, we successfully compiled all 14 nuclear chromosomes of the P. falciparum genome from telomere-to-telomere in single contigs. Specifically, amplification-free sequencing generated reads of average length 12 kb, with =50% of the reads between 15.5 and 50 kb in length. A hierarchical genome assembly process (HGAP), was used to assemble the P. falciparum genome de novo. This assembly accurately resolved centromeres (~90-99% (A+T)) and sub-telomeric regions, and identified large insertions and duplications in the genome that added extra genes to the var and rifin virulence families, along with smaller structural variants such as homopolymer tract expansions. These regions can be used as markers for genetic diversity during comparative genome analyses. Moreover, identifying the polymorphic and repetitive sub-telomeric sequences of parasite populations from endemic areas might inform the link between structural variation and phenotypes such as virulence, drug resistance and disease transmission.


June 1, 2021  |  

High-quality human genomes achieved through HiFi sequence data and FALCON-Unzip assembly

De novo assemblies of human genomes from accurate (85-90%), continuous long reads (CLR) now approach the human reference genome in contiguity, but the assembly base pair accuracy is typically below QV40 (99.99%), an order-of-magnitude lower than the standard for finished references. The base pair errors complicate downstream interpretation, particularly false positive indels that lead to false gene loss through frameshifts. PacBio HiFi sequence data, which are both long (>10 kb) and very accurate (>99.9%) at the individual sequence read level, enable a new paradigm in human genome assembly. Haploid human assemblies using HiFi data achieve similar contiguity to those using CLR data and are highly accurate at the base level1. Furthermore, HiFi assemblies resolve more high-identity sequences such as segmental duplications2. To enable HiFi assembly in diploid human samples, we have extended the FALCON-Unzip assembler to work directly with HiFi reads. Here we present phased human diploid genome assemblies from HiFi sequencing of HG002, HG005, and the Vertebrate Genome Project (VGP) mHomSap1 trio on the PacBio Sequel II System. The HiFi assemblies all exceed the VGP’s quality guidelines, approaching QV50 (99.999%) accuracy. For HG002, 60% of the genome was haplotype-resolved, with phase-block N50 of 143Kbp and phasing accuracy of 99.6%. The overall mean base accuracy of the assembly was QV49.7. In conclusion, HiFi data show great promise towards complete, contiguous, and accurate diploid human assemblies.


February 5, 2021  |  

Video: Overview of SMRT technology

PacBio’s SMRT technology harnesses the natural process of DNA replication, which is a highly efficient and accurate process. Our SMRT technology enables the observation of DNA synthesis as it occurs…


April 21, 2020  |  

Tracking short-term changes in the genetic diversity and antimicrobial resistance of OXA-232-producing Klebsiella pneumoniae ST14 in clinical settings.

To track stepwise changes in genetic diversity and antimicrobial resistance in rapidly evolving OXA-232-producing Klebsiella pneumoniae ST14, an emerging carbapenem-resistant high-risk clone, in clinical settings.Twenty-six K. pneumoniae ST14 isolates were collected by the Korean Nationwide Surveillance of Antimicrobial Resistance system over the course of 1 year. Isolates were subjected to whole-genome sequencing and MIC determinations using 33 antibiotics from 14 classes.Single-nucleotide polymorphism (SNP) typing identified 72 unique SNP sites spanning the chromosomes of the isolates, dividing them into three clusters (I, II and III). The initial isolate possessed two plasmids with 18 antibiotic-resistance genes, including blaOXA-232, and exhibited resistance to 11 antibiotic classes. Four other plasmids containing 12 different resistance genes, including blaCTX-M-15 and strA/B, were introduced over time, providing additional resistance to aztreonam and streptomycin. Moreover, chromosomal integration of insertion sequence Ecp1-blaCTX-M-15 mediated the inactivation of mgrB responsible for colistin resistance in four isolates from cluster III. To the best of our knowledge, this is the first description of K. pneumoniae ST14 resistant to both carbapenem and colistin in South Korea. Furthermore, although some acquired genes were lost over time, the retention of 12 resistance genes and inactivation of mgrB provided resistance to 13 classes of antibiotics.We describe stepwise changes in OXA-232-producing K. pneumoniae ST14 in vivo over time in terms of antimicrobial resistance. Our findings contribute to our understanding of the evolution of emerging high-risk K. pneumoniae clones and provide reference data for future outbreaks.Copyright © 2019 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.