Allele-level resolution data at primary HLA typing is the ideal for most histocompatibility testing laboratories. Many high-throughput molecular HLA typing approaches are unable to determine the phase of observed DNA sequence polymorphisms, leading to ambiguous results. The use of higher resolution methods is often restricted due to cost and time limitations. Here we report on the feasibility of using Pacific Biosciences’ Single Molecule Real-Time (SMRT) DNA sequencing technology for high-resolution and high-throughput HLA typing. Seven DNA samples were typed for HLA-A, -B and -C. The results showed that SMRT DNA sequencing technology was able to generate sequences that spanned entire HLA Class I genes that allowed for accurate allele calling. Eight novel genomic HLA class I sequences were identified, four were novel alleles, three were confirmed as genomic sequence extensions and one corrected an existing genomic reference sequence. This method has the potential to revolutionize the field of HLA typing. The clinical impact of achieving this level of resolution HLA typing data is likely to considerable, particularly in applications such as organ and blood stem cell transplantation where matching donors and recipients for their HLA is of utmost importance.
Single-molecule real-time (SMRT) sequencing generates much longer reads than other widely used next-generation (next-gen) sequencing methods, but its application to whole genome/exome analysis has been limited. Here, we describe the use of SMRT sequencing coupled with barcoding to simultaneously analyze one or a small number of genomic targets derived from multiple sources. In the budding yeast system, SMRT sequencing was used to analyze strand-exchange intermediates generated during mitotic recombination and to analyze genetic changes in a forward mutation assay. The general barcoding-SMRT approach was then extended to diffuse large B-cell lymphoma primary tumors and cell lines, where detected changes agreed with prior Illumina exome sequencing. A distinct advantage afforded by SMRT sequencing over other next-gen methods is that it immediately provides the linkage relationships between SNPs in the target segment sequenced. The strength of our approach for mutation/recombination studies (as well as linkage identification) derives from its inherent computational simplicity coupled with a lack of reliance on sophisticated statistical analyses. Copyright © 2015 Guo et al.
The CYP2D6 enzyme metabolizes ~25% of common medications, yet homologous pseudogenes and copy-number variants (CNVs) make interrogating the polymorphic CYP2D6 gene with short-read sequencing challenging. Therefore, we developed a novel long-read, full gene CYP2D6 single-molecule real-time (SMRT) sequencing method using the Pacific Biosciences platform. Long-range PCR and CYP2D6 SMRT sequencing of 10 previously genotyped controls identified expected star (*) alleles, but also enabled suballele resolution, diplotype refinement, and discovery of novel alleles. Coupled with an optimized variant calling pipeline, CYP2D6 SMRT sequencing was highly reproducible as triplicate intra- and inter-run non-reference genotype results were completely concordant. Importantly, targeted SMRT sequencing of upstream and downstream CYP2D6 gene copies characterized the duplicated allele in 15 control samples with CYP2D6 CNVs. The utility of CYP2D6 SMRT sequencing was further underscored by identifying the diplotypes of 14 samples with discordant or unclear CYP2D6 configurations from previous targeted genotyping, which again included suballele resolution, duplicated allele characterization, and discovery of a novel allele and tandem arrangement (CYP2D6*36+*41). Taken together, long-read CYP2D6 SMRT sequencing is an innovative, reproducible, and validated method for full-gene characterization, duplication allele-specific analysis and novel allele discovery, which will likely improve CYP2D6 metabolizer phenotype prediction for both research and clinical testing applications. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.
The power of Single Molecule Real-Time sequencing technology in the de novo assembly of a eukaryotic genome.
Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.
DNA methylation assessed by SMRT Sequencing is linked to mutations in Neisseria meningitidis isolates.
The Gram-negative bacterium Neisseria meningitidis features extensive genetic variability. To present, proposed virulence genotypes are also detected in isolates from asymptomatic carriers, indicating more complex mechanisms underlying variable colonization modes of N. meningitidis. We applied the Single Molecule, Real-Time (SMRT) sequencing method from Pacific Biosciences to assess the genome-wide DNA modification profiles of two genetically related N. meningitidis strains, both of serogroup A. The resulting DNA methylomes revealed clear divergences, represented by the detection of shared and of strain-specific DNA methylation target motifs. The positional distribution of these methylated target sites within the genomic sequences displayed clear biases, which suggest a functional role of DNA methylation related to the regulation of genes. DNA methylation in N. meningitidis has a likely underestimated potential for variability, as evidenced by a careful analysis of the ORF status of a panel of confirmed and predicted DNA methyltransferase genes in an extended collection of N. meningitidis strains of serogroup A. Based on high coverage short sequence reads, we find phase variability as a major contributor to the variability in DNA methylation. Taking into account the phase variable loci, the inferred functional status of DNA methyltransferase genes matched the observed methylation profiles. Towards an elucidation of presently incompletely characterized functional consequences of DNA methylation in N. meningitidis, we reveal a prominent colocalization of methylated bases with Single Nucleotide Polymorphisms (SNPs) detected within our genomic sequence collection. As a novel observation we report increased mutability also at 6mA methylated nucleotides, complementing mutational hotspots previously described at 5mC methylated nucleotides. These findings suggest a more diverse role of DNA methylation and Restriction-Modification (RM) systems in the evolution of prokaryotic genomes.
Antibiotic resistance is an increasingly serious public health threat. Understanding pathways allowing bacteria to survive antibiotic stress may unveil new therapeutic targets. We explore the role of the bacterial epigenome in antibiotic stress survival using classical genetic tools and single-molecule real-time sequencing to characterize genomic methylation kinetics. We find that Escherichia coli survival under antibiotic pressure is severely compromised without adenine methylation at GATC sites. Although the adenine methylome remains stable during drug stress, without GATC methylation, methyl-dependent mismatch repair (MMR) is deleterious and, fueled by the drug-induced error-prone polymerase Pol IV, overwhelms cells with toxic DNA breaks. In multiple E. coli strains, including pathogenic and drug-resistant clinical isolates, DNA adenine methyltransferase deficiency potentiates antibiotics from the ß-lactam and quinolone classes. This work indicates that the GATC methylome provides structural support for bacterial survival during antibiotic stress and suggests targeting bacterial DNA methylation as a viable approach to enhancing antibiotic activity.
It has been widely accepted that 5-methylcytosine is the only form of DNA methylation in mammalian genomes. Here we identify N(6)-methyladenine as another form of DNA modification in mouse embryonic stem cells. Alkbh1 encodes a demethylase for N(6)-methyladenine. An increase of N(6)-methyladenine levels in Alkbh1-deficient cells leads to transcriptional silencing. N(6)-methyladenine deposition is inversely correlated with the evolutionary age of LINE-1 transposons; its deposition is strongly enriched at young (<1.5 million years old) but not old (>6 million years old) L1 elements. The deposition of N(6)-methyladenine correlates with epigenetic silencing of such LINE-1 transposons, together with their neighbouring enhancers and genes, thereby resisting the gene activation signals during embryonic stem cell differentiation. As young full-length LINE-1 transposons are strongly enriched on the X chromosome, genes located on the X chromosome are also silenced. Thus, N(6)-methyladenine developed a new role in epigenetic silencing in mammalian evolution distinct from its role in gene activation in other organisms. Our results demonstrate that N(6)-methyladenine constitutes a crucial component of the epigenetic regulation repertoire in mammalian genomes.
Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups.
De novo assembly of human genomes is now a tractable effort due in part to advances in sequencing and mapping technologies. We use PacBio single-molecule, real-time (SMRT) sequencing and BioNano genomic maps to construct the first de novo assembly of NA19240, a Yoruban individual from Africa. This chromosome-scaffolded assembly of 3.08 Gb with a contig N50 of 7.25 Mb and a scaffold N50 of 78.6 Mb represents one of the most contiguous high-quality human genomes. We utilize a BAC library derived from NA19240 DNA and novel haplotype-resolving sequencing technologies and algorithms to characterize regions of complex genomic architecture that are normally lost due to compression to a linear haploid assembly. Our results demonstrate that multiple technologies are still necessary for complete genomic representation, particularly in regions of highly identical segmental duplications. Additionally, we show that diploid assembly has utility in improving the quality of de novo human genome assemblies.
We recently observed a decrease in deoxyribonucleotide (dNTP) pools in HIV-infected individuals on antiretroviral therapy (ART). Alterations in dNTPs result in mutations in mitochondrial DNA (mtDNA) in cell culture and animal models. Therefore, we investigated whether ART is associated with mitochondrial genome sequence variation in peripheral blood mononuclear cells (PBMCs) of HIV-infected treatment-experienced individuals.In this substudy of a case-control study, 71 participants were included: 22 ‘cases’, who were HIV-infected treatment-experienced patients with mitochondrial toxicity, 25 HIV-infected treatment-experienced patients without mitochondrial toxicity, and 24 HIV-uninfected controls. Total DNA was extracted from PBMCs and purified polymerase chain reaction (PCR) products were subjected to third-generation sequencing using the PacBio Single Molecule Real-Time (SMRT) sequencing technology. The sequences were aligned against the revised Cambridge reference sequence for human mitochondrial DNA (NC_012920.1) for detection of variants.We identified a total of 123 novel variants, 39 of them in the coding region. HIV-infected treatment-experienced patients with and without toxicity had significantly higher average numbers of mitochondrial variants per participant than HIV-uninfected controls. We observed a higher burden of mtDNA large-scale deletions in HIV-infected treatment-experienced patients with toxicity compared with HIV-uninfected controls (P = 0.02). The frequency of mtDNA molecules containing a common deletion (mt.d4977) was higher in HIV-infected treatment-experienced patients with toxicity compared with HIV-uninfected controls (P = 0.06). There was no statistically significant difference in mtDNA variants between HIV-infected treatment-experienced patients with and without toxicity.The frequency of mtDNA variants (mutations and large-scale deletions) was higher in HIV-infected treatment-experienced patients with or without ART-induced toxicity than in uninfected controls.© 2016 The Authors. HIV Medicine published by John Wiley & Sons Ltd on behalf of British HIV Association.
IncFIIk plasmid harbouring an amplification of 16S rRNA methyltransferase-encoding gene rmtH associated with mobile element ISCR2.
To investigate the resistance mechanisms and genetic support underlying the high resistance level of the Klebsiella pneumoniae strain CMUL78 to aminoglycoside and ß-lactam antibiotics.Antibiotic susceptibility was assessed by the disc diffusion method and MICs were determined by the microdilution method. Antibiotic resistance genes and their genetic environment were characterized by PCR and Sanger sequencing. Plasmid contents were analysed in the clinical strain and transconjugants obtained by mating-out assays. Complete plasmid sequencing was performed with PacBio and Illumina technology.Strain CMUL78 co-produced the 16S rRNA methyltransferase (RMTase) RmtH, carbapenemase OXA-48 and ESBL SHV-12. The rmtH- and blaSHV-12-encoding genes were harboured by a novel ~115 kb IncFIIk plasmid designated pRmtH, and blaOXA-48 by a ~62 kb IncL/M plasmid related to pOXA-48a. pRmtH plasmid possessed seven different stability modules, one of which is a novel hybrid toxin-antitoxin system. Interestingly, pRmtH plasmid harboured a 4-fold amplification of an rmtH-ISCR2 unit arranged in tandem and inserted within a novel IS26-based composite transposon designated Tn6329.This is the first known report of the 16S RMTase-encoding gene rmtH in a plasmid. The rmtH-ISCR2 unit was inserted in a composite transposon as a 4-fold tandem repeat, a scarcely reported organization.© The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: email@example.com.
CGG repeat-induced FMR1 silencing depends on the expansion size in human iPSCs and neurons carrying unmethylated full mutations.
In fragile X syndrome (FXS), CGG repeat expansion greater than 200 triplets is believed to trigger FMR1 gene silencing and disease etiology. However, FXS siblings have been identified with more than 200 CGGs, termed unmethylated full mutation (UFM) carriers, without gene silencing and disease symptoms. Here, we show that hypomethylation of the FMR1 promoter is maintained in induced pluripotent stem cells (iPSCs) derived from two UFM individuals. However, a subset of iPSC clones with large CGG expansions carries silenced FMR1. Furthermore, we demonstrate de novo silencing upon expansion of the CGG repeat size. FMR1 does not undergo silencing during neuronal differentiation of UFM iPSCs, and expression of large unmethylated CGG repeats has phenotypic consequences resulting in neurodegenerative features. Our data suggest that UFM individuals do not lack the cell-intrinsic ability to silence FMR1 and that inter-individual variability in the CGG repeat size required for silencing exists in the FXS population. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Characterization of hepatitis C virus (HCV) envelope diversification from acute to chronic infection within a sexually transmitted HCV cluster by using single-molecule, real-time sequencing.
In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms.IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations. Copyright © 2017 American Society for Microbiology.
Comprehensive bioinformatics analysis of Mycoplasma pneumoniae genomes to investigate underlying population structure and type-specific determinants.
Mycoplasma pneumoniae is a significant cause of respiratory illness worldwide. Despite a minimal and highly conserved genome, genetic diversity within the species may impact disease. We performed whole genome sequencing (WGS) analysis of 107 M. pneumoniae isolates, including 67 newly sequenced using the Pacific BioSciences RS II and/or Illumina MiSeq sequencing platforms. Comparative genomic analysis of 107 genomes revealed >3,000 single nucleotide polymorphisms (SNPs) in total, including 520 type-specific SNPs. Population structure analysis supported the existence of six distinct subgroups, three within each type. We developed a predictive model to classify an isolate based on whole genome SNPs called against the reference genome into the identified subtypes, obviating the need for genome assembly. This study is the most comprehensive WGS analysis for M. pneumoniae to date, underscoring the power of combining complementary sequencing technologies to overcome difficult-to-sequence regions and highlighting potential differential genomic signatures in M. pneumoniae.
N6-methyldeoxyadenine (6mA) is a noncanonical DNA base modification present at low levels in plant and animal genomes, but its prevalence and association with genome function in other eukaryotic lineages remains poorly understood. Here we report that abundant 6mA is associated with transcriptionally active genes in early-diverging fungal lineages. Using single-molecule long-read sequencing of 16 diverse fungal genomes, we observed that up to 2.8% of all adenines were methylated in early-diverging fungi, far exceeding levels observed in other eukaryotes and more derived fungi. 6mA occurred symmetrically at ApT dinucleotides and was concentrated in dense methylated adenine clusters surrounding the transcriptional start sites of expressed genes; its distribution was inversely correlated with that of 5-methylcytosine. Our results show a striking contrast in the genomic distributions of 6mA and 5-methylcytosine and reinforce a distinct role for 6mA as a gene-expression-associated epigenomic mark in eukaryotes.