Menu
April 21, 2020  |  

Long-read amplicon denoising.

Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called ‘amplicon denoising’, this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

Construction of full-length Japanese reference panel of class I HLA genes with single-molecule, real-time sequencing.

Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.


September 22, 2019  |  

ABC transporter mis-splicing associated with resistance to Bt toxin Cry2Ab in laboratory- and field-selected pink bollworm.

Evolution of pest resistance threatens the benefits of genetically engineered crops that produce Bacillus thuringiensis (Bt) insecticidal proteins. Strategies intended to delay pest resistance are most effective when implemented proactively. Accordingly, researchers have selected for and analyzed resistance to Bt toxins in many laboratory strains of pests before resistance evolves in the field, but the utility of this approach depends on the largely untested assumption that laboratory- and field-selected resistance to Bt toxins are similar. Here we compared the genetic basis of resistance to Bt toxin Cry2Ab, which is widely deployed in transgenic crops, between laboratory- and field-selected populations of the pink bollworm (Pectinophora gossypiella), a global pest of cotton. We discovered that resistance to Cry2Ab is associated with mutations disrupting the same ATP-binding cassette transporter gene (PgABCA2) in a laboratory-selected strain from Arizona, USA, and in field-selected populations from India. The most common mutation, loss of exon 6 caused by alternative splicing, occurred in resistant larvae from both locations. Together with previous data, the results imply that mutations in the same gene confer Bt resistance in laboratory- and field-selected strains and suggest that focusing on ABCA2 genes may help to accelerate progress in monitoring and managing resistance to Cry2Ab.


September 22, 2019  |  

Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics.

Short read massive parallel sequencing has emerged as a standard diagnostic tool in the medical setting. However, short read technologies have inherent limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles. Long read single molecule sequencers resolve these obstacles. Moreover, they offer higher consensus accuracies and can detect epigenetic modifications from native DNA. The first commercially available long read single molecule platform was the RS system based on PacBio’s single molecule real-time (SMRT) sequencing technology, which has since evolved into their RSII and Sequel systems. Here we capsulize how SMRT sequencing is revolutionizing constitutional, reproductive, cancer, microbial and viral genetic testing.© The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.


September 22, 2019  |  

Conventional and single-molecule targeted sequencing method for specific variant detection in IKBKG while bypassing the IKBKGP1 pseudogene.

In addition to Sanger sequencing, next-generation sequencing of gene panels and exomes has emerged as a standard diagnostic tool in many laboratories. However, these captures can miss regions, have poor efficiency, or capture pseudogenes, which hamper proper diagnoses. One such example is the primary immunodeficiency-associated gene IKBKG. Its pseudogene IKBKGP1 makes traditional capture methods aspecific. We therefore developed a long-range PCR method to efficiently target IKBKG, as well as two associated genes (IRAK4 and MYD88), while bypassing the IKBKGP1 pseudogene. Sequencing accuracy was evaluated using both conventional short-read technology and a newer long-read, single-molecule sequencer. Different mapping and variant calling options were evaluated in their capability to bypass the pseudogene using both sequencing platforms. Based on these evaluations, we determined a robust diagnostic application for unambiguous sequencing and variant calling in IKBKG, IRAK4, and MYD88. This method allows rapid identification of selected primary immunodeficiency diseases in patients suffering from life-threatening invasive pyogenic bacterial infections. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.


September 22, 2019  |  

Discovery of gorilla MHC-C expressing C1 ligand for KIR.

In comparison to humans and chimpanzees, gorillas show low diversity at MHC class I genes (Gogo), as reflected by an overall reduced level of allelic variation as well as the absence of a functionally important sequence motif that interacts with killer cell immunoglobulin-like receptors (KIR). Here, we use recently generated large-scale genomic sequence data for a reassessment of allelic diversity at Gogo-C, the gorilla orthologue of HLA-C. Through the combination of long-range amplifications and long-read sequencing technology, we obtained, among the 35 gorillas reanalyzed, three novel full-length genomic sequences including a coding region sequence that has not been previously described. The newly identified Gogo-C*03:01 allele has a divergent recombinant structure that sets it apart from other Gogo-C alleles. Domain-by-domain phylogenetic analysis shows that Gogo-C*03:01 has segments in common with Gogo-B*07, the additional B-like gene that is present on some gorilla MHC haplotypes. Identified in ~ 50% of the gorillas analyzed, the Gogo-C*03:01 allele exclusively encodes the C1 epitope among Gogo-C allotypes, indicating its important function in controlling natural killer cell (NK cell) responses via KIR. We further explored the hypothesis whether gorillas experienced a selective sweep which may have resulted in a general reduction of the gorilla MHC class I repertoire. Our results provide little support for a selective sweep but rather suggest that the overall low Gogo class I diversity can be best explained by drastic demographic changes gorillas experienced in the ancient and recent past.


September 22, 2019  |  

Diversity of hepatitis E virus genotype 3

Summary Hepatitis E virus genotype 3 (HEV-3) can lead to chronic infection in immunocompromised patients, and ribavirin is the treatment of choice. Recently, mutations in the polymerase gene have been associated with ribavirin failure but their frequency before treatment according to HEV-3 subtypes has not been studied on a large data set. We used single-molecule real-time sequencing technology to sequence 115 new complete genomes of HEV-3 infecting French patients. We analyzed phylogenetic relationships, the length of the polyproline region, and mutations in the HEV polymerase gene. Eighty-five (74%) were in the clade HEV-3efg, 28 (24%) in HEV-3chi clade, and 2 (2%) in HEV-3ra clade. Using automated partitioning of maximum likelihood phylogenetic trees, complete genomes were classified into subtypes. Polyproline region length differs within HEV-3 clades (from 189 to 315 nt). Investigating mutations in the polymerase gene, distinct polymorphisms between HEV-3 subtypes were found (G1634R in 95% of HEV-3e, G1634K in 56% of HEV-3ra, and V1479I in all HEV-3efg, clade HEV-3ra, and HEV-3k strains). Subtype-specific polymorphisms in the HEV-3 polymerase have been identified. Our study provides new complete genome sequences of HEV-3 that could be useful for comparing strains circulating in humans and the animal reservoir.


September 22, 2019  |  

An outbreak of a rare Shiga-toxin-producing Escherichia coli serotype (O117:H7) among men who have sex with men.

Sexually transmissible enteric infections (STEIs) are commonly associated with transmission among men who have sex with men (MSM). In the past decade, the UK has experienced multiple parallel STEI emergences in MSM caused by a range of bacterial species of the genus Shigella, and an outbreak of an uncommon serotype (O117?:?H7) of Shiga-toxin-producing Escherichia coli (STEC). Here, we used microbial genomics on 6 outbreak and 30 sporadic STEC O117?:?H7 isolates to explore the origins and pathogenic drivers of the STEC O117?:?H7 emergence in MSM. Using genomic epidemiology, we found that the STEC O117?:?H7 outbreak lineage was potentially imported from Latin America and likely continues to circulate both in the UK MSM population and in Latin America. We found genomic relationships consistent with existing symptomatic evidence for chronic infection with this STEC serotype. Comparative genomic analysis indicated the existence of a novel Shiga toxin 1-encoding prophage in the outbreak isolates, and evidence of horizontal gene exchange among the STEC O117?:?H7 outbreak lineage and other enteric pathogens. There was no evidence of increased virulence in the outbreak strains relative to contextual isolates, but the outbreak lineage was associated with azithromycin resistance. Comparing these findings with similar genomic investigations of emerging MSM-associated Shigella in the UK highlighted many parallels, the most striking of which was the importance of the azithromycin phenotype for STEI emergence in this patient group.


September 22, 2019  |  

Novel type of pilus associated with a Shiga-toxigenic E. coli hybrid pathovar conveys aggregative adherence and bacterial virulence.

A large German outbreak in 2011 was caused by a locus of enterocyte effacement (LEE)-negative enterohemorrhagic E. coli (EHEC) strain of the serotype O104:H4. This strain harbors markers that are characteristic of both EHEC and enteroaggregative E. coli (EAEC), including aggregative adhesion fimbriae (AAF) genes. Such rare EHEC/EAEC hybrids are highly pathogenic due to their possession of a combination of genes promoting severe toxicity and aggregative adhesion. We previously identified novel EHEC/EAEC hybrids and observed that one strain exhibited aggregative adherence but had no AAF genes. In this study, a genome sequence analysis showed that this strain belongs to the genoserotype O23:H8, MLST ST26, and harbors a 5.2?Mb chromosome and three plasmids. One plasmid carries some EAEC marker genes, such as aatA and genes with limited protein homology (11-61%) to those encoding the bundle-forming pilus (BFP) of enteropathogenic E. coli. Due to significant protein homology distance to known pili, we designated these as aggregate-forming pili (AFP)-encoding genes and the respective plasmid as pAFP. The afp operon was arranged similarly to the operon of BFP genes but contained an additional gene, afpA2, which is homologous to afpA. The deletion of the afp operon, afpA, or a nearby gene (afpR) encoding an AraC-like regulator, but not afpA2, led to a loss of pilin production, piliation, bacterial autoaggregation, and importantly, a?>80% reduction in adhesion and cytotoxicity toward epithelial cells. Gene sets similar to the afp operon were identified in a variety of aatA-positive but AAF-negative intestinal pathogenic E. coli. In summary, we characterized widely distributed and novel fimbriae that are essential for aggregative adherence and cytotoxicity in a LEE-negative Shiga-toxigenic hybrid.


July 19, 2019  |  

Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes.

Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing (WES) is hampered by the high polymorphism of the HLA loci, which prevents alignment of sequencing reads to the human reference genome. We describe a computational pipeline that enables accurate inference of germline alleles of class I HLA-A, B and C genes and subsequent detection of mutations in these genes using the inferred alleles as a reference. Analysis of WES data from 7,930 pairs of tumor and healthy tissue from the same patient revealed 298 nonsilent HLA mutations in tumors from 266 patients. These 298 mutations are enriched for likely functional mutations, including putative loss-of-function events. Recurrence of mutations suggested that these ‘hotspot’ sites were positively selected. Cancers with recurrent somatic HLA mutations were associated with upregulation of signatures of cytolytic activity characteristic of tumor infiltration by effector lymphocytes, supporting immune evasion by altered HLA function as a contributory mechanism in cancer.


July 19, 2019  |  

Gorilla MHC class I gene and sequence variation in a comparative context.

Comparisons of MHC gene content and diversity among closely related species can provide insights into the evolutionary mechanisms shaping immune system variation. After chimpanzees and bonobos, gorillas are humans’ closest living relatives; but in contrast, relatively little is known about the structure and variation of gorilla MHC class I genes (Gogo). Here, we combined long-range amplifications and long-read sequencing technology to analyze full-length MHC class I genes in 35 gorillas. We obtained 50 full-length genomic sequences corresponding to 15 Gogo-A alleles, 4 Gogo-Oko alleles, 21 Gogo-B alleles, and 10 Gogo-C alleles including 19 novel coding region sequences. We identified two previously undetected MHC class I genes related to Gogo-A and Gogo-B, respectively, thereby illustrating the potential of this approach for efficient and highly accurate MHC genotyping. Consistent with their phylogenetic position within the hominid family, individual gorilla MHC haplotypes share characteristics with humans and chimpanzees as well as orangutans suggesting a complex history of the MHC class I genes in humans and the great apes. However, the overall MHC class I diversity appears to be low further supporting the hypothesis that gorillas might have experienced a reduction of their MHC repertoire.


July 19, 2019  |  

Detecting PKD1 variants in polycystic kidney disease patients by single-molecule long-read sequencing.

A genetic diagnosis of autosomal-dominant polycystic kidney disease (ADPKD) is challenging due to allelic heterogeneity, high GC content, and homology of the PKD1 gene with six pseudogenes. Short-read next-generation sequencing approaches, such as whole-genome sequencing and whole-exome sequencing, often fail at reliably characterizing complex regions such as PKD1. However, long-read single-molecule sequencing has been shown to be an alternative strategy that could overcome PKD1 complexities and discriminate between homologous regions of PKD1 and its pseudogenes. In this study, we present the increased power of resolution for complex regions using long-read sequencing to characterize a cohort of 19 patients with ADPKD. Our approach provided high sensitivity in identifying PKD1 pathogenic variants, diagnosing 94.7% of the patients. We show that reliable screening of ADPKD patients in a single test without interference of PKD1 homologous sequences, commonly introduced by residual amplification of PKD1 pseudogenes, by direct long-read sequencing is now possible. This strategy can be implemented in diagnostics and is highly suitable to sequence and resolve complex genomic regions that are of clinical relevance. © 2017 The Authors. Human Mutation published by Wiley Periodicals, Inc.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.