Menu
September 22, 2019

Defining a personal, allele-specific, and single-molecule long-read transcriptome.

Personal transcriptomes in which all of an individual’s genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes =3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV–in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.


September 22, 2019

Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data.

The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.


September 22, 2019

The role of MHC-E in T cell immunity is conserved among humans, rhesus macaques, and cynomolgus macaques.

MHC-E is a highly conserved nonclassical MHC class Ib molecule that predominantly binds and presents MHC class Ia leader sequence-derived peptides for NK cell regulation. However, MHC-E also binds pathogen-derived peptide Ags for presentation to CD8+ T cells. Given this role in adaptive immunity and its highly monomorphic nature in the human population, HLA-E is an attractive target for novel vaccine and immunotherapeutic modalities. Development of HLA-E-targeted therapies will require a physiologically relevant animal model that recapitulates HLA-E-restricted T cell biology. In this study, we investigated MHC-E immunobiology in two common nonhuman primate species, Indian-origin rhesus macaques (RM) and Mauritian-origin cynomolgus macaques (MCM). Compared to humans and MCM, RM expressed a greater number of MHC-E alleles at both the population and individual level. Despite this difference, human, RM, and MCM MHC-E molecules were expressed at similar levels across immune cell subsets, equivalently upregulated by viral pathogens, and bound and presented identical peptides to CD8+ T cells. Indeed, SIV-specific, Mamu-E-restricted CD8+ T cells from RM recognized antigenic peptides presented by all MHC-E molecules tested, including cross-species recognition of human and MCM SIV-infected CD4+ T cells. Thus, MHC-E is functionally conserved among humans, RM, and MCM, and both RM and MCM represent physiologically relevant animal models of HLA-E-restricted T cell immunobiology. Copyright © 2017 by The American Association of Immunologists, Inc.


September 22, 2019

Evaluating the mobility potential of antibiotic resistance genes in environmental resistomes without metagenomics.

Antibiotic resistance genes are ubiquitous in the environment. However, only a fraction of them are mobile and able to spread to pathogenic bacteria. Until now, studying the mobility of antibiotic resistance genes in environmental resistomes has been challenging due to inadequate sensitivity and difficulties in contig assembly of metagenome based methods. We developed a new cost and labor efficient method based on Inverse PCR and long read sequencing for studying mobility potential of environmental resistance genes. We applied Inverse PCR on sediment samples and identified 79 different MGE clusters associated with the studied resistance genes, including novel mobile genetic elements, co-selected resistance genes and a new putative antibiotic resistance gene. The results show that the method can be used in antibiotic resistance early warning systems. In comparison to metagenomics, Inverse PCR was markedly more sensitive and provided more data on resistance gene mobility and co-selected resistances.


September 22, 2019

Recurrent structural variation, clustered sites of selection, and disease risk for the complement factor H (CFH) gene family.

Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ~360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ~25-35 Mya and CFHR1 and CFHR3 ~7-13 Mya). Remarkably, all evolutionary breakpoints share a common ~4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has expanded independently throughout primate evolution. This segment is recurrently reused and juxtaposed with a donor duplication containing exons 8 and 9 from ancestral CFH, creating four CFHR fusion genes that include lineage-specific members of the gene family. Combined analysis of >5,000 AMD cases and controls identifies a significant burden of a rare missense mutation that clusters at the N terminus of CFH [P = 5.81 × 10-8, odds ratio (OR) = 9.8 (3.67-Infinity)]. A bipolar clustering pattern of rare nonsynonymous mutations in patients with AMD (P < 10-3) and AHUS (P = 0.0079) maps to functional domains that show evidence of positive selection during primate evolution. Our structural variation analysis in >2,400 individuals reveals five recurrent rearrangement breakpoints that show variable frequency among AMD cases and controls. These data suggest a dynamic and recurrent pattern of mutation critical to the emergence of new CFHR genes but also in the predisposition to complex human genetic disease phenotypes.


September 22, 2019

Revertant mosaicism repairs skin lesions in a patient with keratitis-ichthyosis-deafness syndrome by second-site mutations in connexin 26.

Revertant mosaicism (RM) is a naturally occurring phenomenon where the pathogenic effect of a germline mutation is corrected by a second somatic event. Development of healthy-looking skin due to RM has been observed in patients with various inherited skin disorders, but not in connexin-related disease. We aimed to clarify the underlying molecular mechanisms of suspected RM in the skin of a patient with keratitis-ichthyosis-deafness (KID) syndrome. The patient was diagnosed with KID syndrome due to characteristic skin lesions, hearing deficiency and keratitis. Investigation of GJB2 encoding connexin (Cx) 26 revealed heterozygosity for the recurrent de novo germline mutation, c.148G?>?A, p.Asp50Asn. At age 20, the patient developed spots of healthy-looking skin that grew in size and number within widespread erythrokeratodermic lesions. Ultra-deep sequencing of two healthy-looking skin biopsies identified five somatic nonsynonymous mutations, independently present in cis with the p.Asp50Asn mutation. Functional studies of Cx26 in HeLa cells revealed co-expression of Cx26-Asp50Asn and wild-type Cx26 in gap junction channel plaques. However, Cx26-Asp50Asn with the second-site mutations identified in the patient displayed no formation of gap junction channel plaques. We argue that the second-site mutations independently inhibit Cx26-Asp50Asn expression in gap junction channels, reverting the dominant negative effect of the p.Asp50Asn mutation. To our knowledge, this is the first time RM has been reported to result in the development of healthy-looking skin in a patient with KID syndrome. © The Author 2017. Published by Oxford University Press.


September 22, 2019

The industrial melanism mutation in British peppered moths is a transposable element.

Discovering the mutational events that fuel adaptation to environmental change remains an important challenge for evolutionary biology. The classroom example of a visible evolutionary response is industrial melanism in the peppered moth (Biston betularia): the replacement, during the Industrial Revolution, of the common pale typica form by a previously unknown black (carbonaria) form, driven by the interaction between bird predation and coal pollution. The carbonaria locus has been coarsely localized to a 200-kilobase region, but the specific identity and nature of the sequence difference controlling the carbonaria-typica polymorphism, and the gene it influences, are unknown. Here we show that the mutation event giving rise to industrial melanism in Britain was the insertion of a large, tandemly repeated, transposable element into the first intron of the gene cortex. Statistical inference based on the distribution of recombined carbonaria haplotypes indicates that this transposition event occurred around 1819, consistent with the historical record. We have begun to dissect the mode of action of the carbonaria transposable element by showing that it increases the abundance of a cortex transcript, the protein product of which plays an important role in cell-cycle regulation, during early wing disc development. Our findings fill a substantial knowledge gap in the iconic example of microevolutionary change, adding a further layer of insight into the mechanism of adaptation in response to natural selection. The discovery that the mutation itself is a transposable element will stimulate further debate about the importance of ‘jumping genes’ as a source of major phenotypic novelty.


September 22, 2019

Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics.

Short read massive parallel sequencing has emerged as a standard diagnostic tool in the medical setting. However, short read technologies have inherent limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles. Long read single molecule sequencers resolve these obstacles. Moreover, they offer higher consensus accuracies and can detect epigenetic modifications from native DNA. The first commercially available long read single molecule platform was the RS system based on PacBio’s single molecule real-time (SMRT) sequencing technology, which has since evolved into their RSII and Sequel systems. Here we capsulize how SMRT sequencing is revolutionizing constitutional, reproductive, cancer, microbial and viral genetic testing.© The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.


September 22, 2019

Next-generation approaches to advancing eco-immunogenomic research in critically endangered primates.

High-throughput sequencing platforms are generating massive amounts of genomic data from nonmodel species, and these data sets are valuable resources that can be mined to advance a number of research areas. An example is the growing amount of transcriptome data that allow for examination of gene expression in nonmodel species. Here, we show how publicly available transcriptome data from nonmodel primates can be used to design novel research focused on immunogenomics. We mined transcriptome data from the world’s most endangered group of primates, the lemurs of Madagascar, for sequences corresponding to immunoglobulins. Our results confirmed homology between strepsirrhine and haplorrhine primate immunoglobulins and allowed for high-throughput sequencing of expressed antibodies (Ig-seq) in Coquerel’s sifaka (Propithecus coquereli). Using both Pacific Biosciences RS and Ion Torrent PGM sequencing, we performed Ig-seq on two individuals of Coquerel’s sifaka. We generated over 150 000 sequences of expressed antibodies, allowing for molecular characterization of the antigen-binding region. Our analyses suggest that similar VDJ expression patterns exist across all primates, with sequences closely related to the human VH 3 immunoglobulin family being heavily represented in sifaka antibodies. Moreover, the antigen-binding region of sifaka antibodies exhibited similar amino acid variation with respect to haplorrhine primates. Our study represents the first attempt to characterize sequence diversity of the expressed antibody repertoire in a species of lemur. We anticipate that methods similar to ours will provide the framework for investigating the adaptive immune response in wild populations of other nonmodel organisms and can be used to advance the burgeoning field of eco-immunology. © 2014 John Wiley & Sons Ltd.


September 22, 2019

Genome and evolution of the shade-requiring medicinal herb Panax ginseng.

Panax ginseng C. A. Meyer, reputed as the king of medicinal herbs, has slow growth, long generation time, low seed production and complicated genome structure that hamper its study. Here, we unveil the genomic architecture of tetraploid P. ginseng by de novo genome assembly, representing 2.98 Gbp with 59 352 annotated genes. Resequencing data indicated that diploid Panax species diverged in association with global warming in Southern Asia, and two North American species evolved via two intercontinental migrations. Two whole genome duplications (WGD) occurred in the family Araliaceae (including Panax) after divergence with the Apiaceae, the more recent one contributing to the ability of P. ginseng to overwinter, enabling it to spread broadly through the Northern Hemisphere. Functional and evolutionary analyses suggest that production of pharmacologically important dammarane-type ginsenosides originated in Panax and are produced largely in shoot tissues and transported to roots; that newly evolved P. ginseng fatty acid desaturases increase freezing tolerance; and that unprecedented retention of chlorophyll a/b binding protein genes enables efficient photosynthesis under low light. A genome-scale metabolic network provides a holistic view of Panax ginsenoside biosynthesis. This study provides valuable resources for improving medicinal values of ginseng either through genomics-assisted breeding or metabolic engineering.© 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.


September 22, 2019

Single-molecule DNA sequencing of acute myeloid leukemia and myelodysplastic syndromes with multiple TP53 alterations.

Although the frequency of TP53 mutations in hemato- logic malignancies is low, these mutations have a high clinical relevance and are usually associated with poor prognosis. Somatic TP53 mutations have been detected in up to 73.3% of cases of acute myeloid leukemia (AML) with complex karyotype and 18.9% of AML with other unfavorable cytogenetic risk factors. AML with TP53 mutations, and/or chromosomal aneuploidy, has been defined as a distinct AML subtype. In low-risk myelodysplastic syndromes (MDS), TP53 mutations occur at an early disease stage and predict disease progression. TP53 mutation diagnosis is now part of the revised European LeukemiaNet (ELN) guidelines.


September 22, 2019

Next generation multilocus sequence typing (NGMLST) and the analytical software program MLSTEZ enable efficient, cost-effective, high-throughput, multilocus sequencing typing.

Multilocus sequence typing (MLST) has become the preferred method for genotyping many biological species, and it is especially useful for analyzing haploid eukaryotes. MLST is rigorous, reproducible, and informative, and MLST genotyping has been shown to identify major phylogenetic clades, molecular groups, or subpopulations of a species, as well as individual strains or clones. MLST molecular types often correlate with important phenotypes. Conventional MLST involves the extraction of genomic DNA and the amplification by PCR of several conserved, unlinked gene sequences from a sample of isolates of the taxon under investigation. In some cases, as few as three loci are sufficient to yield definitive results. The amplicons are sequenced, aligned, and compared by phylogenetic methods to distinguish statistically significant differences among individuals and clades. Although MLST is simpler, faster, and less expensive than whole genome sequencing, it is more costly and time-consuming than less reliable genotyping methods (e.g. amplified fragment length polymorphisms). Here, we describe a new MLST method that uses next-generation sequencing, a multiplexing protocol, and appropriate analytical software to provide accurate, rapid, and economical MLST genotyping of 96 or more isolates in single assay. We demonstrate this methodology by genotyping isolates of the well-characterized, human pathogenic yeast Cryptococcus neoformans. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.


September 22, 2019

Extensive allele-specific translational regulation in hybrid mice.

Translational regulation is mediated through the interaction between diffusible trans-factors and cis-elements residing within mRNA transcripts. In contrast to extensively studied transcriptional regulation, cis-regulation on translation remains underexplored. Using deep sequencing-based transcriptome and polysome profiling, we globally profiled allele-specific translational efficiency for the first time in an F1 hybrid mouse. Out of 7,156 genes with reliable quantification of both alleles, we found 1,008 (14.1%) exhibiting significant allelic divergence in translational efficiency. Systematic analysis of sequence features of the genes with biased allelic translation revealed that local RNA secondary structure surrounding the start codon and proximal out-of-frame upstream AUGs could affect translational efficiency. Finally, we observed that the cis-effect was quantitatively comparable between transcriptional and translational regulation. Such effects in the two regulatory processes were more frequently compensatory, suggesting that the regulation at the two levels could be coordinated in maintaining robustness of protein expression. © 2015 The Authors. Published under the terms of the CC BY 4.0 license.


September 22, 2019

No assembly required: Full-length MHC class I allele discovery by PacBio circular consensus sequencing.

Single-molecule real-time (SMRT) sequencing technology with the Pacific Biosciences (PacBio) RS II platform offers the potential to obtain full-length coding regions (~1100-bp) from MHC class I cDNAs. Despite the relatively high error rate associated with SMRT technology, high quality sequences can be obtained by circular consensus sequencing (CCS) due to the random nature of the error profile. In the present study we first validated the ability of SMRT-CCS to accurately identify class I transcripts in Mauritian-origin cynomolgus macaques (Macaca fascicularis) that have been characterized previously by cloning and Sanger-based sequencing as well as pyrosequencing approaches. We then applied this SMRT-CCS method to characterize 60 novel full-length class I transcript sequences expressed by a cohort of cynomolgus macaques from China. The SMRT-CCS method described here provides a straightforward protocol for characterization of unfragmented single-molecule cDNA transcripts that will potentially revolutionize MHC class I allele discovery in nonhuman primates and other species. Published by Elsevier Inc.


September 22, 2019

A protein-truncating HSD17B13 variant and protection from chronic liver disease.

Elucidation of the genetic factors underlying chronic liver disease may reveal new therapeutic targets.We used exome sequence data and electronic health records from 46,544 participants in the DiscovEHR human genetics study to identify genetic variants associated with serum levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST). Variants that were replicated in three additional cohorts (12,527 persons) were evaluated for association with clinical diagnoses of chronic liver disease in DiscovEHR study participants and two independent cohorts (total of 37,173 persons) and with histopathological severity of liver disease in 2391 human liver samples.A splice variant (rs72613567:TA) in HSD17B13, encoding the hepatic lipid droplet protein hydroxysteroid 17-beta dehydrogenase 13, was associated with reduced levels of ALT (P=4.2×10-12) and AST (P=6.2×10-10). Among DiscovEHR study participants, this variant was associated with a reduced risk of alcoholic liver disease (by 42% [95% confidence interval CI, 20 to 58] among heterozygotes and by 53% [95% CI, 3 to 77] among homozygotes), nonalcoholic liver disease (by 17% [95% CI, 8 to 25] among heterozygotes and by 30% [95% CI, 13 to 43] among homozygotes), alcoholic cirrhosis (by 42% [95% CI, 14 to 61] among heterozygotes and by 73% [95% CI, 15 to 91] among homozygotes), and nonalcoholic cirrhosis (by 26% [95% CI, 7 to 40] among heterozygotes and by 49% [95% CI, 15 to 69] among homozygotes). Associations were confirmed in two independent cohorts. The rs72613567:TA variant was associated with a reduced risk of nonalcoholic steatohepatitis, but not steatosis, in human liver samples. The rs72613567:TA variant mitigated liver injury associated with the risk-increasing PNPLA3 p.I148M allele and resulted in an unstable and truncated protein with reduced enzymatic activity.A loss-of-function variant in HSD17B13 was associated with a reduced risk of chronic liver disease and of progression from steatosis to steatohepatitis. (Funded by Regeneron Pharmaceuticals and others.).


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.