Menu
September 22, 2019  |  

Transcriptional fates of human-specific segmental duplications in brain.

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.© 2018 Dougherty et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Human and rhesus macaque KIR haplotypes defined by their transcriptomes.

The killer-cell Ig-like receptors (KIRs) play a central role in the immune recognition in infection, pregnancy, and transplantation through their interactions with MHC class I molecules. KIR genes display abundant copy number variation as well as high levels of polymorphism. As a result, it is challenging to characterize this structurally dynamic region. KIR haplotypes have been analyzed in different species using conventional characterization methods, such as Sanger sequencing and Roche/454 pyrosequencing. However, these methods are time-consuming and often failed to define complete haplotypes, or do not reach allele-level resolution. In addition, most analyses were performed on genomic DNA, and thus were lacking substantial information about transcription and its corresponding modifications. In this paper, we present a single-molecule real-time sequencing approach, using Pacific Biosciences Sequel platform to characterize the KIR transcriptomes in human and rhesus macaque (Macaca mulatta) families. This high-resolution approach allowed the identification of novel Mamu-KIR alleles, the extension of reported allele sequences, and the determination of human and macaque KIR haplotypes. In addition, multiple recombinant KIR genes were discovered, all located on contracted haplotypes, which were likely the result of chromosomal rearrangements. The relatively high number of contracted haplotypes discovered might be indicative of selection on small KIR repertoires and/or novel fusion gene products. This next-generation method provides an improved high-resolution characterization of the KIR cluster in humans and macaques, which eventually may aid in a better understanding and interpretation of KIR allele-associated diseases, as well as the immune response in transplantation and reproduction. Copyright © 2018 by The American Association of Immunologists, Inc.


September 22, 2019  |  

Long non-coding RNA identification: comparing machine learning based tools for long non-coding transcripts discrimination

Long noncoding RNA (lncRNA) is a kind of noncoding RNA with length more than 200 nucleotides, which aroused interest of people in recent years. Lots of studies have confirmed that human genome contains many thousands of lncRNAs which exert great influence over some critical regulators of cellular process. With the advent of high-throughput sequencing technologies, a great quantity of sequences is waiting for exploitation. Thus, many programs are developed to distinguish differences between coding and long noncoding transcripts. Different programs are generally designed to be utilised under different circumstances and it is sensible and practical to select an appropriate method according to a certain situation. In this review, several popular methods and their advantages, disadvantages, and application scopes are summarised to assist people in employing a suitable method and obtaining a more reliable result.


September 22, 2019  |  

The full transcription map of mouse papillomavirus type 1 (MmuPV1) in mouse wart tissues.

Mouse papillomavirus type 1 (MmuPV1) provides, for the first time, the opportunity to study infection and pathogenesis of papillomaviruses in the context of laboratory mice. In this report, we define the transcriptome of MmuPV1 genome present in papillomas arising in experimentally infected mice using a combination of RNA-seq, PacBio Iso-seq, 5′ RACE, 3′ RACE, primer-walking RT-PCR, RNase protection, Northern blot and in situ hybridization analyses. We demonstrate that the MmuPV1 genome is transcribed unidirectionally from five major promoters (P) or transcription start sites (TSS) and polyadenylates its transcripts at two major polyadenylation (pA) sites. We designate the P7503, P360 and P859 as “early” promoters because they give rise to transcripts mostly utilizing the polyadenylation signal at nt 3844 and therefore can only encode early genes, and P7107 and P533 as “late” promoters because they give rise to transcripts utilizing polyadenylation signals at either nt 3844 or nt 7047, the latter being able to encode late, capsid proteins. MmuPV1 genome contains five splice donor sites and three acceptor sites that produce thirty-six RNA isoforms deduced to express seven predicted early gene products (E6, E7, E1, E1^M1, E1^M2, E2 and E8^E2) and three predicted late gene products (E1^E4, L2 and L1). The majority of the viral early transcripts are spliced once from nt 757 to 3139, while viral late transcripts, which are predicted to encode L1, are spliced twice, first from nt 7243 to either nt 3139 (P7107) or nt 757 to 3139 (P533) and second from nt 3431 to nt 5372. Thirteen of these viral transcripts were detectable by Northern blot analysis, with the P533-derived late E1^E4 transcripts being the most abundant. The late transcripts could be detected in highly differentiated keratinocytes of MmuPV1-infected tissues as early as ten days after MmuPV1 inoculation and correlated with detection of L1 protein and viral DNA amplification. In mature warts, detection of L1 was also found in more poorly differentiated cells, as previously reported. Subclinical infections were also observed. The comprehensive transcription map of MmuPV1 generated in this study provides further evidence that MmuPV1 is similar to high-risk cutaneous beta human papillomaviruses. The knowledge revealed will facilitate the use of MmuPV1 as an animal virus model for understanding of human papillomavirus gene expression, pathogenesis and immunology.


September 22, 2019  |  

Single-molecule long-read sequencing facilitates shrimp transcriptome research.

Although shrimp are of great economic importance, few full-length shrimp transcriptomes are available. Here, we used Pacific Biosciences single-molecule real-time (SMRT) long-read sequencing technology to generate transcripts from the Pacific white shrimp (Litopenaeus vannamei). We obtained 322,600 full-length non-chimeric reads, from which we generated 51,367 high-quality unique full-length transcripts. We corrected errors in the SMRT sequences by comparison with Illumina-produced short reads. We successfully annotated 81.72% of all unique SMRT transcripts against the NCBI non-redundant database, 58.63% against Swiss-Prot, 45.38% against Gene Ontology, 32.57% against Clusters of Orthologous Groups of proteins (COG), and 47.83% against Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Across all transcripts, we identified 3,958 long non-coding RNAs (lncRNAs) and 80,650 simple sequence repeats (SSRs). Our study provides a rich set of full-length cDNA sequences for L. vannamei, which will greatly facilitate shrimp transcriptome research.


September 22, 2019  |  

Microbiome and infectivity studies reveal complex polyspecies tree disease in Acute Oak Decline.

Decline-diseases are complex and becoming increasingly problematic to tree health globally. Acute Oak Decline (AOD) is characterized by necrotic stem lesions and galleries of the bark-boring beetle, Agrilus biguttatus, and represents a serious threat to oak. Although multiple novel bacterial species and Agrilus galleries are associated with AOD lesions, the causative agent(s) are unknown. The AOD pathosystem therefore provides an ideal model for a systems-based research approach to address our hypothesis that AOD lesions are caused by a polymicrobial complex. Here we show that three bacterial species, Brenneria goodwinii, Gibbsiella quercinecans and Rahnella victoriana, are consistently abundant in the lesion microbiome and possess virulence genes used by canonical phytopathogens that are expressed in AOD lesions. Individual and polyspecies inoculations on oak logs and trees demonstrated that B. goodwinii and G. quercinecans cause tissue necrosis and, in combination with A. biguttatus, produce the diagnostic symptoms of AOD. We have proved a polybacterial cause of AOD lesions, providing new insights into polymicrobial interactions and tree disease. This work presents a novel conceptual and methodological template for adapting Koch’s postulates to address the role of microbial communities in disease.


September 22, 2019  |  

Insights into the evolution of host association through the isolation and characterization of a novel human periodontal pathobiont, Desulfobulbus oralis.

The human oral microbiota encompasses representatives of many bacterial lineages that have not yet been cultured. Here we describe the isolation and characterization of previously uncultured Desulfobulbus oralis, the first human-associated representative of its genus. As mammalian-associated microbes rarely have free-living close relatives, D. oralis provides opportunities to study how bacteria adapt and evolve within a host. This sulfate-reducing deltaproteobacterium has adapted to the human oral subgingival niche by curtailing its physiological repertoire, losing some biosynthetic abilities and metabolic independence, and by dramatically reducing environmental sensing and signaling capabilities. The genes that enable free-living Desulfobulbus to synthesize the potent neurotoxin methylmercury were also lost by D. oralis, a notably positive outcome of host association. However, horizontal gene acquisitions from other members of the microbiota provided novel mechanisms of interaction with the human host, including toxins like leukotoxin and hemolysins. Proteomic and transcriptomic analysis revealed that most of those factors are actively expressed, including in the subgingival environment, and some are secreted. Similar to other known oral pathobionts, D. oralis can trigger a proinflammatory response in oral epithelial cells, suggesting a direct role in the development of periodontal disease.IMPORTANCE Animal-associated microbiota likely assembled as a result of numerous independent colonization events by free-living microbes followed by coevolution with their host and other microbes. Through specific adaptation to various body sites and physiological niches, microbes have a wide range of contributions, from beneficial to disease causing. Desulfobulbus oralis provides insights into genomic and physiological transformations associated with transition from an open environment to a host-dependent lifestyle and the emergence of pathogenicity. Through a multifaceted mechanism triggering a proinflammatory response, D. oralis is a novel periodontal pathobiont. Even though culture-independent approaches can provide insights into the potential role of the human microbiome “dark matter,” cultivation and experimental characterization remain important to studying the roles of individual organisms in health and disease.


September 22, 2019  |  

Genomic analysis of oral Campylobacter concisus strains identified a potential bacterial molecular marker associated with active Crohn’s disease.

Campylobacter concisus is an oral bacterium that is associated with inflammatory bowel disease (IBD) including Crohn’s disease (CD) and ulcerative colitis (UC). C. concisus consists of two genomospecies (GS) and diverse strains. This study aimed to identify molecular markers to differentiate commensal and IBD-associated C. concisus strains. The genomes of 63 oral C. concisus strains isolated from patients with IBD and healthy controls were examined, of which 38 genomes were sequenced in this study. We identified a novel secreted enterotoxin B homologue, Csep1. The csep1 gene was found in 56% of GS2 C. concisus strains, presented in the plasmid pICON or the chromosome. A six-nucleotide insertion at the position 654-659?bp in csep1 (csep1-6bpi) was found. The presence of csep1-6bpi in oral C. concisus strains isolated from patients with active CD (47%, 7/15) was significantly higher than that in strains from healthy controls (0/29, P?=?0.0002), and the prevalence of csep1-6bpi positive C. concisus strains was significantly higher in patients with active CD (67%, 4/6) as compared to healthy controls (0/23, P?=?0.0006). Proteomics analysis detected the Csep1 protein. A csep1 gene hot spot in the chromosome of different C. concisus strains was found. The pICON plasmid was only found in GS2 strains isolated from the two relapsed CD patients with small bowel complications. This study reports a C. concisus molecular marker (csep1-6bpi) that is associated with active CD.


September 22, 2019  |  

Comparative genomics of Campylobacter concisus: Analysis of clinical strains reveals genome diversity and pathogenic potential.

In recent years, an increasing number of Campylobacter species have been associated with human gastrointestinal (GI) diseases including gastroenteritis, inflammatory bowel disease, and colorectal cancer. Campylobacter concisus, an oral commensal historically linked to gingivitis and periodontitis, has been increasingly detected in the lower GI tract. In the present study, we generated robust genome sequence data from C. concisus strains and undertook a comprehensive pangenome assessment to identify C. concisus virulence properties and to explain potential adaptations acquired while residing in specific ecological niche(s) of the GI tract. Genomes of 53 new C. concisus strains were sequenced, assembled, and annotated including 36 strains from gastroenteritis patients, 13 strains from Crohn’s disease patients and four strains from colitis patients (three collagenous colitis and one lymphocytic colitis). When compared with previous published sequences, strains clustered into two main groups/genomospecies (GS) with phylogenetic clustering explained neither by disease phenotype nor sample location. Paired oral/faecal isolates, from the same patient, indicated that there are few genetic differences between oral and gut isolates which suggests that gut isolates most likely reflect oral strain relocation. Type IV and VI secretion systems genes, genes known to be important for pathogenicity in the Campylobacter genus, were present in the genomes assemblies, with 82% containing Type VI secretion system genes. Our findings indicate that C. concisus strains are genetically diverse, and the variability in bacterial secretion system content may play an important role in their virulence potential.


September 22, 2019  |  

High-Resolution Full-Length HLA Typing Method Using Third Generation (Pac-Bio SMRT) Sequencing Technology.

The human HLA genes are among the most polymorphic genes in the human genome. Therefore, it is very difficult to find two unrelated individuals with identical HLA molecules. As a result, HLA Class I and Class II genes are routinely sequenced or serotyped for organ transplantation, autoimmune disease-association studies, drug hypersensitivity research, and other applications. However, these methods were able to give two or four digit data, which was not sufficient enough to understand the completeness of haplotypes of HLA genes. To overcome these limitations, we here described end-to-end workflow for sequencing of HLA class I and class II genes using third generation sequencing, SMRT technology. This method produces fully-phased, unambiguous, allele-level information on the PacBio System.


September 22, 2019  |  

Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing.

Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing.We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points.We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.