Menu
April 21, 2020

Investigating the role of exudates in recruiting Streptomyces bacteria to the Arabidopsis thaliana root microbiome

Arabidopsis thaliana has a diverse but consistent root microbiome, recruited in part by the release of fixed carbon in root exudates. Here we focussed on the recruitment of Streptomyces bacteria, which are well established plant-growth-promoting rhizobacteria and which have been proposed to be recruited to A. thaliana roots by the release of salicylic acid. We generated high quality genome sequences for eight Streptomyces endophyte strains and showed that although some strains do enhance plant growth, they are not attracted to, and do not feed on, salicyclic acid. We used 13CO2 DNA-stable isotope probing to determine which bacteria are fed by the plants in the rhizo- and endosphere and found that streptomycetes did not feed on root exudates in vivo, despite the fact that they can use exudate as sole carbon and nitrogen sources in vitro. We confirmed increased root colonisation by streptomycetes in plants that constitutively produce salicylic acid, but these plants exhibited a pleiotropic phenotype of early senescence and weak growth. We propose that streptomycetes are attracted to the rhizosphere by root exudates but can be outcompeted for this food source by more abundant proteobacteria and most likely feed off unlabelled complex organic matter.


April 21, 2020

Long-Read RNA Sequencing Identifies Alternative Splice Variants in Hepatocellular Carcinoma and Tumor-Specific Isoforms.

Alternative splicing (AS) allows generation of cell type-specific mRNA transcripts and contributes to hallmarks of cancer. Genome-wide analysis for AS in human hepatocellular carcinoma (HCC), however, is limited. We sought to obtain a comprehensive AS landscape in HCC and define tumor-associated variants. Single-molecule real-time long-read RNA sequencing was performed on patient-derived HCC cells, and presence of splice junctions was defined by SpliceMap-LSC-IDP algorithm. We obtained an all-inclusive map of annotated AS variants and further discovered 362 alternative spliced variants that are not previously reported in any database (neither RefSeq nor GENCODE). They were mostly derived from intron retention and early termination codon with an in-frame open reading frame in 81.5%. We corroborated many of these predicted unannotated and annotated variants to be tumor specific in an independent cohort of primary HCC tumors and matching nontumoral liver. Using the combined Sanger sequencing and TaqMan junction assays, unique and common expressions of spliced variants including enzyme regulators (ARHGEF2, SERPINH1), chromatin modifiers (DEK, CDK9, RBBP7), RNA-binding proteins (SRSF3, RBM27, MATR3, YBX1), and receptors (ADRM1, CD44v8-10, vitamin D receptor, ROR1) were determined in HCC tumors. We further focused functional investigations on ARHGEF2 variants (v1 and v3) that arise from the common amplified site chr.1q22 of HCC. Their biological significance underscores two major cancer hallmarks, namely cancer stemness and epithelial-to-mesenchymal transition-mediated cell invasion and migration, although v3 is consistently more potent than v1. Conclusion: Alternative isoforms and tumor-specific isoforms that arise from aberrant splicing are common during the liver tumorigenesis. Our results highlight insights gained from the analysis of AS in HCC. © 2019 The Authors. Hepatology published by Wiley Periodicals, Inc., on behalf of American Association for the Study of Liver Diseases.


April 21, 2020

A chromosome-level genome of black rockfish, Sebastes schlegelii, provides insights into the evolution of live birth.

Black rockfish (Sebastes schlegelii) is a teleost species where eggs are fertilized internally and retained in the maternal reproductive system, where they undergo development until live birth (termed viviparity). In the present study, we report a chromosome-level black rockfish genome assembly. High-throughput transcriptome analysis (RNA-seq and ATAC-seq), coupled with in situ hybridization (ISH) and immunofluorescence, identify several candidate genes for maternal preparation, sperm storage and release, and hatching. We propose that zona pellucida (ZP) proteins retain sperm at the oocyte envelope, while genes in two distinct astacin metalloproteinase subfamilies serve to release sperm from the ZP and free the embryo from chorion at pre-hatching stage. Finally, we present a model of black rockfish reproduction, and propose that the rockfish ovarian wall has a similar function to the uterus of mammals. Taken together, these genomic data reveal unprecedented insights into the evolution of an unusual teleost life history strategy, and provide a sound foundation for studying viviparity in non-mammalian vertebrates and an invaluable resource for rockfish ecological and evolutionary research. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.


April 21, 2020

Haplotype-phased genome assembly of virulent Phythophthora ramorum isolate ND886 facilitated by long-read sequencing reveals effector polymorphisms and copy number variation.

Phytophthora ramorum is a destructive pathogen that causes Sudden Oak Death. The genome sequence of P. ramorum isolate Pr102 was previously produced using Sanger reads, and contained 12 Mb of gaps. However, isolate Pr102 had shown reduced aggressiveness and genome abnormalities. In order to produce an improved genome assembly for P. ramorum, we performed long read sequencing of highly aggressive P. ramorum isolate CDFA1418886 (abbreviated as ND886). We generated a 60.5 Mb assembly of the ND886 genome using the Pacific Biosciences sequencing platform. The assembly includes 302 primary contigs (60.2 Mb) and 9 unplaced contigs (265 Kb). Additionally, we found a “Highly repetitive” component from the Pacbio unassembled unmapped reads containing tandem repeats that are not part of the 60.5 Mb genome. The overall repeat content in the primary assembly was much higher than the Pr102 Sanger version (48% vs. 29%) indicating that the long reads have captured repetitive regions effectively. The 302 primary contigs were phased into 345 haplotype blocks and 222,892 phased variants, of which the longest phased block was 1,513,201 bp with 7,265 phased variants. The improved phased assembly facilitated identification of 21 and 25 Crinkler effectors and 393 and 394 RXLR effector genes from two haplotypes. Of these, 24 and 25 RXLR effectors were newly predicted from Haplotype A and Haplotype B, respectively. In addition, 7 new paralogs of effector Avh207 were found in contig 54, not reported earlier. Comparison of the ND886 assembly with Pr102 V1 assembly suggests that several repeat-rich smaller scaffolds within the Pr102 V1 assembly were possibly misassembled; these regions are fully encompassed now in ND886 contigs. Our analysis further reveals that Pr102 is a heterokaryon with multiple nuclear types in the sequences corresponding to contig 10 of ND886 assembly.


April 21, 2020

The radish genome database (RadishGD): an integrated information resource for radish genomics.

Radish (Raphanus sativus L.) is an important root vegetable crop in the family Brassicaceae, which provides diverse nutrients for human health and is closely related to the Brassica crop species. Recently, we sequenced and assembled the radish genome into nine chromosome pseudomolecules. In addition, we developed diverse genomic resources, including genetic maps, molecular markers, transcriptome, genome-wide methylation and variome data. In this study, we describe the radish genome database (RadishGD), including details of data sets that we generated and the web interface that allows access to these data. RadishGD comprises six major units that enable researchers and general users to search, browse and analyze the radish genomic data in an integrated manner. The Search unit provides gene structures and sequences for gene models through keyword or BLAST searches. The Genome browser displays graphic representations of gene models, mRNAs, repetitive sequences, genome-wide methylation and variomes among various genotypes. The Functional annotation unit offers gene ontology, plant ontology, pathway and gene family information for gene models. The Genetic map unit provides information about markers and their genetic locations using two types of genetic maps. The Expression unit presents transcriptional characteristics and methylation levels for each gene in 18 tissues. All sequence data incorporated into RadishGD can be downloaded from the Data resources unit. RadishGD will be continually updated to serve as a community resource for radish genomics and breeding research.


April 21, 2020

Lateral transfers of large DNA fragments spread functional genes among grasses.

A fundamental tenet of multicellular eukaryotic evolution is that vertical inheritance is paramount, with natural selection acting on genetic variants transferred from parents to offspring. This lineal process means that an organism’s adaptive potential can be restricted by its evolutionary history, the amount of standing genetic variation, and its mutation rate. Lateral gene transfer (LGT) theoretically provides a mechanism to bypass many of these limitations, but the evolutionary importance and frequency of this process in multicellular eukaryotes, such as plants, remains debated. We address this issue by assembling a chromosome-level genome for the grass Alloteropsis semialata, a species surmised to exhibit two LGTs, and screen it for other grass-to-grass LGTs using genomic data from 146 other grass species. Through stringent phylogenomic analyses, we discovered 57 additional LGTs in the A. semialata nuclear genome, involving at least nine different donor species. The LGTs are clustered in 23 laterally acquired genomic fragments that are up to 170 kb long and have accumulated during the diversification of Alloteropsis. The majority of the 59 LGTs in A. semialata are expressed, and we show that they have added functions to the recipient genome. Functional LGTs were further detected in the genomes of five other grass species, demonstrating that this process is likely widespread in this globally important group of plants. LGT therefore appears to represent a potent evolutionary force capable of spreading functional genes among distantly related grass species. Copyright © 2019 the Author(s). Published by PNAS.


April 21, 2020

High satellite repeat turnover in great apes studied with short- and long-read technologies.

Satellite repeats are a structural component of centromeres and telomeres, and in some instances their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50?bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: (1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and (2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males vs. females; using Y chromosome assemblies or FIuorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59?kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


April 21, 2020

BjuWRR1, a CC-NB-LRR gene identified in Brassica juncea, confers resistance to white rust caused by Albugo candida.

BjuWRR1, a CNL-type R gene, was identified from an east European gene pool line of Brassica juncea and validated for conferring resistance to white rust by genetic transformation. White rust caused by the oomycete pathogen Albugo candida is a significant disease of crucifer crops including Brassica juncea (mustard), a major oilseed crop of the Indian subcontinent. Earlier, a resistance-conferring locus named AcB1-A5.1 was mapped in an east European gene pool line of B. juncea-Donskaja-IV. This line was tested along with some other lines of B. juncea (AABB), B. rapa (AA) and B. nigra (BB) for resistance to six isolates of A. candida collected from different mustard growing regions of India. Donskaja-IV was found to be completely resistant to all the tested isolates. Sequencing of a BAC spanning the locus AcB1-A5.1 showed the presence of a single CC-NB-LRR protein encoding R gene. The genomic sequence of the putative R gene with its native promoter and terminator was used for the genetic transformation of a susceptible Indian gene pool line Varuna and was found to confer complete resistance to all the isolates. This is the first white rust resistance-conferring gene described from Brassica species and has been named BjuWRR1. Allelic variants of the gene in B. juncea germplasm and orthologues in the Brassicaceae genomes were studied to understand the evolutionary dynamics of the BjuWRR1 gene.


April 21, 2020

CRISPR/Cas9-targeted enrichment and long-read sequencing of the Fuchs endothelial corneal dystrophy-associated TCF4 triplet repeat.

To demonstrate the utility of an amplification-free long-read sequencing method to characterize the Fuchs endothelial corneal dystrophy (FECD)-associated intronic TCF4 triplet repeat (CTG18.1).We applied an amplification-free method, utilizing the CRISPR/Cas9 system, in combination with PacBio single-molecule real-time (SMRT) long-read sequencing, to study CTG18.1. FECD patient samples displaying a diverse range of CTG18.1 allele lengths and zygosity status (n?=?11) were analyzed. A robust data analysis pipeline was developed to effectively filter, align, and interrogate CTG18.1-specific reads. All results were compared with conventional polymerase chain reaction (PCR)-based fragment analysis.CRISPR-guided SMRT sequencing of CTG18.1 provided accurate genotyping information for all samples and phasing was possible for 18/22 alleles sequenced. Repeat length instability was observed for all expanded (=50 repeats) phased CTG18.1 alleles analyzed. Furthermore, higher levels of repeat instability were associated with increased CTG18.1 allele length (mode length =91 repeats) indicating that expanded alleles behave dynamically.CRISPR-guided SMRT sequencing of CTG18.1 has revealed novel insights into CTG18.1 length instability. Furthermore, this study provides a framework to improve the molecular diagnostic accuracy for CTG18.1-mediated FECD, which we anticipate will become increasingly important as gene-directed therapies are developed for this common age-related and sight threatening disease.


April 21, 2020

A comprehensive evaluation of long read error correction methods

Motivation: Third-generation sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. Results: In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research.


April 21, 2020

Fast and accurate long-read assembly with wtdbg2

Existing long-read assemblers require tens of thousands of CPU hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a novel long-read assembler wtdbg2 that, for human data, is tens of times faster than published tools while achieving comparable contiguity and accuracy. It represents a significant algorithmic advance and paves the way for population-scale long-read assembly in future.


April 21, 2020

Strengths and potential pitfalls of hay-transfer for ecological restoration revealed by RAD-seq analysis in floodplain Arabis species

Achieving high intraspecific genetic diversity is a critical goal in ecological restoration as it increases the adaptive potential and long-term resilience of populations. Thus, we investigated genetic diversity within and between pristine sites in a fossil floodplain and compared it to sites restored by hay-transfer between 1997 and 2014. RAD-seq genotyping revealed that the stenoecious flood-plain species Arabis nemorensis is co-occurring with individuals that, based on ploidy, ITS-sequencing and morphology, probably belong to the close relative Arabis sagittata, which has a documented preference for dry calcareous grasslands but has not been reported in floodplain meadows. We show that hay-transfer maintains genetic diversity for both species. Additionally, in A. sagittata, transfer from multiple genetically isolated pristine sites resulted in restored sites with increased diversity and admixed local genotypes. In A. nemorensis, transfer did not create novel admixture dynamics because genetic diversity between pristine sites was less differentiated. Thus, the effects of hay-transfer on genetic diversity also depend on the genetic makeup of the donor communities of each species, especially when local material is mixed. Our results demonstrate the efficiency of hay-transfer for habitat restoration and emphasize the importance of pre-restoration characterization of micro-geographic patterns of intraspecific diversity of the community to guarantee that restoration practices reach their goal, i.e. maximize the adaptive potential of the entire restored plant community. Overlooking these patterns may alter the balance between species in the community. Additionally, our comparison of summary statistics obtained from de novo and reference-based RAD-seq pipelines shows that the genomic impact of restoration can be reliably monitored in species lacking prior genomic knowledge.


April 21, 2020

Fam83F induces p53 stabilisation and promotes its activity.

p53 is one of the most important tumour suppressor proteins currently known. It is activated in response to DNA damage and this activation leads to proliferation arrest and cell death. The abundance and activity of p53 are tightly controlled and reductions in p53’s activity can contribute to the development of cancer. Here, we show that Fam83F increases p53 protein levels by protein stabilisation. Fam83F interacts with p53 and decreases its ubiquitination and degradation. Fam83F is induced in response to DNA damage and its overexpression also increases p53 activity in cell culture experiments and in zebrafish embryos. Downregulation of Fam83F decreases transcription of p53 target genes in response to DNA damage and increases cell proliferation, identifying Fam83F as an important regulator of the DNA damage response. Overexpression of Fam83F also enhances migration of cells harbouring mutant p53 demonstrating that it can also activate mutant forms of p53.


April 21, 2020

The landscape of SNCA transcripts across synucleinopathies: New insights from long reads sequencing analysis

Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinsontextquoterights Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here we apply the first study of using long read sequencing with targeted capture of both the gDNA and cDNA of the SNCA gene in brain tissues of PD, DLB, and control samples using the PacBio Sequel system. The targeted full-length cDNA (Iso-Seq) data confirmed complex usage of known alternative start sites and variable 3textquoteright UTR lengths, as well as novel 5textquoteright starts and 3textquoteright ends not previously described. The targeted gDNA data allowed phasing of up to 81% of the ~114kb SNCA region, with the longest phased block excedding 54 kb. We demonstrate that long gDNA and cDNA reads have the potential to reveal long-range information not previously accessible using traditional sequencing methods. This approach has a potential impact in studying disease risk genes such as SNCA, providing new insights into the genetic etiologies, including perturbations to the landscape the gene transcripts, of human complex diseases such as synucleinopathies.


April 21, 2020

Schizophrenia risk variants influence multiple classes of transcripts of sorting nexin 19 (SNX19).

Genome-wide association studies (GWAS) have identified many genomic loci associated with risk for schizophrenia, but unambiguous identification of the relationship between disease-associated variants and specific genes, and in particular their effect on risk conferring transcripts, has proven difficult. To better understand the specific molecular mechanism(s) at the schizophrenia locus in 11q25, we undertook cis expression quantitative trait loci (cis-eQTL) mapping for this 2 megabase genomic region using postmortem human brain samples. To comprehensively assess the effects of genetic risk upon local expression, we evaluated multiple transcript features: genes, exons, and exon-exon junctions in multiple brain regions-dorsolateral prefrontal cortex (DLPFC), hippocampus, and caudate. Genetic risk variants strongly associated with expression of SNX19 transcript features that tag multiple rare classes of SNX19 transcripts, whereas they only weakly affected expression of an exon-exon junction that tags the majority of abundant transcripts. The most prominent class of SNX19 risk-associated transcripts is predicted to be overexpressed, defined by an exon-exon splice junction between exons 8 and 10 (junc8.10) and that is predicted to encode proteins that lack the characteristic nexin C terminal domain. Risk alleles were also associated with either increased or decreased expression of multiple additional classes of transcripts. With RACE, molecular cloning, and long read sequencing, we found a number of novel SNX19 transcripts that further define the set of potential etiological transcripts. We explored epigenetic regulation of SNX19 expression and found that DNA methylation at CpG sites near the primary transcription start site and within exon 2 partially mediate the effects of risk variants on risk-associated expression. ATAC sequencing revealed that some of the most strongly risk-associated SNPs are located within a region of open chromatin, suggesting a nearby regulatory element is involved. These findings indicate a potentially complex molecular etiology, in which risk alleles for schizophrenia generate epigenetic alterations and dysregulation of multiple classes of SNX19 transcripts.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.