Menu
July 7, 2019

Measuring the mappability spectrum of reference genome assemblies

The ability to infer actionable information from genomic variation data in a resequencing experiment relies on accurately aligning the sequences to a reference genome. However, this accuracy is inherently limited by the quality of the reference assembly and the repetitive content of the subject’s genome. As long read sequencing technologies become more widespread, it is crucial to investigate the expected improvements in alignment accuracy and variant analysis over existing short read methods. The ability to quantify the read length and error rate necessary to uniquely map regions of interest in a sequence allows users to make informed decisions regarding experiment design and provides useful metrics for comparing the magnitude of repetition across different reference assemblies. To this end we have developed NEAT-Repeat, a toolkit for exhaustively identifying the minimum read length required to uniquely map each position of a reference sequence given a specified error rate. Using these tools we computed the -mappability spectrum” for ten reference sequences, including human and a range of plants and animals, quantifying the theoretical improvements in alignment accuracy that would result from sequencing with longer reads or reads with less base-calling errors. Our inclusion of read length and error rate builds upon existing methods for mappability tracks based on uniqueness or aligner-specific mapping scores, and thus enables more comprehensive analysis. We apply our mappability results to whole-genome variant call data, and demonstrate that variants called with low mapping and genotype quality scores are disproportionately found in reference regions that require long reads to be uniquely covered. We propose that our mappability metrics provide a valuable supplement to established variant filtering and annotation pipelines by supplying users with an additional metric related to read mapping quality. NEAT-Repeat can process large and repetitive genomes, such as those of corn and soybean, in a tractable amount of time by leveraging efficient methods for edit distance computation as well as running multiple jobs in parallel. NEAT-Repeat is written in Python 2.7 and C++, and is available at https://github.com/zstephens/neat-repeat.


July 7, 2019

Genome-wide characterization and phylogenetic analysis of GSK gene family in three species of cotton: evidence for a role of some GSKs in fiber development and responses to stress

Background: The glycogen synthase kinase 3/shaggy kinase (GSK3) is a serine/threonine kinase with important roles in animals. Although GSK3 genes have been studied for more than 30years, plant GSK genes have been studied only since the last decade. Previous research has confirmed that plant GSK genes are involved in diverse processes, including floral development, brassinosteroid signaling, and responses to abiotic stresses. Result: In this study, 20, 15 (including 5 different transcripts) and 10 GSK genes were identified in G. hirsutum, G. raimondii and G. arboreum, respectively. A total of 65 genes from Arabidopsis, rice, and cotton were classified into 4 clades. High similarities were found in GSK3 protein sequences, conserved motifs, and gene structures, as well as good concordance in gene pairwise comparisons (G. hirsutum vs. G. arboreum, G. hirsutum vs. G. raimondii, and G. arboreum vs. G. raimondii) were observed. Whole genome duplication (WGD) within At and Dt sub-genomes has been central to the expansion of the GSK gene family. Furthermore, GhSK genes showed diverse expression patterns in various tissues. Additionally, the expression profiles of GhSKs under different stress treatments demonstrated that many are stress-responsive genes. However, none were induced by brassinolide treatment. Finally, nine co-expression sub- networks were observed for GhSKs and the functional annotations of these genes suggested that some GhSKs might be involved in cotton fiber development. Conclusion: In this present work, we identified 45 GSK genes from three cotton species, which were divided into four clades. The gene features, muti-alignment, conversed motifs, and syntenic blocks indicate that they have been highly conserved during evolution. Whole genome duplication was determined to be the dominant factor for GSK gene family expansion. The analysis of co-expressed sub-networks and tissue-specific expression profiles suggested functions of GhSKs during fiber development. Moreover, their different responses to various abiotic stresses indicated great functional diversity amongst the GhSKs. Briefly, data presented herein may serve as the basis for future functional studies of GhSKs.


July 7, 2019

Pilot satellitome analysis of the model plant, Physcomitrellapatens, revealed a transcribed and high-copy IGS related tandem repeat.

Satellite DNA (satDNA) constitutes a substantial part of eukaryotic genomes. In the last decade, it has been shown that satDNA is not an inert part of the genome and its function extends beyond the nuclear membrane. However, the number of model plant species suitable for studying the novel horizons of satDNA functionality is low. Here, we explored the satellitome of the model “basal” plant, Physcomitrellapatens (Hedwig, 1801) Bruch & Schimper, 1849 (moss), which has a number of advantages for deep functional and evolutionary research. Using a newly developed pyTanFinder pipeline (https://github.com/Kirovez/pyTanFinder) coupled with fluorescence in situ hybridization (FISH), we identified five high copy number tandem repeats (TRs) occupying a long DNA array in the moss genome. The nuclear organization study revealed that two TRs had distinct locations in the moss genome, concentrating in the heterochromatin and knob-rDNA like chromatin bodies. Further genomic, epigenetic and transcriptomic analysis showed that one TR, named PpNATR76, was located in the intergenic spacer (IGS) region and transcribed into long non-coding RNAs (lncRNAs). Several specific features of PpNATR76 lncRNAs make them very similar with the recently discovered human lncRNAs, raising a number of questions for future studies. This work provides new resources for functional studies of satellitome in plants using the model organism P.patens, and describes a list of tandem repeats for further analysis.


July 7, 2019

Bridging gaps in transposable element research with single-molecule and single-cell technologies

More than half of the genomic landscape in humans and many other organisms is composed of repetitive DNA, which mostly derives from transposable elements (TEs) and viruses. Recent technological advances permit improved assessment of the repetitive content across genomes and newly developed molecular assays have revealed important roles of TEs and viruses in host genome evolution and organization. To update on our current understanding of TE biology and to promote new interdisciplinary strategies for the TE research community, leading experts gathered for the 2nd Uppsala Transposon Symposium on October 4–5, 2018 in Uppsala, Sweden. Using cutting-edge single-molecule and single-cell approaches, research on TEs and other repeats has entered a new era in biological and biomedical research.


July 7, 2019

Hardwood tree genomics: Unlocking woody plant biology.

Woody perennial angiosperms (i.e., hardwood trees) are polyphyletic in origin and occur in most angiosperm orders. Despite their independent origins, hardwoods have shared physiological, anatomical, and life history traits distinct from their herbaceous relatives. New high-throughput DNA sequencing platforms have provided access to numerous woody plant genomes beyond the early reference genomes of Populus and Eucalyptus, references that now include willow and oak, with pecan and chestnut soon to follow. Genomic studies within these diverse and undomesticated species have successfully linked genes to ecological, physiological, and developmental traits directly. Moreover, comparative genomic approaches are providing insights into speciation events while large-scale DNA resequencing of native collections is identifying population-level genetic diversity responsible for variation in key woody plant biology across and within species. Current research is focused on developing genomic prediction models for breeding, defining speciation and local adaptation, detecting and characterizing somatic mutations, revealing the mechanisms of gender determination and flowering, and application of systems biology approaches to model complex regulatory networks underlying quantitative traits. Emerging technologies such as single-molecule, long-read sequencing is being employed as additional woody plant species, and genotypes within species, are sequenced, thus enabling a comparative (“evo-devo”) approach to understanding the unique biology of large woody plants. Resource availability, current genomic and genetic applications, new discoveries and predicted future developments are illustrated and discussed for poplar, eucalyptus, willow, oak, chestnut, and pecan.


July 7, 2019

Reference genes for RT-qPCR normalisation in different tissues, developmental stages and stress conditions of Hypericum perforatum

Hypericum perforatum is a widely known medicinal herb used mostly as a remedy for depression because of its abundant secondary metabolites. Quantitative real-time PCR (qRT-PCR) is an optimized method for the efficient and reliable quantification of gene expression studies. In general, reference genes are used in qRT-PCR analysis because of their known or suspected housekeeping roles. However, their expression level cannot be assumed to remain stable under all possible experimental conditions. Thus, the identification of high quality reference genes is very necessary for the interpretation of qRT-PCR data. In this study, we investigated the expression of fourteen candidate genes, including nine housekeeping genes and five potential candidate genes. Additionally, the HpHYP1 gene, belonging to the PR-10 family associated with stress control, was used for validation of the candidate reference genes. Three programs were applied to evaluate the gene expression stability across four different plant tissues, three developmental stages and a set of abiotic stress and hormonal treatments. The candidate genes showed a wide range of Ct values in all samples, indicating that they are differentially expressed. Integrating all of the algorithms and evaluations, ACT2 and TUB-ß were the most stable combination overall and for different developmental stages samples. Moreover, ACT2 and EF1-a were considered to be the two most applicable reference genes for different tissues and for stress samples. Majority of the conventional housekeeping genes exhibited better than the potential reference genes. The obtained results will contribute to improving credibility of standardization and quantification of transcription levels in future expression research of H. perforatum.


July 7, 2019

Whole-Genome and Expression Analyses of Bamboo Aquaporin Genes Reveal Their Functions Involved in Maintaining Diurnal Water Balance in Bamboo Shoots.

Water supply is essential for maintaining normal physiological function during the rapid growth of bamboo. Aquaporins (AQPs) play crucial roles in water transport for plant growth and development. Although 26 PeAQPs in bamboo have been reported, the aquaporin-led mechanism of maintaining diurnal water balance in bamboo shoots remains unclear. In this study, a total of 63 PeAQPs were identified, based on the updated genome of moso bamboo (Phyllostachys edulis), including 22 PePIPs, 20 PeTIPs, 17 PeNIPs, and 4 PeSIPs. All of the PeAQPs were differently expressed in 26 different tissues of moso bamboo, based on RNA sequencing (RNA-seq) data. The root pressure in shoots showed circadian rhythm changes, with positive values at night and negative values in the daytime. The quantitative real-time PCR (qRT-PCR) result showed that 25 PeAQPs were detected in the base part of the shoots, and most of them demonstrated diurnal rhythm changes. The expression levels of some PeAQPs were significantly correlated with the root pressure. Of the 86 sugar transport genes, 33 had positive co-expression relationships with 27 PeAQPs. Two root pressure-correlated PeAQPs, PeTIP4;1 and PeTIP4;2, were confirmed to be highly expressed in the parenchyma and epidermal cells of bamboo culm, and in the epidermis, pith, and primary xylem of bamboo roots by in situ hybridization. The authors’ findings provide new insights and a possible aquaporin-led mechanism for bamboo fast growth.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.