Menu
July 7, 2019

TriPoly: haplotype estimation for polyploids using sequencing data of related individuals.

Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci.We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring.TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly.Supplementary data are available at Bioinformatics online.


July 7, 2019

Loss of RXFP2 and INSL3 genes in Afrotheria shows that testicular descent is the ancestral condition in placental mammals.

Descent of testes from a position near the kidneys into the lower abdomen or into the scrotum is an important developmental process that occurs in all placental mammals, with the exception of five afrotherian lineages. Since soft-tissue structures like testes are not preserved in the fossil record and since key parts of the placental mammal phylogeny remain controversial, it has been debated whether testicular descent is the ancestral or derived condition in placental mammals. To resolve this debate, we used genomic data of 71 mammalian species and analyzed the evolution of two key genes (relaxin/insulin-like family peptide receptor 2 [RXFP2] and insulin-like 3 [INSL3]) that induce the development of the gubernaculum, the ligament that is crucial for testicular descent. We show that both RXFP2 and INSL3 are lost or nonfunctional exclusively in four afrotherians (tenrec, cape elephant shrew, cape golden mole, and manatee) that completely lack testicular descent. The presence of remnants of once functional orthologs of both genes in these afrotherian species shows that these gene losses happened after the split from the placental mammal ancestor. These “molecular vestiges” provide strong evidence that testicular descent is the ancestral condition, irrespective of persisting phylogenetic discrepancies. Furthermore, the absence of shared gene-inactivating mutations and our estimates that the loss of RXFP2 happened at different time points strongly suggest that testicular descent was lost independently in Afrotheria. Our results provide a molecular mechanism that explains the loss of testicular descent in afrotherians and, more generally, highlight how molecular vestiges can provide insights into the evolution of soft-tissue characters.


July 7, 2019

Meeting report: mobile genetic elements and genome plasticity 2018

The Mobile Genetic Elements and Genome Plasticity conference was hosted by Keystone Symposia in Santa Fe, NM USA, February 11–15, 2018. The organizers were Marlene Belfort, Evan Eichler, Henry Levin and Lynn Maquat. The goal of this conference was to bring together scientists from around the world to discuss the function of transposable elements and their impact on host species. Central themes of the meeting included recent innovations in genome analysis and the role of mobile DNA in disease and evolution. The conference included 200 scientists who participated in poster presentations, short talks selected from abstracts, and invited talks. A total of 58 talks were organized into eight sessions and two workshops. The topics varied from mechanisms of mobilization, to the structure of genomes and their defense strategies to protect against transposable elements.


July 7, 2019

Fast-SG: an alignment-free algorithm for hybrid assembly.

Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes.Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.


July 7, 2019

Genomes and transcriptomes of duckweeds.

Duckweeds (Lemnaceae family) are the smallest flowering plants that adapt to the aquatic environment. They are regarded as the promising sustainable feedstock with the characteristics of high starch storage, fast propagation, and global distribution. The duckweed genome size varies 13-fold ranging from 150 Mb in Spirodela polyrhiza to 1,881 Mb in Wolffia arrhiza. With the development of sequencing technology and bioinformatics, five duckweed genomes from Spirodela and Lemna genera are sequenced and assembled. The genome annotations discover that they share similar protein orthologs, whereas the repeat contents could mainly explain the genome size difference. The gene families responsible for cell growth and expansion, lignin biosynthesis, and flowering are greatly contracted. However, the gene family of glutamate synthase has experienced expansion, indicating their significance in ammonia assimilation and nitrogen transport. The transcriptome is comprehensively sequenced for the genera of Spirodela, Landoltia, and Lemna, including various treatments such as abscisic acid, radiation, heavy metal, and starvation. The analysis of the underlying molecular mechanism and the regulatory network would accelerate their applications in the fields of bioenergy and phytoremediation. The comparative genomics has shown that duckweed genomes contain relatively low gene numbers and more contracted gene families, which may be in parallel with their highly reduced morphology with a simple leaf and primary roots. Still, we are waiting for the advancement of the long read sequencing technology to resolve the complex genomes and transcriptomes for unsequenced Wolffiella and Wolffia due to the large genome sizes and the similarity in their polyploidy.


July 7, 2019

Speeding up DNA sequence alignment by optical correlator

In electronic computers, extensive amount of computations required for searching biological sequences in big databases leads to vast amount of energy consumption for electrical processing and cooling. On the other hand, optical processing is much faster than electrical counterpart, due to its parallel processing capability, at a fraction of energy consumption level and cost. In this regard, this paper proposes a correlation-based optical algorithm using metamaterial, taking advantages of optical parallel processing, to efficiently locate the edits as a means of DNA sequence comparison. Specifically, the proposed algorithm partitions the read DNA sequence into multiple overlapping intervals, referred to as windows, and then, extracts the peaks resulted from their cross-correlation with the reference sequence in parallel. Finally, to locate the edits, a simple algorithm utilizing number and location of the peaks is introduced to analyze the correlation outputs obtained from window-based DNA sequence comparison. As a novel implementation approach, we adopt multiple metamaterial-based optical correlators to optically implement the proposed parallel architecture, named as Window-based Optical Correlator (WOC). This wave-based computing architecture fully controls wave transmission and phase using dielectric and plasmonic materials. Design limitations and challenges of the proposed architecture are also discussed in details. The simulation results, comparing WOC with the well-known BLAST algorithm, demonstrate superior speed-up up to 60%, as well as, high accuracy even at the presence of large number of edits. Also, WOC method considerably reduces power consumption as a result of implementing metamaterial-based optical computing structure.


July 7, 2019

The regenerative flatworm Macrostomum lignano, a model organism with high experimental potential.

Understanding the process of regeneration has been one of the longstanding scientific aims, from a fundamental biological perspective, as well as within the applied context of regenerative medicine. Because regeneration competence varies greatly between organisms, it is essential to investigate different experimental animals. The free-living marine flatworm Macrostomum lignano is a rising model organism for this type of research, and its power stems from a unique set of biological properties combined with amenability to experimental manipulation. The biological properties of interest include production of single-cell fertilized eggs, a transparent body, small size, short generation time, ease of culture, the presence of a pluripotent stem cell population, and a large regeneration competence. These features sparked the development of molecular tools and resources for this animal, including high-quality genome and transcriptome assemblies, gene knockdown, in situ hybridization, and transgenesis. Importantly, M. lignano is currently the only flatworm species for which transgenesis methods are established. This review summarizes biological features of M. lignano and recent technological advances towards experimentation with this animal. In addition, we discuss the experimental potential of this model organism for different research questions related to regeneration and stem cell biology.


July 7, 2019

Genome size estimation of Chinese cultured artemisia annua L.

Almost all of antimalarial artemisinin is extracted from the traditional Chinese medicinal plant Artemisia annua L. However, under the condition of insufficient genomic in- formation and unresolved genetic backgrounds, regulatory mechanism of artemisinin biosynthetic pathway has not yet been clear. The genome size of genuine A. annua plants is an especially important and fundamental parameter, which helpful for further insight into genomic studies of ar- temisinin biosynthesis and improvement. In current study, all those genome sizes of A. annua samples collected with Barcoding identification were evaluated to be 1.38-1.49 Gb by Flow Cytometry (FCM) with Nipponbare as the bench- mark calibration standard and soybean and maize as two internal standards individually and simultaneously. The ge- nome estimation of seven A. annua strains came from five China provinces (Shandong, Hunan, Chongqing, Sichuan, and Hainan) with a low coefficient of variation (CV, = 2.96%) wasrelative accurate, 12.87% (220 Mb) less than previous reports about a foreign A. annuaspecies with a single con- trol. It facilitated the schedule of A. annua whole genome sequencing project, optimization of assembly methods and insight into its subsequent genetics and evolution.


July 7, 2019

Spalter: A meta machine learning approach to distinguish true DNA variants from sequencing artefacts

Being able to distinguish between true DNA variants and technical sequencing artefacts is a fundamental task in whole genome, exome or targeted gene analysis. Variant calling tools provide diagnostic parameters, such as strand bias or an aggregated overall quality for each called variant, to help users make an informed choice about which variants to accept or discard. Having several such quality indicators poses a problem for the users of variant callers because they need to set or adjust thresholds for each such indicator. Alternatively, machine learning methods can be used to train a classifier based on these indicators. This approach needs large sets of labeled training data, which is not easily available. The new approach presented here relies on the idea that a true DNA variant exists independently of technical features of the read in which it appears (e.g. base quality, strand, position in the read). Therefore the nucleotide separability classification problem – predicting the nucleotide state of each read in a given pileup based on technical features only – should be near impossible to solve for true variants. Nucleotide separability, i.e. achievable classification accuracy, can either be used to distinguish between true variants and technical artefacts directly, using a thresholding approach, or it can be used as a meta-feature to train a separability-based classifier. This article explores both possibilities with promising results, showing accuracies around 90%.


July 7, 2019

STRetch: detecting and discovering pathogenic short tandem repeat expansions.

Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Most existing tools for detecting STR variation with short reads do so within the read length and so are unable to detect the majority of pathogenic expansions. Here we present STRetch, a new genome-wide method to scan for STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting STR expansions using short-read whole-genome sequencing data at known pathogenic loci as well as novel STR loci. STRetch is open source software, available from github.com/Oshlack/STRetch .


July 7, 2019

Genome-wide analysis of the invertase gene family from maize.

The recent release of the maize genome (AGPv4) contains annotation errors of invertase genes and therefore the enzymes are bestly curated manually at the protein level in a comprehensible fashion The synthesis, transport and degradation of sucrose are determining factors for biomass allocation and yield of crop plants. Invertase (INV) is a key enzyme of carbon metabolism in both source and sink tissues. Current releases of the maize genome correctly annotates only two vacuolar invertases (ivr1 and ivr2) and four cell wall invertases (incw1, incw2 (mn1), incw3, and incw4). Our comprehensive survey identified 21 INV isogenes for which we propose a standard nomenclature grouped phylogenetically by amino acid similarity: three vacuolar (INVVR), eight cell wall (INVCW), and ten alkaline/neutral (INVAN) isogenes which form separate dendogram branches due to distinct molecular features. The acidic enzymes were curated for the presence of the DPN tripeptide which is coded by one of the smallest exons reported in plants. Particular attention was placed on the molecular role of INV in vascular tissues such as the nodes, internodes, leaf sheath, husk leaves and roots. We report the expression profile of most members of the maize INV family in nine tissues in two developmental stages, R1 and R3. INVCW7, INVVR2, INVAN8, INVAN9, INVAN10, and INVAN3 displayed the highest absolute expressions in most tissues. INVVR3, INVCW5, INVCW8, and INVAN1 showed low mRNA levels. Expressions of most INVs were repressed from stage R1 to R3, except for INVCW7 which increased significantly in all tissues after flowering. The mRNA levels of INVCW7 in the vegetative stem correlated with a higher transport rate of assimilates from leaves to the cob which led to starch accumulation and growth of the female reproductive organs.


July 7, 2019

Genomics, GPCRs and new targets for the control of insect pests and vectors.

The pressing need for new pest control products with novel modes of action has spawned interest in small molecules and peptides targeting arthropod GPCRs. Genome sequence data and tools for reverse genetics have enabled the prediction and characterization of GPCRs from many invertebrates. We review recent work to identify, characterize and de-orphanize arthropod GPCRs, with a focus on studies that reveal exciting new functional roles for these receptors, including the regulation of metabolic resistance. We explore the potential for insecticides targeting Class A biogenic amine-binding and peptide-binding receptors, and consider the innovation required to generate pest-selective leads for development, within the context of new PCR-targeting products to control arthropod vectors of disease.Copyright © 2018. Published by Elsevier Inc.


July 7, 2019

Omics in weed science: A perspective from genomics, transcriptomics, and metabolomics approaches

Modern high-throughput molecular and analytical tools offer exciting opportunities to gain a mechanistic understanding of unique traits of weeds. During the past decade, tremendous progress has been made within the weed science discipline using genomic techniques to gain deeper insights into weedy traits such as invasiveness, hybridization, and herbicide resistance. Though the adoption of newer “omics” techniques such as proteomics, metabolomics, and physionomics has been slow, applications of these omics platforms to study plants, especially agriculturally important crops and weeds, have been increasing over the years. In weed science, these platforms are now used more frequently to understand mechanisms of herbicide resistance, weed resistance evolution, and crop–weed interactions. Use of these techniques could help weed scientists to further reduce the knowledge gaps in understanding weedy traits. Although these techniques can provide robust insights about the molecular functioning of plants, employing a single omics platform can rarely elucidate the gene-level regulation and the associated real-time expression of weedy traits due to the complex and overlapping nature of biological interactions. Therefore, it is desirable to integrate the different omics technologies to give a better understanding of molecular functioning of biological systems. This multidimensional integrated approach can therefore offer new avenues for better understanding of questions of interest to weed scientists. This review offers a retrospective and prospective examination of omics platforms employed to investigate weed physiology and novel approaches and new technologies that can provide holistic and knowledge-based weed management strategies for future.


July 7, 2019

Identification of woodland strawberry gene coexpression networks

What we think of as a strawberry is botanically not a berry or even a fruit, but rather multiple fruits (achenes that contain the seeds) on the outside of a swollen receptacle. This technicality aside, strawberries are both economically important and a useful system in which to study seed-fruit communication. While cultivated strawberries have a complex octoploid genome, one of their likely progenitors, the woodland strawberry (Fragaria vesca; Fig. 1), is a rapidly growing model system for the Rosaceae family due to its short generation time and capacity to be transformed. A draft of the woodland strawberry diploid genome sequence was released in 2011 (Shulaev et al., 2011), and the recent publication of a high-quality genome based on PacBio sequencing has added almost 1,500 genes to the annotation (Edger et al., 2018). Genetic and epigenetic resources have also been developed for this species (Xu et al., 2016; Hilmarsson et al., 2017).


July 7, 2019

Measuring the mappability spectrum of reference genome assemblies

The ability to infer actionable information from genomic variation data in a resequencing experiment relies on accurately aligning the sequences to a reference genome. However, this accuracy is inherently limited by the quality of the reference assembly and the repetitive content of the subject’s genome. As long read sequencing technologies become more widespread, it is crucial to investigate the expected improvements in alignment accuracy and variant analysis over existing short read methods. The ability to quantify the read length and error rate necessary to uniquely map regions of interest in a sequence allows users to make informed decisions regarding experiment design and provides useful metrics for comparing the magnitude of repetition across different reference assemblies. To this end we have developed NEAT-Repeat, a toolkit for exhaustively identifying the minimum read length required to uniquely map each position of a reference sequence given a specified error rate. Using these tools we computed the -mappability spectrum” for ten reference sequences, including human and a range of plants and animals, quantifying the theoretical improvements in alignment accuracy that would result from sequencing with longer reads or reads with less base-calling errors. Our inclusion of read length and error rate builds upon existing methods for mappability tracks based on uniqueness or aligner-specific mapping scores, and thus enables more comprehensive analysis. We apply our mappability results to whole-genome variant call data, and demonstrate that variants called with low mapping and genotype quality scores are disproportionately found in reference regions that require long reads to be uniquely covered. We propose that our mappability metrics provide a valuable supplement to established variant filtering and annotation pipelines by supplying users with an additional metric related to read mapping quality. NEAT-Repeat can process large and repetitive genomes, such as those of corn and soybean, in a tractable amount of time by leveraging efficient methods for edit distance computation as well as running multiple jobs in parallel. NEAT-Repeat is written in Python 2.7 and C++, and is available at https://github.com/zstephens/neat-repeat.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.