Menu
July 7, 2019

The case for not masking away repetitive DNA

In the course of analyzing whole-genome data, it is common practice to mask or filter out repetitive regions of a genome, such as transposable elements and endogenous retroviruses, in order to focus only on genes and thus simplify the results. This Commentary is a plea from one member of the Mobile DNA community to all gene-centric researchers: please do not ignore the repetitive fraction of the genome. Please stop narrowing your findings by only analyzing a minority of the genome, and instead broaden your analyses to include the rich biology of repetitive and mobile DNA. In this article, I present four arguments supporting a case for retaining repetitive DNA in your genome-wide analysis.


July 7, 2019

An improved approach for reconstructing consensus repeats from short sequence reads

Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.


July 7, 2019

Loss of RXFP2 and INSL3 genes in Afrotheria shows that testicular descent is the ancestral condition in placental mammals.

Descent of testes from a position near the kidneys into the lower abdomen or into the scrotum is an important developmental process that occurs in all placental mammals, with the exception of five afrotherian lineages. Since soft-tissue structures like testes are not preserved in the fossil record and since key parts of the placental mammal phylogeny remain controversial, it has been debated whether testicular descent is the ancestral or derived condition in placental mammals. To resolve this debate, we used genomic data of 71 mammalian species and analyzed the evolution of two key genes (relaxin/insulin-like family peptide receptor 2 [RXFP2] and insulin-like 3 [INSL3]) that induce the development of the gubernaculum, the ligament that is crucial for testicular descent. We show that both RXFP2 and INSL3 are lost or nonfunctional exclusively in four afrotherians (tenrec, cape elephant shrew, cape golden mole, and manatee) that completely lack testicular descent. The presence of remnants of once functional orthologs of both genes in these afrotherian species shows that these gene losses happened after the split from the placental mammal ancestor. These “molecular vestiges” provide strong evidence that testicular descent is the ancestral condition, irrespective of persisting phylogenetic discrepancies. Furthermore, the absence of shared gene-inactivating mutations and our estimates that the loss of RXFP2 happened at different time points strongly suggest that testicular descent was lost independently in Afrotheria. Our results provide a molecular mechanism that explains the loss of testicular descent in afrotherians and, more generally, highlight how molecular vestiges can provide insights into the evolution of soft-tissue characters.


July 7, 2019

Genomes and transcriptomes of duckweeds.

Duckweeds (Lemnaceae family) are the smallest flowering plants that adapt to the aquatic environment. They are regarded as the promising sustainable feedstock with the characteristics of high starch storage, fast propagation, and global distribution. The duckweed genome size varies 13-fold ranging from 150 Mb in Spirodela polyrhiza to 1,881 Mb in Wolffia arrhiza. With the development of sequencing technology and bioinformatics, five duckweed genomes from Spirodela and Lemna genera are sequenced and assembled. The genome annotations discover that they share similar protein orthologs, whereas the repeat contents could mainly explain the genome size difference. The gene families responsible for cell growth and expansion, lignin biosynthesis, and flowering are greatly contracted. However, the gene family of glutamate synthase has experienced expansion, indicating their significance in ammonia assimilation and nitrogen transport. The transcriptome is comprehensively sequenced for the genera of Spirodela, Landoltia, and Lemna, including various treatments such as abscisic acid, radiation, heavy metal, and starvation. The analysis of the underlying molecular mechanism and the regulatory network would accelerate their applications in the fields of bioenergy and phytoremediation. The comparative genomics has shown that duckweed genomes contain relatively low gene numbers and more contracted gene families, which may be in parallel with their highly reduced morphology with a simple leaf and primary roots. Still, we are waiting for the advancement of the long read sequencing technology to resolve the complex genomes and transcriptomes for unsequenced Wolffiella and Wolffia due to the large genome sizes and the similarity in their polyploidy.


July 7, 2019

The regenerative flatworm Macrostomum lignano, a model organism with high experimental potential.

Understanding the process of regeneration has been one of the longstanding scientific aims, from a fundamental biological perspective, as well as within the applied context of regenerative medicine. Because regeneration competence varies greatly between organisms, it is essential to investigate different experimental animals. The free-living marine flatworm Macrostomum lignano is a rising model organism for this type of research, and its power stems from a unique set of biological properties combined with amenability to experimental manipulation. The biological properties of interest include production of single-cell fertilized eggs, a transparent body, small size, short generation time, ease of culture, the presence of a pluripotent stem cell population, and a large regeneration competence. These features sparked the development of molecular tools and resources for this animal, including high-quality genome and transcriptome assemblies, gene knockdown, in situ hybridization, and transgenesis. Importantly, M. lignano is currently the only flatworm species for which transgenesis methods are established. This review summarizes biological features of M. lignano and recent technological advances towards experimentation with this animal. In addition, we discuss the experimental potential of this model organism for different research questions related to regeneration and stem cell biology.


July 7, 2019

Genomics, GPCRs and new targets for the control of insect pests and vectors.

The pressing need for new pest control products with novel modes of action has spawned interest in small molecules and peptides targeting arthropod GPCRs. Genome sequence data and tools for reverse genetics have enabled the prediction and characterization of GPCRs from many invertebrates. We review recent work to identify, characterize and de-orphanize arthropod GPCRs, with a focus on studies that reveal exciting new functional roles for these receptors, including the regulation of metabolic resistance. We explore the potential for insecticides targeting Class A biogenic amine-binding and peptide-binding receptors, and consider the innovation required to generate pest-selective leads for development, within the context of new PCR-targeting products to control arthropod vectors of disease.Copyright © 2018. Published by Elsevier Inc.


July 7, 2019

Measuring the mappability spectrum of reference genome assemblies

The ability to infer actionable information from genomic variation data in a resequencing experiment relies on accurately aligning the sequences to a reference genome. However, this accuracy is inherently limited by the quality of the reference assembly and the repetitive content of the subject’s genome. As long read sequencing technologies become more widespread, it is crucial to investigate the expected improvements in alignment accuracy and variant analysis over existing short read methods. The ability to quantify the read length and error rate necessary to uniquely map regions of interest in a sequence allows users to make informed decisions regarding experiment design and provides useful metrics for comparing the magnitude of repetition across different reference assemblies. To this end we have developed NEAT-Repeat, a toolkit for exhaustively identifying the minimum read length required to uniquely map each position of a reference sequence given a specified error rate. Using these tools we computed the -mappability spectrum” for ten reference sequences, including human and a range of plants and animals, quantifying the theoretical improvements in alignment accuracy that would result from sequencing with longer reads or reads with less base-calling errors. Our inclusion of read length and error rate builds upon existing methods for mappability tracks based on uniqueness or aligner-specific mapping scores, and thus enables more comprehensive analysis. We apply our mappability results to whole-genome variant call data, and demonstrate that variants called with low mapping and genotype quality scores are disproportionately found in reference regions that require long reads to be uniquely covered. We propose that our mappability metrics provide a valuable supplement to established variant filtering and annotation pipelines by supplying users with an additional metric related to read mapping quality. NEAT-Repeat can process large and repetitive genomes, such as those of corn and soybean, in a tractable amount of time by leveraging efficient methods for edit distance computation as well as running multiple jobs in parallel. NEAT-Repeat is written in Python 2.7 and C++, and is available at https://github.com/zstephens/neat-repeat.


July 7, 2019

An investigation of Y chromosome incorporations in 400 species of Drosophila and related genera.

Y chromosomes are widely believed to evolve from a normal autosome through a process of massive gene loss (with preservation of some male genes), shaped by sex-antagonistic selection and complemented by occasional gains of male-related genes. The net result of these processes is a male-specialized chromosome. This might be expected to be an irreversible process, but it was found in 2005 that the Drosophila pseudoobscura Y chromosome was incorporated into an autosome. Y chromosome incorporations have important consequences: a formerly male-restricted chromosome reverts to autosomal inheritance, and the species may shift from an XY/XX to X0/XX sex-chromosome system. In order to assess the frequency and causes of this phenomenon we searched for Y chromosome incorporations in 400 species from Drosophila and related genera. We found one additional large scale event of Y chromosome incorporation, affecting the whole montium subgroup (40 species in our sample); overall 13% of the sampled species (52/400) have Y incorporations. While previous data indicated that after the Y incorporation the ancestral Y disappeared as a free chromosome, the much larger data set analyzed here indicates that a copy of the Y survived as a free chromosome both in montium and pseudoobscura species, and that the current Y of the pseudoobscura lineage results from a fusion between this free Y and the neoY. The 400 species sample also showed that the previously suggested causal connection between X-autosome fusions and Y incorporations is, at best, weak: the new case of Y incorporation (montium) does not have X-autosome fusion, whereas nine independent cases of X-autosome fusions were not followed by Y incorporations. Y incorporation is an underappreciated mechanism affecting Y chromosome evolution; our results show that at least in Drosophila it plays a relevant role and highlight the need of similar studies in other groups.


July 7, 2019

Overview of the germline and expressed repertoires of the TRB genes in Sus scrofa.

The a/ß T cell receptor (TR) is a complex heterodimer that recognizes antigenic peptides and binds to major histocompatibility complex (MH) molecules. Both a and ß chains are encoded by different genes localized on two distinct chromosomal loci: TRA and TRB. The present study employed the recent release of the swine genome assembly to define the genomic organization of the TRB locus. According to the sequencing data, the pig TRB locus spans approximately 400 kb of genomic DNA and consists of 38 TRBV genes belonging to 24 subgroups located upstream of three in tandem TRBD-J-C clusters, which are followed by a TRBV gene in an inverted transcriptional orientation. Comparative analysis confirms that the general organization of the TRB locus is similar among mammalian species, but the number of germline TRBV genes varies greatly even between species belonging to the same order, determining the diversity and specificity of the immune response. However, sequence analysis of the TRB locus also suggests the presence of blocks of conserved homology in the genomic region across mammals. Furthermore, by analysing a public cDNA collection, we identified the usage pattern of the TRBV, TRBD, and TRBJ genes in the adult pig TRB repertoire, and we noted that the expressed TRBV repertoire seems to be broader and more diverse than the germline repertoire, in line with the presence of a high level of TRBV gene polymorphisms. Because the nucleotide differences seems to be principally concentrated in the CDR2 region, it is reasonable to presume that most T cell ß-chain diversity can be related to polymorphisms in pig MH molecules. Domestic pigs represent a valuable animal model as they are even more anatomically, genetically and physiologically similar to humans than are mice. Therefore, present knowledge on the genomic organization of the pig TRB locus allows the collection of increased information on the basic aspects of the porcine immune system and contributes to filling the gaps left by rodent models.


July 7, 2019

Signatures of selection and environmental adaptation across the goat genome post-domestication.

Since goat was domesticated 10,000 years ago, many factors have contributed to the differentiation of goat breeds and these are classified mainly into two types: (i) adaptation to different breeding systems and/or purposes and (ii) adaptation to different environments. As a result, approximately 600 goat breeds have developed worldwide; they differ considerably from one another in terms of phenotypic characteristics and are adapted to a wide range of climatic conditions. In this work, we analyzed the AdaptMap goat dataset, which is composed of data from more than 3000 animals collected worldwide and genotyped with the CaprineSNP50 BeadChip. These animals were partitioned into groups based on geographical area, production uses, available records on solid coat color and environmental variables including the sampling geographical coordinates, to investigate the role of natural and/or artificial selection in shaping the genome of goat breeds.Several signatures of selection on different chromosomal regions were detected across the different breeds, sub-geographical clusters, phenotypic and climatic groups. These regions contain genes that are involved in important biological processes, such as milk-, meat- or fiber-related production, coat color, glucose pathway, oxidative stress response, size, and circadian clock differences. Our results confirm previous findings in other species on adaptation to extreme environments and human purposes and provide new genes that could explain some of the differences between goat breeds according to their geographical distribution and adaptation to different environments.These analyses of signatures of selection provide a comprehensive first picture of the global domestication process and adaptation of goat breeds and highlight possible genes that may have contributed to the differentiation of this species worldwide.


July 7, 2019

Synaptogyrin-2 influences replication of Porcine circovirus 2.

Porcine circovirus 2 (PCV2) is a circular single-stranded DNA virus responsible for a group of diseases collectively known as PCV2 Associated Diseases (PCVAD). Variation in the incidence and severity of PCVAD exists between pigs suggesting a host genetic component involved in pathogenesis. A large-scale genome-wide association study of experimentally infected pigs (n = 974), provided evidence of a host genetic role in PCV2 viremia, immune response and growth during challenge. Host genotype explained 64% of the phenotypic variation for overall viral load, with two major Quantitative Trait Loci (QTL) identified on chromosome 7 (SSC7) near the swine leukocyte antigen complex class II locus and on the proximal end of chromosome 12 (SSC12). The SNP having the strongest association, ALGA0110477 (SSC12), explained 9.3% of the genetic and 6.2% of the phenotypic variance for viral load. Dissection of the SSC12 QTL based on gene annotation, genomic and RNA-sequencing, suggested that a missense mutation in the SYNGR2 (SYNGR2 p.Arg63Cys) gene is potentially responsible for the variation in viremia. This polymorphism, located within a protein domain conserved across mammals, results in an amino acid variant SYNGR2 p.63Cys only observed in swine. PCV2 titer in PK15 cells decreased when the expression of SYNGR2 was silenced by specific-siRNA, indicating a role of SYNGR2 in viral replication. Additionally, a PK15 edited clone generated by CRISPR-Cas9, carrying a partial deletion of the second exon that harbors a key domain and the SYNGR2 p.Arg63Cys, was associated with a lower viral titer compared to wildtype PK15 cells (>24 hpi) and supernatant (>48hpi)(P < 0.05). Identification of a non-conservative substitution in this key domain of SYNGR2 suggests that the SYNGR2 p.Arg63Cys variant may underlie the observed genetic effect on viral load.


July 7, 2019

iMGEins: detecting novel mobile genetic elements inserted in individual genomes.

Recent advances in sequencing technology have allowed us to investigate personal genomes to find structural variations, which have been studied extensively to identify their association with the physiology of diseases such as cancer. In particular, mobile genetic elements (MGEs) are one of the major constituents of the human genomes, and cause genome instability by insertion, mutation, and rearrangement.We have developed a new program, iMGEins, to identify such novel MGEs by using sequencing reads of individual genomes, and to explore the breakpoints with the supporting reads and MGEs detected. iMGEins is the first MGE detection program that integrates three algorithmic components: discordant read-pair mapping, split-read mapping, and insertion sequence assembly. Our evaluation results showed its outstanding performance in detecting novel MGEs from simulated genomes, as well as real personal genomes. In detail, the average recall and precision rates of iMGEins are 96.67 and 100%, respectively, which are the highest among the programs compared. In the testing with real human genomes of the NA12878 sample, iMGEins shows the highest accuracy in detecting MGEs within 20?bp proximity of the breakpoints annotated.In order to study the dynamics of MGEs in individual genomes, iMGEins was developed to accurately detect breakpoints and report inserted MGEs. Compared with other programs, iMGEins has valuable features of identifying novel MGEs and assembling the MGEs inserted.


July 7, 2019

Bridging gaps in transposable element research with single-molecule and single-cell technologies

More than half of the genomic landscape in humans and many other organisms is composed of repetitive DNA, which mostly derives from transposable elements (TEs) and viruses. Recent technological advances permit improved assessment of the repetitive content across genomes and newly developed molecular assays have revealed important roles of TEs and viruses in host genome evolution and organization. To update on our current understanding of TE biology and to promote new interdisciplinary strategies for the TE research community, leading experts gathered for the 2nd Uppsala Transposon Symposium on October 4–5, 2018 in Uppsala, Sweden. Using cutting-edge single-molecule and single-cell approaches, research on TEs and other repeats has entered a new era in biological and biomedical research.


July 7, 2019

De novo genome assembly of the olive fruit fly (Bactrocera oleae) developed through a combination of linked-reads and long-read technologies

Long-read sequencing has greatly contributed to the generation of high quality assemblies, albeit at a high cost. It is also not always clear how to combine sequencing platforms. We sequenced the genome of the olive fruit fly (Bactrocera oleae), the most important pest in the olive fruits agribusiness industry, using Illumina short-reads, mate-pairs, 10x Genomics linked-reads, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT). The 10x linked-reads assembly gave the most contiguous assembly with an N50 of 2.16 Mb. Scaffolding the linked-reads assembly using long-reads from ONT gave a more contiguous assembly with scaffold N50 of 4.59 Mb. We also present the most extensive transcriptome datasets of the olive fly derived from different tissues and stages of development. Finally, we used the Chromosome Quotient method to identify Y-chromosome scaffolds and show that the long-reads based assembly generates very highly contiguous Y-chromosome assembly.


January 23, 2017

Tutorial: HGAP4 de novo assembly application

This tutorial provides an overview of the Hierarchical Genome Assembly Process (HGAP4) de novo assembly analysis application. HGAP4 generates accurate de novo assemblies using only PacBio data. HGAP4 is suitable…


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.