October 23, 2019  |  

A high quality assembly of the Nile Tilapia (Oreochromis niloticus) genome reveals the structure of two sex determination regions.

Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species.A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly (contig NG50 of 3.3Mbp) was selected using principal component analysis of likelihood scores calculated from several paired-end sequencing libraries. Comparison of the new assembly to the previous O. niloticus genome assembly reveals that recently duplicated portions of the genome are now well represented. The overall number of genes in the new assembly increased by 27.3%, including a 67% increase in pseudogenes. The new tilapia genome assembly correctly represents two recent vasa gene duplication events that have been verified with BAC sequencing. At total of 146Mbp of additional transposable element sequence are now assembled, a large proportion of which are recent insertions. Large centromeric satellite repeats are assembled and annotated in cichlid fish for the first time. Finally, the new assembly identifies the long-range structure of both a ~9Mbp XY sex determination region on LG1 in O. niloticus, and a ~50Mbp WZ sex determination region on LG3 in the related species O. aureus.This study highlights the use of long read sequencing to correctly assemble recent duplications and to characterize repeat-filled regions of the genome. The study serves as an example of the need for high quality genome assemblies and provides a framework for identifying sex determining genes in tilapia and related fish species.


October 23, 2019  |  

Endogenous sequence patterns predispose the repair modes of CRISPR/Cas9-induced DNA double-stranded breaks in Arabidopsis thaliana.

The possibility to predict the outcome of targeted DNA double-stranded break (DSB) repair would be desirable for genome editing. Furthermore the consequences of mis-repair of potentially cell-lethal DSBs and the underlying pathways are not yet fully understood. Here we study the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-induced mutation spectra at three selected endogenous loci in Arabidopsis thaliana by deep sequencing of long amplicon libraries. Notably, we found sequence-dependent genomic features that affected the DNA repair outcome. Deletions of 1-bp to <1000-bp size and/or very short insertions, deletions >1 kbp (all due to NHEJ) and deletions combined with insertions between 5-bp to >100 bp [caused by a synthesis-dependent strand annealing (SDSA)-like mechanism] occurred most frequently at all three loci. The appearance of single-stranded annealing events depends on the presence and distance between repeats flanking the DSB. The frequency and size of insertions is increased if a sequence with high similarity to the target site was available in cis. Most deletions were linked to pre-existing microhomology. Deletion and/or insertion mutations were blunt-end ligated or via de novo generated microhomology. While most mutation types and, to some degree, their predictability are comparable with animal systems, the broad range of deletion mutations seems to be a peculiar feature of the plant A. thaliana.© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.


September 22, 2019  |  

PHASIS: A computational suite for de novo discovery and characterization of phased, siRNA-generating loci and their miRNA triggers

Phased, secondary siRNAs (phasiRNAs) are found widely in plants, from protein-coding transcripts and long, non-coding RNAs; animal piRNAs are also phased. Integrated methods characterizing textquotedblleftPHAStextquotedblright loci are unavailable, and existing methods are quite limited and inefficient in handling large volumes of sequencing data. The PHASIS suite described here provides complete tools for the computational characterization of PHAS loci, with an emphasis on plants, in which these loci are numerous. Benchmarked comparisons demonstrate that PHASIS is sensitive, highly scalable and fast. Importantly, PHASIS eliminates the requirement of a sequenced genome and PARE/degradome data for discovery of phasiRNAs and their miRNA triggers.


September 22, 2019  |  

Assessing the gene content of the megagenome: sugar pine (Pinus lambertiana).

Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq has been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here contribute to the otherwise scarce comparisons of 2nd and 3rd generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data was also used to address some of the questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers. Copyright © 2016 Author et al.


September 22, 2019  |  

Genome re-annotation of the wild strawberry Fragaria vesca using extensive Illumina-and SMRT-based RNA-seq datasets

The genome of the wild diploid strawberry species Fragaria vesca, an ideal model system of cultivated strawberry (Fragaria × ananassa, octoploid) and other Rosaceae family crops, was first published in 2011 and followed by a new assembly (Fvb). However, the annotation for Fvb mainly relied on ab initio predictions and included only predicted coding sequences, therefore an improved annotation is highly desirable. Here, a new annotation version named v2.0.a2 was created for the Fvb genome by a pipeline utilizing one PacBio library, 90 Illumina RNA-seq libraries, and 9 small RNA-seq libraries. Altogether, 18,641 genes (55.6% out of 33,538 genes) were augmented with information on the 5′ and/or 3′ UTRs, 13,168 (39.3%) protein-coding genes were modified or newly identified, and 7,370 genes were found to possess alternative isoforms. In addition, 1,938 long non-coding RNAs, 171 miRNAs, and 51,714 small RNA clusters were integrated into the annotation. This new annotation of F. vesca is substantially improved in both accuracy and integrity of gene predictions, beneficial to the gene functional studies in strawberry and to the comparative genomic analysis of other horticultural crops in Rosaceae family.


September 22, 2019  |  

Plant 24-nt reproductive phasiRNAs from intramolecular duplex mRNAs in diverse monocots.

In grasses, two pathways that generate diverse and numerous 21-nt (premeiotic) and 24-nt (meiotic) phased siRNAs are highly enriched in anthers, the male reproductive organs. These “phasiRNAs” are analogous to mammalian piRNAs, yet their functions and evolutionary origins remain largely unknown. The 24-nt meiotic phasiRNAs have only been described in grasses, wherein their biogenesis is dependent on a specialized Dicer (DCL5). To assess how evolution gave rise to this pathway, we examined reproductive phasiRNA pathways in nongrass monocots: garden asparagus, daylily, and lily. The common ancestors of these species diverged approximately 115-117 million years ago (MYA). We found that premeiotic 21-nt and meiotic 24-nt phasiRNAs were abundant in all three species and displayed spatial localization and temporal dynamics similar to grasses. The miR2275-triggered pathway was also present, yielding 24-nt reproductive phasiRNAs, and thus originated more than 117 MYA. In asparagus, unlike in grasses, these siRNAs are largely derived from inverted repeats (IRs); analyses in lily identified thousands of precursor loci, and many were also predicted to form foldback substrates for Dicer processing. Additionally, reproductive phasiRNAs were present in female reproductive organs and thus may function in both male and female germinal development. These data describe several distinct mechanisms of production for 24-nt meiotic phasiRNAs and provide new insights into the evolution of reproductive phasiRNA pathways in monocots.© 2018 Kakrana et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

The expressed portion of the barley genome

In this chapter, we refer to the expressed portion of the barley genome as the relatively small fraction of the total cellular DNA that either contains the genes that ultimately produce proteins, or that directly/indirectly controls the level, location and/or timing of when these genes are expressed and proteins are produced. We start by describing the dynamics of tissue and time-dependent gene expression and how common patterns across multiple samples can provide clues about gene networks involved in common biological processes. We then describe some of the complexities of how a single mRNA template can be differentially processed by alternative splicing to generate multiple different proteins or provide a mechanism to regulate the amount of functional gene product in a cell at a given point in time. We extend our analysis, using a number of biological examples, to address how diverse families of small non-coding microRNAs specifically regulate gene expression, and complete our appraisal by looking at the physical/molecular environment around genes that can result in either the promotion or repression of gene expression. We conclude by assessing some of the issues that remain around our ability to fully exploit the depth and power of current approaches for analysing gene expression and propose improvements that could be made using new but available sequencing and bioinformatics technologies.


September 22, 2019  |  

PacBio sequencing and its applications.

Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.


September 22, 2019  |  

Extreme haplotype variation in the desiccation-tolerant clubmoss Selaginella lepidophylla.

Plant genome size varies by four orders of magnitude, and most of this variation stems from dynamic changes in repetitive DNA content. Here we report the small 109?Mb genome of Selaginella lepidophylla, a clubmoss with extreme desiccation tolerance. Single-molecule sequencing enables accurate haplotype assembly of a single heterozygous S. lepidophylla plant, revealing extensive structural variation. We observe numerous haplotype-specific deletions consisting of largely repetitive and heavily methylated sequences, with enrichment in young Gypsy LTR retrotransposons. Such elements are active but rapidly deleted, suggesting “bloat and purge” to maintain a small genome size. Unlike all other land plant lineages, Selaginella has no evidence of a whole-genome duplication event in its evolutionary history, but instead shows unique tandem gene duplication patterns reflecting adaptation to extreme drying. Gene expression changes during desiccation in S. lepidophylla mirror patterns observed across angiosperm resurrection plants.


September 22, 2019  |  

The genome of the Hi5 germ cell line from Trichoplusia ni, an agricultural pest and novel model for small RNA biology.

We report a draft assembly of the genome of Hi5 cells from the lepidopteran insect pest,Trichoplusia ni, assigning 90.6% of bases to one of 28 chromosomes and predicting 14,037 protein-coding genes. Chemoreception and detoxification gene families revealT. ni-specific gene expansions that may explain its widespread distribution and rapid adaptation to insecticides. Transcriptome and small RNA data from thorax, ovary, testis, and the germline-derived Hi5 cell line show distinct expression profiles for 295 microRNA- and >393 piRNA-producing loci, as well as 39 genes encoding small RNA pathway proteins. Nearly all of the W chromosome is devoted to piRNA production, andT. nisiRNAs are not 2´-O-methylated. To enable use of Hi5 cells as a model system, we have established genome editing and single-cell cloning protocols. TheT. nigenome provides insights into pest control and allows Hi5 cells to become a new tool for studying small RNAs ex vivo.© 2018, Fu et al.


September 22, 2019  |  

Redkmer: An Assembly-Free Pipeline for the Identification of Abundant and Specific X-Chromosome Target Sequences for X-Shredding by CRISPR Endonucleases.

CRISPR-based synthetic sex ratio distorters, which operate by shredding the X-chromosome during male meiosis, are promising tools for the area-wide control of harmful insect pest or disease vector species. X-shredders have been proposed as tools to suppress insect populations by biasing the sex ratio of the wild population toward males, thus reducing its natural reproductive potential. However, to build synthetic X-shredders based on CRISPR, the selection of gRNA targets, in the form of high-copy sequence repeats on the X chromosome of a given species, is difficult, since such repeats are not accurately resolved in genome assemblies and cannot be assigned to chromosomes with confidence. We have therefore developed the redkmer computational pipeline, designed to identify short and highly abundant sequence elements occurring uniquely on the X chromosome. Redkmer was designed to use as input minimally processed whole genome sequence data from males and females. We tested redkmer with short- and long-read whole genome sequence data of Anopheles gambiae, the major vector of human malaria, in which the X-shredding paradigm was originally developed. Redkmer established long reads as chromosomal proxies with excellent correlation to the genome assembly and used them to rank X-candidate kmers for their level of X-specificity and abundance. Among these, a high-confidence set of 25-mers was identified, many belonging to previously known X-chromosome repeats of Anopheles gambiae, including the ribosomal gene array and the selfish elements harbored within it. Data from a control strain, in which these repeats are shared with the Y chromosome, confirmed the elimination of these kmers during filtering. Finally, we show that redkmer output can be linked directly to gRNA selection and off-target prediction. In addition, the output of redkmer, including the prediction of chromosomal origin of single-molecule long reads and chromosome specific kmers, could also be used for the characterization of other biologically relevant sex chromosome sequences, a task that is frequently hampered by the repetitiveness of sex chromosome sequence content.


September 22, 2019  |  

Cytogenomic analysis of several repetitive DNA elements in turbot (Scophthalmus maximus).

Repetitive DNA plays a fundamental role in the organization, size and evolution of eukaryotic genomes. The sequencing of the turbot revealed a small and compact genome, as in all flatfish studied to date. The assembly of repetitive regions is still incomplete because it is difficult to correctly identify their position, number and array. The combination of classical cytogenetic techniques along with high quality sequencing is essential to increase the knowledge of the structure and composition of these sequences and, thus, of the structure and function of the whole genome. In this work, the in silico analysis of H1 histone, 5S rDNA, telomeric and Rex repetitive sequences, was compared to their chromosomal mapping by fluorescent in situ hybridization (FISH), providing a more comprehensive picture of these elements in the turbot genome. FISH assays confirmed the location of H1 in LG8; 5S rDNA in LG4 and LG6; telomeric sequences at the end of all chromosomes whereas Rex elements were dispersed along most chromosomes. The discrepancies found between both approaches could be related to the sequencing methodology applied in this species and also to the resolution limitations of the FISH technique. Turbot cytogenomic analyses have proven to add new chromosomal landmarks in the karyotype of this species, representing a powerful tool to investigate targeted genomic sequences or regions in the genetic and physical maps of this species. Copyright © 2017 Elsevier B.V. All rights reserved.


September 22, 2019  |  

LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons.

Long terminal repeat retrotransposons (LTR-RTs) are prevalent in plant genomes. The identification of LTR-RTs is critical for achieving high-quality gene annotation. Based on the well-conserved structure, multiple programs were developed for the de novo identification of LTR-RTs; however, these programs are associated with low specificity and high false discovery rates. Here, we report LTR_retriever, a multithreading-empowered Perl program that identifies LTR-RTs and generates high-quality LTR libraries from genomic sequences. LTR_retriever demonstrated significant improvements by achieving high levels of sensitivity (91%), specificity (97%), accuracy (96%), and precision (90%) in rice (Oryza sativa). LTR_retriever is also compatible with long sequencing reads. With 40k self-corrected PacBio reads equivalent to 4.5× genome coverage in Arabidopsis (Arabidopsis thaliana), the constructed LTR library showed excellent sensitivity and specificity. In addition to canonical LTR-RTs with 5′-TG…CA-3′ termini, LTR_retriever also identifies noncanonical LTR-RTs (non-TGCA), which have been largely ignored in genome-wide studies. We identified seven types of noncanonical LTRs from 42 out of 50 plant genomes. The majority of noncanonical LTRs areCopiaelements, with which the LTR is four times shorter than that of otherCopiaelements, which may be a result of their target specificity. Strikingly, non-TGCACopiaelements are often located in genic regions and preferentially insert nearby or within genes, indicating their impact on the evolution of genes and their potential as mutagenesis tools.© 2018 American Society of Plant Biologists. All Rights Reserved.


September 22, 2019  |  

Sequence analysis of European maize inbred line F2 provides new insights into molecular and chromosomal characteristics of presence/absence variants.

Maize is well known for its exceptional structural diversity, including copy number variants (CNVs) and presence/absence variants (PAVs), and there is growing evidence for the role of structural variation in maize adaptation. While PAVs have been described in this important crop species, they have been only scarcely characterized at the sequence level and the extent of presence/absence variation and relative chromosomal landscape of inbred-specific regions remain to be elucidated.De novo genome sequencing of the French F2 maize inbred line revealed 10,044 novel genomic regions larger than 1 kb, making up 88 Mb of DNA, that are present in F2 but not in B73 (PAV). This set of maize PAV sequences allowed us to annotate PAV content and to analyze sequence breakpoints. Using PAV genotyping on a collection of 25 temperate lines, we also analyzed Linkage Disequilibrium in PAVs and flanking regions, and PAV frequencies within maize genetic groups.We highlight the possible role of MMEJ-type double strand break repair in maize PAV formation and discover 395 new genes with transcriptional support. Pattern of linkage disequilibrium within PAVs strikingly differs from this of flanking regions and is in accordance with the intuition that PAVs may recombine less than other genomic regions. We show that most PAVs are ancient, while some are found only in European Flint material, thus pinpointing structural features that may be at the origin of adaptive traits involved in the success of this material. Characterization of such PAVs will provide useful material for further association genetic studies in European and temperate maize.


September 22, 2019  |  

Comparative heterochromatin profiling reveals conserved and unique epigenome signatures linked to adaptation and development of malaria parasites.

Heterochromatin-dependent gene silencing is central to the adaptation and survival of Plasmodium falciparum malaria parasites, allowing clonally variant gene expression during blood infection in humans. By assessing genome-wide heterochromatin protein 1 (HP1) occupancy, we present a comprehensive analysis of heterochromatin landscapes across different Plasmodium species, strains, and life cycle stages. Common targets of epigenetic silencing include fast-evolving multi-gene families encoding surface antigens and a small set of conserved HP1-associated genes with regulatory potential. Many P. falciparum heterochromatic genes are marked in a strain-specific manner, increasing the parasite’s adaptive capacity. Whereas heterochromatin is strictly maintained during mitotic proliferation of asexual blood stage parasites, substantial heterochromatin reorganization occurs in differentiating gametocytes and appears crucial for the activation of key gametocyte-specific genes and adaptation of erythrocyte remodeling machinery. Collectively, these findings provide a catalog of heterochromatic genes and reveal conserved and specialized features of epigenetic control across the genus Plasmodium. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.