Menu
July 7, 2019

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.

The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time.Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50?=?4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50?=?14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~?10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n?=?13).ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.


July 7, 2019

The challenge of analyzing the sugarcane genome.

Reference genome sequences have become key platforms for genetics and breeding of the major crop species. Sugarcane is probably the largest crop produced in the world (in weight of crop harvested) but lacks a reference genome sequence. Sugarcane has one of the most complex genomes in crop plants due to the extreme level of polyploidy. The genome of modern sugarcane hybrids includes sub-genomes from two progenitors Saccharum officinarum and S. spontaneum with some chromosomes resulting from recombination between these sub-genomes. Advancing DNA sequencing technologies and strategies for genome assembly are making the sugarcane genome more tractable. Advances in long read sequencing have allowed the generation of a more complete set of sugarcane gene transcripts. This is supporting transcript profiling in genetic research. The progenitor genomes are being sequenced. A monoploid coverage of the hybrid genome has been obtained by sequencing BAC clones that cover the gene space of the closely related sorghum genome. The complete polyploid genome is now being sequenced and assembled. The emerging genome will allow comparison of related genomes and increase understanding of the functioning of this polyploidy system. Sugarcane breeding for traditional sugar and new energy and biomaterial uses will be enhanced by the availability of these genomic resources.


July 7, 2019

Tracing the de novo origin of protein-coding genes in yeast.

De novo genes are very important for evolutionary innovation. However, how these genes originate and spread remains largely unknown. To better understand this, we rigorously searched for de novo genes in Saccharomyces cerevisiae S288C and examined their spread and fixation in the population. Here, we identified 84 de novo genes in S. cerevisiae S288C since the divergence with their sister groups. Transcriptome and ribosome profiling data revealed at least 8 (10%) and 28 (33%) de novo genes being expressed and translated only under specific conditions, respectively. DNA microarray data, based on 2-fold change, showed that 87% of the de novo genes are regulated during various biological processes, such as nutrient utilization and sporulation. Our comparative and evolutionary analyses further revealed that some factors, including single nucleotide polymorphism (SNP)/indel mutation, high GC content, and DNA shuffling, contribute to the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we also provide evidence suggesting the possible parallel origin of a de novo gene between S. cerevisiae and Saccharomyces paradoxus Together, our study provides several new insights into the origin and spread of de novo genes.IMPORTANCE Emergence of de novo genes has occurred in many lineages during evolution, but the birth, spread, and function of these genes remain unresolved. Here we have searched for de novo genes from Saccharomyces cerevisiae S288C using rigorous methods, which reduced the effects of bad annotation and genomic gaps on the identification of de novo genes. Through this analysis, we have found 84 new genes originating de novo from previously noncoding regions, 87% of which are very likely involved in various biological processes. We noticed that 10% and 33% of de novo genes were only expressed and translated under specific conditions, therefore, verification of de novo genes through transcriptome and ribosome profiling, especially from limited expression data, may underestimate the number of bona fide new genes. We further show that SNP/indel mutation, high GC content, and DNA shuffling could be involved in the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we provide evidence suggesting the possible parallel origin of a new gene. Copyright © 2018 Wu and Knudson.


July 7, 2019

An improved approach for reconstructing consensus repeats from short sequence reads

Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.


July 7, 2019

Meeting report: mobile genetic elements and genome plasticity 2018

The Mobile Genetic Elements and Genome Plasticity conference was hosted by Keystone Symposia in Santa Fe, NM USA, February 11–15, 2018. The organizers were Marlene Belfort, Evan Eichler, Henry Levin and Lynn Maquat. The goal of this conference was to bring together scientists from around the world to discuss the function of transposable elements and their impact on host species. Central themes of the meeting included recent innovations in genome analysis and the role of mobile DNA in disease and evolution. The conference included 200 scientists who participated in poster presentations, short talks selected from abstracts, and invited talks. A total of 58 talks were organized into eight sessions and two workshops. The topics varied from mechanisms of mobilization, to the structure of genomes and their defense strategies to protect against transposable elements.


July 7, 2019

Fast-SG: an alignment-free algorithm for hybrid assembly.

Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes.Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.


July 7, 2019

Genomes and transcriptomes of duckweeds.

Duckweeds (Lemnaceae family) are the smallest flowering plants that adapt to the aquatic environment. They are regarded as the promising sustainable feedstock with the characteristics of high starch storage, fast propagation, and global distribution. The duckweed genome size varies 13-fold ranging from 150 Mb in Spirodela polyrhiza to 1,881 Mb in Wolffia arrhiza. With the development of sequencing technology and bioinformatics, five duckweed genomes from Spirodela and Lemna genera are sequenced and assembled. The genome annotations discover that they share similar protein orthologs, whereas the repeat contents could mainly explain the genome size difference. The gene families responsible for cell growth and expansion, lignin biosynthesis, and flowering are greatly contracted. However, the gene family of glutamate synthase has experienced expansion, indicating their significance in ammonia assimilation and nitrogen transport. The transcriptome is comprehensively sequenced for the genera of Spirodela, Landoltia, and Lemna, including various treatments such as abscisic acid, radiation, heavy metal, and starvation. The analysis of the underlying molecular mechanism and the regulatory network would accelerate their applications in the fields of bioenergy and phytoremediation. The comparative genomics has shown that duckweed genomes contain relatively low gene numbers and more contracted gene families, which may be in parallel with their highly reduced morphology with a simple leaf and primary roots. Still, we are waiting for the advancement of the long read sequencing technology to resolve the complex genomes and transcriptomes for unsequenced Wolffiella and Wolffia due to the large genome sizes and the similarity in their polyploidy.


July 7, 2019

The recombination landscape of Drosophila virilis is robust to transposon activation in hybrid dysgenesis

DNA damage in the germline is a double-edged sword. Induced double-strand breaks establish the foundation for meiotic recombination and proper chromosome segregation but can also pose a significant challenge for genome stability. Within the germline, transposable elements are powerful agents of double-strand break formation. How different types of DNA damage are resolved within the germline is poorly understood. For example, little is known about the relationship between the frequency of double-stranded breaks, both endogenous and exogenous, and the decision to repair DNA through one of the many pathways, including crossing over and gene conversion. Here we use the Drosophila virilis hybrid dysgenesis model to determine how recombination landscapes change under transposable element activation. In this system, a cross between two strains of D. virilis with divergent transposable element profiles results in the hybrid dysgenesis phenotype, which includes the germline activation of diverse transposable elements, reduced fertility, and male recombination. However, only one direction of the cross results in hybrid dysgenesis. This allows the study of recombination in genetically identical F1 females; those with baseline levels of programmed DNA damage and those with an increased level of DNA damage resulting from transposable element proliferation. Using multiplexed shotgun genotyping to map crossover events, we compared the recombination landscapes of hybrid dysgenic and non-hybrid dysgenic individuals. The frequency and distribution of meiotic recombination appears to be robust during hybrid dysgenesis. However, hybrid dysgenesis is also associated with occasional clusters of recombination derived from single dysgenic F1 mothers. The clusters of recombination are hypothesized to be the result of mitotic crossovers during early germline development. Overall, these results show that meiotic recombination in D. virilis is robust to the damage caused by transposable elements during early development.


July 7, 2019

sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing.

The genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species, that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer; last accessed September 6, 2018).


July 7, 2019

Regulation of neuronal differentiation, function, and plasticity by alternative splicing.

Posttranscriptional mechanisms provide powerful means to expand the coding power of genomes. In nervous systems, alternative splicing has emerged as a fundamental mechanism not only for the diversification of protein isoforms but also for the spatiotemporal control of transcripts. Thus, alternative splicing programs play instructive roles in the development of neuronal cell type-specific properties, neuronal growth, self-recognition, synapse specification, and neuronal network function. Here we discuss the most recent genome-wide efforts on mapping RNA codes and RNA-binding proteins for neuronal alternative splicing regulation. We illustrate how alternative splicing shapes key steps of neuronal development, neuronal maturation, and synaptic properties. Finally, we highlight efforts to dissect the spatiotemporal dynamics of alternative splicing and their potential contribution to neuronal plasticity and the mature nervous system. Expected final online publication date for the Annual Review of Cell and Developmental Biology Volume 34 is October 6, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


July 7, 2019

Genome size estimation of Chinese cultured artemisia annua L.

Almost all of antimalarial artemisinin is extracted from the traditional Chinese medicinal plant Artemisia annua L. However, under the condition of insufficient genomic in- formation and unresolved genetic backgrounds, regulatory mechanism of artemisinin biosynthetic pathway has not yet been clear. The genome size of genuine A. annua plants is an especially important and fundamental parameter, which helpful for further insight into genomic studies of ar- temisinin biosynthesis and improvement. In current study, all those genome sizes of A. annua samples collected with Barcoding identification were evaluated to be 1.38-1.49 Gb by Flow Cytometry (FCM) with Nipponbare as the bench- mark calibration standard and soybean and maize as two internal standards individually and simultaneously. The ge- nome estimation of seven A. annua strains came from five China provinces (Shandong, Hunan, Chongqing, Sichuan, and Hainan) with a low coefficient of variation (CV, = 2.96%) wasrelative accurate, 12.87% (220 Mb) less than previous reports about a foreign A. annuaspecies with a single con- trol. It facilitated the schedule of A. annua whole genome sequencing project, optimization of assembly methods and insight into its subsequent genetics and evolution.


July 7, 2019

Pathogenesis of Helicobacter pylori infection

In this review, we highlight progress in the last year in characterizing known virulence factors like flagella and the Cag type IV secretion system with sophisticated struc- tural and biochemical approaches to yield new insight on the assembly and functions of these critical virulence determinants. Several aspects of Helicobacter pylori physi- ology were newly explored this year and evaluated for their functions during stom- ach colonization, including a fascinating role for the essential protease HtrA in allowing access of H. pylori to the basolateral side of the gastric epithelium through cleavage of the tight junction protein E- cadherin to facilitate CagA delivery. Molecular biology tools standard in model bacteria, including regulated gene expression during animal infection and fluorescent reporter gene fusions, were newly applied to H. py- lori to explore functions for urease beyond initial colonization and establish high salt consumption as a mediator of gene expression changes. New sequencing technolo- gies enabled validation of long postulated roles for DNA methylation in regulating H. pylori gene expression. On the cell biology side, elegant work using lineage tracing in the murine model and organoid primary cell culture systems has provided new in- sights into how H. pylori manipulates gastric tissue functions, locally and at a dis- tance, to promote its survival in the stomach and induce pathologic changes. Finally, new work has bolstered the case for genomic variation as an important mechanism to generate phenotypic diversity during changing environmental conditions in the context of diet manipulation in animal infection models and during human experi- mental infection after vaccination.


July 7, 2019

Traditional Norwegian kveik are a genetically distinct group of domesticated Saccharomyces cerevisiae brewing yeasts.

The widespread production of fermented food and beverages has resulted in the domestication of Saccharomyces cerevisiae yeasts specifically adapted to beer production. While there is evidence beer yeast domestication was accelerated by industrialization of beer, there also exists a farmhouse brewing culture in western Norway which has passed down yeasts referred to as kveik for generations. This practice has resulted in ale yeasts which are typically highly flocculant, phenolic off flavor negative (POF-), and exhibit a high rate of fermentation, similar to previously characterized lineages of domesticated yeast. Additionally, kveik yeasts are reportedly high-temperature tolerant, likely due to the traditional practice of pitching yeast into warm (>28°C) wort. Here, we characterize kveik yeasts from 9 different Norwegian sources via PCR fingerprinting, whole genome sequencing of selected strains, phenotypic screens, and lab-scale fermentations. Phylogenetic analysis suggests that kveik yeasts form a distinct group among beer yeasts. Additionally, we identify a novel POF- loss-of-function mutation, as well as SNPs and CNVs potentially relevant to the thermotolerance, high ethanol tolerance, and high fermentation rate phenotypes of kveik strains. We also identify domestication markers related to flocculation in kveik. Taken together, the results suggest that Norwegian kveik yeasts are a genetically distinct group of domesticated beer yeasts with properties highly relevant to the brewing sector.


July 7, 2019

Genomics, GPCRs and new targets for the control of insect pests and vectors.

The pressing need for new pest control products with novel modes of action has spawned interest in small molecules and peptides targeting arthropod GPCRs. Genome sequence data and tools for reverse genetics have enabled the prediction and characterization of GPCRs from many invertebrates. We review recent work to identify, characterize and de-orphanize arthropod GPCRs, with a focus on studies that reveal exciting new functional roles for these receptors, including the regulation of metabolic resistance. We explore the potential for insecticides targeting Class A biogenic amine-binding and peptide-binding receptors, and consider the innovation required to generate pest-selective leads for development, within the context of new PCR-targeting products to control arthropod vectors of disease.Copyright © 2018. Published by Elsevier Inc.


July 7, 2019

Complete genome sequence of Salmonella enterica subsp. enterica serotype Derby, associated with the pork sector in France.

In the European Union, Salmonella enterica subsp. enterica serovar Derby is the most abundant serotype isolated from pork. Recent studies have shown that this serotype is polyphyletic. However, one main genomic lineage, characterized by sequence type 40 (ST40), the presence of the Salmonella pathogenicity island 23, and showing resistance to streptomycin, sulphonamides, and tetracycline (STR-SSS- TET), is pork associated. Here, we describe the complete genome sequence of a strain from this lineage isolated in France.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.