Menu
April 21, 2020  |  

Paragraph: A graph-based structural variant genotyper for short-read sequence data

Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, a fast and accurate genotyper that models SVs using sequence graphs and SV annotations produced by a range of methods and technologies. We demonstrate the accuracy of Paragraph on whole genome sequence data from a control sample with both short and long read sequencing data available, and then apply it at scale to a cohort of 100 samples of diverse ancestry sequenced with short-reads. Comparative analyses indicate that Paragraph has better accuracy than other existing genotypers. The Paragraph software is open-source and available at ?https://github.com/Illumina/paragraph


April 21, 2020  |  

The persimmon genome reveals clues to the evolution of a lineage-specific sex determination system in plants

Most angiosperms bear hermaphroditic flowers, but a few species have evolved outcrossing strategies, such as dioecy, the presence of separate male and female individuals. We previously investigated the mechanisms underlying dioecy in diploid persimmon (D. lotus) and found that male flowers are specified by repression of the autosomal gene MeGI by its paralog, the Y-encoded pseudo-gene OGI. This mechanism is thought to be lineage-specific, but its evolutionary path remains unknown. Here, we developed a full draft of the diploid persimmon genome (D. lotus), which revealed a lineage-specific genome-wide paleoduplication event. Together with a subsequent persimmon-specific duplication(s), these events resulted in the presence of three paralogs, MeGI, OGI and newly identified Sister of MeGI (SiMeGI), from the single original gene. Evolutionary analysis suggested that MeGI underwent adaptive evolution after the paleoduplication event. Transformation of tobacco plants with MeGI and SiMeGI revealed that MeGI specifically acquired a new function as a repressor of male organ development, while SiMeGI presumably maintained the original function. Later, local duplication spawned MeGI’s regulator OGI, completing the path leading to dioecy. These findings exemplify how duplication events can provide flexible genetic material available to help respond to varying environments and provide interesting parallels for our understanding of the mechanisms underlying the transition into dieocy in plants.


April 21, 2020  |  

The genomic diversification of clonally propagated grapevines

Vegetatively propagated clones accumulate somatic mutations. The purpose of this study was to better understand the consequences of clonal propagation and involved defining the nature of somatic mutations throughout the genome. Fifteen Zinfandel winegrape clone genomes were sequenced and compared to one another using a highly contiguous genome reference produced from one of the clones, Zinfandel 03. Though most heterozygous variants were shared, somatic mutations accumulated in individual and subsets of clones. Overall, heterozygous mutations were most frequent in intergenic space and more frequent in introns than exons. A significantly larger percentage of CpG, CHG, and CHH sites in repetitive intergenic space experienced transition mutations than genic and non-repetitive intergenic spaces, likely because of higher levels of methylation in the region and the increased likelihood of methylated cytosines to spontaneously deaminate. Of the minority of mutations that occurred in exons, larger proportions of these were putatively deleterious when they occurred in relatively few clones. These data support three major conclusions. First, repetitive intergenic space is a major driver of clone genome diversification. Second, clonal propagation is associated with the accumulation of putatively deleterious mutations. Third, the data suggest selection against deleterious variants in coding regions such that mutations are less frequent in coding than noncoding regions of the genome.


April 21, 2020  |  

First near complete haplotype phased genome assembly of River buffalo (Bubalus bubalis)

This study reports the first haplotype phased reference quality genome assembly of textquoteleftMurrahtextquoteright an Indian breed of river buffalo. A mother-father-progeny trio was used for sequencing so that the individual haplotypes could be assembled in the progeny. Parental DNA samples were sequenced on the Illumina platform to generate a total of 274 Gb paired-end data. The progeny DNA sample was sequenced using PacBio long reads and 10x Genomics linked reads at 166x coverage along with 802Gb of optical mapping data. Trio binning based FALCON assembly of each haplotype was scaffolded with 10x Genomics reads and superscaffolded with BioNano Maps to build reference quality assembly of sire and dam haplotypes of 2.63Gb and 2.64Gb with just 59 and 64 scaffolds and N50 of 81.98Mb and 83.23Mb, respectively. BUSCO single copy core gene set coverage was > 91.25%, and gVolante-CEGMA completeness was >96.14% for both haplotypes. Finally, RaGOO was used to order and build the chromosomal level assembly with 25 scaffolds and N50 of 117.48 Mb (sire haplotype) and 118.51 Mb (dam haplotype). The improved haplotype phased genome assembly of river buffalo may provide valuable resources to discover molecular mechanisms related to milk production and reproduction traits.


April 21, 2020  |  

Trochodendron aralioides, the first chromosome-level draft genome in Trochodendrales and a valuable resource for basal eudicot research

Background The wheel tree (Trochodendron aralioides) is one of only two species in the basal eudicot order Trochodendrales. Together with Tetracentron sinense, the family is unique in having secondary xylem without vessel elements, long considered to be a primitive character also found in Amborella and Winteraceae. Recent studies however have shown that Trochodendraceae belong to basal eudicots and demonstrate this represents an evolutionary reversal for the group. Trochodendron aralioides is widespread in cultivation and popular for use in gardens and parks. Findings We assembled the T. aralioides genome using a total of 679.56 Gb of clean reads that were generated using both PacBio and Illumina short-reads in combination with 10XGenomics and Hi-C data. Nineteen scaffolds corresponding to 19 chromosomes were assembled to a final size of 1.614 Gb with a scaffold N50 of 73.37 Mb in addition to 1,534 contigs. Repeat sequences accounted for 64.226% of the genome, and 35,328 protein-coding genes with an average of 5.09 exons per gene were annotated using de novo, RNA-seq, and homology-based approaches. According to a phylogenetic analysis of protein-coding genes, T. aralioides diverged in a basal position relatively to core eudicots, approximately 121.8-125.8 million years ago. Conclusions Trochodendron aralioides is the first chromosome-scale genome assembled in the order Trochodendrales. It represents the largest genome assembled to date in the basal eudicot grade, as well as the closest order relative to the core-eudicots, as the position of Buxales remains unresolved. This genome will support further studies of wood morphology and floral evolution, and will be an essential resource for understanding rapid changes that took place at the base of the Eudicot tree. Finally, it can serve as a valuable source to aid both the acceleration of genome-assisted improvement for cultivation and conservation efforts of the wheel tree.


April 21, 2020  |  

Hi-C guided assemblies reveal conserved regulatory topologies on X and autosomes despite extensive genome shuffling

Genome rearrangements that occur during evolution impose major challenges on regulatory mechanisms that rely on three-dimensional genome architecture. Here, we developed a scaffolding algorithm and generated chromosome-length assemblies from Hi-C data for studying genome topology in three distantly related Drosophila species. We observe extensive genome shuffling between these species with one synteny breakpoint after approximately every six genes. A/B compartments, a set of large gene-dense topologically associating domains (TADs) and spatial contacts between high-affinity sites (HAS) located on the X chromosome are maintained over 40 million years, indicating architectural conservation at various hierarchies. Evolutionary conserved genes cluster in the vicinity of HAS, while HAS locations appear evolutionarily flexible, thus uncoupling functional requirement of dosage compensation from individual positions on the linear X chromosome. Therefore, 3D architecture is preserved even in scenarios of thousands of rearrangements highlighting its relevance for essential processes such as dosage compensation of the X chromosome.


April 21, 2020  |  

The Chinese chestnut genome: a reference for species restoration

Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.


April 21, 2020  |  

Characterization of LINE-1 transposons in a human genome at allelic resolution

The activity of the retrotransposon LINE-1 has created a substantial portion of the human genome. Most of this sequence comprises fractured and debilitated LINE-1s. An accurate approximation of the number, location, and sequence of the LINE-1 elements present in any single genome has proven elusive due to the difficulty of assembling and phasing the repetitive and polymorphic regions of the human genome. Through an in-depth analysis of publicly-available, deep, long-read assemblies of nearly homozygous human genomes, we defined the location and sequence of all intact LINE-1s in these assemblies. We found 148 and 142 intact LINE-1s in two nearly homozygous assemblies. A combination of these assemblies suggests a diploid human genome contains at least 50% more intact LINE-1s than previous estimates textendash in this case, 290 intact LINE-1s at 194 loci. We think this is the best approximation, to date, of the number of intact LINE-1s in a single diploid human genome. In addition to counting intact LINE-1 elements, we resolved the sequence of each element, including some LINE-1 elements in unassembled, presumably centromeric regions of the genome. A comparison of the intact LINE-1s in each assembly shows the specific pattern of variation between these genomes, including LINE-1s that remain intact in only one genome, allelic variation in shared intact LINE-1s, and LINE-1s that are unique (presumably young) insertions in only one genome. We found that many old elements (> 6 million years old) remain intact, and comparison of the young and intact LINE-1s across assemblies reinforces the notion that only a small portion of all LINE-1 sequences that may be intact in the genomes of the human population has been uncovered. This dataset provides the first nearly comprehensive estimate of LINE-1 diversity within an individual, an important dataset in the quest to understand the functional consequences of sequence variation in LINE-1 and the complete set of LINE-1s in the human population.


April 21, 2020  |  

Exceptional subgenome stability and functional divergence in allotetraploid teff, the primary cereal crop in Ethiopia

Teff (Eragrostis tef) is a cornerstone of food security in the Horn of Africa, where it is prized for stress resilience, grain nutrition, and market value. Despite its overall importance to small-scale farmers and communities in Africa, teff suffers from low production compared to other cereals because of limited intensive selection and molecular breeding. Here we report a chromosome-scale genome assembly of allotetraploid teff (variety textquoteleftDabbitextquoteright) and patterns of subgenome dynamics. The teff genome contains two complete sets of homoeologous chromosomes, with most genes maintained as syntenic gene pairs. Through analyzing the history of transposable element activity, we estimate the teff polyploidy event occurred ~1.1 million years ago (mya) and the two subgenomes diverged ~5.0 mya. Despite this divergence, we detected no large-scale structural rearrangements, homoeologous exchanges, or bias gene loss, contrasting most other allopolyploid plant systems. The exceptional subgenome stability observed in teff may enable the ubiquitous and recurrent polyploidy within Chloridoideae, possibly contributing to the increased resilience and diversification of these grasses. The two teff subgenomes have partitioned their ancestral functions based on divergent expression patterns among homoeologous gene pairs across a diverse expression atlas. The most striking differences in homoeolog expression bias are observed during seed development and under abiotic stress, and thus may be related to agronomic traits. Together these genomic resources will be useful for accelerating breeding efforts of this underutilized grain crop and for acquiring fundamental insights into polyploid genome evolution.


April 21, 2020  |  

Decoding and analysis of organelle genomes of Indian tea (Camellia assamica) for phylogenetic confirmation.

The NCBI database has >15 chloroplast (cp) genome sequences available for different Camellia species but none for C. assamica. There is no report of any mitochondrial (mt) genome in the Camellia genus or Theaceae family. With the strong believes that these organelle genomes can play a great tool for taxonomic and phylogenetic analysis, we successfully assembled and analyzed cp and mt genome of C. assamica. We assembled the complete mt genome of C. assamica in a single circular contig of 707,441?bp length comprising of a total of 66 annotated genes, including 35 protein-coding genes, 29 tRNAs and two rRNAs. The first ever cp genome of C. assamica resulted in a circular contig of 157,353?bp length with a typical quadripartite structure. Phylogenetic analysis based on these organelle genomes showed that C. assamica was closely related to C. sinensis and C. leptophylla. It also supports Caryophyllales as Superasterids. Copyright © 2019. Published by Elsevier Inc.


April 21, 2020  |  

Genome analysis and Hi-C assisted assembly of Elaeagnus angustifolia L., a deciduous tree belonging to Elaeagnaceae

Elaeagnus angustifolia L. is a deciduous tree of the Elaeagnaceae family. It is widely used in the study of abiotic stress tolerance in plants and for the improvement of desertification-affected land due to its characteristics of drought resistance, salt tolerance, cold resistance, wind resistance, and other environmental adaptation. Here, we report the complete genome sequencing using the Pacific Biosciences (PacBio) platform and Hi-C assisted assembly of E. angustifolia. A total of 44.27 Gb raw PacBio sequel reads were obtained after filtering out low-quality data, with an average length of 8.64 Kb. Assembly using Canu gave an assembly length of 781.09 Mb, with a contig N50 of 486.92 Kb. A total of 39.56 Gb of clean reads was obtained, with a sequencing coverage of 75×, and Q30 ratio > 95.46%. The 510.71 Mb genomic sequence was mapped to the chromosome, accounting for 96.94% of the total length of the sequence, and the corresponding number of sequences was 269, accounting for 45.83% of the total number of sequences. The genome sequence study of E. angustifolia can be a valuable source for the comparative genome analysis of the Elaeagnaceae family members, and can help to understand the evolutionary response mechanisms of the Elaeagnaceae to drought, salt, cold and wind resistance, and thereby provide effective theoretical support for the improvement of desertification-affected land.


April 21, 2020  |  

Insect genomes: progress and challenges.

In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.


April 21, 2020  |  

Phenomics and genomics of finger millet: current status and future prospects.

Diverse gene pool, advanced plant phenomics and genomics methods enhanced genetic gain and understanding of important agronomic, adaptation and nutritional traits in finger millet. Finger millet (Eleusine coracana L. Gaertn) is an important minor millet for food and nutritional security in semi-arid regions of the world. The crop has wide adaptability and can be grown right from high hills in Himalayan region to coastal plains. It provides food grain as well as palatable straw for cattle, and is fairly climate resilient. The crop has large gene pool with distinct features of both Indian and African germplasm types. Interspecific hybridization between Indian and African germplasm has resulted in greater yield enhancement and disease resistance. The crop has shown numerous advantages over major cereals in terms of stress adaptation, nutritional quality and health benefits. It has indispensable repository of novel genes for the benefits of mankind. Although rapid strides have been made in allele mining in model crops and major cereals, the progress in finger millet genomics is lacking. Comparative genomics have paved the way for the marker-assisted selection, where resistance gene homologues of rice for blast and sequence variants for nutritional traits from other cereals have been invariably used. Transcriptomics studies have provided preliminary understanding of the nutritional variation, drought and salinity tolerance. However, the genetics of many important traits in finger millet is poorly understood and need systematic efforts from biologists across disciplines. Recently, deciphered finger millet genome will enable identification of candidate genes for agronomically and nutritionally important traits. Further, improvement in genome assembly and application of genomic selection as well as genome editing in near future will provide plethora of information and opportunity to understand the genetics of complex traits.


April 21, 2020  |  

Extended haplotype phasing of de novo genome assemblies with FALCON-Phase

Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. These assemblies can be created in various ways, such as use of tissues that contain single-haplotype (haploid) genomes, or by co-sequencing of parental genomes, but these approaches can be impractical in many situations. We present FALCON-Phase, which integrates long-read sequencing data and ultra-long-range Hi-C chromatin interaction data of a diploid individual to create high-quality, phased diploid genome assemblies. The method was evaluated by application to three datasets, including human, cattle, and zebra finch, for which high-quality, fully haplotype resolved assemblies were available for benchmarking. Phasing algorithm accuracy was affected by heterozygosity of the individual sequenced, with higher accuracy for cattle and zebra finch (>97%) compared to human (82%). In addition, scaffolding with the same Hi-C chromatin contact data resulted in phased chromosome-scale scaffolds.


April 21, 2020  |  

Divergent selection following speciation in two ectoparasitic honey bee mites

Multispecies host-parasite evolution is common, but how parasites evolve after speciating remains poorly understood. Shared evolutionary history and physiology may propel species along similar evolutionary trajectories whereas pursuing different strategies can reduce competition. We test these scenarios in the economically important association between honey bees and ectoparasitic mites by sequencing the genomes of the sister mite species Varroa destructor and Varroa jacobsoni. These genomes were closely related, with 99.7% sequence identity. Among the 9,628 orthologous genes, 4.8% showed signs of positive selection in at least one species. Divergent selective trajectories were discovered in conserved chemosensory gene families (IGR, SNMP), and Halloween genes (CYP) involved in moulting and reproduction. However, there was little overlap in these gene sets and associated GO terms, indicating different selective regimes operating on each of the parasites. Based on our findings, we suggest that species-specific strategies may be needed to combat evolving parasite communities.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.