April 21, 2020  |  

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system

Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ~36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.


April 21, 2020  |  

Genes of the pig, Sus scrofa, reconstructed with EvidentialGene.

The pig is a well-studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI reference sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from current NCBI and Ensembl reference pig gene sets, additional alternate transcripts, and other improvements. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline of EvidentialGene project.


April 21, 2020  |  

Genomic sequence and copy number evolution during hybrid crop development in sunflowers.

Hybrid crops, an important part of modern agriculture, rely on the development of male and female heterotic gene pools. In sunflowers, heterotic gene pools were developed through the use of crop-wild relatives to produce cytoplasmic male sterile female and branching, fertility restoring male lines. Here, we use genomic data from a diversity panel of male, female, and open-pollinated lines to explore the genetic changes brought during modern improvement. We find the male lines have diverged most from their open-pollinated progenitors and that genetic differentiation is concentrated in chromosomes, 8, 10 and 13, due to introgressions from wild relatives. Ancestral variation from open-pollinated varieties almost universally evolved in parallel for both male and female lines suggesting little or no selection for heterotic overdominance. Furthermore, we show that gene content differs between the male and female lines and that differentiation in gene content is concentrated in high FST regions. This means that the introgressions that brought branching and fertility restoration to the male lines, brought with them different gene content from the ancestral haplotypes, including the removal of some genes. Although we find no evidence that gene complementation genomewide is responsible for heterosis between male and female lines, several of the genes that are largely absent in either the male or female lines are associated with pathogen defense, suggesting complementation may be functionally relevant for crop breeders.


April 21, 2020  |  

Whole genome sequence of Auricularia heimuer (Basidiomycota, Fungi), the third most important cultivated mushroom worldwide.

Heimuer, Auricularia heimuer, is one of the most famous traditional Chinese foods and medicines, and it is the third most important cultivated mushroom worldwide. The aim of this study is to develop genomic resources for A. heimuer to furnish tools that can be used to study its secondary metabolite production capability, wood degradation ability and biosynthesis of polysaccharides. The genome was obtained from single spore mycelia of the strain Dai 13782 by using combined high-throughput Illumina HiSeq 4000 system with the PacBio RSII long-read sequencing platform. Functional annotation was accomplished by blasting protein sequences with different public available databases to obtain their corresponding annotations. It is 49.76Mb in size with a N50 scaffold size of 1,350,668bp and encodes 16,244 putative predicted genes. This is the first genome-scale assembly and annotation for A. heimuer, which is the third sequenced species in Auricularia. Copyright © 2018 Elsevier Inc. All rights reserved.


April 21, 2020  |  

Newly designed 16S rRNA metabarcoding primers amplify diverse and novel archaeal taxa from the environment.

High-throughput studies of microbial communities suggest that Archaea are a widespread component of microbial diversity in various ecosystems. However, proper quantification of archaeal diversity and community ecology remains limited, as sequence coverage of Archaea is usually low owing to the inability of available prokaryotic primers to efficiently amplify archaeal compared to bacterial rRNA genes. To improve identification and quantification of Archaea, we designed and validated the utility of several primer pairs to efficiently amplify archaeal 16S rRNA genes based on up-to-date reference genes. We demonstrate that several of these primer pairs amplify phylogenetically diverse Archaea with high sequencing coverage, outperforming commonly used primers. Based on comparing the resulting long 16S rRNA gene fragments with public databases from all habitats, we found several novel family- to phylum-level archaeal taxa from topsoil and surface water. Our results suggest that archaeal diversity has been largely overlooked due to the limitations of available primers, and that improved primer pairs enable to estimate archaeal diversity more accurately. © 2018 The Authors. Environmental Microbiology Reports published by Society for Applied Microbiology and John Wiley & Sons Ltd.


April 21, 2020  |  

Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life.

The human gut microbiome matures towards the adult composition during the first years of life and is implicated in early immune development. Here, we investigate the effects of microbial genomic diversity on gut microbiome development using integrated early childhood data sets collected in the DIABIMMUNE study in Finland, Estonia and Russian Karelia. We show that gut microbial diversity is associated with household location and linear growth of children. Single nucleotide polymorphism- and metagenomic assembly-based strain tracking revealed large and highly dynamic microbial pangenomes, especially in the genus Bacteroides, in which we identified evidence of variability deriving from Bacteroides-targeting bacteriophages. Our analyses revealed functional consequences of strain diversity; only 10% of Finnish infants harboured Bifidobacterium longum subsp. infantis, a subspecies specialized in human milk metabolism, whereas Russian infants commonly maintained a probiotic Bifidobacterium bifidum strain in infancy. Groups of bacteria contributing to diverse, characterized metabolic pathways converged to highly subject-specific configurations over the first two years of life. This longitudinal study extends the current view of early gut microbial community assembly based on strain-level genomic variation.


April 21, 2020  |  

Genetic map-guided genome assembly reveals a virulence-governing minichromosome in the lentil anthracnose pathogen Colletotrichum lentis.

Colletotrichum lentis causes anthracnose, which is a serious disease on lentil and can account for up to 70% crop loss. Two pathogenic races, 0 and 1, have been described in the C. lentis population from lentil. To unravel the genetic control of virulence, an isolate of the virulent race 0 was sequenced at 1481-fold genomic coverage. The 56.10-Mb genome assembly consists of 50 scaffolds with N50 scaffold length of 4.89 Mb. A total of 11 436 protein-coding gene models was predicted in the genome with 237 coding candidate effectors, 43 secondary metabolite biosynthetic enzymes and 229 carbohydrate-active enzymes (CAZymes), suggesting a contraction of the virulence gene repertoire in C. lentis. Scaffolds were assigned to 10 core and two minichromosomes using a population (race 0 × race 1, n = 94 progeny isolates) sequencing-based, high-density (14 312 single nucleotide polymorphisms) genetic map. Composite interval mapping revealed a single quantitative trait locus (QTL), qClVIR-11, located on minichromosome 11, explaining 85% of the variability in virulence of the C. lentis population. The QTL covers a physical distance of 0.84 Mb with 98 genes, including seven candidate effector and two secondary metabolite genes. Taken together, the study provides genetic and physical evidence for the existence of a minichromosome controlling the C. lentis virulence on lentil. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.


April 21, 2020  |  

In-depth analysis of the genome of Trypanosoma evansi, an etiologic agent of surra.

Trypanosoma evansi is the causative agent of the animal trypanosomiasis surra, a disease with serious economic burden worldwide. The availability of the genome of its closely related parasite Trypanosoma brucei allows us to compare their genetic and evolutionarily shared and distinct biological features. The complete genomic sequence of the T. evansi YNB strain was obtained using a combination of genomic and transcriptomic sequencing, de novo assembly, and bioinformatic analysis. The genome size of the T. evansi YNB strain was 35.2 Mb, showing 96.59% similarity in sequence and 88.97% in scaffold alignment with T. brucei. A total of 8,617 protein-coding genes, accounting for 31% of the genome, were predicted. Approximately 1,641 alternative splicing events of 820 genes were identified, with a majority mediated by intron retention, which represented a major difference in post-transcriptional regulation between T. evansi and T. brucei. Disparities in gene copy number of the variant surface glycoprotein, expression site-associated genes, microRNAs, and RNA-binding protein were clearly observed between the two parasites. The results revealed the genomic determinants of T. evansi, which encoded specific biological characteristics that distinguished them from other related trypanosome species.


April 21, 2020  |  

Liriodendron genome sheds light on angiosperm phylogeny and species-pair differentiation.

The genus Liriodendron belongs to the family Magnoliaceae, which resides within the magnoliids, an early diverging lineage of the Mesangiospermae. However, the phylogenetic relationship of magnoliids with eudicots and monocots has not been conclusively resolved and thus remains to be determined1-6. Liriodendron is a relict lineage from the Tertiary with two distinct species-one East Asian (L. chinense (Hemsley) Sargent) and one eastern North American (L. tulipifera Linn)-identified as a vicariad species pair. However, the genetic divergence and evolutionary trajectories of these species remain to be elucidated at the whole-genome level7. Here, we report the first de novo genome assembly of a plant in the Magnoliaceae, L. chinense. Phylogenetic analyses suggest that magnoliids are sister to the clade consisting of eudicots and monocots, with rapid diversification occurring in the common ancestor of these three lineages. Analyses of population genetic structure indicate that L. chinense has diverged into two lineages-the eastern and western groups-in China. While L. tulipifera in North America is genetically positioned between the two L. chinense groups, it is closer to the eastern group. This result is consistent with phenotypic observations that suggest that the eastern and western groups of China may have diverged long ago, possibly before the intercontinental differentiation between L. chinense and L. tulipifera. Genetic diversity analyses show that L. chinense has tenfold higher genetic diversity than L. tulipifera, suggesting that the complicated regions comprising east-west-orientated mountains and the Yangtze river basin (especially near 30°?N latitude) in East Asia offered more successful refugia than the south-north-orientated mountain valleys in eastern North America during the Quaternary glacial period.


April 21, 2020  |  

Mutation of a bHLH transcription factor allowed almond domestication.

Wild almond species accumulate the bitter and toxic cyanogenic diglucoside amygdalin. Almond domestication was enabled by the selection of genotypes harboring sweet kernels. We report the completion of the almond reference genome. Map-based cloning using an F1 population segregating for kernel taste led to the identification of a 46-kilobase gene cluster encoding five basic helix-loop-helix transcription factors, bHLH1 to bHLH5. Functional characterization demonstrated that bHLH2 controls transcription of the P450 monooxygenase-encoding genes PdCYP79D16 and PdCYP71AN24, which are involved in the amygdalin biosynthetic pathway. A nonsynonymous point mutation (Leu to Phe) in the dimerization domain of bHLH2 prevents transcription of the two cytochrome P450 genes, resulting in the sweet kernel trait. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


April 21, 2020  |  

Recompleting the Caenorhabditis elegans genome.

Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. elegans available today. To provide a more accurate C. elegans genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted =53 new genes in VC2010. The recompleted genome of C. elegans should be a valuable resource for genetics, genomics, and systems biology. © 2019 Yoshimura et al.; Published by Cold Spring Harbor Laboratory Press.


April 21, 2020  |  

SMRT long reads and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica).

The barn swallow (Hirundo rustica) is a migratory bird that has been the focus of a large number of ecological, behavioral, and genetic studies. To facilitate further population genetics and genomic studies, we present a reference genome assembly for the European subspecies (H. r. rustica).As part of the Genome10K effort on generating high-quality vertebrate genomes (Vertebrate Genomes Project), we have assembled a highly contiguous genome assembly using single molecule real-time (SMRT) DNA sequencing and several Bionano optical map technologies. We compared and integrated optical maps derived from both the Nick, Label, Repair, and Stain technology and from the Direct Label and Stain (DLS) technology. As proposed by Bionano, DLS more than doubled the scaffold N50 with respect to the nickase. The dual enzyme hybrid scaffold led to a further marginal increase in scaffold N50 and an overall increase of confidence in the scaffolds. After removal of haplotigs, the final assembly is approximately 1.21 Gbp in size, with a scaffold N50 value of more than 25.95 Mbp.This high-quality genome assembly represents a valuable resource for future studies of population genetics and genomics in the barn swallow and for studies concerning the evolution of avian genomes. It also represents one of the very first genomes assembled by combining SMRT long-read sequencing with the new Bionano DLS technology for scaffolding. The quality of this assembly demonstrates the potential of this methodology to substantially increase the contiguity of genome assemblies.


April 21, 2020  |  

Comprehensive identification of the full-length transcripts and alternative splicing related to the secondary metabolism pathways in the tea plant (Camellia sinensis).

Flavonoids, theanine and caffeine are the main secondary metabolites of the tea plant (Camellia sinensis), which account for the tea’s unique flavor quality and health benefits. The biosynthesis pathways of these metabolites have been extensively studied at the transcriptional level, but the regulatory mechanisms are still unclear. In this study, to explore the transcriptome diversity and complexity of tea plant, PacBio Iso-Seq and RNA-seq analysis were combined to obtain full-length transcripts and to profile the changes in gene expression during the leaf development. A total of 1,388,066 reads of insert (ROI) were generated with an average length of 1,762?bp, and more than 54% (755,716) of the ROIs were full-length non-chimeric (FLNC) reads. The Benchmarking Universal Single-Copy Orthologue (BUSCO) completeness was 92.7%. A total of 93,883 non-redundant transcripts were obtained, and 87,395 (93.1%) were new alternatively spliced isoforms. Meanwhile, 7,650 differential expression transcripts (DETs) were identified. A total of 28,980 alternative splicing (AS) events were predicted, including 1,297 differential AS (DAS) events. The transcript isoforms of the key genes involved in the flavonoid, theanine and caffeine biosynthesis pathways were characterized. Additionally, 5,777 fusion transcripts and 9,052 long non-coding RNAs (lncRNAs) were also predicted. Our results revealed that AS potentially plays a crucial role in the regulation of the secondary metabolism of the tea plant. These findings enhanced our understanding of the complexity of the secondary metabolic regulation of tea plants and provided a basis for the subsequent exploration of the regulatory mechanisms of flavonoid, theanine and caffeine biosynthesis in tea plants.


April 21, 2020  |  

Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis).

Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n?=?58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun.We generated ˜300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1?Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle.Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.


April 21, 2020  |  

Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity.

Rapid innovation in sequencing technologies and improvement in assembly algorithms have enabled the creation of highly contiguous mammalian genomes. Here we report a chromosome-level assembly of the water buffalo (Bubalus bubalis) genome using single-molecule sequencing and chromatin conformation capture data. PacBio Sequel reads, with a mean length of 11.5?kb, helped to resolve repetitive elements and generate sequence contiguity. All five B. bubalis sub-metacentric chromosomes were correctly scaffolded with centromeres spanned. Although the index animal was partly inbred, 58% of the genome was haplotype-phased by FALCON-Unzip. This new reference genome improves the contig N50 of the previous short-read based buffalo assembly more than a thousand-fold and contains only 383 gaps. It surpasses the human and goat references in sequence contiguity and facilitates the annotation of hard to assemble gene clusters such as the major histocompatibility complex (MHC).


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.