Many applications of high throughput sequencing rely on the availability of an accurate reference genome. Errors in the reference genome assembly increase the number of false-positives in downstream analyses. Recently, we have shown that over 33% of the current pig reference genome, Sscrofa10.2, is either misassembled or otherwise unreliable for genomic analyses. Additionally, ~10% of the bases in the assembly are Ns in gaps of an arbitrary size. Thousands of highly fragmented contigs remain unplaced and many genes are known to be missing from the assembly. Here we present a new assembly of the pig genome, Sscrofa11, assembled using 65X PacBio sequencing from T.J. Tabasco, the same Duroc sow used in the assembly of Sscrofa10.2. The PacBio reads were assembled using the Falcon assembly pipeline resulting in 3,206 contigs with an initial contig N50 of 14.5Mb. We used Sscrofa10.2 as a template to scaffold the PacBio contigs, under the assumption that its gross structure is correct, and used PBJelly to fill gaps. Additional gaps were filled using large, sequenced BACs from the original assembly. Following gap filling, the assembly has substantially improved contiguity and contains more sequence than the Sscrofa10.2 assembly. Arrow and Pilon were used to polish the assembly. The contig N50 is now 58.5Mb with 103 gaps remaining. By comparing regions of the two assemblies we show that regions with structural abnormalities we identified in Sscrofa10.2 are resolved in the new PacBio assembly.
Andrew Carroll, Director of Science at DNAnexus, presents how to greatly improve the accuracy of SV-calling by using long-read PacBio sequencing and fast and easy-to-run cloud-optimized apps like PBHoney, Parliament,…
PacBio SMRT Sequencing is fast changing the genomics space with its long reads and high consensus sequence accuracy, providing the most comprehensive view of the genome and transcriptome. In this…
Brassica napus (AACC, 2n = 38) is an important oilseed crop grown worldwide. However, little is known about the population evolution of this species, the genomic difference between its major genetic groups, such as European and Asian rapeseed, and the impacts of historical large-scale introgression events on this young tetraploid. In this study, we reported the de novo assembly of the genome sequences of an Asian rapeseed (B. napus), Ningyou 7, and its four progenitors and compared these genomes with other available genomic data from diverse European and Asian cultivars. Our results showed that Asian rapeseed originally derived from European rapeseed but subsequently significantly diverged, with rapid genome differentiation after hybridization and intensive local selective breeding. The first historical introgression of B. rapa dramatically broadened the allelic pool but decreased the deleterious variations of Asian rapeseed. The second historical introgression of the double-low traits of European rapeseed (canola) has reshaped Asian rapeseed into two groups (double-low and double-high), accompanied by an increase in genetic load in the double-low group. This study demonstrates distinctive genomic footprints and deleterious SNP (single nucleotide polymorphism) variants for local adaptation by recent intra- and interspecies introgression events and provides novel insights for understanding the rapid genome evolution of a young allopolyploid crop. © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
We present the genome of the moon jellyfish Aurelia, a genome from a cnidarian with a medusa life stage. Our analyses suggest that gene gain and loss in Aurelia is comparable to what has been found in its morphologically simpler relatives-the anthozoan corals and sea anemones. RNA sequencing analysis does not support the hypothesis that taxonomically restricted (orphan) genes play an oversized role in the development of the medusa stage. Instead, genes broadly conserved across animals and eukaryotes play comparable roles throughout the life cycle. All life stages of Aurelia are significantly enriched in the expression of genes that are hypothesized to interact in protein networks found in bilaterian animals. Collectively, our results suggest that increased life cycle complexity in Aurelia does not correlate with an increased number of genes. This leads to two possible evolutionary scenarios: either medusozoans evolved their complex medusa life stage (with concomitant shifts into new ecological niches) primarily by re-working genetic pathways already present in the last common ancestor of cnidarians, or the earliest cnidarians had a medusa life stage, which was subsequently lost in the anthozoans. While we favour the earlier hypothesis, the latter is consistent with growing evidence that many of the earliest animals were more physically complex than previously hypothesized.
Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil Improvement.
Cultivated peanut (Arachis hypogaea) is an allotetraploid crop planted in Asia, Africa, and America for edible oil and protein. To explore the origins and consequences of tetraploidy, we sequenced the allotetraploid A. hypogaea genome and compared it with the related diploid Arachis duranensis and Arachis ipaensis genomes. We annotated 39 888 A-subgenome genes and 41 526 B-subgenome genes in allotetraploid peanut. The A. hypogaea subgenomes have evolved asymmetrically, with the B subgenome resembling the ancestral state and the A subgenome undergoing more gene disruption, loss, conversion, and transposable element proliferation, and having reduced gene expression during seed development despite lacking genome-wide expression dominance. Genomic and transcriptomic analyses identified more than 2 500 oil metabolism-related genes and revealed that most of them show altered expression early in seed development while their expression ceases during desiccation, presenting a comprehensive map of peanut lipid biosynthesis. The availability of these genomic resources will facilitate a better understanding of the complex genome architecture, agronomically and economically important genes, and genetic improvement of peanut.Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.
Populus alba is widely distributed and cultivated in Europe and Asia. This species has been used for diverse studies. In this study, we assembled a de novo genome sequence of P. alba var. pyramidalis (= P. bolleana) and confirmed its high transformation efficiency and short transformation time by experiments. Through a process of hybrid genome assembly, a total of 464 M of the genome was assembled. Annotation analyses predicted 37 901 protein-coding genes. This genome is highly collinear to that of P. trichocarpa, with most genes having orthologs in the two species. We found a marked expansion of gene families related to histone and the hormone auxin but loss of disease resistance genes in P. alba if compared with the closely related P. trichocarpa. The genome sequence presented here represents a valuable resource for further molecular functional analyses of this species as a new tree model, poplar breeding practices and comparative genomic analyses across different poplars. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Genome assembly and gene expression in the American black bear provides new insights into the renal response to hibernation.
The prevalence of chronic kidney disease (CKD) is rising worldwide and 10-15% of the global population currently suffers from CKD and its complications. Given the increasing prevalence of CKD there is an urgent need to find novel treatment options. The American black bear (Ursus americanus) copes with months of lowered kidney function and metabolism during hibernation without the devastating effects on metabolism and other consequences observed in humans. In a biomimetic approach to better understand kidney adaptations and physiology in hibernating black bears, we established a high-quality genome assembly. Subsequent RNA-Seq analysis of kidneys comparing gene expression profiles in black bears entering (late fall) and emerging (early spring) from hibernation identified 169 protein-coding genes that were differentially expressed. Of these, 101 genes were downregulated and 68 genes were upregulated after hibernation. Fold changes ranged from 1.8-fold downregulation (RTN4RL2) to 2.4-fold upregulation (CISH). Most notable was the upregulation of cytokine suppression genes (SOCS2, CISH, and SERPINC1) and the lack of increased expression of cytokines and genes involved in inflammation. The identification of these differences in gene expression in the black bear kidney may provide new insights in the prevention and treatment of CKD. © The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
The genus Liriodendron belongs to the family Magnoliaceae, which resides within the magnoliids, an early diverging lineage of the Mesangiospermae. However, the phylogenetic relationship of magnoliids with eudicots and monocots has not been conclusively resolved and thus remains to be determined1-6. Liriodendron is a relict lineage from the Tertiary with two distinct species-one East Asian (L. chinense (Hemsley) Sargent) and one eastern North American (L. tulipifera Linn)-identified as a vicariad species pair. However, the genetic divergence and evolutionary trajectories of these species remain to be elucidated at the whole-genome level7. Here, we report the first de novo genome assembly of a plant in the Magnoliaceae, L. chinense. Phylogenetic analyses suggest that magnoliids are sister to the clade consisting of eudicots and monocots, with rapid diversification occurring in the common ancestor of these three lineages. Analyses of population genetic structure indicate that L. chinense has diverged into two lineages-the eastern and western groups-in China. While L. tulipifera in North America is genetically positioned between the two L. chinense groups, it is closer to the eastern group. This result is consistent with phenotypic observations that suggest that the eastern and western groups of China may have diverged long ago, possibly before the intercontinental differentiation between L. chinense and L. tulipifera. Genetic diversity analyses show that L. chinense has tenfold higher genetic diversity than L. tulipifera, suggesting that the complicated regions comprising east-west-orientated mountains and the Yangtze river basin (especially near 30°?N latitude) in East Asia offered more successful refugia than the south-north-orientated mountain valleys in eastern North America during the Quaternary glacial period.
Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants.
We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque show diverged enhancer activity and gene expression. We further characterize a subset that may contribute to ape- or great-ape-specific phenotypic traits, including taillessness, brain volume expansion, improved manual dexterity, and large body size. The rheMacS genome assembly serves as an ideal reference for future biomedical and evolutionary studies.
The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor >?98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal.
Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies.PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion.Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organism’s genome.
Lentinula edodes is a popular, cultivated edible and medicinal mushroom. Lentinula edodes is susceptible to postharvest problems, such as gill browning, fruiting body softening, and lentinan degradation. We constructed a de novo assembly draft genome sequence and performed gene prediction for Lentinula edodesDe novo assembly was carried out using short reads from paired-end and mate-paired libraries and by using long reads by PacBio, resulting in a contig number of 1,951 and an N50 of 1 Mb. Furthermore, we predicted genes by Augustus using transcriptome sequencing (RNA-seq) data from the whole life cycle of Lentinula edodes, resulting in 12,959 predicted genes. This analysis revealed that Lentinula edodes lacks lignin peroxidase. To reveal genes involved in the loss of quality of Lentinula edodes postharvest fruiting bodies, transcriptome analysis was carried out using serial analysis of gene expression (SuperSAGE). This analysis revealed that many cell wall-related enzymes are upregulated after harvest, such as ß-1,3-1,6-glucan-degrading enzymes in glycoside hydrolase (GH) families GH5, GH16, GH30, GH55, and GH128, and thaumatin-like proteins. In addition, we found that several chitin-related genes are upregulated, such as putative chitinases in GH family 18, exochitinases in GH20, and a putative chitosanase in GH family 75. The results suggest that cell wall-degrading enzymes synergistically cooperate for rapid fruiting body autolysis. Many putative transcription factor genes were upregulated postharvest, such as genes containing high-mobility-group (HMG) domains and zinc finger domains. Several cell death-related proteins were also upregulated postharvest.IMPORTANCE Our data collectively suggest that there is a rapid fruiting body autolysis system in Lentinula edodes The genes for the loss of postharvest quality newly found in this research will be targets for the future breeding of strains that keep fresh longer than present strains. De novoLentinula edodes genome assembly data will be used for the construction of a complete Lentinula edodes chromosome map for future breeding. Copyright © 2017 American Society for Microbiology.
A high-resolution genetic map of the cereal crown rot pathogen Fusarium pseudograminearum provides a near-complete genome assembly.
Fusarium pseudograminearum is an important pathogen of wheat and barley, particularly in semi-arid environments. Previous genome assemblies for this organism were based entirely on short read data and are highly fragmented. In this work, a genetic map of F. pseudograminearum has been constructed for the first time based on a mapping population of 178 individuals. The genetic map, together with long read scaffolding of a short read-based genome assembly, was used to give a near-complete assembly of the four F. pseudograminearum chromosomes. Large regions of synteny between F. pseudograminearum and F. graminearum, the related pathogen that is the primary causal agent of cereal head blight disease, were previously proposed in the core conserved genome, but the construction of a genetic map to order and orient contigs is critical to the validation of synteny and the placing of species-specific regions. Indeed, our comparative analyses of the genomes of these two related pathogens suggest that rearrangements in the F. pseudograminearum genome have occurred in the chromosome ends. One of these rearrangements includes the transposition of an entire gene cluster involved in the detoxification of the benzoxazolinone (BOA) class of plant phytoalexins. This work provides an important genomic and genetic resource for F. pseudograminearum, which is less well characterized than F. graminearum. In addition, this study provides new insights into a better understanding of the sexual reproduction process in F. pseudograminearum, which informs us of the potential of this pathogen to evolve.© 2016 BSPP AND JOHN WILEY & SONS LTD.
Clownfishes (or anemonefishes) form an iconic group of coral reef fishes, principally known for their mutualistic interaction with sea anemones. They are characterized by particular life history traits, such as a complex social structure and mating system involving sequential hermaphroditism, coupled with an exceptionally long lifespan. Additionally, clownfishes are considered to be one of the rare groups to have experienced an adaptive radiation in the marine environment. Here, we assembled and annotated the first genome of a clownfish species, the tomato clownfish (Amphiprion frenatus). We obtained 17,801 assembled scaffolds, containing a total of 26,917 genes. The completeness of the assembly and annotation was satisfying, with 96.5% of the Actinopterygii Benchmarking Universal Single-Copy Orthologs (BUSCOs) being retrieved in A. frenatus assembly. The quality of the resulting assembly is comparable to other bony fish assemblies. This resource is valuable for advancing studies of the particular life history traits of clownfishes, as well as being useful for population genetic studies and the development of new phylogenetic markers. It will also open the way to comparative genomics. Indeed, future genomic comparison among closely related fishes may provide means to identify genes related to the unique adaptations to different sea anemone hosts, as well as better characterize the genomic signatures of an adaptive radiation.© 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.