The pig is a well-studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI reference sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from current NCBI and Ensembl reference pig gene sets, additional alternate transcripts, and other improvements. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline of EvidentialGene project.
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.
Construction of chromosome-level assembly is a vital step in achieving the goal of a ‘Platinum’ genome, but it remains a major challenge to assemble and anchor sequences to chromosomes in autopolyploid or highly heterozygous genomes. High-throughput chromosome conformation capture (Hi-C) technology serves as a robust tool to dramatically advance chromosome scaffolding; however, existing approaches are mostly designed for diploid genomes and often with the aim of reconstructing a haploid representation, thereby having limited power to reconstruct chromosomes for autopolyploid genomes. We developed a novel algorithm (ALLHiC) that is capable of building allele-aware, chromosomal-scale assembly for autopolyploid genomes using Hi-C paired-end reads with innovative ‘prune’ and ‘optimize’ steps. Application on simulated data showed that ALLHiC can phase allelic contigs and substantially improve ordering and orientation when compared to other mainstream Hi-C assemblers. We applied ALLHiC on an autotetraploid and an autooctoploid sugar-cane genome and successfully constructed the phased chromosomal-level assemblies, revealing allelic variations present in these two genomes. The ALLHiC pipeline enables de novo chromosome-level assembly of autopolyploid genomes, separating each allele. Haplotype chromosome-level assembly of allopolyploid and heterozygous diploid genomes can be achieved using ALLHiC, overcoming obstacles in assembling complex genomes.
Wolbachia, an alpha-proteobacterium closely related to Rickettsia, is a maternally transmitted, intracellular symbiont of arthropods and nematodes. Aedes albopictus mosquitoes are naturally infected with Wolbachia strains wAlbA and wAlbB. Cell line Aa23 established from Ae. albopictus embryos retains only wAlbB and is a key model to study host-endosymbiont interactions. We have assembled the complete circular genome of wAlbB from the Aa23 cell line using long-read PacBio sequencing at 500× median coverage. The assembled circular chromosome is 1.48 megabases in size, an increase of more than 300 kb over the published draft wAlbB genome. The annotation of the genome identified 1,205 protein coding genes, 34 tRNA, 3 rRNA, 1 tmRNA, and 3 other ncRNA loci. The long reads enabled sequencing over complex repeat regions which are difficult to resolve with short-read sequencing. Thirteen percent of the genome comprised insertion sequence elements distributed throughout the genome, some of which cause pseudogenization. Prophage WO genes encoding some essential components of phage particle assembly are missing, while the remainder are found in five prophage regions/WO-like islands or scattered around the genome. Orthology analysis identified a core proteome of 535 orthogroups across all completed Wolbachia genomes. The majority of proteins could be annotated using Pfam and eggNOG analyses, including ankyrins and components of the Type IV secretion system. KEGG analysis revealed the absence of five genes in wAlbB which are present in other Wolbachia. The availability of a complete circular chromosome from wAlbB will enable further biochemical, molecular, and genetic analyses on this strain and related Wolbachia. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Alternative polyadenylation coordinates embryonic development, sexual dimorphism and longitudinal growth in Xenopus tropicalis.
RNA alternative polyadenylation contributes to the complexity of information transfer from genome to phenome, thus amplifying gene function. Here, we report the first X. tropicalis resource with 127,914 alternative polyadenylation (APA) sites derived from embryos and adults. Overall, APA networks play central roles in coordinating the maternal-zygotic transition (MZT) in embryos, sexual dimorphism in adults and longitudinal growth from embryos to adults. APA sites coordinate reprogramming in embryos before the MZT, but developmental events after the MZT due to zygotic genome activation. The APA transcriptomes of young adults are more variable than growing adults and male frog APA transcriptomes are more divergent than females. The APA profiles of young females were similar to embryos before the MZT. Enriched pathways in developing embryos were distinct across the MZT and noticeably segregated from adults. Briefly, our results suggest that the minimal functional units in genomes are alternative transcripts as opposed to genes.
In insects, rapidly evolving primary sex-determining signals are transduced by a conserved regulatory module controlling sexual differentiation. In the agricultural pest Ceratitis capitata (Mediterranean fruit fly, or Medfly), we identified a Y-linked gene, Maleness-on-the-Y (MoY), encoding a small protein that is necessary and sufficient for male development. Silencing or disruption of MoY in XY embryos causes feminization, whereas overexpression of MoY in XX embryos induces masculinization. Crosses between transformed XY females and XX males give rise to males and females, indicating that a Y chromosome can be transmitted by XY females. MoY is Y-linked and functionally conserved in other species of the Tephritidae family, highlighting its potential to serve as a tool for developing more effective control strategies against these major agricultural insect pests.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation. © The Author 2017. Published by Oxford University Press.
Symbiosis is a major force of evolutionary change, influencing virtually all aspects of biology, from population ecology and evolution to genomics and molecular/biochemical mechanisms of development and reproduction. A remarkable example is Wolbachia endobacteria, present in some parasitic nematodes and many arthropod species. Acquisition of genomic data from diverse Wolbachia clades will aid in the elucidation of the different symbiotic mechanisms(s). However, challenges of de novo assembly of Wolbachia genomes include the presence in the sample of host DNA: nematode/vertebrate or insect. We designed biotinylated probes to capture large fragments of Wolbachia DNA for sequencing using PacBio technology (LEFT-SEQ: Large Enriched Fragment Targeted Sequencing). LEFT-SEQ was used to capture and sequence four Wolbachia genomes: the filarial nematode Brugia malayi, wBm, (21-fold enrichment), Drosophila mauritiana flies (2 isolates), wMau (11-fold enrichment), and Aedes albopictus mosquitoes, wAlbB (200-fold enrichment). LEFT-SEQ resulted in complete genomes for wBm and for wMau. For wBm, 18 single-nucleotide polymorphisms (SNPs), relative to the wBm reference, were identified and confirmed by PCR. A limit of LEFT-SEQ is illustrated by the wAlbB genome, characterized by a very high level of insertion sequences elements (ISs) and DNA repeats, for which only a 20-contig draft assembly was achieved.
A chromosome-level genome assembly of Cydia pomonella provides insights into chemical ecology and insecticide resistance.
The codling moth Cydia pomonella, a major invasive pest of pome fruit, has spread around the globe in the last half century. We generated a chromosome-level scaffold assembly including the Z chromosome and a portion of the W chromosome. This assembly reveals the duplication of an olfactory receptor gene (OR3), which we demonstrate enhances the ability of C. pomonella to exploit kairomones and pheromones in locating both host plants and mates. Genome-wide association studies contrasting insecticide-resistant and susceptible strains identify hundreds of single nucleotide polymorphisms (SNPs) potentially associated with insecticide resistance, including three SNPs found in the promoter of CYP6B2. RNAi knockdown of CYP6B2 increases C. pomonella sensitivity to two insecticides, deltamethrin and azinphos methyl. The high-quality genome assembly of C. pomonella informs the genetic basis of its invasiveness, suggesting the codling moth has distinctive capabilities and adaptive potential that may explain its worldwide expansion.
Metagenomic sequence classification should be fast, accurate and information-rich. Emerging long-read sequencing technologies promise to improve the balance between these factors but most existing methods were designed for short reads. MetaMaps is a new method, specifically developed for long reads, capable of mapping a long-read metagenome to a comprehensive RefSeq database with >12,000 genomes in <16?GB or RAM on a laptop computer. Integrating approximate mapping with probabilistic scoring and EM-based estimation of sample composition, MetaMaps achieves >94% accuracy for species-level read assignment and r2?>?0.97 for the estimation of sample composition on both simulated and real data when the sample genomes or close relatives are present in the classification database. To address novel species and genera, which are comparatively harder to predict, MetaMaps outputs mapping locations and qualities for all classified reads, enabling functional studies (e.g. gene presence/absence) and detection of incongruities between sample and reference genomes.
Multispecies host-parasite evolution is common, but how parasites evolve after speciating remains poorly understood. Shared evolutionary history and physiology may propel species along similar evolutionary trajectories whereas pursuing different strategies can reduce competition. We test these scenarios in the economically important association between honey bees and ectoparasitic mites by sequencing the genomes of the sister mite species Varroa destructor and Varroa jacobsoni. These genomes were closely related, with 99.7% sequence identity. Among the 9,628 orthologous genes, 4.8% showed signs of positive selection in at least one species. Divergent selective trajectories were discovered in conserved chemosensory gene families (IGR, SNMP), and Halloween genes (CYP) involved in moulting and reproduction. However, there was little overlap in these gene sets and associated GO terms, indicating different selective regimes operating on each of the parasites. Based on our findings, we suggest that species-specific strategies may be needed to combat evolving parasite communities. © The Author(s) 2019.
A First Study of the Virulence Potential of a Bacillus subtilis Isolate From Deep-Sea Hydrothermal Vent.
Bacillus subtilis is the best studied Gram-positive bacterium, primarily as a model of cell differentiation and industrial exploitation. To date, little is known about the virulence of B. subtilis. In this study, we examined the virulence potential of a B. subtilis strain (G7) isolated from the Iheya North hydrothermal field of Okinawa Trough. G7 is aerobic, motile, endospore-forming, and requires NaCl for growth. The genome of G7 is composed of one circular chromosome of 4,216,133 base pairs with an average GC content of 43.72%. G7 contains 4,416 coding genes, 27.5% of which could not be annotated, and the remaining 72.5% were annotated with known or predicted functions in 25 different COG categories. Ten sets of 23S, 5S, and 16S ribosomal RNA operons, 86 tRNA and 14 sRNA genes, 50 tandem repeats, 41 mini-satellites, one microsatellite, and 42 transposons were identified in G7. Comparing to the genome of the B. subtilis wild type strain NCIB 3610T, G7 genome contains many genomic translocations, inversions, and insertions, and twice the amount of genomic Islands (GIs), with 42.5% of GI genes encoding hypothetical proteins. G7 possesses abundant putative virulence genes associated with adhesion, invasion, dissemination, anti-phagocytosis, and intracellular survival. Experimental studies showed that G7 was able to cause mortality in fish and mice following intramuscular/intraperitoneal injection, resist the killing effect of serum complement, and replicate in mouse macrophages and fish peripheral blood leukocytes. Taken together, our study indicates that G7 is a B. subtilis isolate with unique genetic features and can be lethal to vertebrate animals once being introduced into the animals by artificial means. These results provide the first insight into the potential harmfulness of deep-sea B. subtilis.
Birds are a group with immense availability of genomic resources, and hundreds of forthcoming genomes at the doorstep. We review recent developments in whole genome sequencing, phylogenomics, and comparative genomics of birds. Short read based genome assemblies are common, largely due to efforts of the Bird 10K genome project (B10K). Chromosome-level assemblies are expected to increase due to improved long-read sequencing. The available genomic data has enabled the reconstruction of the bird tree of life with increasing confidence and resolution, but challenges remain in the early splits of Neoaves due to their explosive diversification after the Cretaceous-Paleogene (K-Pg) event. Continued genomic sampling of the bird tree of life will not just better reflect their evolutionary history but also shine new light onto the organization of phylogenetic signal and conflict across the genome. The comparatively simple architecture of avian genomes makes them a powerful system to study the molecular foundation of bird specific traits. Birds are on the verge of becoming an extremely resourceful system to study biodiversity from the nucleotide up.
A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.