Zygotic genome activation (ZGA) following fertilization is accomplished through a process termed the maternal-to-zygotic transition, during which the maternal RNAs and proteins are degraded and zygotic genome is transcriptionally activated.1 In mice, minor ZGA occurs from S phase of the zygote to G1 phase of the two-cell (2C) embryo, while major ZGA takes place during the middle-to-late 2C stage with a burst of transcription of totipotent cleavage stage-specific genes and retrotransposons.2Dux has been recently identified and considered as a master inducer that regulates the ZGA process.3–5Dux can directly bind and robustly activate 2C stage-specific ZGA transcripts and convert mouse embryonic stem cells (mESCs) to a 2C-like state with unique features that resembles the 2C embryos.4Intriguingly, ~20% embryos with zygotic depletion of Dux unexpectedly reached morula or blastocyst stage even though defective ZGA program was detected.
N6-methyladenosine (m6A) is a widespread RNA modification that influences nearly every aspect of the messenger RNA lifecycle. Our understanding of m6A has been facilitated by the development of global m6A mapping methods, which use antibodies to immunoprecipitate methylated RNA. However, these methods have several limitations, including high input RNA requirements and cross-reactivity to other RNA modifications. Here, we present DART-seq (deamination adjacent to RNA modification targets), an antibody-free method for detecting m6A sites. In DART-seq, the cytidine deaminase APOBEC1 is fused to the m6A-binding YTH domain. APOBEC1-YTH expression in cells induces C-to-U deamination at sites adjacent to m6A residues, which are detected using standard RNA-seq. DART-seq identifies thousands of m6A sites in cells from as little as 10?ng of total RNA and can detect m6A accumulation in cells over time. Additionally, we use long-read DART-seq to gain insights into m6A distribution along the length of individual transcripts.
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated bromelain inhibitors. Four candidate genes for self-incompatibility were linked in F153, but were not functional in self-compatible CB5. Our findings support the coexistence of sexual recombination and a one-step operation in the domestication of clonally propagated crops. This work guides the exploration of sexual and asexual domestication trajectories in other clonally propagated crops.
Translation initiation determines both the quantity and identity of the protein that is encoded in an mRNA by establishing the reading frame for protein synthesis. In eukaryotic cells, numerous translation initiation factors prepare ribosomes for polypeptide synthesis; however, the underlying dynamics of this process remain unclear1,2. A central question is how eukaryotic ribosomes transition from translation initiation to elongation. Here we use in vitro single-molecule fluorescence microscopy approaches in a purified yeast Saccharomyces cerevisiae translation system to monitor directly, in real time, the pathways of late translation initiation and the transition to elongation. This transition was slower in our eukaryotic system than that reported for Escherichia coli3-5. The slow entry to elongation was defined by a long residence time of eukaryotic initiation factor 5B (eIF5B) on the 80S ribosome after the joining of individual ribosomal subunits-a process that is catalysed by this universally conserved initiation factor. Inhibition of the GTPase activity of eIF5B after the joining of ribosomal subunits prevented the dissociation of eIF5B from the 80S complex, thereby preventing elongation. Our findings illustrate how the dissociation of eIF5B serves as a kinetic checkpoint for the transition from initiation to elongation, and how its release may be governed by a change in the conformation of the ribosome complex that triggers GTP hydrolysis.
Horticultural plants play various and critical roles for humans by providing fruits, vegetables, materials for beverages, and herbal medicines and by acting as ornamentals. They have also shaped human art, culture, and environments and thereby have influenced the lifestyles of humans. With the advent of sequencing technologies, there has been a dramatic increase in the number of sequenced genomes of horticultural plant species in the past decade. The genomes of horticultural plants are highly diverse and complex, often with a high degree of heterozygosity and a high ploidy due to their long and complex history of evolution and domestication. Here we summarize the advances in the genome sequencing of horticultural plants, the reconstruction of pan-genomes, and the development of horticultural genome databases. We also discuss past, present, and future studies related to genome sequencing, data storage, data quality, data sharing, and data visualization to provide practical guidance for genomic studies of horticultural plants. Finally, we propose a horticultural plant genome project as well as the roadmap and technical details toward three goals of the project.
Improvement of the Pacific bluefin tuna (Thunnus orientalis) reference genome and development of male-specific DNA markers.
The Pacific bluefin tuna, Thunnus orientalis, is a highly migratory species that is widely distributed in the North Pacific Ocean. Like other marine species, T. orientalis has no external sexual dimorphism; thus, identifying sex-specific variants from whole genome sequence data is a useful approach to develop an effective sex identification method. Here, we report an improved draft genome of T. orientalis and male-specific DNA markers. Combining PacBio long reads and Illumina short reads sufficiently improved genome assembly, with a 38-fold increase in scaffold contiguity (to 444 scaffolds) compared to the first published draft genome. Through analysing re-sequence data of 15 males and 16 females, 250 male-specific SNPs were identified from more than 30 million polymorphisms. All male-specific variants were male-heterozygous, suggesting that T. orientalis has a male heterogametic sex-determination system. The largest linkage disequilibrium block (3,174?bp on scaffold_064) contained 51 male-specific variants. PCR primers and a PCR-based sex identification assay were developed using these male-specific variants. The sex of 115 individuals (56 males and 59 females; sex was diagnosed by visual examination of the gonads) was identified with high accuracy using the assay. This easy, accurate, and practical technique facilitates the control of sex ratios in tuna farms. Furthermore, this method could be used to estimate the sex ratio and/or the sex-specific growth rate of natural populations.
Whole Genome Sequencing of the Mutamouse Model Reveals Strain- and Colony-Level Variation, and Genomic Features of the Transgene Integration Site.
The MutaMouse transgenic rodent model is widely used for assessing in vivo mutagenicity. Here, we report the characterization of MutaMouse’s whole genome sequence and its genetic variants compared to the C57BL/6 reference genome. High coverage (>50X) next-generation sequencing (NGS) of whole genomes from multiple MutaMouse animals from the Health Canada (HC) colony showed ~5 million SNVs per genome, ~20% of which are putatively novel. Sequencing of two animals from a geographically separated colony at Covance indicated that, over the course of 23 years, each colony accumulated 47,847 (HC) and 17,677 (Covance) non-parental homozygous single nucleotide variants. We found no novel nonsense or missense mutations that impair the MutaMouse response to genotoxic agents. Pairing sequencing data with array comparative genomic hybridization (aCGH) improved the accuracy and resolution of copy number variants (CNVs) calls and identified 300 genomic regions with CNVs. We also used long-read sequence technology (PacBio) to show that the transgene integration site involved a large deletion event with multiple inversions and rearrangements near a retrotransposon. The MutaMouse genome gives important genetic context to studies using this model, offers insight on the mechanisms of structural variant formation, and contributes a framework to analyze aCGH results alongside NGS data.
An African Salmonella Typhimurium ST313 sublineage with extensive drug-resistance and signatures of host adaptation.
Bloodstream infections by Salmonella enterica serovar Typhimurium constitute a major health burden in sub-Saharan Africa (SSA). These invasive non-typhoidal (iNTS) infections are dominated by isolates of the antibiotic resistance-associated sequence type (ST) 313. Here, we report emergence of ST313 sublineage II.1 in the Democratic Republic of the Congo. Sublineage II.1 exhibits extensive drug resistance, involving a combination of multidrug resistance, extended spectrum ß-lactamase production and azithromycin resistance. ST313 lineage II.1 isolates harbour an IncHI2 plasmid we name pSTm-ST313-II.1, with one isolate also exhibiting decreased ciprofloxacin susceptibility. Whole genome sequencing reveals that ST313 II.1 isolates have accumulated genetic signatures potentially associated with altered pathogenicity and host adaptation, related to changes observed in biofilm formation and metabolic capacity. Sublineage II.1 emerged at the beginning of the 21st century and is involved in on-going outbreaks. Our data provide evidence of further evolution within the ST313 clade associated with iNTS in SSA.
A chromosome-level genome assembly of Cydia pomonella provides insights into chemical ecology and insecticide resistance.
The codling moth Cydia pomonella, a major invasive pest of pome fruit, has spread around the globe in the last half century. We generated a chromosome-level scaffold assembly including the Z chromosome and a portion of the W chromosome. This assembly reveals the duplication of an olfactory receptor gene (OR3), which we demonstrate enhances the ability of C. pomonella to exploit kairomones and pheromones in locating both host plants and mates. Genome-wide association studies contrasting insecticide-resistant and susceptible strains identify hundreds of single nucleotide polymorphisms (SNPs) potentially associated with insecticide resistance, including three SNPs found in the promoter of CYP6B2. RNAi knockdown of CYP6B2 increases C. pomonella sensitivity to two insecticides, deltamethrin and azinphos methyl. The high-quality genome assembly of C. pomonella informs the genetic basis of its invasiveness, suggesting the codling moth has distinctive capabilities and adaptive potential that may explain its worldwide expansion.
Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants.
We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque show diverged enhancer activity and gene expression. We further characterize a subset that may contribute to ape- or great-ape-specific phenotypic traits, including taillessness, brain volume expansion, improved manual dexterity, and large body size. The rheMacS genome assembly serves as an ideal reference for future biomedical and evolutionary studies.
Long-read sequencing technologies have advantages in genome assembly, structural variant detection and haplotype phasing, but are less suited for single-nucleotide variant (SNV) and insertion/deletion (indel) calling due to the high error rate in comparison with short-read sequencing. Wenger et al., from Pacific Biosciences, optimized the circular consensus sequencing (CCS) protocol to achieve long, high-fidelity reads, in which they selected the SMRTbell library with fractions tightly distributed at 15 kb for high-coverage sequencing.
Multispecies host-parasite evolution is common, but how parasites evolve after speciating remains poorly understood. Shared evolutionary history and physiology may propel species along similar evolutionary trajectories whereas pursuing different strategies can reduce competition. We test these scenarios in the economically important association between honey bees and ectoparasitic mites by sequencing the genomes of the sister mite species Varroa destructor and Varroa jacobsoni. These genomes were closely related, with 99.7% sequence identity. Among the 9,628 orthologous genes, 4.8% showed signs of positive selection in at least one species. Divergent selective trajectories were discovered in conserved chemosensory gene families (IGR, SNMP), and Halloween genes (CYP) involved in moulting and reproduction. However, there was little overlap in these gene sets and associated GO terms, indicating different selective regimes operating on each of the parasites. Based on our findings, we suggest that species-specific strategies may be needed to combat evolving parasite communities. © The Author(s) 2019.
The sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies.
Larimichthys crocea is an endemic marine fish in East Asia that belongs to Sciaenidae in Perciformes. L. crocea has now been recognized as an “iconic” marine fish species in China because not only is it a popular food fish in China, it is a representative victim of overfishing and still provides high value fish products supported by the modern large-scale mariculture industry. Here, we report a chromosome-level reference genome of L. crocea generated by employing the PacBio single molecule sequencing technique (SMRT) and high-throughput chromosome conformation capture (Hi-C) technologies. The genome sequences were assembled into 1,591 contigs with a total length of 723.86?Mb and a contig N50 length of 2.83?Mb. After chromosome-level scaffolding, 24 scaffolds were constructed with a total length of 668.67?Mb (92.48% of the total length). Genome annotation identified 23,657 protein-coding genes and 7262 ncRNAs. This highly accurate, chromosome-level reference genome of L. crocea provides an essential genome resource to support the development of genome-scale selective breeding and restocking strategies of L. crocea.
The sequence and de novo assembly of Takifugu bimaculatus genome using PacBio and Hi-C technologies.
Takifugu bimaculatus is a native teleost species of the southeast coast of China where it has been cultivated as an important edible fish in the last decade. Genetic breeding programs, which have been recently initiated for improving the aquaculture performance of T. bimaculatus, urgently require a high-quality reference genome to facilitate genome selection and related genetic studies. To address this need, we produced a chromosome-level reference genome of T. bimaculatus using the PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies. The genome was assembled into 2,193 contigs with a total length of 404.21?Mb and a contig N50 length of 1.31?Mb. After chromosome-level scaffolding, 22 chromosomes with a total length of 371.68?Mb were constructed. Moreover, a total of 21,117 protein-coding genes and 3,471 ncRNAs were annotated in the reference genome. The highly accurate, chromosome-level reference genome of T. bimaculatus provides an essential genome resource for not only the genome-scale selective breeding of T. bimaculatus but also the exploration of the evolutionary basis of the speciation and local adaptation of the Takifugu genus.