Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
Complete genome sequence and analysis of the industrial Saccharomyces cerevisiae strain N85 used in Chinese rice wine production.
Chinese rice wine is a popular traditional alcoholic beverage in China, while its brewing processes have rarely been explored. We herein report the first gapless, near-finished genome sequence of the yeast strain Saccharomyces cerevisiae N85 for Chinese rice wine production. Several assembly methods were used to integrate Pacific Bioscience (PacBio) and Illumina sequencing data to achieve high-quality genome sequencing of the strain. The genome encodes more than 6,000 predicted proteins, and 238 long non-coding RNAs, which are validated by RNA-sequencing data. Moreover, our annotation predicts 171 novel genes that are not present in the reference S288c genome. We also identified 65,902 single nucleotide polymorphisms and small indels, many of which are located within genic regions. Dozens of larger copy-number variations and translocations were detected, mainly enriched in the subtelomeres, suggesting these regions may be related to genomic evolution. This study will serve as a milestone in studying of Chinese rice wine and related beverages in China and in other countries. It will help to develop more scientific and modern fermentation processes of Chinese rice wine, and explore metabolism pathways of desired and harmful components in Chinese rice wine to improve its taste and nutritional value.© The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Extensive gene amplification as a mechanism for piperacillin-tazobactam resistance in Escherichia coli.
Although the TEM-1 ß-lactamase (BlaTEM-1) hydrolyzes penicillins and narrow-spectrum cephalosporins, organisms expressing this enzyme are typically susceptible to ß-lactam/ß-lactamase inhibitor combinations such as piperacillin-tazobactam (TZP). However, our previous work led to the discovery of 28 clinical isolates of Escherichia coli resistant to TZP that contained only blaTEM-1 One of these isolates, E. coli 907355, was investigated further in this study. E. coli 907355 exhibited significantly higher ß-lactamase activity and BlaTEM-1 protein levels when grown in the presence of subinhibitory concentrations of TZP. A corresponding TZP-dependent increase in blaTEM-1 copy number was also observed, with as many as 113 copies of the gene detected per cell. These results suggest that TZP treatment promotes an increase in blaTEM-1 gene dosage, allowing BlaTEM-1 to reach high enough levels to overcome inactivation by the available tazobactam in the culture. To better understand the nature of the blaTEM-1 copy number proliferation, whole-genome sequence (WGS) analysis was performed on E. coli 907355 in the absence and presence of TZP. The WGS data revealed that the blaTEM-1 gene is located in a 10-kb genomic resistance module (GRM) that contains multiple resistance genes and mobile genetic elements. The GRM was found to be tandemly repeated at least 5 times within a p1ESCUM/p1ECUMN-like plasmid when bacteria were grown in the presence of TZP.IMPORTANCE Understanding how bacteria acquire resistance to antibiotics is essential for treating infected patients effectively, as well as preventing the spread of resistant organisms. In this study, a clinical isolate of E. coli was identified that dedicated more than 15% of its genome toward tandem amplification of a ~10-kb resistance module, allowing it to escape antibiotic-mediated killing. Our research is significant in that it provides one possible explanation for clinical isolates that exhibit discordant behavior when tested for antibiotic resistance by different phenotypic methods. Our research also shows that GRM amplification is difficult to detect by short-read WGS technologies. Analysis of raw long-read sequence data was required to confirm GRM amplification as a mechanism of antibiotic resistance. Copyright © 2018 Schechter et al.
Knockout of rapC improves the bacillomycin D yield based on de novo genome sequencing of Bacillus amyloliquefaciens fmbJ.
Bacillus amyloliquefaciens, a Gram-positive and soil-dwelling bacterium, could produce secondary metabolites that suppress plant pathogens. In this study, we provided the whole genome sequence results of B. amyloliquefaciens fmbJ, which had one circular chromosome of 4?193?344 bp with 4249 genes, 87 tRNA genes, and 27 rRNA genes. In addition, fmbJ was found to contain several gene clusters of antimicrobial lipopeptides (bacillomycin D, surfactin, and fengycin), and bacillomycin D homologues were further comprehensively identified. To clarify the influence of rapC regulating the synthesis of lipopeptide on the yield of bacillomycin D, rapC gene in fmbJ was successfully deleted by the marker-free method. Finally, it was found that the deletion of rapC gene in fmbJ significantly improved bacillomycin D production from 240.7 ± 18.9 to 360.8 ± 30.7 mg/L, attributed to the increased the expression of bacillomycin D synthesis-related genes through enhancing the transcriptional level of comA, comP, and phrC. These results showed that the production of bacillomycin D in B. amyloliquefaciens fmbJ might be regulated by the RapC-PhrC system. The findings are expected to advance further agricultural application of Bacillus spp. as a promising source of natural bioactive compounds.
High-quality assembly of the reference genome for scarlet sage, Salvia splendens, an economically important ornamental plant.
Salvia splendens Ker-Gawler, scarlet or tropical sage, is a tender herbaceous perennial widely introduced and seen in public gardens all over the world. With few molecular resources, breeding is still restricted to traditional phenotypic selection, and the genetic mechanisms underlying phenotypic variation remain unknown. Hence, a high-quality reference genome will be very valuable for marker-assisted breeding, genome editing, and molecular genetics.We generated 66 Gb and 37 Gb of raw DNA sequences, respectively, from whole-genome sequencing of a largely homozygous scarlet sage inbred line using Pacific Biosciences (PacBio) single-molecule real-time and Illumina HiSeq sequencing platforms. The PacBio de novo assembly yielded a final genome with a scaffold N50 size of 3.12 Mb and a total length of 808 Mb. The repetitive sequences identified accounted for 57.52% of the genome sequence, and ?54,008 protein-coding genes were predicted collectively with ab initio and homology-based gene prediction from the masked genome. The divergence time between S. splendens and Salvia miltiorrhiza was estimated at 28.21 million years ago (Mya). Moreover, 3,797 species-specific genes and 1,187 expanded gene families were identified for the scarlet sage genome.We provide the first genome sequence and gene annotation for the scarlet sage. The availability of these resources will be of great importance for further breeding strategies, genome editing, and comparative genomics among related species.
Yeonsan Ogye (YO), an indigenous Korean chicken breed (Gallus gallus domesticus), has entirely black external features and internal organs. In this study, the draft genome of YO was assembled using a hybrid de novo assembly method that takes advantage of high-depth Illumina short reads (376.6X) and low-depth Pacific Biosciences (PacBio) long reads (9.7X).The contig and scaffold NG50s of the hybrid de novo assembly were 362.3 Kbp and 16.8 Mbp, respectively. The completeness (97.6%) of the draft genome (Ogye_1.1) was evaluated with single-copy orthologous genes using Benchmarking Universal Single-Copy Orthologs and found to be comparable to the current chicken reference genome (galGal5; 97.4%; contigs were assembled with high-depth PacBio long reads (50X) and scaffolded with short reads) and superior to other avian genomes (92%-93%; assembled with short read-only or hybrid methods). Compared to galGal4 and galGal5, the draft genome included 551 structural variations including the fibromelanosis (FM) locus duplication, related to hyperpigmentation. To comprehensively reconstruct transcriptome maps, RNA sequencing and reduced representation bisulfite sequencing data were analyzed from 20 tissues, including 4 black tissues (skin, shank, comb, and fascia). The maps included 15,766 protein-coding and 6,900 long noncoding RNA genes, many of which were tissue-specifically expressed and displayed tissue-specific DNA methylation patterns in the promoter regions.We expect that the resulting genome sequence and transcriptome maps will be valuable resources for studying domestic chicken breeds, including black-skinned chickens, as well as for understanding genomic differences between breeds and the evolution of hyperpigmented chickens and functional elements related to hyperpigmentation.
Genome survey of the freshwater mussel Venustaconcha ellipsiformis (Bivalvia: Unionida) using a hybrid de novo assembly approach.
Freshwater mussels (Bivalvia: Unionida) serve an important role as aquatic ecosystem engineers but are one of the most critically imperilled groups of animals. Here, we used a combination of sequencing strategies to assemble and annotate a draft genome of Venustaconcha ellipsiformis, which will serve as a valuable genomic resource given the ecological value and unique “doubly uniparental inheritance” mode of mitochondrial DNA transmission of freshwater mussels. The genome described here was obtained by combining high-coverage short reads (65× genome coverage of Illumina paired-end and 11× genome coverage of mate-pairs sequences) with low-coverage Pacific Biosciences long reads (0.3× genome coverage). Briefly, the final scaffold assembly accounted for a total size of 1.54?Gb (366,926 scaffolds, N50?=?6.5 kb, with 2.3% of “N” nucleotides), representing 86% of the predicted genome size of 1.80?Gb, while over one third of the genome (37.5%) consisted of repeated elements and >85% of the core eukaryotic genes were recovered. Given the repeated genetic bottlenecks of V. ellipsiformis populations as a result of glaciations events, heterozygosity was also found to be remarkably low (0.6%), in contrast to most other sequenced bivalve species. Finally, we reassembled the full mitochondrial genome and found six polymorphic sites with respect to the previously published reference. This resource opens the way to comparative genomics studies to identify genes related to the unique adaptations of freshwater mussels and their distinctive mitochondrial inheritance mechanism.
Novel enterobacter lineage as leading cause of nosocomial outbreak involving carbapenemase-producing strains.
We investigated unusual carbapenemase-producing Enterobacter cloacae complex isolates (n = 8) in the novel sequence type (ST) 873, which caused nosocomial infections in 2 hospitals in France. Whole-genome sequence typing showed the 1-year persistence of the epidemic strain, which harbored a blaVIM-4 ST1-IncHI2 plasmid, in 1 health institution and 2 closely related strains harboring blaCTX-M-15 in the other. These isolates formed a new subgroup in the E. hormaechei metacluster, according to their hsp60 sequences and phylogenomic analysis. The average nucleotide identities, specific biochemical properties, and pangenomic and functional investigations of isolates suggested isolates of a novel species that had acquired genes associated with adhesion and mobility. The emergence of this novel Enterobacter phylogenetic lineage within hospitals should be closely monitored because of its ability to persist and spread.
Background: The Argane tree (Argania spinosa L. Skeels) is an endemic tree of southwestern Morocco that plays an important socioeconomic and ecologic role for a dense human population in an arid zone. Several studies confirmed the importance of this species as a food and feed source and as a resource for both pharmaceutical and cosmetic compounds. Unfortunately, the argane tree ecosystem is facing significant threats from environmental changes (global warming, over-population) and over-exploitation. Limited research has been conducted, however, on argane tree genetics and genomics, which hinders its conservation and genetic improvement. Methods: Here, we present a draft genome assembly of A. spinosa. A reliable reference genome of A. spinosa was created using a hybrid de novo assembly approach combining short and long sequencing reads. Results: In total, 144 Gb Illumina HiSeq reads and 7.2 Gb PacBio reads were produced and assembled. The final draft genome comprises 75 327 scaffolds totaling 671 Mb with an N50 of 49 916 kb. The draft assembly is close to the genome size estimated by k-mers distribution and covers 89% of complete and 4.3 % of partial Arabidopsis orthologous groups in BUSCO. Conclusion: The A. spinosa genome will be useful for assessing biodiversity leading to efficient conservation of this endangered endemic tree. Furthermore, the genome may enable genome-assisted cultivar breeding, and provide a better understanding of important metabolic pathways and their underlying genes for both cosmetic and pharmacological purposes.
The chromosome-level genome assemblies of two rattans (Calamus simplicifolius and Daemonorops jenkinsiana).
Calamus simplicifolius and Daemonorops jenkinsiana are two representative rattans, the most significant material sources for the rattan industry. However, the lack of reference genome sequences is a major obstacle for basic and applied biology on rattan.We produced two chromosome-level genome assemblies of C. simplicifolius and D. jenkinsiana using Illumina, Pacific Biosciences, and Hi-C sequencing data. A total of ~730 Gb and ~682 Gb of raw data covered the predicted genome lengths (~1.98 Gb of C. simplicifolius and ~1.61 Gb of D. jenkinsiana) to ~372 × and ~426 × read depths, respectively. The two de novo genome assemblies, ~1.94 Gb and ~1.58 Gb, were generated with scaffold N50s of ~160 Mb and ~119 Mb in C. simplicifolius and D. jenkinsiana, respectively. The C. simplicifolius and D. jenkinsiana genomes were predicted to harbor ?51,235 and ?53,342 intact protein-coding gene models, respectively. Benchmarking Universal Single-Copy Orthologs evaluation demonstrated that genome completeness reached 96.4% and 91.3% in the C. simplicifolius and D. jenkinsiana genomes, respectively. Genome evolution showed that four Arecaceae plants clustered together, and the divergence time between the two rattans was ~19.3 million years ago. Additionally, we identified 193 and 172 genes involved in the lignin biosynthesis pathway in the C. simplicifolius and D. jenkinsiana genomes, respectively.We present the first de novo assemblies of two rattan genomes (C. simplicifolius and D. jenkinsiana). These data will not only provide a fundamental resource for functional genomics, particularly in promoting germplasm utilization for breeding, but also serve as reference genomes for comparative studies between and among different species.
The cane toad (Rhinella marina formerly Bufo marinus) is a species native to Central and South America that has spread across many regions of the globe. Cane toads are known for their rapid adaptation and deleterious impacts on native fauna in invaded regions. However, despite an iconic status, there are major gaps in our understanding of cane toad genetics. The availability of a genome would help to close these gaps and accelerate cane toad research.We report a draft genome assembly for R. marina, the first of its kind for the Bufonidae family. We used a combination of long-read Pacific Biosciences RS II and short-read Illumina HiSeq X sequencing to generate 359.5 Gb of raw sequence data. The final hybrid assembly of 31,392 scaffolds was 2.55 Gb in length with a scaffold N50 of 168 kb. BUSCO analysis revealed that the assembly included full length or partial fragments of 90.6% of tetrapod universal single-copy orthologs (n = 3950), illustrating that the gene-containing regions have been well assembled. Annotation predicted 25,846 protein coding genes with similarity to known proteins in Swiss-Prot. Repeat sequences were estimated to account for 63.9% of the assembly.The R. marina draft genome assembly will be an invaluable resource that can be used to further probe the biology of this invasive species. Future analysis of the genome will provide insights into cane toad evolution and enrich our understanding of their interplay with the ecosystem at large.
Draft genome sequence of wild Prunus yedoensis reveals massive inter-specific hybridization between sympatric flowering cherries.
Hybridization is an important evolutionary process that results in increased plant diversity. Flowering Prunus includes popular cherry species that are appreciated worldwide for their flowers. The ornamental characteristics were acquired both naturally and through artificially hybridizing species with heterozygous genomes. Therefore, the genome of hybrid flowering Prunus presents important challenges both in plant genomics and evolutionary biology.We use long reads to sequence and analyze the highly heterozygous genome of wild Prunus yedoensis. The genome assembly covers >?93% of the gene space; annotation identified 41,294 protein-coding genes. Comparative analysis of the genome with 16 accessions of six related taxa shows that 41% of the genes were assigned into the maternal or paternal state. This indicates that wild P. yedoensis is an F1 hybrid originating from a cross between maternal P. pendula f. ascendens and paternal P. jamasakura, and it can be clearly distinguished from its confusing taxon, Yoshino cherry. A focused analysis of the S-locus haplotypes of closely related taxa distributed in a sympatric natural habitat suggests that reduced restriction of inter-specific hybridization due to strong gametophytic self-incompatibility is likely to promote complex hybridization of wild Prunus species and the development of a hybrid swarm.We report the draft genome assembly of a natural hybrid Prunus species using long-read sequencing and sequence phasing. Based on a comprehensive comparative genome analysis with related taxa, it appears that cross-species hybridization in sympatric habitats is an ongoing process that facilitates the diversification of flowering Prunus.
Understanding how crop plants evolved from their wild relatives and spread around the world can inform about the origins of agriculture. Here, we review how the rapid development of genomic resources and tools has made it possible to conduct genetic mapping and population genetic studies to unravel the molecular underpinnings of domestication and crop evolution in diverse crop species. We propose three future avenues for the study of crop evolution: establishment of high-quality reference genomes for crops and their wild relatives; genomic characterization of germplasm collections; and the adoption of novel methodologies such as archaeogenetics, epigenomics, and genome editing.
Characterization of the antimonite- and arsenite-oxidizing bacterium Bosea sp. AS-1 and its potential application in arsenic removal.
Arsenic (As) and antinomy (Sb) usually coexist in natural environments where both of them pollute soils and water. Microorganisms that oxidize arsenite [As(III)] and tolerate Sb have great potential in As and Sb bioremediation, In this study, a Gram-negative bacterial strain, Bosea sp. AS-1, was isolated from a mine slag sample collected in Xikuangshan Sb mine in China. AS-1 could tolerate 120?mM of As(III) and 50?mM of antimonite [Sb(III)]. It could also oxidize 2?mM of As(III) or Sb(III) completely under heterotrophic and aerobic conditions. Interestingly, strain AS-1 preferred to oxidize As(III) with yeast extract as the carbon source, whereas Sb(III) oxidation was favored with lactate in the medium. Genomic analysis of AS-1 confirmed the presence of several gene islands for As resistance and oxidation. Notably, a system of AS-1 and goethite was found to be able to remove 99% of the As with the initial concentration of 500?µg/L As(III) and 500?µg/L Sb(III), which suggests the potential of this approach for As removal in environments especially with the presence of high Sb. Copyright © 2018 Elsevier B.V. All rights reserved.
Hilsa shad (Tenualosa ilisha), is a popular fish of Bangladesh belonging to the Clupeidae family. An anadromous species, like the salmon and many other migratory fish, it is a unique species that lives in the sea and travels to freshwater rivers for spawning. During its entire life, Tenualosa ilisha migrates both from sea to freshwater and vice versa.The genome of Tenualosa ilisha collected from the river Padma of Rajshahi, Bangladesh has been sequenced and its de novo hybrid assembly and structural annotations are being reported here. Illumina and PacBio sequencing platforms were used for high depth sequencing and the draft genome assembly was found to be 816 MB with N50 size of 188 kb. MAKER gene annotation tool predicted 31,254 gene models. Benchmarking Universal Single-Copy Orthologs refer 95% completeness of the assembled genome.