Crop domestication changed the course of human evolution, and domestication of maize (Zea mays L. subspecies mays), today the world’s most important crop, enabled civilizations to flourish and has played a major role in shaping the world we know today. Archaeological and ethnobotanical research help us understand the development of the cultures and the movements of the peoples who carried maize to new areas where it continued to adapt. Ancient remains of maize cobs and kernels have been found in the place of domestication, the Balsas River Valley (~9,000 years before present era), and the cultivation center, the Tehuacan Valley (~5,000 years before present era), and have been used to study the process of domestication. Paleogenomic data showed that some of the genes controlling the stem and inflorescence architecture were comparable to modern maize, while other genes controlling ear shattering and starch biosynthesis retain high levels of variability, similar to those found in the wild relative teosinte. These results indicate that the domestication process was both gradual and complex, where different genetic loci were selected at different points in time, and that the transformation of teosinte to maize was completed in the last 5,000 years. Mesoamerican native cultures domesticated teosinte and developed maize from a 6 cm long, popping-kernel ear to what we now recognize as modern maize with its wide variety in ear size, kernel texture, color, size, and adequacy for diverse uses and also invented nixtamalization, a process key to maximizing its nutrition. Used directly for human and animal consumption, processed food products, bioenergy, and many cultural applications, it is now grown on six of the world’s seven continents. The study of its evolution and domestication from the wild grass teosinte helps us understand the nature of genetic diversity of maize and its wild relatives and gene expression. Genetic barriers to direct use of teosinte or Tripsacum in maize breeding have challenged our ability to identify valuable genes and traits, let alone incorporate them into elite, modern varieties. Genomic information and newer genetic technologies will facilitate the use of wild relatives in crop improvement; hence it is more important than ever to ensure their conservation and availability, fundamental to future food security. In situ conservation efforts dedicated to preserving remnant populations of wild relatives in Mexico are key to safeguarding the genetic diversity of maize and its genepool, as well as enabling these species to continue to adapt to dynamic climate and environmental changes. Genebank ex situ efforts are crucial to securely maintain collected wild relative resources and to provide them for gene discovery and other research efforts.
Genome analysis and genetic transformation of a water surface-floating microalga Chlorococcum sp. FFG039.
Microalgal harvesting and dewatering are the main bottlenecks that need to be overcome to tap the potential of microalgae for production of valuable compounds. Water surface-floating microalgae form robust biofilms, float on the water surface along with gas bubbles entrapped under the biofilms, and have great potential to overcome these bottlenecks. However, little is known about the molecular mechanisms involved in the water surface-floating phenotype. In the present study, we analysed the genome sequence of a water surface-floating microalga Chlorococcum sp. FFG039, with a next generation sequencing technique to elucidate the underlying mechanisms. Comparative genomics study with Chlorococcum sp. FFG039 and other non-floating green microalgae revealed some of the unique gene families belonging to this floating microalga, which may be involved in biofilm formation. Furthermore, genetic transformation of this microalga was achieved with an electroporation method. The genome information and transformation techniques presented in this study will be useful to obtain molecular insights into the water surface-floating phenotype of Chlorococcum sp. FFG039.
Extended insight into the Mycobacterium chelonae-abscessus complex through whole genome sequencing of Mycobacterium salmoniphilum outbreak and Mycobacterium salmoniphilum-like strains.
Members of the Mycobacterium chelonae-abscessus complex (MCAC) are close to the mycobacterial ancestor and includes both human, animal and fish pathogens. We present the genomes of 14 members of this complex: the complete genomes of Mycobacterium salmoniphilum and Mycobacterium chelonae type strains, seven M. salmoniphilum isolates, and five M. salmoniphilum-like strains including strains isolated during an outbreak in an animal facility at Uppsala University. Average nucleotide identity (ANI) analysis and core gene phylogeny revealed that the M. salmoniphilum-like strains are variants of the human pathogen Mycobacterium franklinii and phylogenetically close to Mycobacterium abscessus. Our data further suggested that M. salmoniphilum separates into three branches named group I, II and III with the M. salmoniphilum type strain belonging to group II. Among predicted virulence factors, the presence of phospholipase C (plcC), which is a major virulence factor that makes M. abscessus highly cytotoxic to mouse macrophages, and that M. franklinii originally was isolated from infected humans make it plausible that the outbreak in the animal facility was caused by a M. salmoniphilum-like strain. Interestingly, M. salmoniphilum-like was isolated from tap water suggesting that it can be present in the environment. Moreover, we predicted the presence of mutational hotspots in the M. salmoniphilum isolates and 26% of these hotspots overlap with genes categorized as having roles in virulence, disease and defense. We also provide data about key genes involved in transcription and translation such as sigma factor, ribosomal protein and tRNA genes.
A chromosome-level genome assembly of Cydia pomonella provides insights into chemical ecology and insecticide resistance.
The codling moth Cydia pomonella, a major invasive pest of pome fruit, has spread around the globe in the last half century. We generated a chromosome-level scaffold assembly including the Z chromosome and a portion of the W chromosome. This assembly reveals the duplication of an olfactory receptor gene (OR3), which we demonstrate enhances the ability of C. pomonella to exploit kairomones and pheromones in locating both host plants and mates. Genome-wide association studies contrasting insecticide-resistant and susceptible strains identify hundreds of single nucleotide polymorphisms (SNPs) potentially associated with insecticide resistance, including three SNPs found in the promoter of CYP6B2. RNAi knockdown of CYP6B2 increases C. pomonella sensitivity to two insecticides, deltamethrin and azinphos methyl. The high-quality genome assembly of C. pomonella informs the genetic basis of its invasiveness, suggesting the codling moth has distinctive capabilities and adaptive potential that may explain its worldwide expansion.
Metagenomic sequence classification should be fast, accurate and information-rich. Emerging long-read sequencing technologies promise to improve the balance between these factors but most existing methods were designed for short reads. MetaMaps is a new method, specifically developed for long reads, capable of mapping a long-read metagenome to a comprehensive RefSeq database with >12,000 genomes in <16?GB or RAM on a laptop computer. Integrating approximate mapping with probabilistic scoring and EM-based estimation of sample composition, MetaMaps achieves >94% accuracy for species-level read assignment and r2?>?0.97 for the estimation of sample composition on both simulated and real data when the sample genomes or close relatives are present in the classification database. To address novel species and genera, which are comparatively harder to predict, MetaMaps outputs mapping locations and qualities for all classified reads, enabling functional studies (e.g. gene presence/absence) and detection of incongruities between sample and reference genomes.
Efficient crop improvement depends on the application of accurate genetic information contained in diverse germplasm resources. Here we report a reference-grade genome of wild soybean accession W05, with a final assembled genome size of 1013.2?Mb and a contig N50 of 3.3?Mb. The analytical power of the W05 genome is demonstrated by several examples. First, we identify an inversion at the locus determining seed coat color during domestication. Second, a translocation event between chromosomes 11 and 13 of some genotypes is shown to interfere with the assignment of QTLs. Third, we find a region containing copy number variations of the Kunitz trypsin inhibitor (KTI) genes. Such findings illustrate the power of this assembly in the analysis of large structural variations in soybean germplasm collections. The wild soybean genome assembly has wide applications in comparative genomic and evolutionary studies, as well as in crop breeding and improvement programs.
The human microbiome includes trillions of bacteria, many of which play a vital role in host physiology. Numerous studies have now detected bacterial DNA in first-pass meconium and amniotic fluid samples, suggesting that the human microbiome may commence in utero. However, these data have remained contentious due to underlying contamination issues. Here, we have used a previously described method for reducing contamination in microbiome workflows to determine if there is a fetal bacterial microbiome beyond the level of background contamination. We recruited 50 women undergoing non-emergency cesarean section deliveries with no evidence of intra-uterine infection and collected first-pass meconium and amniotic fluid samples. Full-length 16S rRNA gene sequencing was performed using PacBio SMRT cell technology, to allow high resolution profiling of the fetal gut and amniotic fluid bacterial microbiomes. Levels of inflammatory cytokines were measured in amniotic fluid, and levels of immunomodulatory short chain fatty acids (SCFAs) were quantified in meconium. All meconium samples and most amniotic fluid samples (36/43) contained bacterial DNA. The meconium microbiome was dominated by reads that mapped to Pelomonas puraquae. Aside from this species, the meconium microbiome was remarkably heterogeneous between patients. The amniotic fluid microbiome was more diverse and contained mainly reads that mapped to typical skin commensals, including Propionibacterium acnes and Staphylococcus spp. All meconium samples contained acetate and propionate, at ratios similar to those previously reported in infants. P. puraquae reads were inversely correlated with meconium propionate levels. Amniotic fluid cytokine levels were associated with the amniotic fluid microbiome. Our results demonstrate that bacterial DNA and SCFAs are present in utero, and have the potential to influence the developing fetal immune system.
RNA-seq of HaHV-1-infected abalones reveals a common transcriptional signature of Malacoherpesviruses.
Haliotid herpesvirus-1 (HaHV-1) is the viral agent causative of abalone viral ganglioneuritis, a disease that has severely affected gastropod aquaculture. Although limited, the sequence similarity between HaHV-1 and Ostreid herpesvirus-1 supported the assignment of both viruses to Malacoherpesviridae, a Herpesvirales family distantly related with other viruses. In this study, we reported the first transcriptional data of HaHV-1, obtained from an experimental infection of Haliotis diversicolor supertexta. We also sequenced the genome draft of the Chinese HaHV-1 variant isolated in 2003 (HaHV-1-CN2003) by PacBio technology. Analysis of 13 million reads obtained from 3 RNA samples at 60?hours post injection (hpi) allowed the prediction of 51 new ORFs for a total of 117 viral genes and the identification of 207 variations from the reference genome, consisting in 135 Single Nucleotide Polymorphisms (SNPs) and 72 Insertions or Deletions (InDels). The pairing of genomic and transcriptomic data supported the identification of 60 additional SNPs, representing viral transcriptional variability and preferentially grouped in hotspots. The expression analysis of HaHV-1 ORFs revealed one putative secreted protein, two putative capsid proteins and a possible viral capsid protease as the most expressed genes and demonstrated highly synchronized viral expression patterns of the 3 infected animals at 60?hpi. Quantitative reverse transcription data of 37 viral genes supported the burst of viral transcription at 30 and 60?hpi during the 72?hours of the infection experiment, and allowed the distinction between early and late viral genes.
Systematic identification of intergenic long-noncoding RNAs in mouse retinas using full-length isoform sequencing.
A great mass of long noncoding RNAs (lncRNAs) have been identified in mouse genome and increasing evidences in the last decades have revealed their crucial roles in diverse biological processes. Nevertheless, the biological roles of lncRNAs in the mouse retina remains largely unknown due to the lack of a comprehensive annotation of lncRNAs expressed in the retina.In this study, we applied the long-reads sequencing strategy to unravel the transcriptomes of developing mouse retinas and identified a total of 940 intergenic lncRNAs (lincRNAs) in embryonic and neonatal retinas, including about 13% of them were transcribed from unannotated gene loci. Subsequent analysis revealed that function of lincRNAs expressed in mouse retinas were closely related to the physiological roles of this tissue, including 90 lincRNAs that were differentially expressed after the functional loss of key regulators of retinal ganglion cell (RGC) differentiation. In situ hybridization results demonstrated the enrichment of three class IV POU-homeobox genes adjacent lincRNAs (linc-3a, linc-3b and linc-3c) in ganglion cell layer and indicated they were potentially RGC-specific.In summary, this study systematically annotated the lincRNAs expressed in embryonic and neonatal mouse retinas and implied their crucial regulatory roles in retinal development such as RGC differentiation.
Focal oncogene amplification and rearrangements drive tumor growth and evolution in multiple cancer types. We present AmpliconArchitect (AA), a tool to reconstruct the fine structure of focally amplified regions using whole genome sequencing (WGS) and validate it extensively on multiple simulated and real datasets, across a wide range of coverage and copy numbers. Analysis of AA-reconstructed amplicons in a pan-cancer dataset reveals many novel properties of copy number amplifications in cancer. These findings support a model in which focal amplifications arise due to the formation and replication of extrachromosomal DNA. Applying AA to 68 viral-mediated cancer samples, we identify a large fraction of amplicons with specific structural signatures suggestive of hybrid, human-viral extrachromosomal DNA. AA reconstruction, integrated with metaphase fluorescence in situ hybridization (FISH) and PacBio sequencing on the cell-line UPCI:SCC090 confirm the extrachromosomal origin and fine structure of a Forkhead box E1 (FOXE1)-containing hybrid amplicon.
The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor >?98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.
A First Study of the Virulence Potential of a Bacillus subtilis Isolate From Deep-Sea Hydrothermal Vent.
Bacillus subtilis is the best studied Gram-positive bacterium, primarily as a model of cell differentiation and industrial exploitation. To date, little is known about the virulence of B. subtilis. In this study, we examined the virulence potential of a B. subtilis strain (G7) isolated from the Iheya North hydrothermal field of Okinawa Trough. G7 is aerobic, motile, endospore-forming, and requires NaCl for growth. The genome of G7 is composed of one circular chromosome of 4,216,133 base pairs with an average GC content of 43.72%. G7 contains 4,416 coding genes, 27.5% of which could not be annotated, and the remaining 72.5% were annotated with known or predicted functions in 25 different COG categories. Ten sets of 23S, 5S, and 16S ribosomal RNA operons, 86 tRNA and 14 sRNA genes, 50 tandem repeats, 41 mini-satellites, one microsatellite, and 42 transposons were identified in G7. Comparing to the genome of the B. subtilis wild type strain NCIB 3610T, G7 genome contains many genomic translocations, inversions, and insertions, and twice the amount of genomic Islands (GIs), with 42.5% of GI genes encoding hypothetical proteins. G7 possesses abundant putative virulence genes associated with adhesion, invasion, dissemination, anti-phagocytosis, and intracellular survival. Experimental studies showed that G7 was able to cause mortality in fish and mice following intramuscular/intraperitoneal injection, resist the killing effect of serum complement, and replicate in mouse macrophages and fish peripheral blood leukocytes. Taken together, our study indicates that G7 is a B. subtilis isolate with unique genetic features and can be lethal to vertebrate animals once being introduced into the animals by artificial means. These results provide the first insight into the potential harmfulness of deep-sea B. subtilis.
Comparative genomics and pathogenicity potential of members of the Pseudomonas syringae species complex on Prunus spp.
Diseases on Prunus spp. have been associated with a large number of phylogenetically different pathovars and species within the P. syringae species complex. Despite their economic significance, there is a severe lack of genomic information of these pathogens. The high phylogenetic diversity observed within strains causing disease on Prunus spp. in nature, raised the question whether other strains or species within the P. syringae species complex were potentially pathogenic on Prunus spp.To gain insight into the genomic potential of adaptation and virulence in Prunus spp., a total of twelve de novo whole genome sequences of P. syringae pathovars and species found in association with diseases on cherry (sweet, sour and ornamental-cherry) and peach were sequenced. Strains sequenced in this study covered three phylogroups and four clades. These strains were screened in vitro for pathogenicity on Prunus spp. together with additional genome sequenced strains thus covering nine out of thirteen of the currently defined P. syringae phylogroups. Pathogenicity tests revealed that most of the strains caused symptoms in vitro and no obvious link was found between presence of known virulence factors and the observed pathogenicity pattern based on comparative genomics. Non-pathogenic strains were displaying a two to three times higher generation time when grown in rich medium.In this study, the first set of complete genomes of cherry associated P. syringae strains as well as the draft genome of the quarantine peach pathogen P. syringae pv. persicae were generated. The obtained genomic data were matched with phenotypic data in order to determine factors related to pathogenicity to Prunus spp. Results of this study suggest that the inability to cause disease on Prunus spp. in vitro is not the result of host specialization but rather linked to metabolic impairments of individual strains.
Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community.
DNA methylation plays important roles in prokaryotes, and their genomic landscapes-prokaryotic epigenomes-have recently begun to be disclosed. However, our knowledge of prokaryotic methylation systems is focused on those of culturable microbes, which are rare in nature. Here, we used single-molecule real-time and circular consensus sequencing techniques to reveal the ‘metaepigenomes’ of a microbial community in the largest lake in Japan, Lake Biwa. We reconstructed 19 draft genomes from diverse bacterial and archaeal groups, most of which are yet to be cultured. The analysis of DNA chemical modifications in those genomes revealed 22 methylated motifs, nine of which were novel. We identified methyltransferase genes likely responsible for methylation of the novel motifs, and confirmed the catalytic specificities of four of them via transformation experiments using synthetic genes. Our study highlights metaepigenomics as a powerful approach for identification of the vast unexplored variety of prokaryotic DNA methylation systems in nature.
Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system.
Complete and contiguous genome assemblies greatly improve the quality of subsequent systems-wide functional profiling studies and the ability to gain novel biological insights. While a de novo genome assembly of an isolated bacterial strain is in most cases straightforward, more informative data about co-existing bacteria as well as synergistic and antagonistic effects can be obtained from a direct analysis of microbial communities. However, the complexity of metagenomic samples represents a major challenge. While third generation sequencing technologies have been suggested to enable finished metagenome-assembled genomes, to our knowledge, the complete genome assembly of all dominant strains in a microbiome sample has not been demonstrated. Natural whey starter cultures (NWCs) are used in cheese production and represent low-complexity microbiomes. Previous studies of Swiss Gruyère and selected Italian hard cheeses, mostly based on amplicon metagenomics, concurred that three species generally pre-dominate: Streptococcus thermophilus, Lactobacillus helveticus and Lactobacillus delbrueckii.Two NWCs from Swiss Gruyère producers were subjected to whole metagenome shotgun sequencing using the Pacific Biosciences Sequel and Illumina MiSeq platforms. In addition, longer Oxford Nanopore Technologies MinION reads had to be generated for one to resolve repeat regions. Thereby, we achieved the complete assembly of all dominant bacterial genomes from these low-complexity NWCs, which was corroborated by a 16S rRNA amplicon survey. Moreover, two distinct L. helveticus strains were successfully co-assembled from the same sample. Besides bacterial chromosomes, we could also assemble several bacterial plasmids and phages and a corresponding prophage. Biologically relevant insights were uncovered by linking the plasmids and phages to their respective host genomes using DNA methylation motifs on the plasmids and by matching prokaryotic CRISPR spacers with the corresponding protospacers on the phages. These results could only be achieved by employing long-read sequencing data able to span intragenomic as well as intergenomic repeats.Here, we demonstrate the feasibility of complete de novo genome assembly of all dominant strains from low-complexity NWCs based on whole metagenomics shotgun sequencing data. This allowed to gain novel biological insights and is a fundamental basis for subsequent systems-wide omics analyses, functional profiling and phenotype to genotype analysis of specific microbial communities.