Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time
After nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has been finished end to end, and hundreds of unresolved gaps persist. The remaining gaps include ribosomal rDNA arrays, large near-identical segmental duplications, and satellite DNA arrays. These regions harbor largely unexplored variation of unknown consequence, and their absence from the current reference genome can lead to experimental artifacts and hide true variants when re-sequencing additional human genomes. Here we present a de novo human genome assembly that surpasses the continuity of GRCh38, along with the first gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome 3, we reconstructed the ~2.8 megabase centromeric satellite DNA array and closed all 29 remaining gaps in the current reference, including new sequence from the human pseudoautosomal regions and cancer-testis ampliconic gene families (CT-X and GAGE). This complete chromosome X, combined with the ultra-long nanopore data, also allowed us to map methylation patterns across complex tandem repeats and satellite arrays for the first time. These results demonstrate that finishing the human genome is now within reach and will enable ongoing efforts to complete the remaining human chromosomes.
Long metabarcoding of the eukaryotic rDNA operon to phylogenetically and taxonomically resolve environmental diversity
High-throughput environmental DNA metabarcoding has revolutionized the analysis of microbial diversity, but this approach is generally restricted to amplicon sizes below 500 base pairs. These short regions contain limited phylogenetic signal, which makes it impractical to use environmental DNA in full phylogenetic inferences. However, new long-read sequencing technologies such as the Pacific Biosciences platform may provide sufficiently large sequence lengths to overcome the poor phylogenetic resolution of short amplicons. To test this idea, we amplified soil DNA and used PacBio Circular Consensus Sequencing (CCS) to obtain a ~4500 bp region of the eukaryotic rDNA operon spanning most of the small (18S) and large subunit (28S) ribosomal RNA genes. The CCS reads were first treated with a novel curation workflow that generated 650 high-quality OTUs containing the physically linked 18S and 28S regions of the long amplicons. In order to assign taxonomy to these OTUs, we developed a phylogeny-aware approach based on the 18S region that showed greater accuracy and sensitivity than similarity-based and phylogenetic placement-based methods using shorter reads. The taxonomically-annotated OTUs were then combined with available 18S and 28S reference sequences to infer a well-resolved phylogeny spanning all major groups of eukaryotes, allowing to accurately derive the evolutionary origin of environmental diversity. A total of 1019 sequences were included, of which a majority (58%) corresponded to the new long environmental CCS reads. Comparisons to the 18S-only region of our amplicons revealed that the combined 18S-28S genes globally increased the phylogenetic resolution, recovering specific groupings otherwise missing. The long-reads also allowed to directly investigate the relationships among environmental sequences themselves, which represents a key advantage over the placement of short reads on a reference phylogeny. Altogether, our results show that long amplicons can be treated in a full phylogenetic framework to provide greater taxonomic resolution and a robust evolutionary perspective to environmental DNA.
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
Complete genome sequence provides insights into the quorum sensing-related spoilage potential of Shewanella baltica 128 isolated from spoiled shrimp.
Shewanella baltica 128 is a specific spoilage organism (SSO) isolated from the refrigerated shrimp that results in shrimp spoilage. This study reported the complete genome sequencing of this strain, with the primary annotations associated with amino acid transport and metabolism (8.66%), indicating that S. baltica 128 has good potential for degrading proteins. In vitro experiments revealed Shewanella baltica 128 could adapt to the stress conditions by regulating its growth and biofilm formation. Genes that related to the spoilage-related metabolic pathways, including trimethylamine metabolism (torT), sulfur metabolism (cysM), putrescine metabolism (speC), biofilm formation (rpoS) and serine protease production (degS), were identified. Genes (LuxS, pfs, LuxR and qseC) that related to the specific QS system were also identified. Complete genome sequence of S. baltica 128 provide insights into the QS-related spoilage potential, which might provide novel information for the development of new approaches for spoilage detection and prevention based on QS target.Copyright © 2019. Published by Elsevier Inc.
Development of high-throughput sequencing techniques have greatly benefited our understanding about microbial ecology; yet the methods producing short reads suffer from species-level resolution and uncertainty of identification. Here we optimize PacBio-based metabarcoding protocols covering the Internal Transcribed Spacer (ITS region) and partial Small Subunit (SSU) of the rRNA gene for species-level identification of all eukaryotes, with a specific focus on Fungi (including Glomeromycota) and Stramenopila (particularly Oomycota). Based on tests on composite soil samples and mock communities, we propose best suitable degenerate primers, ITS9munngs + ITS4ngsUni for eukaryotes and selected groups therein and discuss pros and cons of long read-based identification of eukaryotes. This article is protected by copyright. All rights reserved.
Chemical defense against predators is widespread in natural ecosystems. Occasionally, taxonomically distant organisms share the same defense chemical. Here, we describe an unusual tripartite marine symbiosis, in which an intracellular bacterial symbiont (“Candidatus Endobryopsis kahalalidefaciens”) uses a diverse array of biosynthetic enzymes to convert simple substrates into a library of complex molecules (the kahalalides) for chemical defense of the host, the alga Bryopsis sp., against predation. The kahalalides are subsequently hijacked by a third partner, the herbivorous mollusk Elysia rufescens, and employed similarly for defense. “Ca E. kahalalidefaciens” has lost many essential traits for free living and acts as a factory for kahalalide production. This interaction between a bacterium, an alga, and an animal highlights the importance of chemical defense in the evolution of complex symbioses.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Relative Performance of MinION (Oxford Nanopore Technologies) versus Sequel (Pacific Biosciences) Third-Generation Sequencing Instruments in Identification of Agricultural and Forest Fungal Pathogens.
Culture-based molecular identification methods have revolutionized detection of pathogens, yet these methods are slow and may yield inconclusive results from environmental materials. The second-generation sequencing tools have much-improved precision and sensitivity of detection, but these analyses are costly and may take several days to months. Of the third-generation sequencing techniques, the portable MinION device (Oxford Nanopore Technologies) has received much attention because of its small size and possibility of rapid analysis at reasonable cost. Here, we compare the relative performances of two third-generation sequencing instruments, MinION and Sequel (Pacific Biosciences), in identification and diagnostics of fungal and oomycete pathogens from conifer (Pinaceae) needles and potato (Solanum tuberosum) leaves and tubers. We demonstrate that the Sequel instrument is efficient for metabarcoding of complex samples, whereas MinION is not suited for this purpose due to a high error rate and multiple biases. However, we find that MinION can be utilized for rapid and accurate identification of dominant pathogenic organisms and other associated organisms from plant tissues following both amplicon-based and PCR-free metagenomics approaches. Using the metagenomics approach with shortened DNA extraction and incubation times, we performed the entire MinION workflow, from sample preparation through DNA extraction, sequencing, bioinformatics, and interpretation, in 2.5 h. We advocate the use of MinION for rapid diagnostics of pathogens and potentially other organisms, but care needs to be taken to control or account for multiple potential technical biases.IMPORTANCE Microbial pathogens cause enormous losses to agriculture and forestry, but current combined culturing- and molecular identification-based detection methods are too slow for rapid identification and application of countermeasures. Here, we develop new and rapid protocols for Oxford Nanopore MinION-based third-generation diagnostics of plant pathogens that greatly improve the speed of diagnostics. However, due to high error rate and technical biases in MinION, the Pacific BioSciences Sequel platform is more useful for in-depth amplicon-based biodiversity monitoring (metabarcoding) from complex environmental samples.Copyright © 2019 American Society for Microbiology.
Plantibacter flavus, Curtobacterium herbarum, Paenibacillus taichungensis, and Rhizobium selenitireducens Endophytes Provide Host-Specific Growth Promotion of Arabidopsis thaliana, Basil, Lettuce, and Bok Choy Plants.
A collection of bacterial endophytes isolated from stem tissues of plants growing in soils highly contaminated with petroleum hydrocarbons were screened for plant growth-promoting capabilities. Twenty-seven endophytic isolates significantly improved the growth of Arabidopsis thaliana plants in comparison to that of uninoculated control plants. The five most beneficial isolates, one strain each of Curtobacterium herbarum, Paenibacillus taichungensis, and Rhizobium selenitireducens and two strains of Plantibacter flavus were further examined for growth promotion in Arabidopsis, lettuce, basil, and bok choy plants. Host-specific plant growth promotion was observed when plants were inoculated with the five bacterial strains. P. flavus strain M251 increased the total biomass and total root length of Arabidopsis plants by 4.7 and 5.8 times, respectively, over that of control plants and improved lettuce and basil root growth, while P. flavus strain M259 promoted Arabidopsis shoot and root growth, lettuce and basil root growth, and bok choy shoot growth. A genome comparison between P. flavus strains M251 and M259 showed that both genomes contain up to 70 actinobacterial putative plant-associated genes and genes involved in known plant-beneficial pathways, such as those for auxin and cytokinin biosynthesis and 1-aminocyclopropane-1-carboxylate deaminase production. This study provides evidence of direct plant growth promotion by Plantibacter flavusIMPORTANCE The discovery of new plant growth-promoting bacteria is necessary for the continued development of biofertilizers, which are environmentally friendly and cost-efficient alternatives to conventional chemical fertilizers. Biofertilizer effects on plant growth can be inconsistent due to the complexity of plant-microbe interactions, as the same bacteria can be beneficial to the growth of some plant species and neutral or detrimental to others. We examined a set of bacterial endophytes isolated from plants growing in a unique petroleum-contaminated environment to discover plant growth-promoting bacteria. We show that strains of Plantibacter flavus exhibit strain-specific plant growth-promoting effects on four different plant species.Copyright © 2019 American Society for Microbiology.
Supernumerary B chromosomes (Bs) are extra karyotype units in addition to A chromosomes, and are found in some fungi and thousands of animals and plant species. Bs are uniquely characterized due to their non-Mendelian inheritance, and represent one of the best examples of genomic conflict. Over the last decades, their genetic composition, function and evolution have remained an unresolved query, although a few successful attempts have been made to address these phenomena. A classical concept based on cytogenetics and genetics is that Bs are selfish and abundant with DNA repeats and transposons, and in most cases, they do not carry any function. However, recently, the modern quantum development of high scale multi-omics techniques has shifted B research towards a new-born field that we call “B-omics”. We review the recent literature and add novel perspectives to the B research, discussing the role of new technologies to understand the mechanistic perspectives of the molecular evolution and function of Bs. The modern view states that B chromosomes are enriched with genes for many significant biological functions, including but not limited to the interesting set of genes related to cell cycle and chromosome structure. Furthermore, the presence of B chromosomes could favor genomic rearrangements and influence the nuclear environment affecting the function of other chromatin regions. We hypothesize that B chromosomes might play a key function in driving their transmission and maintenance inside the cell, as well as offer an extra genomic compartment for evolution.
Chromulinavorax destructans, a pathogen of microzooplankton that provides a window into the enigmatic candidate phylum Dependentiae.
Members of the major candidate phylum Dependentiae (a.k.a. TM6) are widespread across diverse environments from showerheads to peat bogs; yet, with the exception of two isolates infecting amoebae, they are only known from metagenomic data. The limited knowledge of their biology indicates that they have a long evolutionary history of parasitism. Here, we present Chromulinavorax destructans (Strain SeV1) the first isolate of this phylum to infect a representative from a widespread and ecologically significant group of heterotrophic flagellates, the microzooplankter Spumella elongata (Strain CCAP 955/1). Chromulinavorax destructans has a reduced 1.2 Mb genome that is so specialized for infection that it shows no evidence of complete metabolic pathways, but encodes an extensive transporter system for importing nutrients and energy in the form of ATP from the host. Its replication causes extensive reorganization and expansion of the mitochondrion, effectively surrounding the pathogen, consistent with its dependency on the host for energy. Nearly half (44%) of the inferred proteins contain signal sequences for secretion, including many without recognizable similarity to proteins of known function, as well as 98 copies of proteins with an ankyrin-repeat domain; ankyrin-repeats are known effectors of host modulation, suggesting the presence of an extensive host-manipulation apparatus. These observations help to cement members of this phylum as widespread and diverse parasites infecting a broad range of eukaryotic microbes.
A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set.
In addition to the BAC-based reference sequence of the accession Columbia-0 from the year 2000, several short read assemblies of THE plant model organism Arabidopsis thaliana were published during the last years. Also, a SMRT-based assembly of Landsberg erecta has been generated that identified translocation and inversion polymorphisms between two genotypes of the species. Here we provide a chromosome-arm level assembly of the A. thaliana accession Niederzenz-1 (AthNd-1_v2c) based on SMRT sequencing data. The best assembly comprises 69 nucleome sequences and displays a contig length of up to 16 Mbp. Compared to an earlier Illumina short read-based NGS assembly (AthNd-1_v1), a 75 fold increase in contiguity was observed for AthNd-1_v2c. To assign contig locations independent from the Col-0 gold standard reference sequence, we used genetic anchoring to generate a de novo assembly. In addition, we assembled the chondrome and plastome sequences. Detailed analyses of AthNd-1_v2c allowed reliable identification of large genomic rearrangements between A. thaliana accessions contributing to differences in the gene sets that distinguish the genotypes. One of the differences detected identified a gene that is lacking from the Col-0 gold standard sequence. This de novo assembly extends the known proportion of the A. thaliana pan-genome.
Streptococcus periodonticum sp. nov., Isolated from Human Subgingival Dental Plaque of Periodontitis Lesion.
A novel facultative anaerobic and Gram-stain-positive coccus, designated strain ChDC F135T, was isolated from human subgingival dental plaque of periodontitis lesion and was characterized by polyphasic taxonomic analysis. The 16S rRNA gene (16S rDNA) sequence of strain ChDC F135T was closest to that of Streptococcus sinensis HKU4T (98.2%), followed by Streptococcus intermedia SK54T (97.0%), Streptococcus constellatus NCTC11325T (96.0%), and Streptococcus anginosus NCTC 10713T (95.7%). In contrast, phylogenetic analysis based on the superoxide dismutase gene (sodA) and the RNA polymerase beta-subunit gene (rpoB) showed that the nucleotide sequence similarities of strain ChDC F135T were highly similar to the corresponding genes of S. anginosus NCTC 10713T (99.2% and 97.6%, respectively), S. constellatus NCTC11325T (87.8% and 91.4%, respectively), and S. intermedia SK54T (85.8% and 91.2%, respectively) rather than those of S. sinensis HKU4T (80.5% and 82.6%). The complete genome of strain ChDC F135T consisted of 1,901,251 bp and the G+C content was 38.9 mol %. Average nucleotide identity value between strain ChDC F135T and S. sinensis HKU4T or S. anginosus NCTC 10713T were 75.7% and 95.6%, respectively. The C14:0 composition of the cellular fatty acids of strain ChDC F135T (32.8%) was different from that of S. intermedia (6-8%), S. constellatus (6-13%), and S. anginosus (13-20%). Based on the results of phylogenetic and phenotypic analysis, strain ChDC F135T (=?KCOM 2412T?=?JCM 33300T) was classified as a type strain of a novel species of the genus Streptococcus, for which we proposed the name Streptococcus periodonticum sp. nov.
A novel facultative anaerobic, Gram-stain-negative coccus, designated strain ChDC B345T, was isolated from human pericoronitis lesion and was characterized by polyphasic taxonomic analysis. The 16S ribosomal RNA gene (16S rDNA) sequence revealed that the strain belonged to the genus Streptococcus. The 16S rDNA sequence of strain ChDC B345T was most closely related to those of Streptococcus mitis NCTC 12261T (99.5%) and Streptococcus pseudopneumoniae ATCC BAA-960T (99.5%). Complete genome of strain ChDC B345T was 1,972,471 bp in length and the G?+?C content was 40.2 mol%. Average nucleotide identity values between strain ChDC B345T and S. pseudopneumoniae ATCC BAA-960T or S. mitis NCTC 12261T were 92.17% and 93.63%, respectively. Genome-to-genome distance values between strain ChDC B345T and S. pseudopneumoniae ATCC BAA-960T or S. mitis NCTC 12261T were 47.8% (45.2-50.4%) and 53.0% (51.0-56.4%), respectively. Based on these results, strain ChDC B345T (=?KCOM 1679T?=?JCM 33299T) should be classified as a novel species of genus Streptococcus, for which we propose the name Streptococcus gwangjuense sp. nov.
Wolbachia, an alpha-proteobacterium closely related to Rickettsia, is a maternally transmitted, intracellular symbiont of arthropods and nematodes. Aedes albopictus mosquitoes are naturally infected with Wolbachia strains wAlbA and wAlbB. Cell line Aa23 established from Ae. albopictus embryos retains only wAlbB and is a key model to study host-endosymbiont interactions. We have assembled the complete circular genome of wAlbB from the Aa23 cell line using long-read PacBio sequencing at 500× median coverage. The assembled circular chromosome is 1.48 megabases in size, an increase of more than 300 kb over the published draft wAlbB genome. The annotation of the genome identified 1,205 protein coding genes, 34 tRNA, 3 rRNA, 1 tmRNA, and 3 other ncRNA loci. The long reads enabled sequencing over complex repeat regions which are difficult to resolve with short-read sequencing. Thirteen percent of the genome comprised insertion sequence elements distributed throughout the genome, some of which cause pseudogenization. Prophage WO genes encoding some essential components of phage particle assembly are missing, while the remainder are found in five prophage regions/WO-like islands or scattered around the genome. Orthology analysis identified a core proteome of 535 orthogroups across all completed Wolbachia genomes. The majority of proteins could be annotated using Pfam and eggNOG analyses, including ankyrins and components of the Type IV secretion system. KEGG analysis revealed the absence of five genes in wAlbB which are present in other Wolbachia. The availability of a complete circular chromosome from wAlbB will enable further biochemical, molecular, and genetic analyses on this strain and related Wolbachia. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.