Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.
Microbial diversity in the tick Argas japonicus (Acari: Argasidae) with a focus on Rickettsia pathogens.
The soft tick Argas japonicus mainly infests birds and can cause human dermatitis; however, no pathogen has been identified from this tick species in China. In the present study, the microbiota in A. japonicus collected from an epidemic community was explored, and some putative Rickettsia pathogens were further characterized. The results obtained indicated that bacteria in A. japonicus were mainly ascribed to the phyla Proteobacteria, Firmicutes and Actinobacteria. At the genus level, the male A. japonicus harboured more diverse bacteria than the females and nymphs. The bacteria Alcaligenes, Pseudomonas, Rickettsia and Staphylococcus were common in nymphs and adults. The abundance of bacteria belonging to the Rickettsia genus in females and males was 7.27% and 10.42%, respectively. Furthermore, the 16S rRNA gene of Rickettsia was amplified and sequenced, and phylogenetic analysis revealed that 13 sequences were clustered with the spotted fever group rickettsiae (Rickettsia heilongjiangensis and Rickettsia japonica) and three were clustered with Rickettsia limoniae, which suggested that the characterized Rickettsia in A. japonicus were novel putative pathogens and also that the residents were at considerable risk for infection by tick-borne pathogens. © 2019 The Royal Entomological Society.
Salmonella Genomic Island 3 Is an Integrative and Conjugative Element and Contributes to Copper and Arsenic Tolerance of Salmonella enterica.
Salmonella genomic island 3 (SGI3) was first described as a chromosomal island in Salmonella 4,,12:i:-, a monophasic variant of Salmonella enterica subsp. enterica serovar Typhimurium. The SGI3 DNA sequence detected from Salmonella 4,,12:i:- isolated in Japan was identical to that of a previously reported one across entire length of 81?kb. SGI3 consists of 86 open reading frames, including a copper homeostasis and silver resistance island (CHASRI) and an arsenic tolerance operon, in addition to genes related to conjugative transfer and DNA replication or partitioning, suggesting that the island is a mobile genetic element. We successfully selected transconjugants that acquired SGI3 after filter-mating experiments using the S. enterica serovars Typhimurium, Heidelberg, Hadar, Newport, Cerro, and Thompson as recipients. Southern blot analysis using I-CeuI-digested genomic DNA demonstrated that SGI3 was integrated into a chromosomal fragment of the transconjugants. PCR and sequencing analysis demonstrated that SGI3 was inserted into the 3′ end of the tRNA genes pheV or pheR The length of the target site was 52 or 55?bp, and a 55-bp attI sequence indicating generation of the circular form of SGI3 was also detected. The transconjugants had a higher MIC against CuSO4 compared to the recipient strains under anaerobic conditions. Tolerance was defined by the cus gene cluster in the CHASRI. The transconjugants also had distinctly higher MICs against Na2HAsO4 compared to recipient strains under aerobic conditions. These findings clearly demonstrate that SGI3 is an integrative and conjugative element and contributes to the copper and arsenic tolerance of S. enterica.Copyright © 2019 American Society for Microbiology.
Chromosomal-level assembly of the blolsod clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C.
The blood clam, Scapharca (Anadara) broughtonii, is an economically and ecologically important marine bivalve of the family Arcidae. Efforts to study their population genetics, breeding, cultivation, and stock enrichment have been somewhat hindered by the lack of a reference genome. Herein, we report the complete genome sequence of S. broughtonii, a first reference genome of the family Arcidae.A total of 75.79 Gb clean data were generated with the Pacific Biosciences and Oxford Nanopore platforms, which represented approximately 86× coverage of the S. broughtonii genome. De novo assembly of these long reads resulted in an 884.5-Mb genome, with a contig N50 of 1.80 Mb and scaffold N50 of 45.00 Mb. Genome Hi-C scaffolding resulted in 19 chromosomes containing 99.35% of bases in the assembled genome. Genome annotation revealed that nearly half of the genome (46.1%) is composed of repeated sequences, while 24,045 protein-coding genes were predicted and 84.7% of them were annotated.We report here a chromosomal-level assembly of the S. broughtonii genome based on long-read sequencing and Hi-C scaffolding. The genomic data can serve as a reference for the family Arcidae and will provide a valuable resource for the scientific community and aquaculture sector. © The Author(s) 2019. Published by Oxford University Press.
Identification of Initial Colonizing Bacteria in Dental Plaques from Young Adults Using Full-Length 16S rRNA Gene Sequencing.
Development of dental plaque begins with the adhesion of salivary bacteria to the acquired pellicle covering the tooth surface. In this study, we collected in vivo dental plaque formed on hydroxyapatite disks for 6 h from 74 young adults and identified initial colonizing taxa based on full-length 16S rRNA gene sequences. A long-read, single-molecule sequencer, PacBio Sequel, provided 100,109 high-quality full-length 16S rRNA gene sequence reads from the early plaque microbiota, which were assigned to 90 oral bacterial taxa. The microbiota obtained from every individual mostly comprised the 21 predominant taxa with the maximum relative abundance of over 10% (95.8?±?6.2%, mean ± SD), which included Streptococcus species as well as nonstreptococcal species. A hierarchical cluster analysis of their relative abundance distribution suggested three major patterns of microbiota compositions: a Streptococcus mitis/Streptococcus sp. HMT-423-dominant profile, a Neisseria sicca/Neisseria flava/Neisseria mucosa-dominant profile, and a complex profile with high diversity. No notable variations in the community structures were associated with the dental caries status, although the total bacterial amounts were larger in the subjects with a high number of caries-experienced teeth (=8) than in those with no or a low number of caries-experienced teeth. Our results revealed the bacterial taxa primarily involved in early plaque formation on hydroxyapatite disks in young adults.IMPORTANCE Selective attachment of salivary bacteria to the tooth surface is an initial and repetitive phase in dental plaque development. We employed full-length 16S rRNA gene sequence analysis with a high taxonomic resolution using a third-generation sequencer, PacBio Sequel, to determine the bacterial composition during early plaque formation in 74 young adults accurately and in detail. The results revealed 21 bacterial taxa primarily involved in early plaque formation on hydroxyapatite disks in young adults, which include several streptococcal species as well as nonstreptococcal species, such as Neisseria sicca/Nflava/Nmucosa and Rothia dentocariosa Given that no notable variations in the microbiota composition were associated with the dental caries status, the maturation process, rather than the specific bacterial species that are the initial colonizers, is likely to play an important role in the development of dysbiotic microbiota associated with dental caries. Copyright © 2019 Ihara et al.
African cichlid fishes are well known for their rapid radiations and are a model system for studying evolutionary processes. Here we compare multiple, high-quality, chromosome-scale genome assemblies to elucidate the genetic mechanisms underlying cichlid diversification and study how genome structure evolves in rapidly radiating lineages.We re-anchored our recent assembly of the Nile tilapia (Oreochromis niloticus) genome using a new high-density genetic map. We also developed a new de novo genome assembly of the Lake Malawi cichlid, Metriaclima zebra, using high-coverage Pacific Biosciences sequencing, and anchored contigs to linkage groups (LGs) using 4 different genetic maps. These new anchored assemblies allow the first chromosome-scale comparisons of African cichlid genomes. Large intra-chromosomal structural differences (~2-28 megabase pairs) among species are common, while inter-chromosomal differences are rare (<10 megabase pairs total). Placement of the centromeres within the chromosome-scale assemblies identifies large structural differences that explain many of the karyotype differences among species. Structural differences are also associated with unique patterns of recombination on sex chromosomes. Structural differences on LG9, LG11, and LG20 are associated with reduced recombination, indicative of inversions between the rock- and sand-dwelling clades of Lake Malawi cichlids. M. zebra has a larger number of recent transposable element insertions compared with O. niloticus, suggesting that several transposable element families have a higher rate of insertion in the haplochromine cichlid lineage.This study identifies novel structural variation among East African cichlid genomes and provides a new set of genomic resources to support research on the mechanisms driving cichlid adaptation and speciation. © The Author(s) 2019. Published by Oxford University Press.
Supernumerary B chromosomes (Bs) are extra karyotype units in addition to A chromosomes, and are found in some fungi and thousands of animals and plant species. Bs are uniquely characterized due to their non-Mendelian inheritance, and represent one of the best examples of genomic conflict. Over the last decades, their genetic composition, function and evolution have remained an unresolved query, although a few successful attempts have been made to address these phenomena. A classical concept based on cytogenetics and genetics is that Bs are selfish and abundant with DNA repeats and transposons, and in most cases, they do not carry any function. However, recently, the modern quantum development of high scale multi-omics techniques has shifted B research towards a new-born field that we call “B-omics”. We review the recent literature and add novel perspectives to the B research, discussing the role of new technologies to understand the mechanistic perspectives of the molecular evolution and function of Bs. The modern view states that B chromosomes are enriched with genes for many significant biological functions, including but not limited to the interesting set of genes related to cell cycle and chromosome structure. Furthermore, the presence of B chromosomes could favor genomic rearrangements and influence the nuclear environment affecting the function of other chromatin regions. We hypothesize that B chromosomes might play a key function in driving their transmission and maintenance inside the cell, as well as offer an extra genomic compartment for evolution.
Human metapneumovirus (HMPV) has been a notable etiological agent of acute respiratory infection in humans, but it was not discovered until 2001, because HMPV replicates only in a limited number of cell lines and the cytopathic effect (CPE) is often mild. To promote the study of HMPV, several groups have generated green fluorescent protein (GFP)-expressing recombinant HMPV strains (HMPVGFP). However, the growing evidence has complicated the understanding of cell line specificity of HMPV, because it seems to vary notably among HMPV strains. In addition, unique A2b clade HMPV strains with a 180-nucleotide duplication in the G gene (HMPV A2b180nt-dup strains) have recently been detected. In this study, we re-evaluated and compared the cell line specificity of clinical isolates of HMPV strains, including the novel HMPV A2b180nt-dup strains, and six recombinant HMPVGFP strains, including the newly generated recombinant HMPV A2b180nt-dup strain, MG0256-EGFP. Our data demonstrate that VeroE6 and LLC-MK2 cells generally showed the highest infectivity with any clinical isolates and recombinant HMPVGFP strains. Other human-derived cell lines (BEAS-2B, A549, HEK293, MNT-1, and HeLa cells) showed certain levels of infectivity with HMPV, but these were significantly lower than those of VeroE6 and LLC-MK2 cells. Also, the infectivity in these suboptimal cell lines varied greatly among HMPV strains. The variations were not directly related to HMPV genotypes, cell lines used for isolation and propagation, specific genome mutations, or nucleotide duplications in the G gene. Thus, these variations in suboptimal cell lines are likely intrinsic to particular HMPV strains.
Members of the genus Nocardia are widespread in diverse environments; a wide range of Nocardia species are known to cause nocardiosis in several animals, including cat, dog, fish, and humans. Of the pathogenic Nocardia species, N. seriolae is known to cause disease in cultured fish, resulting in major economic loss. We isolated two N. seriolae strains, CK-14008 and EM15050, from diseased fish and sequenced their genomes using the PacBio sequencing platform. To identify their genomic features, we compared their genomes with those of other Nocardia species. Phylogenetic analysis showed that N. seriolae shares a common ancestor with a putative human pathogenic Nocardia species. Moreover, N. seriolae strains were phylogenetically divided into four clusters according to host fish families. Through genome comparison, we observed that the putative pathogenic Nocardia strains had additional genes for iron acquisition. Dozens of antibiotic resistance genes were detected in the genomes of N. seriolae strains; most of the antibiotics were involved in the inhibition of the biosynthesis of proteins or cell walls. Our results demonstrated the virulence features and antibiotic resistance of fish pathogenic N. seriolae strains at the genomic level. These results may be useful to develop strategies for the prevention of fish nocardiosis. © 2018 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
Genome mining identifies cepacin as a plant-protective metabolite of the biopesticidal bacterium Burkholderia ambifaria.
Beneficial microorganisms are widely used in agriculture for control of plant pathogens, but a lack of efficacy and safety information has limited the exploitation of multiple promising biopesticides. We applied phylogeny-led genome mining, metabolite analyses and biological control assays to define the efficacy of Burkholderia ambifaria, a naturally beneficial bacterium with proven biocontrol properties but potential pathogenic risk. A panel of 64 B.?ambifaria strains demonstrated significant antimicrobial activity against priority plant pathogens. Genome sequencing, specialized metabolite biosynthetic gene cluster mining and metabolite analysis revealed an armoury of known and unknown pathways within B.?ambifaria. The biosynthetic gene cluster responsible for the production of the metabolite cepacin was identified and directly shown to mediate protection of germinating crops against Pythium damping-off disease. B.?ambifaria maintained biopesticidal protection and overall fitness in the soil after deletion of its third replicon, a non-essential plasmid associated with virulence in Burkholderia?cepacia complex bacteria. Removal of the third replicon reduced B.?ambifaria persistence in a murine respiratory infection model. Here, we show that by using interdisciplinary phylogenomic, metabolomic and functional approaches, the mode of action of natural biological control agents related to pathogens can be systematically established to facilitate their future exploitation.
Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.
Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.
Conventional culture methods with commercially available media unveil the presence of novel culturable bacteria.
Recent metagenomic analysis has revealed that our gut microbiota plays an important role in not only the maintenance of our health but also various diseases such as obesity, diabetes, inflammatory bowel disease, and allergy. However, most intestinal bacteria are considered ‘unculturable’ bacteria, and their functions remain unknown. Although culture-independent genomic approaches have enabled us to gain insight into their potential roles, culture-based approaches are still required to understand their characteristic features and phenotypes. To date, various culturing methods have been attempted to obtain these ‘unculturable’ bacteria, but most such methods require advanced techniques. Here, we have tried to isolate possible unculturable bacteria from a healthy Japanese individual by using commercially available media. A 16S rRNA (ribosomal RNA) gene metagenomic analysis revealed that each culture medium showed bacterial growth depending on its selective features and a possibility of the presence of novel bacterial species. Whole genome sequencing of these candidate strains suggested the isolation of 8 novel bacterial species classified in the Actinobacteria and Firmicutes phyla. Our approach indicates that a number of intestinal bacteria hitherto considered unculturable are potentially culturable and can be cultured on commercially available media. We have obtained novel gut bacteria from a healthy Japanese individual using a combination of comprehensive genomics and conventional culturing methods. We would expect that the discovery of such novel bacteria could illuminate pivotal roles for the gut microbiota in association with human health.
Polysaccharide utilization loci of North Sea Flavobacteriia as basis for using SusC/D-protein expression for predicting major phytoplankton glycans.
Marine algae convert a substantial fraction of fixed carbon dioxide into various polysaccharides. Flavobacteriia that are specialized on algal polysaccharide degradation feature genomic clusters termed polysaccharide utilization loci (PULs). As knowledge on extant PUL diversity is sparse, we sequenced the genomes of 53 North Sea Flavobacteriia and obtained 400 PULs. Bioinformatic PUL annotations suggest usage of a large array of polysaccharides, including laminarin, a-glucans, and alginate as well as mannose-, fucose-, and xylose-rich substrates. Many of the PULs exhibit new genetic architectures and suggest substrates rarely described for marine environments. The isolates’ PUL repertoires often differed considerably within genera, corroborating ecological niche-associated glycan partitioning. Polysaccharide uptake in Flavobacteriia is mediated by SusCD-like transporter complexes. Respective protein trees revealed clustering according to polysaccharide specificities predicted by PUL annotations. Using the trees, we analyzed expression of SusC/D homologs in multiyear phytoplankton bloom-associated metaproteomes and found indications for profound changes in microbial utilization of laminarin, a-glucans, ß-mannan, and sulfated xylan. We hence suggest the suitability of SusC/D-like transporter protein expression within heterotrophic bacteria as a proxy for the temporal utilization of discrete polysaccharides.