A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) ‘Hongyang’ draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models.A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within ‘Hongyang’ The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned ‘Hort16A’ cDNAs and comparing with the predicted protein models for Red5 and both the original ‘Hongyang’ assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised ‘Hongyang’ annotation, respectively, compared with 90.9% to the Red5 models.Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.

Draft genome of the Peruvian scallop Argopecten purpuratus.

The Peruvian scallop, Argopecten purpuratus, is mainly cultured in southern Chile and Peru was introduced into China in the last century. Unlike other Argopecten scallops, the Peruvian scallop normally has a long life span of up to 7 to 10 years. Therefore, researchers have been using it to develop hybrid vigor. Here, we performed whole genome sequencing, assembly, and gene annotation of the Peruvian scallop, with an important aim to develop genomic resources for genetic breeding in scallops.A total of 463.19-Gb raw DNA reads were sequenced. A draft genome assembly of 724.78 Mb was generated (accounting for 81.87% of the estimated genome size of 885.29 Mb), with a contig N50 size of 80.11 kb and a scaffold N50 size of 1.02 Mb. Repeat sequences were calculated to reach 33.74% of the whole genome, and 26,256 protein-coding genes and 3,057 noncoding RNAs were predicted from the assembly.We generated a high-quality draft genome assembly of the Peruvian scallop, which will provide a solid resource for further genetic breeding and for the analysis of the evolutionary history of this economically important scallop.

Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality.

Tea, one of the world’s most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ~0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ~30 to 40 and ~90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties. Copyright © 2018 the Author(s). Published by PNAS.

A whole genome assembly of the horn fly, Haematobia irritans, and prediction of genes with roles in metabolism and sex determination.

Haematobia irritans, commonly known as the horn fly, is a globally distributed blood-feeding pest of cattle that is responsible for significant economic losses to cattle producers. Chemical insecticides are the primary means for controlling this pest but problems with insecticide resistance have become common in the horn fly. To provide a foundation for identification of genomic loci for insecticide resistance and for discovery of new control technology, we report the sequencing, assembly, and annotation of the horn fly genome. The assembled genome is 1.14 Gb, comprising 76,616 scaffolds with N50 scaffold length of 23 Kb. Using RNA-Seq data, we have predicted 34,413 gene models of which 19,185 have been assigned functional annotations. Comparative genomics analysis with the Dipteran flies Musca domestica L., Drosophila melanogaster, and Lucilia cuprina, show that the horn fly is most closely related to M. domestica, sharing 8,748 orthologous clusters followed by D. melanogaster and L. cuprina, sharing 7,582 and 7,490 orthologous clusters respectively. We also identified a gene locus for the sodium channel protein in which mutations have been previously reported that confers target site resistance to the most common class of pesticides used in fly control. Additionally, we identified 276 genomic loci encoding members of metabolic enzyme gene families such as cytochrome P450s, esterases and glutathione S-transferases, and several genes orthologous to sex determination pathway genes in other Dipteran species. Copyright © 2018 Konganti et al.

Identification of a leucine-rich repeat receptor-like serine/threonine-protein kinase as a candidate gene for Rvi12 (Vb)-based apple scab resistance

Apple scab caused by Venturia inaequalis is the most important fungal disease of apples (Malus × domestica). Currently, the disease is controlled by up to 15 fungicide applications to the crop per year. Resistant apple cultivars will help promote the sustainable control of scab in commercial orchards. The breakdown of the Rvi6 (Vf) major-gene based resistance, the most used resistance gene in apple breeding, prompted the identification and characterization of new scab resistance genes. By using a large segregating population, the Rvi12 scab resistance gene was previously mapped to a genetic location flanked by molecular markers SNP_23.599 and SNP_24.482. Starting from these markers, utilizing chromosome walking of a Hansen’s baccata #2 (HB2) BAC-library; a single BAC clone spanning the Rvi12 interval was identified. Following Pacific Biosciences (PacBio) RS II sequencing and the use of the hierarchical genome assembly process (HGAP) assembly of the BAC clone sequence, the Rvi12 resistance locus was localized to a 62.3-kb genomic region. Gene prediction and in silico characterization identified a single candidate resistance gene. The gene, named here as Rvi12_Cd5, belongs to the LRR receptor-like serine/threonine-protein kinase family. In silico comparison of the resistance allele from HB2 and the susceptible allele from Golden Delicious (GD) identified the presence of an additional intron in the HB2 allele. Conserved domain analysis identified the presence of four additional LRR motifs in the susceptible allele compared to the resistance allele. The constitutive expression of Rvi12_Cd5 in HB2, together with its structural similarity to known resistance genes, makes it the most likely candidate for Rvi12 scab resistance in apple.

Parallels between experimental and natural evolution of legume symbionts.

The emergence of symbiotic interactions has been studied using population genomics in nature and experimental evolution in the laboratory, but the parallels between these processes remain unknown. Here we compare the emergence of rhizobia after the horizontal transfer of a symbiotic plasmid in natural populations of Cupriavidus taiwanensis, over 10 MY ago, with the experimental evolution of symbiotic Ralstonia solanacearum for a few hundred generations. In spite of major differences in terms of time span, environment, genetic background, and phenotypic achievement, both processes resulted in rapid genetic diversification dominated by purifying selection. We observe no adaptation in the plasmid carrying the genes responsible for the ecological transition. Instead, adaptation was associated with positive selection in a set of genes that led to the co-option of the same quorum-sensing system in both processes. Our results provide evidence for similarities in experimental and natural evolutionary transitions and highlight the potential of comparisons between both processes to understand symbiogenesis.

Genome of an allotetraploid wild peanut Arachis monticola: a de novo assembly.

Arachis monticola (2n = 4x = 40) is the only allotetraploid wild peanut within the Arachis genus and section, with an AABB-type genome of ~2.7 Gb in size. The AA-type subgenome is derived from diploid wild peanut Arachis duranensis, and the BB-type subgenome is derived from diploid wild peanut Arachis ipaensis. A. monticola is regarded either as the direct progenitor of the cultivated peanut or as an introgressive derivative between the cultivated peanut and wild species. The large polyploidy genome structure and enormous nearly identical regions of the genome make the assembly of chromosomal pseudomolecules very challenging. Here we report the first reference quality assembly of the A. monticola genome, using a series of advanced technologies. The final whole genome of A. monticola is ~2.62 Gb and has a contig N50 and scaffold N50 of 106.66 Kb and 124.92 Mb, respectively. The vast majority (91.83%) of the assembled sequence was anchored onto the 20 pseudo-chromosomes, and 96.07% of assemblies were accurately separated into AA- and BB- subgenomes. We demonstrated efficiency of the current state of the strategy for de novo assembly of the highly complex allotetraploid species, wild peanut (A. monticola), based on whole-genome shotgun sequencing, single molecule real-time sequencing, high-throughput chromosome conformation capture technology, and BioNano optical genome maps. These combined technologies produced reference-quality genome of the allotetraploid wild peanut, which is valuable for understanding the peanut domestication and evolution within the Arachis genus and among legume crops.

Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

Due to the large number of repetitive sequences in complex eukaryotic genomes, fragmented and incompletely assembled genomes lose value as reference sequences, often due to short contigs that cannot be anchored or mispositioned onto chromosomes. Here we report a novel method Highly Efficient Repeat Assembly (HERA), which includes a new concept called a connection graph as well as algorithms for constructing the graph. HERA resolves repeats at high efficiency with single-molecule sequencing data, and enables the assembly of chromosome-scale contigs by further integrating genome maps and Hi-C data. We tested HERA with the genomes of rice R498, maize B73, human HX1 and Tartary buckwheat Pinku1. HERA can correctly assemble most of the tandemly repetitive sequences in rice using single-molecule sequencing data only. Using the same maize and human sequencing data published by Jiao et al. (2017) and Shi et al. (2016), respectively, we dramatically improved on the sequence contiguity compared with the published assemblies, increasing the contig N50 from 1.3 Mb to 61.2 Mb in maize B73 assembly and from 8.3 Mb to 54.4 Mb in human HX1 assembly with HERA. We provided a high-quality maize reference genome with 96.9% of the gaps filled (only 76 gaps left) and several incorrectly positioned sequences fixed compared with the B73 RefGen_v4 assembly. Comparisons between the HERA assembly of HX1 and the human GRCh38 reference genome showed that many gaps in GRCh38 could be filled, and that GRCh38 contained some potential errors that could be fixed. We assembled the Pinku1 genome into 12 scaffolds with a contig N50 size of 27.85 Mb. HERA serves as a new genome assembly/phasing method to generate high quality sequences for complex genomes and as a curation tool to improve the contiguity and completeness of existing reference genomes, including the correction of assembly errors in repetitive regions.

Computational Modeling of Multidrug-Resistant Bacteria

Understanding how complex phenotypes arise from individual molecules and their interactions is a primary challenge in biology, and computational approaches have been increasingly employed to tackle this task. In this chapter, we describe current efforts by FIOCRUZ and partners to develop integrated computational models of multidrug-resistant bacteria. The bacterium chosen as the main focus of this effort is Pseudomonas aeruginosa, an opportunistic pathogen associated with a broad spectrum of infections in humans. Nowadays, P. aeruginosa is one of the main problems of healthcare-associated infections (HAI) in the world, because of its great capacity of survival in hospital environments and its intrinsic resistance to many antibiotics. Our overall research objective is to use integrated computational models to accurately predict a wide range of observable cellular behaviors of multidrug-resistant P. aeruginosa CCBH4851, which is a strain belonging to the clone ST277, endemic in Brazil. In this chapter, after a brief introduction to P. aeruginosa biology, we discuss the construction of metabolic and gene regulatory networks of P. aeruginosa CCBH 4851 from its genome. We also illustrate how these networks can be integrated into a single model, and we discuss methods for identifying potential therapeutic targets through integrated models.

RAD sequencing and a hybrid Antarctic fur seal genome assembly reveal rapidly decaying linkage disequilibrium, global population structure and evidence for inbreeding.

Recent advances in high throughput sequencing have transformed the study of wild organisms by facilitating the generation of high quality genome assemblies and dense genetic marker datasets. These resources have the potential to significantly advance our understanding of diverse phenomena at the level of species, populations and individuals, ranging from patterns of synteny through rates of linkage disequilibrium (LD) decay and population structure to individual inbreeding. Consequently, we used PacBio sequencing to refine an existing Antarctic fur seal (Arctocephalus gazella) genome assembly and genotyped 83 individuals from six populations using restriction site associated DNA (RAD) sequencing. The resulting hybrid genome comprised 6,169 scaffolds with an N50 of 6.21 Mb and provided clear evidence for the conservation of large chromosomal segments between the fur seal and dog (Canis lupus familiaris). Focusing on the most extensively sampled population of South Georgia, we found that LD decayed rapidly, reaching the background level by around 400 kb, consistent with other vertebrates but at odds with the notion that fur seals experienced a strong historical bottleneck. We also found evidence for population structuring, with four main Antarctic island groups being resolved. Finally, appreciable variance in individual inbreeding could be detected, reflecting the strong polygyny and site fidelity of the species. Overall, our study contributes important resources for future genomic studies of fur seals and other pinnipeds while also providing a clear example of how high throughput sequencing can generate diverse biological insights at multiple levels of organization. Copyright © 2018 Humble et al.

High-quality assembly of the reference genome for scarlet sage, Salvia splendens, an economically important ornamental plant.

Salvia splendens Ker-Gawler, scarlet or tropical sage, is a tender herbaceous perennial widely introduced and seen in public gardens all over the world. With few molecular resources, breeding is still restricted to traditional phenotypic selection, and the genetic mechanisms underlying phenotypic variation remain unknown. Hence, a high-quality reference genome will be very valuable for marker-assisted breeding, genome editing, and molecular genetics.We generated 66 Gb and 37 Gb of raw DNA sequences, respectively, from whole-genome sequencing of a largely homozygous scarlet sage inbred line using Pacific Biosciences (PacBio) single-molecule real-time and Illumina HiSeq sequencing platforms. The PacBio de novo assembly yielded a final genome with a scaffold N50 size of 3.12 Mb and a total length of 808 Mb. The repetitive sequences identified accounted for 57.52% of the genome sequence, and ?54,008 protein-coding genes were predicted collectively with ab initio and homology-based gene prediction from the masked genome. The divergence time between S. splendens and Salvia miltiorrhiza was estimated at 28.21 million years ago (Mya). Moreover, 3,797 species-specific genes and 1,187 expanded gene families were identified for the scarlet sage genome.We provide the first genome sequence and gene annotation for the scarlet sage. The availability of these resources will be of great importance for further breeding strategies, genome editing, and comparative genomics among related species.

Whole genome and transcriptome maps of the entirely black native Korean chicken breed Yeonsan Ogye.

Yeonsan Ogye (YO), an indigenous Korean chicken breed (Gallus gallus domesticus), has entirely black external features and internal organs. In this study, the draft genome of YO was assembled using a hybrid de novo assembly method that takes advantage of high-depth Illumina short reads (376.6X) and low-depth Pacific Biosciences (PacBio) long reads (9.7X).The contig and scaffold NG50s of the hybrid de novo assembly were 362.3 Kbp and 16.8 Mbp, respectively. The completeness (97.6%) of the draft genome (Ogye_1.1) was evaluated with single-copy orthologous genes using Benchmarking Universal Single-Copy Orthologs and found to be comparable to the current chicken reference genome (galGal5; 97.4%; contigs were assembled with high-depth PacBio long reads (50X) and scaffolded with short reads) and superior to other avian genomes (92%-93%; assembled with short read-only or hybrid methods). Compared to galGal4 and galGal5, the draft genome included 551 structural variations including the fibromelanosis (FM) locus duplication, related to hyperpigmentation. To comprehensively reconstruct transcriptome maps, RNA sequencing and reduced representation bisulfite sequencing data were analyzed from 20 tissues, including 4 black tissues (skin, shank, comb, and fascia). The maps included 15,766 protein-coding and 6,900 long noncoding RNA genes, many of which were tissue-specifically expressed and displayed tissue-specific DNA methylation patterns in the promoter regions.We expect that the resulting genome sequence and transcriptome maps will be valuable resources for studying domestic chicken breeds, including black-skinned chickens, as well as for understanding genomic differences between breeds and the evolution of hyperpigmented chickens and functional elements related to hyperpigmentation.

GC content elevates mutation and recombination rates in the yeast Saccharomyces cerevisiae.

The chromosomes of many eukaryotes have regions of high GC content interspersed with regions of low GC content. In the yeast Saccharomyces cerevisiae, high-GC regions are often associated with high levels of meiotic recombination. In this study, we constructed URA3 genes that differ substantially in their base composition [URA3-AT (31% GC), URA3-WT (43% GC), and URA3-GC (63% GC)] but encode proteins with the same amino acid sequence. The strain with URA3-GC had an approximately sevenfold elevated rate of ura3 mutations compared with the strains with URA3-WT or URA3-AT About half of these mutations were single-base substitutions and were dependent on the error-prone DNA polymerase ?. About 30% were deletions or duplications between short (5-10 base) direct repeats resulting from DNA polymerase slippage. The URA3-GC gene also had elevated rates of meiotic and mitotic recombination relative to the URA3-AT or URA3-WT genes. Thus, base composition has a substantial effect on the basic parameters of genome stability and evolution. Copyright © 2018 the Author(s). Published by PNAS.

An outbreak of a rare Shiga-toxin-producing Escherichia coli serotype (O117:H7) among men who have sex with men.

Sexually transmissible enteric infections (STEIs) are commonly associated with transmission among men who have sex with men (MSM). In the past decade, the UK has experienced multiple parallel STEI emergences in MSM caused by a range of bacterial species of the genus Shigella, and an outbreak of an uncommon serotype (O117?:?H7) of Shiga-toxin-producing Escherichia coli (STEC). Here, we used microbial genomics on 6 outbreak and 30 sporadic STEC O117?:?H7 isolates to explore the origins and pathogenic drivers of the STEC O117?:?H7 emergence in MSM. Using genomic epidemiology, we found that the STEC O117?:?H7 outbreak lineage was potentially imported from Latin America and likely continues to circulate both in the UK MSM population and in Latin America. We found genomic relationships consistent with existing symptomatic evidence for chronic infection with this STEC serotype. Comparative genomic analysis indicated the existence of a novel Shiga toxin 1-encoding prophage in the outbreak isolates, and evidence of horizontal gene exchange among the STEC O117?:?H7 outbreak lineage and other enteric pathogens. There was no evidence of increased virulence in the outbreak strains relative to contextual isolates, but the outbreak lineage was associated with azithromycin resistance. Comparing these findings with similar genomic investigations of emerging MSM-associated Shigella in the UK highlighted many parallels, the most striking of which was the importance of the azithromycin phenotype for STEI emergence in this patient group.

Pol V-mediated translesion synthesis elicits localized untargeted mutagenesis during post-replicative gap repair.

In vivo, replication forks proceed beyond replication-blocking lesions by way of downstream repriming, generating daughter strand gaps that are subsequently processed by post-replicative repair pathways such as homologous recombination and translesion synthesis (TLS). The way these gaps are filled during TLS is presently unknown. The structure of gap repair synthesis was assessed by sequencing large collections of single DNA molecules that underwent specific TLS events in vivo. The higher error frequency of specialized relative to replicative polymerases allowed us to visualize gap-filling events at high resolution. Unexpectedly, the data reveal that a specialized polymerase, Pol V, synthesizes stretches of DNA both upstream and downstream of a site-specific DNA lesion. Pol V-mediated untargeted mutations are thus spread over several hundred nucleotides, strongly eliciting genetic instability on either side of a given lesion. Consequently, post-replicative gap repair may be a source of untargeted mutations critical for gene diversification in adaptation and evolution. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

