Combining high-throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short-read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long-read sequencing technology for comparative genomic analyses of the haemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom-made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100 kb) and MN (~200 kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genomic organization of the Hb genes within this lineage, yet with several, lineage-specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long-read capture as a versatile approach for comparative genomic studies by generation of a cross-species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes. © 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
De novo assembly of the Pasteuria penetrans genome reveals high plasticity, host dependency, and BclA-like collagens.
Pasteuria penetrans is a gram-positive endospore forming bacterial parasite of Meloidogyne spp. the most economically damaging genus of plant parasitic nematodes globally. The obligate antagonistic nature of P. penetrans makes it an attractive candidate biological control agent. However, deployment of P. penetrans for this purpose is inhibited by a lack of understanding of its metabolism and the molecular mechanics underpinning parasitism of the host, in particular the initial attachment of the endospore to the nematode cuticle. Several attempts to assemble the genomes of species within this genus have been unsuccessful. Primarily this is due to the obligate parasitic nature of the bacterium which makes obtaining genomic DNA of sufficient quantity and quality which is free from contamination challenging. Taking advantage of recent developments in whole genome amplification, long read sequencing platforms, and assembly algorithms, we have developed a protocol to generate large quantities of high molecular weight genomic DNA from a small number of purified endospores. We demonstrate this method via genomic assembly of P. penetrans. This assembly reveals a reduced genome of 2.64Mbp estimated to represent 86% of the complete sequence; its reduced metabolism reflects widespread reliance on the host and possibly associated organisms. Additionally, apparent expansion of transposases and prediction of partial competence pathways suggest a high degree of genomic plasticity. Phylogenetic analysis places our sequence within the Bacilli, and most closely related to Thermoactinomyces species. Seventeen predicted BclA-like proteins are identified which may be involved in the determination of attachment specificity. This resource may be used to develop in vitro culture methods and to investigate the genetic and molecular basis of attachment specificity.
TCR sequencing of single cells reactive to DQ2.5-glia-a2 and DQ2.5-glia-?2 reveals clonal expansion and epitope-specific V-gene usage.
CD4+ T cells recognizing dietary gluten epitopes in the context of disease-associated human leukocyte antigen (HLA)-DQ2 or HLA-DQ8 molecules are the key players in celiac disease pathogenesis. Here, we conducted a large-scale single-cell paired T-cell receptor (TCR) sequencing study to characterize the TCR repertoire for two homologous immunodominant gluten epitopes, DQ2.5-glia-a2 and DQ2.5-glia-?2, in blood of celiac disease patients after oral gluten challenge. Despite sequence similarity of the epitopes, the TCR repertoires are unique but shared several overall features. We demonstrate that clonally expanded T cells dominate the T-cell responses to both epitopes. Moreover, we find V-gene bias of TRAV26, TRAV4, and TRBV7 in DQ2.5-glia-a2 reactive TCRs, while DQ2.5-glia-?2 TCRs displayed significant bias toward TRAV4 and TRBV4. The knowledge that antigen-specific TCR repertoire in chronic inflammatory diseases tends to be dominated by a few expanded clones that use the same TCR V-gene segments across patients is important information for HLA-associated diseases where the antigen is unknown.
Complete genome sequence of Tessaracoccus sp. strain T2.5-30 isolated from 139.5 meters deep on the subsurface of the Iberian Pyritic Belt.
Here, we report the complete genome sequence of Tessaracoccus sp. strain T2.5-30, which consists of a chromosome with 3.2 Mbp, 70.4% G+C content, and 3,005 coding DNA sequences. The strain was isolated from a rock core retrieved at a depth of 139.5 m in the subsurface of the Iberian Pyritic Belt (Spain). Copyright © 2017 Leandro et al.
The flowering plant Primula veris is a common spring blooming perennial that is widely cultivated throughout Europe. This species is an established model system in the study of the genetics, evolution, and ecology of heterostylous floral polymorphisms. Despite the long history of research focused on this and related species, the continued development of this system has been restricted due the absence of genomic and transcriptomic resources.We present here a de novo draft genome assembly of P. veris covering 301.8 Mb, or approximately 63% of the estimated 479.22 Mb genome, with an N50 contig size of 9.5 Kb, an N50 scaffold size of 164 Kb, and containing an estimated 19,507 genes. The results of a RADseq bulk segregant analysis allow for the confident identification of four genome scaffolds that are linked to the P. veris S-locus. RNAseq data from both P. veris and the closely related species P. vulgaris allow for the characterization of 113 candidate heterostyly genes that show significant floral morph-specific differential expression. One candidate gene of particular interest is a duplicated GLOBOSA homolog that may be unique to Primula (PveGLO2), and is completely silenced in L-morph flowers.The P. veris genome represents the first genome assembled from a heterostylous species, and thus provides an immensely important resource for future studies focused on the evolution and genetic dissection of heterostyly. As the first genome assembled from the Primulaceae, the P. veris genome will also facilitate the expanded application of phylogenomic methods in this diverse family and the eudicots as a whole.
Convergent losses of decay mechanisms and rapid turnover of symbiosis genes in mycorrhizal mutualists.
To elucidate the genetic bases of mycorrhizal lifestyle evolution, we sequenced new fungal genomes, including 13 ectomycorrhizal (ECM), orchid (ORM) and ericoid (ERM) species, and five saprotrophs, which we analyzed along with other fungal genomes. Ectomycorrhizal fungi have a reduced complement of genes encoding plant cell wall-degrading enzymes (PCWDEs), as compared to their ancestral wood decayers. Nevertheless, they have retained a unique array of PCWDEs, thus suggesting that they possess diverse abilities to decompose lignocellulose. Similar functional categories of nonorthologous genes are induced in symbiosis. Of induced genes, 7-38% are orphan genes, including genes that encode secreted effector-like proteins. Convergent evolution of the mycorrhizal habit in fungi occurred via the repeated evolution of a ‘symbiosis toolkit’, with reduced numbers of PCWDEs and lineage-specific suites of mycorrhiza-induced genes.
Genome sequences of Corynebacterium pseudotuberculosis strains 48252 (human, pneumonia), CS_10 (lab strain), Ft_2193/ 67 (goat, pus), and CCUG 27541.
Here we report the genome sequencess of four Corynebacterium pseudotuberculosis strains. These include a strain isolated from a patient with C. pseudotuberculosis pneumonia (48252), a strain isolated from pus in goat (Ft_2193/67), a laboratory strain originating from strain Ft_2193/67 (CS_10), and the draft genome of an equine reference strain, CCUG 27541. Copyright © 2014 Håvelsrud et al.
Porphyromonas gingivalis is considered a major etiologic agent in adult periodontitis. Gingipains are among its most important virulence factors, but their release is unique in strain HG66. We present the genome sequence of HG66 with a single contig of 2,441,680 bp and a G+C content of 48.1%. Copyright © 2014 Siddiqui et al.
Seeking the source of Pseudomonas aeruginosa infections in a recently opened hospital: an observational study using whole-genome sequencing.
Pseudomonas aeruginosa is a common nosocomial pathogen responsible for significant morbidity and mortality internationally. Patients may become colonised or infected with P. aeruginosa after exposure to contaminated sources within the hospital environment. The aim of this study was to determine whether whole-genome sequencing (WGS) can be used to determine the source in a cohort of burns patients at high risk of P. aeruginosa acquisition.An observational prospective cohort study.Burns care ward and critical care ward in the UK.Patients with >7% total burns by surface area were recruited into the study.All patients were screened for P. aeruginosa on admission and samples taken from their immediate environment, including water. Screening patients who subsequently developed a positive P. aeruginosa microbiology result were subject to enhanced environmental surveillance. All isolates of P. aeruginosa were genome sequenced. Sequence analysis looked at similarity and relatedness between isolates.WGS for 141 P. aeruginosa isolates were obtained from patients, hospital water and the ward environment. Phylogenetic analysis revealed eight distinct clades, with a single clade representing the majority of environmental isolates in the burns unit. Isolates from three patients had identical genotypes compared with water isolates from the same room. There was clear clustering of water isolates by room and outlet, allowing the source of acquisitions to be unambiguously identified. Whole-genome shotgun sequencing of biofilm DNA extracted from a thermostatic mixer valve revealed this was the source of a P. aeruginosa subpopulation previously detected in water. In the remaining two cases there was no clear link to the hospital environment.This study reveals that WGS can be used for source tracking of P. aeruginosa in a hospital setting, and that acquisitions can be traced to a specific source within a hospital ward. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Dissemination of cephalosporin resistance genes between Escherichia coli strains from farm animals and humans by specific plasmid lineages.
Third-generation cephalosporins are a class of ß-lactam antibiotics that are often used for the treatment of human infections caused by Gram-negative bacteria, especially Escherichia coli. Worryingly, the incidence of human infections caused by third-generation cephalosporin-resistant E. coli is increasing worldwide. Recent studies have suggested that these E. coli strains, and their antibiotic resistance genes, can spread from food-producing animals, via the food-chain, to humans. However, these studies used traditional typing methods, which may not have provided sufficient resolution to reliably assess the relatedness of these strains. We therefore used whole-genome sequencing (WGS) to study the relatedness of cephalosporin-resistant E. coli from humans, chicken meat, poultry and pigs. One strain collection included pairs of human and poultry-associated strains that had previously been considered to be identical based on Multi-Locus Sequence Typing, plasmid typing and antibiotic resistance gene sequencing. The second collection included isolates from farmers and their pigs. WGS analysis revealed considerable heterogeneity between human and poultry-associated isolates. The most closely related pairs of strains from both sources carried 1263 Single-Nucleotide Polymorphisms (SNPs) per Mbp core genome. In contrast, epidemiologically linked strains from humans and pigs differed by only 1.8 SNPs per Mbp core genome. WGS-based plasmid reconstructions revealed three distinct plasmid lineages (IncI1- and IncK-type) that carried cephalosporin resistance genes of the Extended-Spectrum Beta-Lactamase (ESBL)- and AmpC-types. The plasmid backbones within each lineage were virtually identical and were shared by genetically unrelated human and animal isolates. Plasmid reconstructions from short-read sequencing data were validated by long-read DNA sequencing for two strains. Our findings failed to demonstrate evidence for recent clonal transmission of cephalosporin-resistant E. coli strains from poultry to humans, as has been suggested based on traditional, low-resolution typing methods. Instead, our data suggest that cephalosporin resistance genes are mainly disseminated in animals and humans via distinct plasmids.
Aquaculture is the fastest-growing food production sector in agriculture, with great potential to meet projected protein needs of human beings. Aquaculture is facing several challenges, including lack of a sufficient number of genetically improved species, lack of species-specific feeds, high mortality due to diseases and pollution of ecosystems. The rapid development of sequencing technologies has revolutionized biological sciences, and supplied necessary tools to tackle these challenges in aquaculture and thus ensure its sustainability and profitability. So far, draft genomes have been published in over 24 aquaculture species, and used to address important issues related to aquaculture. We briefly review the advances of next generation sequencing technologies, and summarize the status of whole genome sequencing and its general applications (i.e. establishing reference genomes and discovering DNA markers) and specific applications in tackling some important issues (e.g. breeding, diseases, sex determination & maturation) related to aquaculture. For sequencing a new genome, we recommend the use of 100–200 × short reads using Illumina and 50–60 × long reads with PacBio sequencing technologies. For identification of a large number of SNPs, resequencing pooled DNA samples from different populations is the most cost-effective way. We also discuss the challenges and future directions of whole genome sequencing in aquaculture.
The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual.The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.
Completed genome sequences of Borrelia burgdorferi sensu stricto B31(NRZ) and closely related patient isolates from Europe.
Borrelia burgdorferi sensu stricto is a causative agent of human Lyme borreliosis in the United States and Europe. We report here the completed genome sequences of strain B31 isolated from a tick in the United States and two closely related strains from Europe, PAli and PAbe, which were isolated from patients with erythema migrans and neuroborreliosis, respectively. Copyright © 2017 Margos et al.
Draft genome sequences of two unclassified bacteria, Hydrogenophaga sp. strains IBVHS1 and IBVHS2, isolated from environmental samples.
We report here the draft genome sequences of Hydrogenophaga sp. strains IBVHS1 and IBVHS2, two bacteria assembled from the metagenomes of surface samples from freshwater lakes. The genomes are >95% complete and may represent new species within the Hydrogenophaga genus, indicating a larger diversity than currently identified. Copyright © 2017 Orr et al.
The whole-genome duplication 80 million years ago of the common ancestor of salmonids (salmonid-specific fourth vertebrate whole-genome duplication, Ss4R) provides unique opportunities to learn about the evolutionary fate of a duplicated vertebrate genome in 70 extant lineages. Here we present a high-quality genome assembly for Atlantic salmon (Salmo salar), and show that large genomic reorganizations, coinciding with bursts of transposon-mediated repeat expansions, were crucial for the post-Ss4R rediploidization process. Comparisons of duplicate gene expression patterns across a wide range of tissues with orthologous genes from a pre-Ss4R outgroup unexpectedly demonstrate far more instances of neofunctionalization than subfunctionalization. Surprisingly, we find that genes that were retained as duplicates after the teleost-specific whole-genome duplication 320 million years ago were not more likely to be retained after the Ss4R, and that the duplicate retention was not influenced to a great extent by the nature of the predicted protein interactions of the gene products. Finally, we demonstrate that the Atlantic salmon assembly can serve as a reference sequence for the study of other salmonids for a range of purposes.