Menu
July 19, 2019

Extreme sensitivity to ultraviolet light in the fungal pathogen causing white-nose syndrome of bats.

Bat white-nose syndrome (WNS), caused by the fungal pathogen Pseudogymnoascus destructans, has decimated North American hibernating bats since its emergence in 2006. Here, we utilize comparative genomics to examine the evolutionary history of this pathogen in comparison to six closely related nonpathogenic species. P. destructans displays a large reduction in carbohydrate-utilizing enzymes (CAZymes) and in the predicted secretome (~50%), and an increase in lineage-specific genes. The pathogen has lost a key enzyme, UVE1, in the alternate excision repair (AER) pathway, which is known to contribute to repair of DNA lesions induced by ultraviolet (UV) light. Consistent with a nonfunctional AER pathway, P. destructans is extremely sensitive to UV light, as well as the DNA alkylating agent methyl methanesulfonate (MMS). The differential susceptibility of P. destructans to UV light in comparison to other hibernacula-inhabiting fungi represents a potential “Achilles’ heel” of P. destructans that might be exploited for treatment of bats with WNS.


July 19, 2019

Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome.

Fireflies are a family of insects within the beetle order Coleoptera, or winged beetles, and they are one of the most well-known and loved insect species because of their bioluminescence. However, the firefly is in danger of extinction because of the massive destruction of its living environment. In order to improve the understanding of fireflies and protect them effectively, we sequenced the whole genome of the terrestrial firefly Pyrocoelia pectoralis.Here, we developed a highly reliable genome resource for the terrestrial firefly Pyrocoelia pectoralis (E. Oliv., 1883; Coleoptera: Lampyridae) using single molecule real time (SMRT) sequencing on the PacBio Sequel platform. In total, 57.8 Gb of long reads were generated and assembled into a 760.4-Mb genome, which is close to the estimated genome size and covered 98.7% complete and 0.7% partial insect Benchmarking Universal Single-Copy Orthologs. The k-mer analysis showed that this genome is highly heterozygous. However, our long-read assembly demonstrates continuousness with a contig N50 length of 3.04 Mb and the longest contig length of 13.69 Mb. Furthermore, 135 589 SSRs and 341 Mb of repeat sequences were detected. A total of 23 092 genes were predicted; 88.44% of genes were annotated with one or more related functions.We assembled a high-quality firefly genome, which will not only provide insights into the conservation and biodiversity of fireflies, but also provide a wealth of information to study the mechanisms of their sexual communication, bio-luminescence, and evolution.© The Authors 2017. Published by Oxford University Press.


July 19, 2019

Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity.

Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology.Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ~7.9 million base pairs (Mb), representing a ~300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ~24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome.Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.© The Authors 2017. Published by Oxford University Press.


July 19, 2019

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons.

Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018.


July 19, 2019

Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi.

Although the genome of Trypanosoma cruzi, the causative agent of Chagas disease, was first made available in 2005, with additional strains reported later, the intrinsic genome complexity of this parasite (the abundance of repetitive sequences and genes organized in tandem) has traditionally hindered high-quality genome assembly and annotation. This also limits diverse types of analyses that require high degrees of precision. Long reads generated by third-generation sequencing technologies are particularly suitable to address the challenges associated with T. cruzi’s genome since they permit direct determination of the full sequence of large clusters of repetitive sequences without collapsing them. This, in turn, not only allows accurate estimation of gene copy numbers but also circumvents assembly fragmentation. Here, we present the analysis of the genome sequences of two T. cruzi clones: the hybrid TCC (TcVI) and the non-hybrid Dm28c (TcI), determined by PacBio Single Molecular Real-Time (SMRT) technology. The improved assemblies herein obtained permitted us to accurately estimate gene copy numbers, abundance and distribution of repetitive sequences (including satellites and retroelements). We found that the genome of T. cruzi is composed of a ‘core compartment’ and a ‘disruptive compartment’ which exhibit opposite GC content and gene composition. Novel tandem and dispersed repetitive sequences were identified, including some located inside coding sequences. Additionally, homologous chromosomes were separately assembled, allowing us to retrieve haplotypes as separate contigs instead of a unique mosaic sequence. Finally, manual annotation of surface multigene families, mucins and trans-sialidases allows now a better overview of these complex groups of genes.


July 19, 2019

Advances in Sequencing and Resequencing in Crop Plants.

DNA sequencing technologies have changed the face of biological research over the last 20 years. From reference genomes to population level resequencing studies, these technologies have made significant contributions to our understanding of plant biology and evolution. As the technologies have increased in power, the breadth and complexity of the questions that can be asked has increased. Along with this, the challenges of managing unprecedented quantities of sequence data are mounting. This chapter describes a few aspects of the journey so far and looks forward to what may lie ahead.


July 19, 2019

A near-complete haplotype-phased genome of the dikaryotic wheat stripe rust fungus Puccinia striiformis f. sp. tritici reveals high interhaplotype diversity.

A long-standing biological question is how evolution has shaped the genomic architecture of dikaryotic fungi. To answer this, high-quality genomic resources that enable haplotype comparisons are essential. Short-read genome assemblies for dikaryotic fungi are highly fragmented and lack haplotype-specific information due to the high heterozygosity and repeat content of these genomes. Here, we present a diploid-aware assembly of the wheat stripe rust fungus Puccinia striiformis f. sp. tritici based on long reads using the FALCON-Unzip assembler. Transcriptome sequencing data sets were used to infer high-quality gene models and identify virulence genes involved in plant infection referred to as effectors. This represents the most complete Puccinia striiformis f. sp. tritici genome assembly to date (83 Mb, 156 contigs, N50 of 1.5 Mb) and provides phased haplotype information for over 92% of the genome. Comparisons of the phase blocks revealed high interhaplotype diversity of over 6%. More than 25% of all genes lack a clear allelic counterpart. When we investigated genome features that potentially promote the rapid evolution of virulence, we found that candidate effector genes are spatially associated with conserved genes commonly found in basidiomycetes. Yet, candidate effectors that lack an allelic counterpart are more distant from conserved genes than allelic candidate effectors and are less likely to be evolutionarily conserved within the P. striiformis species complex and Pucciniales In summary, this haplotype-phased assembly enabled us to discover novel genome features of a dikaryotic plant-pathogenic fungus previously hidden in collapsed and fragmented genome assemblies.IMPORTANCE Current representations of eukaryotic microbial genomes are haploid, hiding the genomic diversity intrinsic to diploid and polyploid life forms. This hidden diversity contributes to the organism’s evolutionary potential and ability to adapt to stress conditions. Yet, it is challenging to provide haplotype-specific information at a whole-genome level. Here, we take advantage of long-read DNA sequencing technology and a tailored-assembly algorithm to disentangle the two haploid genomes of a dikaryotic pathogenic wheat rust fungus. The two genomes display high levels of nucleotide and structural variations, which lead to allelic variation and the presence of genes lacking allelic counterparts. Nonallelic candidate effector genes, which likely encode important pathogenicity factors, display distinct genome localization patterns and are less likely to be evolutionary conserved than those which are present as allelic pairs. This genomic diversity may promote rapid host adaptation and/or be related to the age of the sequenced isolate since last meiosis. Copyright © 2018 Schwessinger et al.


July 19, 2019

The Florida manatee (Trichechus manatus latirostris) T cell receptor loci exhibit V subgroup synteny and chain-specific evolution.

The Florida manatee (Trichechus manatus latirostris) has limited diversity in the immunoglobulin heavy chain. We therefore investigated the antigen receptor loci of the other arm of the adaptive immune system: the T cell receptor. Manatees are the first species from Afrotheria, a basal eutherian superorder, to have an in-depth characterization of all T cell receptor loci. By annotating the genome and expressed transcripts, we found that each chain has distinct features that correlates to their individual functions. The genomic organization also plays a role in modulating sequence conservation between species. There were extensive V subgroup synteny blocks in the TRA and TRB loci between T. m. latirostris and human. Increased genomic locus complexity correlated to increased locus synteny. We also identified evidence for a VHD pseudogene for the first time in a eutherian mammal. These findings emphasize the value of including species within this basal eutherian radiation in comparative studies. Copyright © 2018. Published by Elsevier Ltd.


July 19, 2019

Genome sequence of the progenitor of wheat A subgenome Triticum urartu.

Triticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes. Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing 3 , linked reads and optical mapping4,5. We assembled seven chromosome-scale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Comparative analyses with genomes of other grasses showed gene loss and amplification in the numbers of transposable elements in the T. urartu genome. Population genomics analysis of 147 T. urartu accessions from across the Fertile Crescent showed clustering of three groups, with differences in altitude and biostress, such as powdery mildew disease. The T. urartu genome assembly provides a valuable resource for studying genetic variation in wheat and related grasses, and promises to facilitate the discovery of genes that could be useful for wheat improvement.


July 19, 2019

Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits.

The ancestors of Gossypium arboreum and Gossypium herbaceum provided the A subgenome for the modern cultivated allotetraploid cotton. Here, we upgraded the G. arboreum genome assembly by integrating different technologies. We resequenced 243?G. arboreum and G. herbaceum accessions to generate a map of genome variations and found that they are equally diverged from Gossypium raimondii. Independent analysis suggested that Chinese G. arboreum originated in South China and was subsequently introduced to the Yangtze and Yellow River regions. Most accessions with domestication-related traits experienced geographic isolation. Genome-wide association study (GWAS) identified 98 significant peak associations for 11 agronomically important traits in G. arboreum. A nonsynonymous substitution (cysteine-to-arginine substitution) of GaKASIII seems to confer substantial fatty acid composition (C16:0 and C16:1) changes in cotton seeds. Resistance to fusarium wilt disease is associated with activation of GaGSTF9 expression. Our work represents a major step toward understanding the evolution of the A genome of cotton.


July 19, 2019

Male-killing toxin in a bacterial symbiont of Drosophila.

Several lineages of symbiotic bacteria in insects selfishly manipulate host reproduction to spread in a population 1 , often by distorting host sex ratios. Spiroplasma poulsonii2,3 is a helical and motile, Gram-positive symbiotic bacterium that resides in a wide range of Drosophila species 4 . A notable feature of S. poulsonii is male killing, whereby the sons of infected female hosts are selectively killed during development1,2. Although male killing caused by S. poulsonii has been studied since the 1950s, its underlying mechanism is unknown. Here we identify an S. poulsonii protein, designated Spaid, whose expression induces male killing. Overexpression of Spaid in D. melanogaster kills males but not females, and induces massive apoptosis and neural defects, recapitulating the pathology observed in S. poulsonii-infected male embryos5-11. Our data suggest that Spaid targets the dosage compensation machinery on the male X chromosome to mediate its effects. Spaid contains ankyrin repeats and a deubiquitinase domain, which are required for its subcellular localization and activity. Moreover, we found a laboratory mutant strain of S. poulsonii with reduced male-killing ability and a large deletion in the spaid locus. Our study has uncovered a bacterial protein that affects host cellular machinery in a sex-specific way, which is likely to be the long-searched-for factor responsible for S. poulsonii-induced male killing.


July 19, 2019

The Rosa genome provides new insights into the domestication of modern roses.

Roses have high cultural and economic importance as ornamental plants and in the perfume industry. We report the rose whole-genome sequencing and assembly and resequencing of major genotypes that contributed to rose domestication. We generated a homozygous genotype from a heterozygous diploid modern rose progenitor, Rosa chinensis ‘Old Blush’. Using single-molecule real-time sequencing and a meta-assembly approach, we obtained one of the most comprehensive plant genomes to date. Diversity analyses highlighted the mosaic origin of ‘La France’, one of the first hybrids combining the growth vigor of European species and the recurrent blooming of Chinese species. Genomic segments of Chinese ancestry identified new candidate genes for recurrent blooming. Reconstructing regulatory and secondary metabolism pathways allowed us to propose a model of interconnected regulation of scent and flower color. This genome provides a foundation for understanding the mechanisms governing rose traits and should accelerate improvement in roses, Rosaceae and ornamentals.


July 19, 2019

Genomic variation in 3,010 diverse accessions of Asian cultivated rice.

Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence-absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding.


July 19, 2019

Unexpected diversity in the mobilome of a Pseudomonas aeruginosa strain isolated from a dental unit waterline revealed by SMRT Sequencing.

The Gram-negative bacterium Pseudomonas aeruginosa is found in several habitats, both natural and human-made, and is particularly known for its recurrent presence as a pathogen in the lungs of patients suffering from cystic fibrosis, a genetic disease. Given its clinical importance, several major studies have investigated the genomic adaptation of P. aeruginosa in lungs and its transition as acute infections become chronic. However, our knowledge about the diversity and adaptation of the P. aeruginosa genome to non-clinical environments is still fragmentary, in part due to the lack of accurate reference genomes of strains from the numerous environments colonized by the bacterium. Here, we used PacBio long-read technology to sequence the genome of PPF-1, a strain of P. aeruginosa isolated from a dental unit waterline. Generating this closed genome was an opportunity to investigate genomic features that are difficult to accurately study in a draft genome (contigs state). It was possible to shed light on putative genomic islands, some shared with other reference genomes, new prophages, and the complete content of insertion sequences. In addition, four different group II introns were also found, including two characterized here and not listed in the specialized group II intron database.


July 19, 2019

Long read assemblies of geographically dispersed Plasmodium falciparum isolates reveal highly structured subtelomeres.

Background: Although thousands of clinical isolates of Plasmodium falciparum are being sequenced and analysed by short read technology, the data do not resolve the highly variable subtelomeric regions of the genomes that contain polymorphic gene families involved in immune evasion and pathogenesis. There is also no current standard definition of the boundaries of these variable subtelomeric regions. Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated the genomes of 15 P. falciparum isolates, ten of which are newly cultured clinical isolates. We performed comparative analysis of the entire genome with particular emphasis on the subtelomeric regions and the internal var genes clusters.   Results: The nearly complete sequence of these 15 isolates has enabled us to define a highly conserved core genome, to delineate the boundaries of the subtelomeric regions, and to compare these across isolates. We found highly structured variable regions in the genome. Some exported gene families purportedly involved in release of merozoites show copy number variation. As an example of ongoing genome evolution, we found a novel CLAG gene in six isolates.  We also found a novel gene that was relatively enriched in the South East Asian isolates compared to those from Africa. Conclusions: These 15 manually curated new reference genome sequences with their nearly complete subtelomeric regions and fully assembled genes are an important new resource for the malaria research community. We report the overall conserved structure and pattern of important gene families and the more clearly defined subtelomeric regions.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.