Menu
July 19, 2019

Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains.

Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation.We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains.In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.


July 19, 2019

The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.

Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity.Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel.Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain’s genetic profile from pathogenic to environmental.


July 19, 2019

Extreme sensitivity to ultraviolet light in the fungal pathogen causing white-nose syndrome of bats.

Bat white-nose syndrome (WNS), caused by the fungal pathogen Pseudogymnoascus destructans, has decimated North American hibernating bats since its emergence in 2006. Here, we utilize comparative genomics to examine the evolutionary history of this pathogen in comparison to six closely related nonpathogenic species. P. destructans displays a large reduction in carbohydrate-utilizing enzymes (CAZymes) and in the predicted secretome (~50%), and an increase in lineage-specific genes. The pathogen has lost a key enzyme, UVE1, in the alternate excision repair (AER) pathway, which is known to contribute to repair of DNA lesions induced by ultraviolet (UV) light. Consistent with a nonfunctional AER pathway, P. destructans is extremely sensitive to UV light, as well as the DNA alkylating agent methyl methanesulfonate (MMS). The differential susceptibility of P. destructans to UV light in comparison to other hibernacula-inhabiting fungi represents a potential “Achilles’ heel” of P. destructans that might be exploited for treatment of bats with WNS.


July 19, 2019

Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome.

Fireflies are a family of insects within the beetle order Coleoptera, or winged beetles, and they are one of the most well-known and loved insect species because of their bioluminescence. However, the firefly is in danger of extinction because of the massive destruction of its living environment. In order to improve the understanding of fireflies and protect them effectively, we sequenced the whole genome of the terrestrial firefly Pyrocoelia pectoralis.Here, we developed a highly reliable genome resource for the terrestrial firefly Pyrocoelia pectoralis (E. Oliv., 1883; Coleoptera: Lampyridae) using single molecule real time (SMRT) sequencing on the PacBio Sequel platform. In total, 57.8 Gb of long reads were generated and assembled into a 760.4-Mb genome, which is close to the estimated genome size and covered 98.7% complete and 0.7% partial insect Benchmarking Universal Single-Copy Orthologs. The k-mer analysis showed that this genome is highly heterozygous. However, our long-read assembly demonstrates continuousness with a contig N50 length of 3.04 Mb and the longest contig length of 13.69 Mb. Furthermore, 135 589 SSRs and 341 Mb of repeat sequences were detected. A total of 23 092 genes were predicted; 88.44% of genes were annotated with one or more related functions.We assembled a high-quality firefly genome, which will not only provide insights into the conservation and biodiversity of fireflies, but also provide a wealth of information to study the mechanisms of their sexual communication, bio-luminescence, and evolution.© The Authors 2017. Published by Oxford University Press.


July 19, 2019

Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity.

Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology.Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ~7.9 million base pairs (Mb), representing a ~300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ~24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome.Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.© The Authors 2017. Published by Oxford University Press.


July 19, 2019

Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi.

Although the genome of Trypanosoma cruzi, the causative agent of Chagas disease, was first made available in 2005, with additional strains reported later, the intrinsic genome complexity of this parasite (the abundance of repetitive sequences and genes organized in tandem) has traditionally hindered high-quality genome assembly and annotation. This also limits diverse types of analyses that require high degrees of precision. Long reads generated by third-generation sequencing technologies are particularly suitable to address the challenges associated with T. cruzi’s genome since they permit direct determination of the full sequence of large clusters of repetitive sequences without collapsing them. This, in turn, not only allows accurate estimation of gene copy numbers but also circumvents assembly fragmentation. Here, we present the analysis of the genome sequences of two T. cruzi clones: the hybrid TCC (TcVI) and the non-hybrid Dm28c (TcI), determined by PacBio Single Molecular Real-Time (SMRT) technology. The improved assemblies herein obtained permitted us to accurately estimate gene copy numbers, abundance and distribution of repetitive sequences (including satellites and retroelements). We found that the genome of T. cruzi is composed of a ‘core compartment’ and a ‘disruptive compartment’ which exhibit opposite GC content and gene composition. Novel tandem and dispersed repetitive sequences were identified, including some located inside coding sequences. Additionally, homologous chromosomes were separately assembled, allowing us to retrieve haplotypes as separate contigs instead of a unique mosaic sequence. Finally, manual annotation of surface multigene families, mucins and trans-sialidases allows now a better overview of these complex groups of genes.


July 19, 2019

Accurate detection of complex structural variations using single-molecule sequencing.

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.


July 19, 2019

Advances in Sequencing and Resequencing in Crop Plants.

DNA sequencing technologies have changed the face of biological research over the last 20 years. From reference genomes to population level resequencing studies, these technologies have made significant contributions to our understanding of plant biology and evolution. As the technologies have increased in power, the breadth and complexity of the questions that can be asked has increased. Along with this, the challenges of managing unprecedented quantities of sequence data are mounting. This chapter describes a few aspects of the journey so far and looks forward to what may lie ahead.


July 19, 2019

A near-complete haplotype-phased genome of the dikaryotic wheat stripe rust fungus Puccinia striiformis f. sp. tritici reveals high interhaplotype diversity.

A long-standing biological question is how evolution has shaped the genomic architecture of dikaryotic fungi. To answer this, high-quality genomic resources that enable haplotype comparisons are essential. Short-read genome assemblies for dikaryotic fungi are highly fragmented and lack haplotype-specific information due to the high heterozygosity and repeat content of these genomes. Here, we present a diploid-aware assembly of the wheat stripe rust fungus Puccinia striiformis f. sp. tritici based on long reads using the FALCON-Unzip assembler. Transcriptome sequencing data sets were used to infer high-quality gene models and identify virulence genes involved in plant infection referred to as effectors. This represents the most complete Puccinia striiformis f. sp. tritici genome assembly to date (83 Mb, 156 contigs, N50 of 1.5 Mb) and provides phased haplotype information for over 92% of the genome. Comparisons of the phase blocks revealed high interhaplotype diversity of over 6%. More than 25% of all genes lack a clear allelic counterpart. When we investigated genome features that potentially promote the rapid evolution of virulence, we found that candidate effector genes are spatially associated with conserved genes commonly found in basidiomycetes. Yet, candidate effectors that lack an allelic counterpart are more distant from conserved genes than allelic candidate effectors and are less likely to be evolutionarily conserved within the P. striiformis species complex and Pucciniales In summary, this haplotype-phased assembly enabled us to discover novel genome features of a dikaryotic plant-pathogenic fungus previously hidden in collapsed and fragmented genome assemblies.IMPORTANCE Current representations of eukaryotic microbial genomes are haploid, hiding the genomic diversity intrinsic to diploid and polyploid life forms. This hidden diversity contributes to the organism’s evolutionary potential and ability to adapt to stress conditions. Yet, it is challenging to provide haplotype-specific information at a whole-genome level. Here, we take advantage of long-read DNA sequencing technology and a tailored-assembly algorithm to disentangle the two haploid genomes of a dikaryotic pathogenic wheat rust fungus. The two genomes display high levels of nucleotide and structural variations, which lead to allelic variation and the presence of genes lacking allelic counterparts. Nonallelic candidate effector genes, which likely encode important pathogenicity factors, display distinct genome localization patterns and are less likely to be evolutionary conserved than those which are present as allelic pairs. This genomic diversity may promote rapid host adaptation and/or be related to the age of the sequenced isolate since last meiosis. Copyright © 2018 Schwessinger et al.


July 19, 2019

RNAi is a critical determinant of centromere evolution in closely related fungi.

The centromere DNA locus on a eukaryotic chromosome facilitates faithful chromosome segregation. Despite performing such a conserved function, centromere DNA sequence as well as the organization of sequence elements is rapidly evolving in all forms of eukaryotes. The driving force that facilitates centromere evolution remains an enigma. Here, we studied the evolution of centromeres in closely related species in the fungal phylum of Basidiomycota. Using ChIP-seq analysis of conserved inner kinetochore proteins, we identified centromeres in three closely related Cryptococcus species: two of which are RNAi-proficient, while the other lost functional RNAi. We find that the centromeres in the RNAi-deficient species are significantly shorter than those of the two RNAi-proficient species. While centromeres are LTR retrotransposon-rich in all cases, the RNAi-deficient species lost all full-length retroelements from its centromeres. In addition, centromeres in RNAi-proficient species are associated with a significantly higher level of cytosine DNA modifications compared with those of RNAi-deficient species. Furthermore, when an RNAi-proficient Cryptococcus species and its RNAi-deficient mutants were passaged under similar conditions, the centromere length was found to be occasionally shortened in RNAi mutants. In silico analysis of predicted centromeres in a group of closely related Ustilago species, also belonging to the Basidiomycota, were found to have undergone a similar transition in the centromere length in an RNAi-dependent fashion. Based on the correlation found in two independent basidiomycetous species complexes, we present evidence suggesting that the loss of RNAi and cytosine DNA methylation triggered transposon attrition, which resulted in shortening of centromere length during evolution. Copyright © 2018 the Author(s). Published by PNAS.


July 19, 2019

Utility of DNA, RNA, protein, and functional approaches to solve cryptic immunodeficiencies.

We report a female infant identified by newborn screening for severe combined immunodeficiencies (NBS SCID) with T cell lymphopenia (TCL). The patient had persistently elevated alpha-fetoprotein (AFP) with IgA deficiency, and elevated IgM. Gene sequencing for a SCID panel was uninformative. We sought to determine the cause of the immunodeficiency in this infant.We performed whole-exome sequencing (WES) on the patient and parents to identify a genetic diagnosis. Based on the WES result, we developed a novel flow cytometric panel for rapid assessment of DNA repair defects using blood samples. We also performed whole transcriptome sequencing (WTS) on fibroblast RNA from the patient and father for abnormal transcript analysis.WES revealed a pathogenic paternally inherited indel in ATM. We used the flow panel to assess several proteins in the DNA repair pathway in lymphocyte subsets. The patient had absent phosphorylation of ATM, resulting in absent or aberrant phosphorylation of downstream proteins, including ?H2AX. However, ataxia-telangiectasia (AT) is an autosomal recessive condition, and the abnormal functional data did not correspond with a single ATM variant. WTS revealed in-frame reciprocal fusion transcripts involving ATM and SLC35F2 indicating a chromosome 11 inversion within 11q22.3, of maternal origin. Inversion breakpoints were identified within ATM intron 16 and SLC35F2 intron 7.We identified a novel ATM-breaking chromosome 11 inversion in trans with a pathogenic indel (compound heterozygote) resulting in non-functional ATM protein, consistent with a diagnosis of AT. Utilization of several molecular and functional assays allowed successful resolution of this case.


July 19, 2019

Genome sequence of the progenitor of wheat A subgenome Triticum urartu.

Triticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes. Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing 3 , linked reads and optical mapping4,5. We assembled seven chromosome-scale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Comparative analyses with genomes of other grasses showed gene loss and amplification in the numbers of transposable elements in the T. urartu genome. Population genomics analysis of 147 T. urartu accessions from across the Fertile Crescent showed clustering of three groups, with differences in altitude and biostress, such as powdery mildew disease. The T. urartu genome assembly provides a valuable resource for studying genetic variation in wheat and related grasses, and promises to facilitate the discovery of genes that could be useful for wheat improvement.


July 19, 2019

Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma.

To understand how genomic heterogeneity of glioblastoma (GBM) contributes to poor therapy response, we performed DNA and RNA sequencing on GBM samples and the neurospheres and orthotopic xenograft models derived from them. We used the resulting dataset to show that somatic driver alterations including single-nucleotide variants, focal DNA alterations and oncogene amplification on extrachromosomal DNA (ecDNA) elements were in majority propagated from tumor to model systems. In several instances, ecDNAs and chromosomal alterations demonstrated divergent inheritance patterns and clonal selection dynamics during cell culture and xenografting. We infer that ecDNA was unevenly inherited by offspring cells, a characteristic that affects the oncogenic potential of cells with more or fewer ecDNAs. Longitudinal patient tumor profiling found that oncogenic ecDNAs are frequently retained throughout the course of disease. Our analysis shows that extrachromosomal elements allow rapid increase of genomic heterogeneity during GBM evolution, independently of chromosomal DNA alterations.


July 19, 2019

Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits.

The ancestors of Gossypium arboreum and Gossypium herbaceum provided the A subgenome for the modern cultivated allotetraploid cotton. Here, we upgraded the G. arboreum genome assembly by integrating different technologies. We resequenced 243?G. arboreum and G. herbaceum accessions to generate a map of genome variations and found that they are equally diverged from Gossypium raimondii. Independent analysis suggested that Chinese G. arboreum originated in South China and was subsequently introduced to the Yangtze and Yellow River regions. Most accessions with domestication-related traits experienced geographic isolation. Genome-wide association study (GWAS) identified 98 significant peak associations for 11 agronomically important traits in G. arboreum. A nonsynonymous substitution (cysteine-to-arginine substitution) of GaKASIII seems to confer substantial fatty acid composition (C16:0 and C16:1) changes in cotton seeds. Resistance to fusarium wilt disease is associated with activation of GaGSTF9 expression. Our work represents a major step toward understanding the evolution of the A genome of cotton.


July 19, 2019

Male-killing toxin in a bacterial symbiont of Drosophila.

Several lineages of symbiotic bacteria in insects selfishly manipulate host reproduction to spread in a population 1 , often by distorting host sex ratios. Spiroplasma poulsonii2,3 is a helical and motile, Gram-positive symbiotic bacterium that resides in a wide range of Drosophila species 4 . A notable feature of S. poulsonii is male killing, whereby the sons of infected female hosts are selectively killed during development1,2. Although male killing caused by S. poulsonii has been studied since the 1950s, its underlying mechanism is unknown. Here we identify an S. poulsonii protein, designated Spaid, whose expression induces male killing. Overexpression of Spaid in D. melanogaster kills males but not females, and induces massive apoptosis and neural defects, recapitulating the pathology observed in S. poulsonii-infected male embryos5-11. Our data suggest that Spaid targets the dosage compensation machinery on the male X chromosome to mediate its effects. Spaid contains ankyrin repeats and a deubiquitinase domain, which are required for its subcellular localization and activity. Moreover, we found a laboratory mutant strain of S. poulsonii with reduced male-killing ability and a large deletion in the spaid locus. Our study has uncovered a bacterial protein that affects host cellular machinery in a sex-specific way, which is likely to be the long-searched-for factor responsible for S. poulsonii-induced male killing.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.