Menu
July 19, 2019  |  

AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences.

Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present ‘AnnoTALE’, a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities.


July 19, 2019  |  

Genome analysis of the fruiting body forming myxobacterium Chondromyces crocatus reveals high potential for natural product biosynthesis.

Here we report the first complete genome sequence of the type strain of the myxobacterial genus Chondromyces – Chondromyces crocatus Cm c5. It presents one of the largest prokaryotic genomes featuring a single circular chromosome and no plasmids. Analysis revealed an enlarged set of tRNA genes, along with reduced pressure on preferred codon usage compared to other bacterial genomes. The large coding capacity and the plethora of encoded secondary metabolite biosynthetic gene clusters is in line with the capability of Cm c5 to produce an arsenal of anti-bacterial, anti-fungal and cytotoxic compounds. Known pathways of the ajudazol, chondramide, chondrochloren, crocacin, crocapeptin and thuggacin compound families are complemented by many more natural compound biosynthetic gene clusters in the chromosome. Whole-genome comparison of the fruiting-body forming type-strain (Cm c5 = DSM 14714) to an accustomed laboratory strain which has lost this ability (Cm c5 fr-) revealed genetic changes in three loci. In addition to the low synteny found with the closest sequenced representative of the same family, Sorangium cellulosum, extensive genetic information duplication, and broad application of eukaryotic-type signal transduction systems are hallmarks of this 11.3 Mbp prokaryotic genome. Copyright © 2016, American Society for Microbiology. All Rights Reserved.


July 19, 2019  |  

Radical remodeling of the Y chromosome in a recent radiation of malaria mosquitoes.

Y chromosomes control essential male functions in many species, including sex determination and fertility. However, because of obstacles posed by repeat-rich heterochromatin, knowledge of Y chromosome sequences is limited to a handful of model organisms, constraining our understanding of Y biology across the tree of life. Here, we leverage long single-molecule sequencing to determine the content and structure of the nonrecombining Y chromosome of the primary African malaria mosquito, Anopheles gambiae. We find that the An. gambiae Y consists almost entirely of a few massively amplified, tandemly arrayed repeats, some of which can recombine with similar repeats on the X chromosome. Sex-specific genome resequencing in a recent species radiation, the An. gambiae complex, revealed rapid sequence turnover within An. gambiae and among species. Exploiting 52 sex-specific An. gambiae RNA-Seq datasets representing all developmental stages, we identified a small repertoire of Y-linked genes that lack X gametologs and are not Y-linked in any other species except An. gambiae, with the notable exception of YG2, a candidate male-determining gene. YG2 is the only gene conserved and exclusive to the Y in all species examined, yet sequence similarity to YG2 is not detectable in the genome of a more distant mosquito relative, suggesting rapid evolution of Y chromosome genes in this highly dynamic genus of malaria vectors. The extensive characterization of the An. gambiae Y provides a long-awaited foundation for studying male mosquito biology, and will inform novel mosquito control strategies based on the manipulation of Y chromosomes.


July 19, 2019  |  

Nested Russian doll-like genetic mobility drives rapid dissemination of the Carbapenem resistance gene blaKPC

The recent widespread emergence of carbapenem resistance in Enterobacteriaceae is a major public health concern, as carbapenems are a therapy of last resort against this family of common bacterial pathogens. Resistance genes can mobilize via various mechanisms, including conjugation and transposition; however, the importance of this mobility in short-term evolution, such as within nosocomial outbreaks, is unknown. Using a combination of short- and long-read whole-genome sequencing of 281 blaKPC-positive Enterobacteriaceae isolates from a single hospital over 5 years, we demonstrate rapid dissemination of this carbapenem resistance gene to multiple species, strains, and plasmids. Mobility of blaKPC occurs at multiple nested genetic levels, with transmission of blaKPC strains between individuals, frequent transfer of blaKPC plasmids between strains/species, and frequent transposition of blaKPC transposon Tn4401 between plasmids. We also identify a common insertion site for Tn4401 within various Tn2-like elements, suggesting that homologous recombination between Tn2-like elements has enhanced the spread of Tn4401 between different plasmid vectors. Furthermore, while short-read sequencing has known limitations for plasmid assembly, various studies have attempted to overcome this by the use of reference-based methods. We also demonstrate that, as a consequence of the genetic mobility observed in this study, plasmid structures can be extremely dynamic, and therefore these reference-based methods, as well as traditional partial typing methods, can produce very misleading conclusions. Overall, our findings demonstrate that nonclonal resistance gene dissemination can be extremely rapid, presenting significant challenges for public health surveillance and achieving effective control of antibiotic resistance. Copyright © 2016 Sheppard et al.


July 19, 2019  |  

Initial assessment of the molecular epidemiology of blaNDM-1 in Colombia.

We report complete genome sequences of fourblaNDM-1-harboring Gram-negative multidrug resistant (MDR) isolates from Colombia. TheblaNDM-1genes were located 193Kb-Inc FIA, 178Kb-Inc A/C2 and 47Kb (unknown Inc type) plasmids. MLST revealed that isolates belong to ST10 (Escherichia coli), ST392 (Klebsiella pneumoniae), and ST322 and ST464 (Acinetobacter baumanniiandA. nosocomialis, respectively). Our analysis identified that the Inc A/C2 plasmid inE. colicontained a novel complex transposon (Tn125and Tn5393with 3 copies ofblaNDM-1) and a recombination “hotspot” for the acquisition of new resistance determinants. Copyright © 2016, American Society for Microbiology. All Rights Reserved.


July 19, 2019  |  

AgIn: Measuring the landscape of CpG methylation of individual repetitive elements.

Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it produces long read lengths, and its kinetic information is sensitive to DNA modifications.We propose a novel linear-time algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Using a practical read coverage of ~30-fold from an inbred strain medaka (Oryzias latipes), we observed that both the sensitivity and precision of our method on individual CpG sites were ~93.7%. We also observed a high correlation coefficient (R?=?0.884) between our method and bisulfite sequencing, and for 92.0% of CpG sites, methylation levels ranging over [0, 1] were in concordance within an acceptable difference 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and hypomethylation and detecting hypomethylation hot spots of LTRs and LINEs. We uncovered the methylation states for nearly identical active transposons, two novel LINE insertions of identity ~99% and length 6050 base pairs (bp) in the human genome, and 16 Tol2 elements of identity >99.8% and length 4682?bp in the medaka genome.AgIn (Aggregate on Intervals) is available at: https://github.com/hacone/AgIn CONTACT: ysuzuki@cb.k.u-tokyo.ac.jp, moris@cb.k.u-tokyo.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author(s) 2016. Published by Oxford University Press.


July 19, 2019  |  

Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads.

Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups.


July 19, 2019  |  

Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63.

Asian cultivated rice consists of two subspecies: Oryza sativa subsp. indica and O. sativa subsp. japonica Despite the fact that indica rice accounts for over 70% of total rice production worldwide and is genetically much more diverse, a high-quality reference genome for indica rice has yet to be published. We conducted map-based sequencing of two indica rice lines, Zhenshan 97 (ZS97) and Minghui 63 (MH63), which represent the two major varietal groups of the indica subspecies and are the parents of an elite Chinese hybrid. The genome sequences were assembled into 237 (ZS97) and 181 (MH63) contigs, with an accuracy >99.99%, and covered 90.6% and 93.2% of their estimated genome sizes. Comparative analyses of these two indica genomes uncovered surprising structural differences, especially with respect to inversions, translocations, presence/absence variations, and segmental duplications. Approximately 42% of nontransposable element related genes were identical between the two genomes. Transcriptome analysis of three tissues showed that 1,059-2,217 more genes were expressed in the hybrid than in the parents and that the expressed genes in the hybrid were much more diverse due to their divergence between the parental genomes. The public availability of two high-quality reference genomes for the indica subspecies of rice will have large-ranging implications for plant biology and crop genetic improvement.


July 19, 2019  |  

Living apart together: crosstalk between the core and supernumerary genomes in a fungal plant pathogen.

Eukaryotes display remarkable genome plasticity, which can include supernumerary chromosomes that differ markedly from the core chromosomes. Despite the widespread occurrence of supernumerary chromosomes in fungi, their origin, relation to the core genome and the reason for their divergent characteristics are still largely unknown. The complexity of genome assembly due to the presence of repetitive DNA partially accounts for this.Here we use single-molecule real-time (SMRT) sequencing to assemble the genome of a prominent fungal wheat pathogen, Fusarium poae, including at least one supernumerary chromosome. The core genome contains limited transposable elements (TEs) and no gene duplications, while the supernumerary genome holds up to 25 % TEs and multiple gene duplications. The core genome shows all hallmarks of repeat-induced point mutation (RIP), a defense mechanism against TEs, specific for fungi. The absence of RIP on the supernumerary genome accounts for the differences between the two (sub)genomes, and results in a functional crosstalk between them. The supernumerary genome is a reservoir for TEs that migrate to the core genome, and even large blocks of supernumerary sequence (>200 kb) have recently translocated to the core. Vice versa, the supernumerary genome acts as a refuge for genes that are duplicated from the core genome.For the first time, a mechanism was determined that explains the differences that exist between the core and supernumerary genome in fungi. Different biology rather than origin was shown to be responsible. A “living apart together” crosstalk exists between the core and supernumerary genome, accelerating chromosomal and organismal evolution.


July 19, 2019  |  

The deep origin and recent loss of venom toxin genes in rattlesnakes.

The genetic origin of novel traits is a central but challenging puzzle in evolutionary biology. Among snakes, phospholipase A2 (PLA2)-related toxins have evolved in different lineages to function as potent neurotoxins, myotoxins, or hemotoxins. Here, we traced the genomic origin and evolution of PLA2 toxins by examining PLA2 gene number, organization, and expression in both neurotoxic and non-neurotoxic rattlesnakes. We found that even though most North American rattlesnakes do not produce neurotoxins, the genes of a specialized heterodimeric neurotoxin predate the origin of rattlesnakes and were present in their last common ancestor (~22 mya). The neurotoxin genes were then deleted independently in the lineages leading to the Western Diamondback (Crotalus atrox) and Eastern Diamondback (C. adamanteus) rattlesnakes (~6 mya), while a PLA2 myotoxin gene retained in C. atrox was deleted from the neurotoxic Mojave rattlesnake (C. scutulatus; ~4 mya). The rapid evolution of PLA2 gene number appears to be due to transposon invasion that provided a template for non-allelic homologous recombination. Copyright © 2016 Elsevier Ltd. All rights reserved.


July 19, 2019  |  

Rapid functional and sequence differentiation of a tandemly repeated species-specific multigene family in Drosophila.

Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our understanding of how mutational mechanisms and evolutionary forces shape the structural and functional evolution of these clusters is hindered by the high sequence identity among the copies, which typically results in their inaccurate representation in genome assemblies. The presumed testis-specific, chimeric gene Sdic originated, and tandemly expanded in Drosophila melanogaster, contributing to increased male-male competition. Using various types of massively parallel sequencing data, we studied the organization, sequence evolution, and functional attributes of the different Sdic copies. By leveraging long-read sequencing data, we uncovered both copy number and order differences from the currently accepted annotation for the Sdic region. Despite evidence for pervasive gene conversion affecting the Sdic copies, we also detected signatures of two episodes of diversifying selection, which have contributed to the evolution of a variety of C-termini and miRNA binding site compositions. Expression analyses involving RNA-seq datasets from 59 different biological conditions revealed distinctive expression breadths among the copies, with three copies being transcribed in females, opening the possibility to a sexually antagonistic effect. Phenotypic assays using Sdic knock-out strains indicated that should this antagonistic effect exist, it does not compromise female fertility. Our results strongly suggest that the genome consolidation of the Sdic gene cluster is more the result of a quick exploration of different paths of molecular tinkering by different copies than a mere dosage increase, which could be a recurrent evolutionary outcome in the presence of persistent sexual selection. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


July 19, 2019  |  

Mechanisms of evolution in high-consequence drug resistance plasmids.

The dissemination of resistance among bacteria has been facilitated by the fact that resistance genes are usually located on a diverse and evolving set of transmissible plasmids. However, the mechanisms generating diversity and enabling adaptation within highly successful resistance plasmids have remained obscure, despite their profound clinical significance. To understand these mechanisms, we have performed a detailed analysis of the mobilome (the entire mobile genetic element content) of a set of previously sequenced carbapenemase-producing Enterobacteriaceae (CPE) from the National Institutes of Health Clinical Center. This analysis revealed that plasmid reorganizations occurring in the natural context of colonization of human hosts were overwhelmingly driven by genetic rearrangements carried out by replicative transposons working in concert with the process of homologous recombination. A more complete understanding of the molecular mechanisms and evolutionary forces driving rearrangements in resistance plasmids may lead to fundamentally new strategies to address the problem of antibiotic resistance.The spread of antibiotic resistance among Gram-negative bacteria is a serious public health threat, as it can critically limit the types of drugs that can be used to treat infected patients. In particular, carbapenem-resistant members of the Enterobacteriaceae family are responsible for a significant and growing burden of morbidity and mortality. Here, we report on the mechanisms underlying the evolution of several plasmids carried by previously sequenced clinical Enterobacteriaceae isolates from the National Institutes of Health Clinical Center (NIH CC). Our ability to track genetic rearrangements that occurred within resistance plasmids was dependent on accurate annotation of the mobile genetic elements within the plasmids, which was greatly aided by access to long-read DNA sequencing data and knowledge of their mechanisms. Mobile genetic elements such as transposons and integrons have been strongly associated with the rapid spread of genes responsible for antibiotic resistance. Understanding the consequences of their actions allowed us to establish unambiguous evolutionary relationships between plasmids in the analysis set. Copyright © 2016 He et al.


July 19, 2019  |  

Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster.

Highly repetitive satellite DNA (satDNA) repeats are found in most eukaryotic genomes. SatDNAs are rapidly evolving and have roles in genome stability and chromosome segregation. Their repetitive nature poses a challenge for genome assembly and makes progress on the detailed study of satDNA structure difficult. Here, we use single-molecule sequencing long reads from Pacific Biosciences (PacBio) to determine the detailed structure of all major autosomal complex satDNA loci in Drosophila melanogaster, with a particular focus on the 260-bp and Responder satellites. We determine the optimal de novo assembly methods and parameter combinations required to produce a high-quality assembly of these previously unassembled satDNA loci and validate this assembly using molecular and computational approaches. We determined that the computationally intensive PBcR-BLASR assembly pipeline yielded better assemblies than the faster and more efficient pipelines based on the MHAP hashing algorithm, and it is essential to validate assemblies of repetitive loci. The assemblies reveal that satDNA repeats are organized into large arrays interrupted by transposable elements. The repeats in the center of the array tend to be homogenized in sequence, suggesting that gene conversion and unequal crossovers lead to repeat homogenization through concerted evolution, although the degree of unequal crossing over may differ among complex satellite loci. We find evidence for higher-order structure within satDNA arrays that suggest recent structural rearrangements. These assemblies provide a platform for the evolutionary and functional genomics of satDNAs in pericentric heterochromatin. © 2017 Khost et al.; Published by Cold Spring Harbor Laboratory Press.


July 19, 2019  |  

Long-read sequencing uncovers the adaptive topography of a carnivorous plant genome.

Utricularia gibba, the humped bladderwort, is a carnivorous plant that retains a tiny nuclear genome despite at least two rounds of whole genome duplication (WGD) since common ancestry with grapevine and other species. We used a third-generation genome assembly with several complete chromosomes to reconstruct the two most recent lineage-specific ancestral genomes that led to the modern U. gibba genome structure. Patterns of subgenome dominance in the most recent WGD, both architectural and transcriptional, are suggestive of allopolyploidization, which may have generated genomic novelty and led to instantaneous speciation. Syntenic duplicates retained in polyploid blocks are enriched for transcription factor functions, whereas gene copies derived from ongoing tandem duplication events are enriched in metabolic functions potentially important for a carnivorous plant. Among these are tandem arrays of cysteine protease genes with trap-specific expression that evolved within a protein family known to be useful in the digestion of animal prey. Further enriched functions among tandem duplicates (also with trap-enhanced expression) include peptide transport (intercellular movement of broken-down prey proteins), ATPase activities (bladder-trap acidification and transmembrane nutrient transport), hydrolase and chitinase activities (breakdown of prey polysaccharides), and cell-wall dynamic components possibly associated with active bladder movements. Whereas independently polyploid Arabidopsis syntenic gene duplicates are similarly enriched for transcriptional regulatory activities, Arabidopsis tandems are distinct from those of U. gibba, while still metabolic and likely reflecting unique adaptations of that species. Taken together, these findings highlight the special importance of tandem duplications in the adaptive landscapes of a carnivorous plant genome.


July 19, 2019  |  

Improved maize reference genome with single-molecule technologies.

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.