Menu
July 7, 2019

High-coverage sequencing and annotated assemblies of the budgerigar genome.

Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome.We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) — the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300× raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing.Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits.


July 7, 2019

The functions of DNA methylation by CcrM in Caulobacter crescentus: a global approach.

DNA methylation is involved in a diversity of processes in bacteria, including maintenance of genome integrity and regulation of gene expression. Here, using Caulobacter crescentus as a model, we exploit genome-wide experimental methods to uncover the functions of CcrM, a DNA methyltransferase conserved in most Alphaproteobacteria. Using single molecule sequencing, we provide evidence that most CcrM target motifs (GANTC) switch from a fully methylated to a hemi-methylated state when they are replicated, and back to a fully methylated state at the onset of cell division. We show that DNA methylation by CcrM is not required for the control of the initiation of chromosome replication or for DNA mismatch repair. By contrast, our transcriptome analysis shows that >10% of the genes are misexpressed in cells lacking or constitutively over-expressing CcrM. Strikingly, GANTC methylation is needed for the efficient transcription of dozens of genes that are essential for cell cycle progression, in particular for DNA metabolism and cell division. Many of them are controlled by promoters methylated by CcrM and co-regulated by other global cell cycle regulators, demonstrating an extensive cross talk between DNA methylation and the complex regulatory network that controls the cell cycle of C. crescentus and, presumably, of many other Alphaproteobacteria.


July 7, 2019

Stenotrophomonas comparative genomics reveals genes and functions that differentiate beneficial and pathogenic bacteria.

In recent years, the number of human infections caused by opportunistic pathogens has increased dramatically. Plant rhizospheres are one of the most typical natural reservoirs for these pathogens but they also represent a great source for beneficial microbes with potential for biotechnological applications. However, understanding the natural variation and possible differences between pathogens and beneficials is the main challenge in furthering these possibilities. The genus Stenotrophomonas contains representatives found to be associated with human and plant host.We used comparative genomics as well as transcriptomic and physiological approaches to detect significant borders between the Stenotrophomonas strains: the multi-drug resistant pathogenic S. maltophilia and the plant-associated strains S. maltophilia R551-3 and S. rhizophila DSM14405T (both are biocontrol agents). We found an overall high degree of sequence similarity between the genomes of all three strains. Despite the notable similarity in potential factors responsible for host invasion and antibiotic resistance, other factors including several crucial virulence factors and heat shock proteins were absent in the plant-associated DSM14405T. Instead, S. rhizophila DSM14405T possessed unique genes for the synthesis and transport of the plant-protective spermidine, plant cell-wall degrading enzymes, and high salinity tolerance. Moreover, the presence or absence of bacterial growth at 37°C was identified as a very simple method in differentiating between pathogenic and non-pathogenic isolates. DSM14405T is not able to grow at this human-relevant temperature, most likely in great part due to the absence of heat shock genes and perhaps also because of the up-regulation at increased temperatures of several genes involved in a suicide mechanism.While this study is important for understanding the mechanisms behind the emerging pattern of infectious diseases, it is, to our knowledge, the first of its kind to assess the risk of beneficial strains for biotechnological applications. We identified certain traits typical of pathogens such as growth at the human body temperature together with the production of heat shock proteins as opposed to a temperature-regulated suicide system that is harnessed by beneficials.


July 7, 2019

FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms.Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible.FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step.FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge datasets for detecting genetic engineering toolmarks, etc.


July 7, 2019

vanG element insertions within a conserved chromosomal site conferring vancomycin resistance to Streptococcus agalactiae and Streptococcus anginosus.

Three vancomycin-resistant streptococcal strains carrying vanG elements (two invasive Streptococcus agalactiae isolates [GBS-NY and GBS-NM, both serotype II and multilocus sequence type 22] and one Streptococcus anginosus [Sa]) were examined. The 45,585-bp elements found within Sa and GBS-NY were nearly identical (together designated vanG-1) and shared near-identity over an ~15-kb overlap with a previously described vanG element from Enterococcus faecalis. Unexpectedly, vanG-1 shared much less homology with the 49,321-bp vanG-2 element from GBS-NM, with widely different levels (50% to 99%) of sequence identity shared among 44 related open reading frames. Immediately adjacent to both vanG-1 and vanG-2 were 44,670-bp and 44,680-bp integrative conjugative element (ICE)-like sequences, designated ICE-r, that were nearly identical in the two group B streptococcal (GBS) strains. The dual vanG and ICE-r elements from both GBS strains were inserted at the same position, between bases 1328 and 1329, within the identical RNA methyltransferase (rumA) genes. A GenBank search revealed that although most GBS strains contained insertions within this specific site, only sequence type 22 (ST22) GBS strains contained highly related ICE-r derivatives. The vanG-1 element in Sa was also inserted within this position corresponding to its rumA homolog adjacent to an ICE-r derivative. vanG-1 insertions were previously reported within the same relative position in the E. faecalis rumA homolog. An ICE-r sequence perfectly conserved with respect to its counterpart in GBS-NY was apparent within the same site of the rumA homolog of a Streptococcus dysgalactiae subsp. equisimilis strain. Additionally, homologous vanG-like elements within the conserved rumA target site were evident in Roseburia intestinalis. Importance: These three streptococcal strains represent the first known vancomycin-resistant strains of their species. The collective observations made from these strains reveal a specific hot spot for insertional elements that is conserved between streptococci and different Gram-positive species. The two GBS strains potentially represent a GBS lineage that is predisposed to insertion of vanG elements. Copyright © 2014 Srinivasan et al.


July 7, 2019

Genomic insights into the taxonomic status of the three subspecies of Bacillus subtilis.

Bacillus subtilis contains three subspecies, i.e., subspecies subtilis, spizizenii, and inaquosorum. As these subspecies are phenotypically indistinguishable, their differentiation has relied on phylogenetic analysis of multiple protein-coding gene sequences. B. subtilis subsp. inaquosorum is a recently proposed taxon that encompasses strain KCTC 13429(T) and related strains, which were previously classified as members of subspecies spizizenii. However, DNA-DNA hybridization (DDH) values among the three subspecies raised a question as to their independence. Thus, we evaluated the taxonomic status of subspecies inaquosorum using genome-based comparative analysis. In contrast to the previous experimental values of DDH, the inter-genomic relatedness inferred by average nucleotide identity (ANI) values indicated that subspecies inaquosorum and spizizenii were sufficiently different from subspecies subtilis and hence raised the possibility that the former two could be classified as separate species from B. subtilis. The genome-based tree also supported the separation of the two subspecies from B. subtilis. The exclusive presence of a subtilin synthesis system in subspecies spizizenii was a remarkable genetic characteristic that could even distinguish subspecies spizizenii from subspecies inaquosorum in addition to the low ANI values (<95%). Conclusively, the genome-based data obtained in this study demonstrated that subspecies inaquosorum and spizizenii are clearly distinguished from subspecies subtilis, and raises the possibility that these two subspecies could be classified as separate species from B. subtilis. In addition, the low ANI values between subspecies inaquosorum and spizizenii and the exclusive presence of subtilin synthesis genes in subspecies spizizenii also suggest circumscription of these two subspecies at the species level. Copyright © 2013 Elsevier GmbH. All rights reserved.


July 7, 2019

The effects of read length, quality and quantity on microsatellite discovery and primer development: from Illumina to PacBio.

The advent of next-generation sequencing (NGS) technologies has transformed the way microsatellites are isolated for ecological and evolutionary investigations. Recent attempts to employ NGS for microsatellite discovery have used the 454, Illumina, and Ion Torrent platforms, but other methods including single-molecule real-time DNA sequencing (Pacific Biosciences or PacBio) remain viable alternatives. We outline a workflow from sequence quality control to microsatellite marker validation in three plant species using PacBio circular consensus sequencing (CCS). We then evaluate the performance of PacBio CCS in comparison with other NGS platforms for microsatellite isolation, through simulations that focus on variations in read length, read quantity and sequencing error rate. Although quality control of CCS reads reduced microsatellite yield by around 50%, hundreds of microsatellite loci that are expected to have improved conversion efficiency to functional markers were retrieved for each species. The simulations quantitatively validate the advantages of long reads and emphasize the detrimental effects of sequencing errors on NGS-enabled microsatellite development. In view of the continuing improvement in read length on NGS platforms, sequence quality and the corresponding strategies of quality control will become the primary factors to consider for effective microsatellite isolation. Among current options, PacBio CCS may be optimal for rapid, small-scale microsatellite development due to its flexibility in scaling sequencing effort, while platforms such as Illumina MiSeq will provide cost-efficient solutions for multispecies microsatellite projects. © 2014 John Wiley & Sons Ltd.


July 7, 2019

A fault-tolerant method for HLA typing with PacBio data.

Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the “phasing” issue.We proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes’ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads.The BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent.


July 7, 2019

High-throughput platform for real-time monitoring of biological processes by multicolor single-molecule fluorescence.

Zero-mode waveguides provide a powerful technology for studying single-molecule real-time dynamics of biological systems at physiological ligand concentrations. We customized a commercial zero-mode waveguide-based DNA sequencer for use as a versatile instrument for single-molecule fluorescence detection and showed that the system provides long fluorophore lifetimes with good signal to noise and low spectral cross-talk. We then used a ribosomal translation assay to show real-time fluidic delivery during data acquisition, showing it is possible to follow the conformation and composition of thousands of single biomolecules simultaneously through four spectral channels. This instrument allows high-throughput multiplexed dynamics of single-molecule biological processes over long timescales. The instrumentation presented here has broad applications to single-molecule studies of biological systems and is easily accessible to the biophysical community.


July 7, 2019

Compact genome of the Antarctic midge is likely an adaptation to an extreme environment.

The midge, Belgica antarctica, is the only insect endemic to Antarctica, and thus it offers a powerful model for probing responses to extreme temperatures, freeze tolerance, dehydration, osmotic stress, ultraviolet radiation and other forms of environmental stress. Here we present the first genome assembly of an extremophile, the first dipteran in the family Chironomidae, and the first Antarctic eukaryote to be sequenced. At 99 megabases, B. antarctica has the smallest insect genome sequenced thus far. Although it has a similar number of genes as other Diptera, the midge genome has very low repeat density and a reduction in intron length. Environmental extremes appear to constrain genome architecture, not gene content. The few transposable elements present are mainly ancient, inactive retroelements. An abundance of genes associated with development, regulation of metabolism and responses to external stimuli may reflect adaptations for surviving in this harsh environment.


July 7, 2019

First genome sequences of Achromobacter phages reveal new members of the N4 family.

Multi-resistant Achromobacter xylosoxidans has been recognized as an emerging pathogen causing nosocomially acquired infections during the last years. Phages as natural opponents could be an alternative to fight such infections. Bacteriophages against this opportunistic pathogen were isolated in a recent study. This study shows a molecular analysis of two podoviruses and reveals first insights into the genomic structure of Achromobacter phages so far.Growth curve experiments and adsorption kinetics were performed for both phages. Adsorption and propagation in cells were visualized by electron microscopy. Both phage genomes were sequenced with the PacBio RS II system based on single molecule, real-time (SMRT) technology and annotated with several bioinformatic tools. To further elucidate the evolutionary relationships between the phage genomes, a phylogenomic analysis was conducted using the genome Blast Distance Phylogeny approach (GBDP).In this study, we present the first detailed analysis of genome sequences of two Achromobacter phages so far. Phages JWAlpha and JWDelta were isolated from two different waste water treatment plants in Germany. Both phages belong to the Podoviridae and contain linear, double-stranded DNA with a length of 72329 bp and 73659 bp, respectively. 92 and 89 putative open reading frames were identified for JWAlpha and JWDelta, respectively, by bioinformatic analysis with several tools. The genomes have nearly the same organization and could be divided into different clusters for transcription, replication, host interaction, head and tail structure and lysis. Detailed annotation via protein comparisons with BLASTP revealed strong similarities to N4-like phages.Analysis of the genomes of Achromobacter phages JWAlpha and JWDelta and comparisons of different gene clusters with other phages revealed that they might be strongly related to other N4-like phages, especially of the Escherichia group. Although all these phages show a highly conserved genomic structure and partially strong similarities at the amino acid level, some differences could be identified. Those differences, e.g. the existence of specific genes for replication or host interaction in some N4-like phages, seem to be interesting targets for further examination of function and specific mechanisms, which might enlighten the mechanism of phage establishment in the host cell after infection.


July 7, 2019

proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.

Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects.Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing.proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Genome sequence of the chromate-resistant bacterium Leucobacter salsicius type strain M1-8(T.).

Leucobacter salsicius M1-8(T) is a member of the Microbacteriaceae family within the class Actinomycetales. This strain is a Gram-positive, rod-shaped bacterium and was previously isolated from a Korean fermented food. Most members of the genus Leucobacter are chromate-resistant and this feature could be exploited in biotechnological applications. However, the genus Leucobacter is poorly characterized at the genome level, despite its potential importance. Thus, the present study determined the features of Leucobacter salsicius M1-8(T), as well as its genome sequence and annotation. The genome comprised 3,185,418 bp with a G+C content of 64.5%, which included 2,865 protein-coding genes and 68 RNA genes. This strain possessed two predicted genes associated with chromate resistance, which might facilitate its growth in heavy metal-rich environments.


July 7, 2019

LUMPY: a probabilistic framework for structural variant discovery.

Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.


July 7, 2019

Association mapping, patterns of linkage disequilibrium and selection in the vicinity of the PHYTOCHROME C gene in pearl millet.

Linkage analysis confirmed the association in the region of PHYC in pearl millet. The comparison of genes found in this region suggests that PHYC is the best candidate. Major efforts are currently underway to dissect the phenotype-genotype relationship in plants and animals using existing populations. This method exploits historical recombinations accumulated in these populations. However, linkage disequilibrium sometimes extends over a relatively long distance, particularly in genomic regions containing polymorphisms that have been targets for selection. In this case, many genes in the region could be statistically associated with the trait shaped by the selected polymorphism. Statistical analyses could help in identifying the best candidate genes into such a region where an association is found. In a previous study, we proposed that a fragment of the PHYTOCHROME C gene (PHYC) is associated with flowering time and morphological variations in pearl millet. In the present study, we first performed linkage analyses using three pearl millet F2 families to confirm the presence of a QTL in the vicinity of PHYC. We then analyzed a wider genomic region of ~100 kb around PHYC to pinpoint the gene that best explains the association with the trait in this region. A panel of 90 pearl millet inbred lines was used to assess the association. We used a Markov chain Monte Carlo approach to compare 75 markers distributed along this 100-kb region. We found the best candidate markers on the PHYC gene. Signatures of selection in this region were assessed in an independent data set and pointed to the same gene. These results foster confidence in the likely role of PHYC in phenotypic variation and encourage the development of functional studies.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.