Menu
July 7, 2019

High-coverage sequencing and annotated assemblies of the budgerigar genome.

Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome.We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) — the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300× raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing.Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits.


July 7, 2019

Quorum sensing activity of Aeromonas caviae strain YL12, a bacterium isolated from compost.

Quorum sensing is a well-studied cell-to-cell communication method that involves a cell-density dependent regulation of genes expression mediated by signalling molecules. In this study, a bacterium isolated from a plant material compost pile was found to possess quorum sensing activity based on bioassay screening. Isolate YL12 was identified using matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry and molecular typing using rpoD gene which identified the isolate as Aeromonas caviae. High resolution tandem mass spectrometry was subsequently employed to identify the N-acyl homoserine lactone profile of Aeromonas caviae YL12 and confirmed that this isolate produced two short chain N-acyl homoserine lactones, namely C4-HSL and C6, and the production was observed to be cell density-dependent. Using the thin layer chromatography (TLC) bioassay, both AHLs were found to activate C. violaceum CV026, whereas only C6-HSL was revealed to induce bioluminescence expression of E. coli [pSB401]. The data presented in this study will be the leading steps in understanding the role of quorum sensing in Aeromonas caviae strain YL12.


July 7, 2019

Stenotrophomonas comparative genomics reveals genes and functions that differentiate beneficial and pathogenic bacteria.

In recent years, the number of human infections caused by opportunistic pathogens has increased dramatically. Plant rhizospheres are one of the most typical natural reservoirs for these pathogens but they also represent a great source for beneficial microbes with potential for biotechnological applications. However, understanding the natural variation and possible differences between pathogens and beneficials is the main challenge in furthering these possibilities. The genus Stenotrophomonas contains representatives found to be associated with human and plant host.We used comparative genomics as well as transcriptomic and physiological approaches to detect significant borders between the Stenotrophomonas strains: the multi-drug resistant pathogenic S. maltophilia and the plant-associated strains S. maltophilia R551-3 and S. rhizophila DSM14405T (both are biocontrol agents). We found an overall high degree of sequence similarity between the genomes of all three strains. Despite the notable similarity in potential factors responsible for host invasion and antibiotic resistance, other factors including several crucial virulence factors and heat shock proteins were absent in the plant-associated DSM14405T. Instead, S. rhizophila DSM14405T possessed unique genes for the synthesis and transport of the plant-protective spermidine, plant cell-wall degrading enzymes, and high salinity tolerance. Moreover, the presence or absence of bacterial growth at 37°C was identified as a very simple method in differentiating between pathogenic and non-pathogenic isolates. DSM14405T is not able to grow at this human-relevant temperature, most likely in great part due to the absence of heat shock genes and perhaps also because of the up-regulation at increased temperatures of several genes involved in a suicide mechanism.While this study is important for understanding the mechanisms behind the emerging pattern of infectious diseases, it is, to our knowledge, the first of its kind to assess the risk of beneficial strains for biotechnological applications. We identified certain traits typical of pathogens such as growth at the human body temperature together with the production of heat shock proteins as opposed to a temperature-regulated suicide system that is harnessed by beneficials.


July 7, 2019

FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms.Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible.FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step.FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge datasets for detecting genetic engineering toolmarks, etc.


July 7, 2019

Site-specific genetic engineering of the Anopheles gambiae Y chromosome.

Despite its function in sex determination and its role in driving genome evolution, the Y chromosome remains poorly understood in most species. Y chromosomes are gene-poor, repeat-rich and largely heterochromatic and therefore represent a difficult target for genetic engineering. The Y chromosome of the human malaria vector Anopheles gambiae appears to be involved in sex determination although very little is known about both its structure and function. Here, we characterize a transgenic strain of this mosquito species, obtained by transposon-mediated integration of a transgene construct onto the Y chromosome. Using meganuclease-induced homologous repair we introduce a site-specific recombination signal onto the Y chromosome and show that the resulting docking line can be used for secondary integration. To demonstrate its utility, we study the activity of a germ-line-specific promoter when located on the Y chromosome. We also show that Y-linked fluorescent transgenes allow automated sex separation of this important vector species, providing the means to generate large single-sex populations. Our findings will aid studies of sex chromosome function and enable the development of male-exclusive genetic traits for vector control.


July 7, 2019

Molecular and biological characterization of a new isolate of guinea pig cytomegalovirus.

Development of a vaccine against congenital infection with human cytomegalovirus is complicated by the issue of re-infection, with subsequent vertical transmission, in women with pre-conception immunity to the virus. The study of experimental therapeutic prevention of re-infection would ideally be undertaken in a small animal model, such as the guinea pig cytomegalovirus (GPCMV) model, prior to human clinical trials. However, the ability to model re-infection in the GPCMV model has been limited by availability of only one strain of virus, the 22122 strain, isolated in 1957. In this report, we describe the isolation of a new GPCMV strain, the CIDMTR strain. This strain demonstrated morphological characteristics of a typical Herpesvirinae by electron microscopy. Illumina and PacBio sequencing demonstrated a genome of 232,778 nt. Novel open reading frames ORFs not found in reference strain 22122 included an additional MHC Class I homolog near the right genome terminus. The CIDMTR strain was capable of dissemination in immune compromised guinea pigs, and was found to be capable of congenital transmission in GPCMV-immune dams previously infected with salivary gland-adapted strain 22122 virus. The availability of a new GPCMV strain should facilitate study of re-infection in this small animal model.


July 7, 2019

The effects of read length, quality and quantity on microsatellite discovery and primer development: from Illumina to PacBio.

The advent of next-generation sequencing (NGS) technologies has transformed the way microsatellites are isolated for ecological and evolutionary investigations. Recent attempts to employ NGS for microsatellite discovery have used the 454, Illumina, and Ion Torrent platforms, but other methods including single-molecule real-time DNA sequencing (Pacific Biosciences or PacBio) remain viable alternatives. We outline a workflow from sequence quality control to microsatellite marker validation in three plant species using PacBio circular consensus sequencing (CCS). We then evaluate the performance of PacBio CCS in comparison with other NGS platforms for microsatellite isolation, through simulations that focus on variations in read length, read quantity and sequencing error rate. Although quality control of CCS reads reduced microsatellite yield by around 50%, hundreds of microsatellite loci that are expected to have improved conversion efficiency to functional markers were retrieved for each species. The simulations quantitatively validate the advantages of long reads and emphasize the detrimental effects of sequencing errors on NGS-enabled microsatellite development. In view of the continuing improvement in read length on NGS platforms, sequence quality and the corresponding strategies of quality control will become the primary factors to consider for effective microsatellite isolation. Among current options, PacBio CCS may be optimal for rapid, small-scale microsatellite development due to its flexibility in scaling sequencing effort, while platforms such as Illumina MiSeq will provide cost-efficient solutions for multispecies microsatellite projects. © 2014 John Wiley & Sons Ltd.


July 7, 2019

A fault-tolerant method for HLA typing with PacBio data.

Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the “phasing” issue.We proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes’ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads.The BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent.


July 7, 2019

Dubowitz syndrome is a complex comprised of multiple, genetically distinct and phenotypically overlapping disorders.

Dubowitz syndrome is a rare disorder characterized by multiple congenital anomalies, cognitive delay, growth failure, an immune defect, and an increased risk of blood dyscrasia and malignancy. There is considerable phenotypic variability, suggesting genetic heterogeneity. We clinically characterized and performed exome sequencing and high-density array SNP genotyping on three individuals with Dubowitz syndrome, including a pair of previously-described siblings (Patients 1 and 2, brother and sister) and an unpublished patient (Patient 3). Given the siblings’ history of bone marrow abnormalities, we also evaluated telomere length and performed radiosensitivity assays. In the siblings, exome sequencing identified compound heterozygosity for a known rare nonsense substitution in the nuclear ligase gene LIG4 (rs104894419, NM_002312.3:c.2440C>T) that predicts p.Arg814X (MAF:0.0002) and an NM_002312.3:c.613delT variant that predicts a p.Ser205Leufs*29 frameshift. The frameshift mutation has not been reported in 1000 Genomes, ESP, or ClinSeq. These LIG4 mutations were previously reported in the sibling sister; her brother had not been previously tested. Western blotting showed an absence of a ligase IV band in both siblings. In the third patient, array SNP genotyping revealed a de novo ~ 3.89 Mb interstitial deletion at chromosome 17q24.2 (chr 17:62,068,463-65,963,102, hg18), which spanned the known Carney complex gene PRKAR1A. In all three patients, a median lymphocyte telomere length of = 1st centile was observed and radiosensitivity assays showed increased sensitivity to ionizing radiation. Our work suggests that, in addition to dyskeratosis congenita, LIG4 and 17q24.2 syndromes also feature shortened telomeres; to confirm this, telomere length testing should be considered in both disorders. Taken together, our work and other reports on Dubowitz syndrome, as currently recognized, suggest that it is not a unitary entity but instead a collection of phenotypically similar disorders. As a clinical entity, Dubowitz syndrome will need continual re-evaluation and re-definition as its constituent phenotypes are determined.


July 7, 2019

High-throughput platform for real-time monitoring of biological processes by multicolor single-molecule fluorescence.

Zero-mode waveguides provide a powerful technology for studying single-molecule real-time dynamics of biological systems at physiological ligand concentrations. We customized a commercial zero-mode waveguide-based DNA sequencer for use as a versatile instrument for single-molecule fluorescence detection and showed that the system provides long fluorophore lifetimes with good signal to noise and low spectral cross-talk. We then used a ribosomal translation assay to show real-time fluidic delivery during data acquisition, showing it is possible to follow the conformation and composition of thousands of single biomolecules simultaneously through four spectral channels. This instrument allows high-throughput multiplexed dynamics of single-molecule biological processes over long timescales. The instrumentation presented here has broad applications to single-molecule studies of biological systems and is easily accessible to the biophysical community.


July 7, 2019

The oxygen-independent metabolism of cyclic monoterpenes in Castellaniella defragrans 65Phen.

The facultatively anaerobic betaproteobacterium Castellaniella defragrans 65Phen utilizes acyclic, monocyclic and bicyclic monoterpenes as sole carbon source under oxic as well as anoxic conditions. A biotransformation pathway of the acyclic ß-myrcene required linalool dehydratase-isomerase as initial enzyme acting on the hydrocarbon. An in-frame deletion mutant did not use myrcene, but was able to grow on monocyclic monoterpenes. The genome sequence and a comparative proteome analysis together with a random transposon mutagenesis were conducted to identify genes involved in the monocyclic monoterpene metabolism. Metabolites accumulating in cultures of transposon and in-frame deletion mutants disclosed the degradation pathway.Castellaniella defragrans 65Phen oxidizes the monocyclic monoterpene limonene at the primary methyl group forming perillyl alcohol. The genome of 3.95 Mb contained a 70 kb genome island coding for over 50 proteins involved in the monoterpene metabolism. This island showed higher homology to genes of another monoterpene-mineralizing betaproteobacterium, Thauera terpenica 58EuT, than to genomes of the family Alcaligenaceae, which harbors the genus Castellaniella. A collection of 72 transposon mutants unable to grow on limonene contained 17 inactivated genes, with 46 mutants located in the two genes ctmAB (cyclic terpene metabolism). CtmA and ctmB were annotated as FAD-dependent oxidoreductases and clustered together with ctmE, a 2Fe-2S ferredoxin gene, and ctmF, coding for a NADH:ferredoxin oxidoreductase. Transposon mutants of ctmA, B or E did not grow aerobically or anaerobically on limonene, but on perillyl alcohol. The next steps in the pathway are catalyzed by the geraniol dehydrogenase GeoA and the geranial dehydrogenase GeoB, yielding perillic acid. Two transposon mutants had inactivated genes of the monoterpene ring cleavage (mrc) pathway. 2-Methylcitrate synthase and 2-methylcitrate dehydratase were also essential for the monoterpene metabolism but not for growth on acetate.The genome of Castellaniella defragrans 65Phen is related to other genomes of Alcaligenaceae, but contains a genomic island with genes of the monoterpene metabolism. Castellaniella defragrans 65Phen degrades limonene via a limonene dehydrogenase and the oxidation of perillyl alcohol. The initial oxidation at the primary methyl group is independent of molecular oxygen.


July 7, 2019

Automated ensemble assembly and validation of microbial genomes.

The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible.To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers.Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.


July 7, 2019

Complete sequences of organelle genomes from the medicinal plant Rhazya stricta (Apocynaceae) and contrasting patterns of mitochondrial genome evolution across asterids.

Rhazya stricta is native to arid regions in South Asia and the Middle East and is used extensively in folk medicine to treat a wide range of diseases. In addition to generating genomic resources for this medicinally important plant, analyses of the complete plastid and mitochondrial genomes and a nuclear transcriptome from Rhazya provide insights into inter-compartmental transfers between genomes and the patterns of evolution among eight asterid mitochondrial genomes.The 154,841 bp plastid genome is highly conserved with gene content and order identical to the ancestral organization of angiosperms. The 548,608 bp mitochondrial genome exhibits a number of phenomena including the presence of recombinogenic repeats that generate a multipartite organization, transferred DNA from the plastid and nuclear genomes, and bidirectional DNA transfers between the mitochondrion and the nucleus. The mitochondrial genes sdh3 and rps14 have been transferred to the nucleus and have acquired targeting presequences. In the case of rps14, two copies are present in the nucleus; only one has a mitochondrial targeting presequence and may be functional. Phylogenetic analyses of both nuclear and mitochondrial copies of rps14 across angiosperms suggests Rhazya has experienced a single transfer of this gene to the nucleus, followed by a duplication event. Furthermore, the phylogenetic distribution of gene losses and the high level of sequence divergence in targeting presequences suggest multiple, independent transfers of both sdh3 and rps14 across asterids. Comparative analyses of mitochondrial genomes of eight sequenced asterids indicates a complicated evolutionary history in this large angiosperm clade with considerable diversity in genome organization and size, repeat, gene and intron content, and amount of foreign DNA from the plastid and nuclear genomes.Organelle genomes of Rhazya stricta provide valuable information for improving the understanding of mitochondrial genome evolution among angiosperms. The genomic data have enabled a rigorous examination of the gene transfer events. Rhazya is unique among the eight sequenced asterids in the types of events that have shaped the evolution of its mitochondrial genome. Furthermore, the organelle genomes of R. stricta provide valuable genomic resources for utilizing this important medicinal plant in biotechnology applications.


July 7, 2019

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.