Menu
July 7, 2019

Complete genome sequences of 11 Bordetella pertussis strains representing the pandemic ptxP3 lineage.

Pathogen adaptation has contributed to the resurgence of pertussis. To facilitate our understanding of this adaptation we report here 11 completely closed and annotated Bordetella pertussis genomes representing the pandemic ptxP3 lineage. Our analyses included six strains which do not produce the vaccine components pertactin and/or filamentous hemagglutinin. Copyright © 2015 Bart et al.


July 7, 2019

Bovine NK-lysin: Copy number variation and functional diversification.

NK-lysin is an antimicrobial peptide and effector protein in the host innate immune system. It is coded by a single gene in humans and most other mammalian species. In this study, we provide evidence for the existence of four NK-lysin genes in a repetitive region on cattle chromosome 11. The NK2A, NK2B, and NK2C genes are tandemly arrayed as three copies in ~30-35-kb segments, located 41.8 kb upstream of NK1. All four genes are functional, albeit with differential tissue expression. NK1, NK2A, and NK2B exhibited the highest expression in intestine Peyer’s patch, whereas NK2C was expressed almost exclusively in lung. The four peptide products were synthesized ex vivo, and their antimicrobial effects against both Gram-positive and Gram-negative bacteria were confirmed with a bacteria-killing assay. Transmission electron microcopy indicated that bovine NK-lysins exhibited their antimicrobial activities by lytic action in the cell membranes. In summary, the single NK-lysin gene in other mammals has expanded to a four-member gene family by tandem duplications in cattle; all four genes are transcribed, and the synthetic peptides corresponding to the core regions are biologically active and likely contribute to innate immunity in ruminants.


July 7, 2019

Single molecule sequencing of THCA synthase reveals copy number variation in modern drug-type Cannabis sativa L.

Cannabinoid expression is an important genetically determined feature of cannabis that presents clinical and legal implications for patients seeking cannabinoid specific therapies like Cannabidiol (CBD). Cannabinoid, terpenoid, and flavonoid marker assisted selection can accelerate breeding efforts by offering genetic tools to select for desired traits at an early stage in growth. To this end, multiple models for chemotype inheritance have been described suggesting a complex picture for chemical phenotype determination. Here we explore the potential role of copy number variation of THCA Synthase using phased single molecule sequencing and demonstrate that copy number and sequence variation of this gene is common and suggests a more nuanced view of chemotype prediction.


July 7, 2019

Lesions from patients with sporadic cerebral cavernous malformations harbor somatic mutations in the CCM genes: evidence for a common biochemical pathway for CCM pathogenesis.

Cerebral cavernous malformations (CCMs) are vascular lesions affecting the central nervous system. CCM occurs either sporadically or in an inherited, autosomal dominant manner. Constitutional (germline) mutations in any of three genes, KRIT1, CCM2 and PDCD10, can cause the inherited form. Analysis of CCM lesions from inherited cases revealed biallelic somatic mutations, indicating that CCM follows a Knudsonian two-hit mutation mechanism. It is still unknown, however, if the sporadic cases of CCM also follow this genetic mechanism. We extracted DNA from 11 surgically excised lesions from sporadic CCM patients, and sequenced the three CCM genes in each specimen using a next-generation sequencing approach. Four sporadic CCM lesion samples (36%) were found to contain novel somatic mutations. Three of the lesions contained a single somatic mutation, and one lesion contained two biallelic somatic mutations. Herein, we also describe evidence of somatic mosaicism in a patient presenting with over 130 CCM lesions localized to one hemisphere of the brain. Finally, in a lesion regrowth sample, we found that the regrown CCM lesion contained the same somatic mutation as the original lesion. Together, these data bolster the idea that all forms of CCM have a genetic underpinning of the two-hit mutation mechanism in the known CCM genes. Recent studies have found aberrant Rho kinase activation in inherited CCM pathogenesis, and we present evidence that this pathway is activated in sporadic CCM patients. These results suggest that all CCM patients, including those with the more common sporadic form, are potentially amenable to the same therapy. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.


July 7, 2019

Methylome diversification through changes in DNA methyltransferase sequence specificity.

Epigenetic modifications such as DNA methylation have large effects on gene expression and genome maintenance. Helicobacter pylori, a human gastric pathogen, has a large number of DNA methyltransferase genes, with different strains having unique repertoires. Previous genome comparisons suggested that these methyltransferases often change DNA sequence specificity through domain movement–the movement between and within genes of coding sequences of target recognition domains. Using single-molecule real-time sequencing technology, which detects N6-methyladenines and N4-methylcytosines with single-base resolution, we studied methylated DNA sites throughout the H. pylori genome for several closely related strains. Overall, the methylome was highly variable among closely related strains. Hypermethylated regions were found, for example, in rpoB gene for RNA polymerase. We identified DNA sequence motifs for methylation and then assigned each of them to a specific homology group of the target recognition domains in the specificity-determining genes for Type I and other restriction-modification systems. These results supported proposed mechanisms for sequence-specificity changes in DNA methyltransferases. Knocking out one of the Type I specificity genes led to transcriptome changes, which suggested its role in gene expression. These results are consistent with the concept of evolution driven by DNA methylation, in which changes in the methylome lead to changes in the transcriptome and potentially to changes in phenotype, providing targets for natural or artificial selection.


July 7, 2019

The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera.

Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report the genome (393?Mb) of the Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a widely recognized model species in metapopulation biology and eco-evolutionary research, which has the putative ancestral karyotype of n=31. Using a phylogenetic analyses of Nymphalidae and of other Lepidoptera, combined with orthologue-level comparisons of chromosomes, we conclude that the ancestral lepidopteran karyotype has been n=31 for at least 140?My. We show that fusion chromosomes have retained the ancestral chromosome segments and very few rearrangements have occurred across the fusion sites. The same, shortest ancestral chromosomes have independently participated in fusion events in species with smaller karyotypes. The short chromosomes have higher rearrangement rate than long ones. These characteristics highlight distinctive features of the evolutionary dynamics of butterflies and moths.


July 7, 2019

Draft genome sequence of marine actinomycete Streptomyces sp. strain NTK 937, producer of the benzoxazole antibiotic caboxamycin.

Streptomyces sp. strain NTK 937 is the producer of the benzoxazole antibiotic caboxamycin, which has been shown to exert inhibitory activity against Gram-positive bacteria, cytotoxic activity against several human tumor cell lines, and inhibition of the enzyme phosphodiesterase. In this genome announcement, we present a draft genome sequence of Streptomyces sp. NTK 937 in which we identified at least 35 putative secondary metabolite biosynthetic gene clusters. Copyright © 2014 Olano et al.


July 7, 2019

A fault-tolerant method for HLA typing with PacBio data.

Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the “phasing” issue.We proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes’ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads.The BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent.


July 7, 2019

Dubowitz syndrome is a complex comprised of multiple, genetically distinct and phenotypically overlapping disorders.

Dubowitz syndrome is a rare disorder characterized by multiple congenital anomalies, cognitive delay, growth failure, an immune defect, and an increased risk of blood dyscrasia and malignancy. There is considerable phenotypic variability, suggesting genetic heterogeneity. We clinically characterized and performed exome sequencing and high-density array SNP genotyping on three individuals with Dubowitz syndrome, including a pair of previously-described siblings (Patients 1 and 2, brother and sister) and an unpublished patient (Patient 3). Given the siblings’ history of bone marrow abnormalities, we also evaluated telomere length and performed radiosensitivity assays. In the siblings, exome sequencing identified compound heterozygosity for a known rare nonsense substitution in the nuclear ligase gene LIG4 (rs104894419, NM_002312.3:c.2440C>T) that predicts p.Arg814X (MAF:0.0002) and an NM_002312.3:c.613delT variant that predicts a p.Ser205Leufs*29 frameshift. The frameshift mutation has not been reported in 1000 Genomes, ESP, or ClinSeq. These LIG4 mutations were previously reported in the sibling sister; her brother had not been previously tested. Western blotting showed an absence of a ligase IV band in both siblings. In the third patient, array SNP genotyping revealed a de novo ~ 3.89 Mb interstitial deletion at chromosome 17q24.2 (chr 17:62,068,463-65,963,102, hg18), which spanned the known Carney complex gene PRKAR1A. In all three patients, a median lymphocyte telomere length of = 1st centile was observed and radiosensitivity assays showed increased sensitivity to ionizing radiation. Our work suggests that, in addition to dyskeratosis congenita, LIG4 and 17q24.2 syndromes also feature shortened telomeres; to confirm this, telomere length testing should be considered in both disorders. Taken together, our work and other reports on Dubowitz syndrome, as currently recognized, suggest that it is not a unitary entity but instead a collection of phenotypically similar disorders. As a clinical entity, Dubowitz syndrome will need continual re-evaluation and re-definition as its constituent phenotypes are determined.


July 7, 2019

Automated ensemble assembly and validation of microbial genomes.

The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible.To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers.Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.


July 7, 2019

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.


July 7, 2019

LUMPY: a probabilistic framework for structural variant discovery.

Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.


July 7, 2019

Dissecting a hidden gene duplication: the Arabidopsis thaliana SEC10 locus.

Repetitive sequences present a challenge for genome sequence assembly, and highly similar segmental duplications may disappear from assembled genome sequences. Having found a surprising lack of observable phenotypic deviations and non-Mendelian segregation in Arabidopsis thaliana mutants in SEC10, a gene encoding a core subunit of the exocyst tethering complex, we examined whether this could be explained by a hidden gene duplication. Re-sequencing and manual assembly of the Arabidopsis thaliana SEC10 (At5g12370) locus revealed that this locus, comprising a single gene in the reference genome assembly, indeed contains two paralogous genes in tandem, SEC10a and SEC10b, and that a sequence segment of 7 kb in length is missing from the reference genome sequence. Differences between the two paralogs are concentrated in non-coding regions, while the predicted protein sequences exhibit 99% identity, differing only by substitution of five amino acid residues and an indel of four residues. Both SEC10 genes are expressed, although varying transcript levels suggest differential regulation. Homozygous T-DNA insertion mutants in either paralog exhibit a wild-type phenotype, consistent with proposed extensive functional redundancy of the two genes. By these observations we demonstrate that recently duplicated genes may remain hidden even in well-characterized genomes, such as that of A. thaliana. Moreover, we show that the use of the existing A. thaliana reference genome sequence as a guide for sequence assembly of new Arabidopsis accessions or related species has at least in some cases led to error propagation.


July 7, 2019

Integrative analysis of Salmonellosis in Israel reveals association of Salmonella enterica serovar 9,12:l,v:- with extraintestinal infections, dissemination of endemic S. enterica serovar Typhimurium DT104 biotypes, and severe underreporting of outbreaks.

Salmonella enterica is the leading etiologic agent of bacterial food-borne outbreaks worldwide. This ubiquitous species contains more than 2,600 serovars that may differ in their host specificity, clinical manifestations, and epidemiology. To characterize salmonellosis epidemiology in Israel and to study the association of nontyphoidal Salmonella (NTS) serovars with invasive infections, 48,345 Salmonella cases reported and serotyped at the National Salmonella Reference Center between 1995 and 2012 were analyzed. A quasi-Poisson regression was used to identify irregular clusters of illness, and pulsed-field gel electrophoresis in conjunction with whole-genome sequencing was applied to molecularly characterize strains of interest. Three hundred twenty-nine human salmonellosis clusters were identified, representing an annual average of 23 (95% confidence interval [CI], 20 to 26) potential outbreaks. We show that the previously unsequenced S. enterica serovar 9,12:l,v:- belongs to the B clade of Salmonella enterica subspecies enterica, and we show its frequent association with extraintestinal infections, compared to other NTS serovars. Furthermore, we identified the dissemination of two prevalent Salmonella enterica serovar Typhimurium DT104 clones in Israel, which are genetically distinct from other global DT104 isolates. Accumulatively, these findings indicate a severe underreporting of Salmonella outbreaks in Israel and provide insights into the epidemiology and genomics of prevalent serovars, responsible for recurring illness. Copyright © 2014, American Society for Microbiology. All Rights Reserved.


July 7, 2019

Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi.

Background Anopheles stephensi is the key vector of malaria throughout the Indian subcontinent and Middle East and an emerging model for molecular and genetic studies of mosquito-parasite interactions. The type form of the species is responsible for the majority of urban malaria transmission across its range.ResultsHere, we report the genome sequence and annotation of the Indian strain of the type form of An. stephensi. The 221 Mb genome assembly represents more than 92% of the entire genome and was produced using a combination of 454, Illumina, and PacBio sequencing. Physical mapping assigned 62% of the genome onto chromosomes, enabling chromosome-based analysis. Comparisons between An. stephensi and An. gambiae reveal that the rate of gene order reshuffling on the X chromosome was three times higher than that on the autosomes. An. stephensi has more heterochromatin in pericentric regions but less repetitive DNA in chromosome arms than An. gambiae. We also identify a number of Y-chromosome contigs and BACs. Interspersed repeats constitute 7.1% of the assembled genome while LTR retrotransposons alone comprise more than 49% of the Y contigs. RNA-seq analyses provide new insights into mosquito innate immunity, development, and sexual dimorphism.ConclusionsThe genome analysis described in this manuscript provides a resource and platform for fundamental and translational research into a major urban malaria vector. Chromosome-based investigations provide unique perspectives on Anopheles chromosome evolution. RNA-seq analysis and studies of immunity genes offer new insights into mosquito biology and mosquito-parasite interactions.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.