Menu
July 7, 2019

HapCol: accurate and memory-efficient haplotype assembly from long reads.

Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of ‘future-generation’ sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions.By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HapCol, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HapCol with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HapCol is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HapCol requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HapCol can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption. Our source code is available under the terms of the GNU General Public License at http://hapcol.algolab.eu/.bonizzoni@disco.unimib.itSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Identification and resolution of microdiversity through metagenomic sequencing of parallel consortia.

To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled the de novo reconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 of the 20 detected member species. Two Halomonas spp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of the Halomonas populations, one of the Rhodobacteraceae populations, and the Rhizobiales population. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set. Copyright © 2015, American Society for Microbiology. All Rights Reserved.


July 7, 2019

Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.


July 7, 2019

Timing, rates and spectra of human germline mutation.

Germline mutations are a driving force behind genome evolution and genetic disease. We investigated genome-wide mutation rates and spectra in multi-sibling families. The mutation rate increased with paternal age in all families, but the number of additional mutations per year differed by more than twofold between families. Meta-analysis of 6,570 mutations showed that germline methylation influences mutation rates. In contrast to somatic mutations, we found remarkable consistency in germline mutation spectra between the sexes and at different paternal ages. In parental germ line, 3.8% of mutations were mosaic, resulting in 1.3% of mutations being shared by siblings. The number of these shared mutations varied significantly between families. Our data suggest that the mutation rate per cell division is higher during both early embryogenesis and differentiation of primordial germ cells but is reduced substantially during post-pubertal spermatogenesis. These findings have important consequences for the recurrence risks of disorders caused by de novo mutations.


July 7, 2019

OxyR-dependent formation of DNA methylation patterns in OpvABOFF and OpvABON cell lineages of Salmonella enterica.

Phase variation of the Salmonella enterica opvAB operon generates a bacterial lineage with standard lipopolysaccharide structure (OpvAB(OFF)) and a lineage with shorter O-antigen chains (OpvAB(ON)). Regulation of OpvAB lineage formation is transcriptional, and is controlled by the LysR-type factor OxyR and by DNA adenine methylation. The opvAB regulatory region contains four sites for OxyR binding (OBSA-D), and four methylatable GATC motifs (GATC1-4). OpvAB(OFF) and OpvAB(ON) cell lineages display opposite DNA methylation patterns in the opvAB regulatory region: (i) in the OpvAB(OFF) state, GATC1 and GATC3 are non-methylated, whereas GATC2 and GATC4 are methylated; (ii) in the OpvAB(ON) state, GATC2 and GATC4 are non-methylated, whereas GATC1 and GATC3 are methylated. We provide evidence that such DNA methylation patterns are generated by OxyR binding. The higher stability of the OpvAB(OFF) lineage may be caused by binding of OxyR to sites that are identical to the consensus (OBSA and OBSc), while the sites bound by OxyR in OpvAB(ON) cells (OBSB and OBSD) are not. In support of this view, amelioration of either OBSB or OBSD locks the system in the ON state. We also show that the GATC-binding protein SeqA and the nucleoid protein HU are ancillary factors in opvAB control.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 7, 2019

Long read and single molecule DNA sequencing simplifies genome assembly and TAL effector gene analysis of Xanthomonas translucens.

The species Xanthomonas translucens encompasses a complex of bacterial strains that cause diseases and yield loss on grass species including important cereal crops. Three pathovars, X. translucens pv. undulosa, X. translucens pv. translucens and X. translucens pv.cerealis, have been described as pathogens of wheat, barley, and oats. However, no complete genome sequence for a strain of this complex is currently available.A complete genome sequence of X. translucens pv. undulosa strain XT4699 was obtained by using PacBio long read, single molecule, real time (SMRT) DNA sequences and Illumina sequences. Draft genome sequences of nineteen additional X. translucens strains, which were collected from wheat or barley in different regions and at different times, were generated by Illumina sequencing. Phylogenetic relationships among different Xanthomonas strains indicates that X. translucens are members of a distinct clade from so-called group 2 xanthomonads and three pathovars of this species, undulosa, translucens and cerealis, represent distinct subclades in the group 1 clade. Knockout mutation of type III secretion system of XT4699 eliminated the ability to cause water-soaking symptoms on wheat and barley and resulted in a reduction in populations on wheat in comparison to the wild type strain. Sequence comparison of X. translucens strains revealed the genetic variation on type III effector repertories among different pathovars or within one pathovar. The full genome sequence of XT4699 reveals the presence of eight members of the Transcription-Activator Like (TAL) effector genes, which are phylogenetically distant from previous known TAL effector genes of group 2 xanthomonads. Microarray and qRT-PCR analyses revealed TAL effector-specific wheat gene expression modulation.PacBio long read sequencing facilitates the assembly of Xanthomonas genomes and the multiple TAL effector genes, which are difficult to assemble from short read platforms. The complete genome sequence of X. translucens pv. undulosa strain XT4699 and draft genome sequences of nineteen additional X. translucens strains provides a resource for further genetic analyses of pathogenic diversity and host range of the X. translucens species complex. TAL effectors of XT4699 strain play roles in modulating wheat host gene expressions.


July 7, 2019

Complete sequences of multidrug resistance plasmids bearing rmtD1 and rmtD2 16S ribosomal RNA methyltransferase genes.

Complete nucleotide sequences were determined for two plasmids bearing rmtD group 16S rRNA methyltransferase genes. pKp64/11 was 78 kb in size, belonged to the IncL/M group, and harbored blaTEM-1b, sul1, qacE?1, dfrA22, and rmtD1 across two multidrug resistance regions (MRRs). pKp368/10 was 170 kb in size, belonged to the IncA/C group, and harbored acrB, sul1, qacE?1, ant(3?)-Ia, aac(6′)-Ib, cat, rmtD2, and blaCTX-M-8 across three MRRs. The rmtD-containing regions shared a conserved motif, suggesting a common origin for the two rmtD alleles. Copyright © 2016, American Society for Microbiology. All Rights Reserved.


July 7, 2019

In vitro selection of miltefosine resistance in promastigotes of Leishmania donovani from Nepal: genomic and metabolomic characterization.

In this study, we followed the genomic, lipidomic and metabolomic changes associated with the selection of miltefosine (MIL) resistance in two clinically derived Leishmania donovani strains with different inherent resistance to antimonial drugs (antimony sensitive strain Sb-S; and antimony resistant Sb-R). MIL-R was easily induced in both strains using the promastigote-stage, but a significant increase in MIL-R in the intracellular amastigote compared to the corresponding wild-type did not occur until promastigotes had adapted to 12.2 µM MIL. A variety of common and strain-specific genetic changes were discovered in MIL-adapted parasites, including deletions at the LdMT transporter gene, single-base mutations and changes in somy. The most obvious lipid changes in MIL-R promastigotes occurred to phosphatidylcholines and lysophosphatidylcholines and results indicate that the Kennedy pathway is involved in MIL resistance. The inherent Sb resistance of the parasite had an impact on the changes that occurred in MIL-R parasites, with more genetic changes occurring in Sb-R compared with Sb-S parasites. Initial interpretation of the changes identified in this study does not support synergies with Sb-R in the mechanisms of MIL resistance, though this requires an enhanced understanding of the parasite’s biochemical pathways and how they are genetically regulated to be verified fully. © 2015 The Authors. Molecular Microbiology published by John Wiley & Sons Ltd.


July 7, 2019

The Vigna Genome Server, ‘VigGS’: A genomic knowledge base of the genus Vigna based on high-quality, annotated genome sequence of the azuki bean, Vigna angularis (Willd.) Ohwi & Ohashi.

The genus Vigna includes legume crops such as cowpea, mungbean and azuki bean, as well as >100 wild species. A number of the wild species are highly tolerant to severe environmental conditions including high-salinity, acid or alkaline soil; drought; flooding; and pests and diseases. These features of the genus Vigna make it a good target for investigation of genetic diversity in adaptation to stressful environments; however, a lack of genomic information has hindered such research in this genus. Here, we present a genome database of the genus Vigna, Vigna Genome Server (‘VigGS’, http://viggs.dna.affrc.go.jp), based on the recently sequenced azuki bean genome, which incorporates annotated exon-intron structures, along with evidence for transcripts and proteins, visualized in GBrowse. VigGS also facilitates user construction of multiple alignments between azuki bean genes and those of six related dicot species. In addition, the database displays sequence polymorphisms between azuki bean and its wild relatives and enables users to design primer sequences targeting any variant site. VigGS offers a simple keyword search in addition to sequence similarity searches using BLAST and BLAT. To incorporate up to date genomic information, VigGS automatically receives newly deposited mRNA sequences of pre-set species from the public database once a week. Users can refer to not only gene structures mapped on the azuki bean genome on GBrowse but also relevant literature of the genes. VigGS will contribute to genomic research into plant biotic and abiotic stresses and to the future development of new stress-tolerant crops.© The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.


July 7, 2019

Refinement of the canine CD1 locus topology and investigation of antibody binding to recombinant canine CD1 isoforms.

CD1 molecules are antigen-presenting glycoproteins primarily found on dendritic cells (DCs) responsible for lipid antigen presentation to CD1-restricted T cells. Despite their pivotal role in immunity, little is known about CD1 protein expression in dogs, notably due to lack of isoform-specific antibodies. The canine (Canis familiaris) CD1 locus was previously found to contain three functional CD1A genes: canCD1A2, canCD1A6, and canCD1A8, where two variants of canCD1A8, canCD1A8.1 and canCD1A8.2, were assumed to be allelic variants. However, we hypothesized that these rather represented two separate genes. Sequencing of three overlapping bacterial artificial chromosomes (BACs) spanning the entire canine CD1 locus revealed canCD1A8.2 and canCD1A8.1 to be located in tandem between canCD1A7 and canCD1C, and canCD1A8.1 was consequently renamed canCD1A9. Green fluorescent protein (GFP)-fused canine CD1 transcripts were recombinantly expressed in 293T cells. All proteins showed a highly positive GFP expression except for canine CD1d and a splice variant of canine CD1a8 lacking exon 3. Probing with a panel of anti-CD1 monoclonal antibodies (mAbs) showed that Ca13.9H11 and Ca9.AG5 only recognized canine CD1a8 and CD1a9 isoforms, and Fe1.5F4 mAb solely recognized canine CD1a6. Anti-CD1b mAbs recognized the canine CD1b protein, but also bound CD1a2, CD1a8, and CD1a9. Interestingly, Ca9.AG5 showed allele specificity based on a single nucleotide polymorphism (SNP) located at position 321. Our findings have refined the structure of the canine CD1 locus and available antibody specificity against canine CD1 proteins. These are important fundamentals for future investigation of the role of canine CD1 in lipid immunity.


July 7, 2019

Clonal Complex 17 group B Streptococcus strains causing invasive disease in neonates and adults originate from the same genetic pool.

A significant proportion of group B Streptococcus (GBS) neonatal disease, particularly late-onset disease, is associated with strains of serotype III, clonal complex (CC) 17. CC17 strains also cause invasive infections in adults. Little is known about the phylogenetic relationships of isolates recovered from neonatal and adult CC17 invasive infections. We performed whole-genome-based phylogenetic analysis of 93 temporally and geographically matched CC17 strains isolated from both neonatal and adult invasive infections in the metropolitan region of Toronto/Peel, Canada. We also mined the whole-genome data to reveal mobile genetic elements carrying antimicrobial resistance genes. We discovered that CC17 GBS strains causing neonatal and adult invasive disease are interspersed and cluster tightly in a phylogenetic tree, signifying that they are derived from the same genetic pool. We identified limited variation due to recombination in the core CC17 genome. We describe that loss of Pilus Island 1 and acquisition of different mobile genetic elements carrying determinants of antimicrobial resistance contribute to CC17 genetic diversity. Acquisition of some of these mobile genetic elements appears to correlate with clonal expansion of the strains that possess them. Our results provide a genome-wide portrait of the population structure and evolution of a major disease-causing clone of an opportunistic pathogen.


July 7, 2019

Read-based phasing of related individuals.

Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potential to deliver results better than each individually.We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual.https://bitbucket.org/whatshap/whatshapt.marschall@mpi-inf.mpg.de.© The Author 2016. Published by Oxford University Press.


July 7, 2019

Multiple and diverse vsp and vlp sequences in Borrelia miyamotoi, a hard tick-borne zoonotic pathogen.

Based on chromosome sequences, the human pathogen Borrelia miyamotoi phylogenetically clusters with species that cause relapsing fever. But atypically for relapsing fever agents, B. miyamotoi is transmitted not by soft ticks but by hard ticks, which also are vectors of Lyme disease Borrelia species. To further assess the relationships of B. miyamotoi to species that cause relapsing fever, I investigated extrachromosomal sequences of a North American strain with specific attention on plasmid-borne vsp and vlp genes, which are the underpinnings of antigenic variation during relapsing fever. For a hybrid approach to achieve assemblies that spanned more than one of the paralogous vsp and vlp genes, a database of short-reads from next-generation sequencing was supplemented with long-reads obtained with real-time DNA sequencing from single polymerase molecules. This yielded three contigs of 31, 16, and 11 kb, which each contained multiple and diverse sequences that were homologous to vsp and vlp genes of the relapsing fever agent B. hermsii. Two plasmid fragments had coding sequences for plasmid partition proteins that differed from each other from paralogous proteins for the megaplasmid and a small plasmid of B. miyamotoi. One of 4 vsp genes, vsp1, was present at two loci, one of which was downstream of a candiate prokaryotic promoter. A limited RNA-seq analysis of a population growing in the blood of mice indicated that of the 4 different vsp genes vsp1 was the one that was expressed. The findings indicate that B. miyamotoi has at least four types of plasmids, two or more of which bear vsp and vlp gene sequences that are as numerous and diverse as those of relapsing fever Borrelia. The database and insights from these findings provide a foundation for further investigations of the immune responses to this pathogen and of the capability of B. miyamotoi for antigenic variation.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.