Phasing Archives - Page 27 of 30

July 7, 2019

Genomic variation and evolution of Vibrio parahaemolyticus ST36 over the course of a transcontinental epidemic expansion.

Vibrio parahaemolyticus is the leading cause of seafood-related infections with illnesses undergoing a geographic expansion. In this process of expansion, the most fundamental change has been the transition from infections caused by local strains to the surge of pandemic clonal types. Pandemic clone sequence type 3 (ST3) was the only example of transcontinental spreading until 2012, when ST36 was detected outside the region where it is endemic in the U.S. Pacific Northwest causing infections along the U.S. northeast coast and Spain. Here, we used genome-wide analyses to reconstruct the evolutionary history of the V. parahaemolyticus ST36 clone over the course of its geographic expansion during the previous 25 years. The origin of this lineage was estimated to be in ~1985. By 1995, a new variant emerged in the region and quickly replaced the old clone, which has not been detected since 2000. The new Pacific Northwest (PNW) lineage was responsible for the first cases associated with this clone outside the Pacific Northwest region. After several introductions into the northeast coast, the new PNW clone differentiated into a highly dynamic group that continues to cause illness on the northeast coast of the United States. Surprisingly, the strains detected in Europe in 2012 diverged from this ancestral group around 2000 and have conserved genetic features present only in the old PNW lineage. Recombination was identified as the major driver of diversification, with some preliminary observations suggesting a trend toward a more specialized lifestyle, which may represent a critical element in the expansion of epidemics under scenarios of coastal warming.IMPORTANCEVibrio parahaemolyticus and Vibrio cholerae represent the only two instances of pandemic expansions of human pathogens originating in the marine environment. However, while the current pandemic of V. cholerae emerged more than 50 years ago, the global expansion of V. parahaemolyticus is a recent phenomenon. These modern expansions provide an exceptional opportunity to study the evolutionary process of these pathogens at first hand and gain an understanding of the mechanisms shaping the epidemic dynamics of these diseases, in particular, the emergence, dispersal, and successful introduction in new regions facilitating global spreading of infections. In this study, we used genomic analysis to examine the evolutionary divergence that has occurred over the course of the most recent transcontinental expansion of a pathogenic Vibrio, the spreading of the V. parahaemolyticus sequence type 36 clone from the region where it is endemic on the Pacific coast of North America to the east coast of the United States and finally to the west coast of Europe.

July 7, 2019

Detection of complex structural variation from paired-end sequencing data

Detecting structural variants (SVs) from sequencing data is a key problem in genome analysis, but the full diversity of SVs is not captured by most methods. We introduce the Automated Reconstruction of Complex Structural Variants (ARC-SV) method, which detects a broad class of structural variants from paired-end whole genome sequencing (WGS) data. Analysis of samples from NA12878 and HuRef suggests that complex SVs are often misclassified by traditional methods. We validated our results both experimentally and by comparison to whole genome assembly and PacBio data; ARC-SV compares favorably to existing algorithms in general and gives state-of-the-art results on complex SV detection. By expanding the range of detectable SVs compared to commonly-used algorithms, ARC-SV allows additional information to be extracted from existing WGS data.

July 7, 2019

Methylomic and phenotypic analysis of the ModH5 phasevarion of Helicobacter pylori.

The Helicobacter pylori phase variable gene modH, typified by gene HP1522 in strain 26695, encodes a N6-adenosine type III DNA methyltransferase. Our previous studies identified multiple strain-specific modH variants (modH1 – modH19) and showed that phase variation of modH5 in H. pylori P12 influenced expression of motility-associated genes and outer membrane protein gene hopG. However, the ModH5 DNA recognition motif and the mechanism by which ModH5 controls gene expression were unknown. Here, using comparative single molecule real-time sequencing, we identify the DNA site methylated by ModH5 as 5′-Gm6ACC-3′. This motif is vastly underrepresented in H. pylori genomes, but overrepresented in a number of virulence genes, including motility-associated genes, and outer membrane protein genes. Motility and the number of flagella of H. pylori P12 wild-type were significantly higher than that of isogenic modH5 OFF or ?modH5 mutants, indicating that phase variable switching of modH5 expression plays a role in regulating H. pylori motility phenotypes. Using the flagellin A (flaA) gene as a model, we show that ModH5 modulates flaA promoter activity in a GACC methylation-dependent manner. These findings provide novel insights into the role of ModH5 in gene regulation and how it mediates epigenetic regulation of H. pylori motility.

July 7, 2019

Ultraaccurate genome sequencing and haplotyping of single human cells.

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.

July 7, 2019

Genome sequence-based marker development and genotyping in potato

Potato (Solanum tuberosum L.) is one of the world’s most economically important food crops and holds major significance for future food security. Despite its importance, the study of potato genetics and breeding has lagged behind mainly due to its polyploid genome and high levels of heterozygosity. Conventional marker and genotyping approaches have been helpful in progressing potato genetic research but have also had limitations in exploiting the outcome from these studies for gene discovery and applied research applications. The sequencing of the potato genome, followed by advancements in marker and genotyping technologies, has brought a step change in the way potato genetic studies are conducted. Potato is now amenable to modern sequence-based marker and genotyping methods with their increased ability to put thousands of markers on any population of interest without a priori knowledge. This has increased the precision and resolution of genetic studies previously not feasible in potato. A diverse range of fixed and flexible genotyping platforms, for a wide variety of research and breeding applications, are now available. Concerted research efforts are now needed to screen the available genetic diversity for this important crop to identify novel and beneficial trait alleles in order to enable efficient and precise introgression breeding permitting breeding of climate smart, and resilient, potato cultivars. This chapter provides an overview of sequence-based marker development and genotyping methods along with their implications for potato research and breeding in the post-genomics era.

July 7, 2019

Institutional profile: translational pharmacogenomics at the Icahn School of Medicine at Mount Sinai.

For almost 50 years, the Icahn School of Medicine at Mount Sinai has continually invested in genetics and genomics, facilitating a healthy ecosystem that provides widespread support for the ongoing programs in translational pharmacogenomics. These programs can be broadly cataloged into discovery, education, clinical implementation and testing, which are collaboratively accomplished by multiple departments, institutes, laboratories, companies and colleagues. Focus areas have included drug response association studies and allele discovery, multiethnic pharmacogenomics, personalized genotyping and survey-based education programs, pre-emptive clinical testing implementation and novel assay development. This overview summarizes the current state of translational pharmacogenomics at Mount Sinai, including a future outlook on the forthcoming expansions in overall support, research and clinical programs, genomic technology infrastructure and the participating faculty.

July 7, 2019

Estimating fitness of viral quasispecies from next-generation sequencing data.

The quasispecies model is ubiquitous in the study of viruses. While having lead to a number of insights that have stood the test of time, the quasispecies model has mostly been discussed in a theoretical fashion with little support of data. With next-generation sequencing (NGS), this situation is changing and a wealth of data can now be produced in a time- and cost-efficient manner. NGS can, after removal of technical errors, yield an exceedingly detailed picture of the viral population structure. The widespread availability of cross-sectional data can be used to study fitness landscapes of viral populations in the quasispecies model. This chapter highlights methods that estimate the strength of selection in selective sweeps, assesses marginal fitness effects of quasispecies, and finally infers the fitness landscape of a viral quasispecies, all on the basis of NGS data.

July 7, 2019

HapCol: accurate and memory-efficient haplotype assembly from long reads.

Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of ‘future-generation’ sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions.By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HapCol, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HapCol with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HapCol is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HapCol requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HapCol can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption. Our source code is available under the terms of the GNU General Public License at http://hapcol.algolab.eu/.bonizzoni@disco.unimib.itSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.

July 7, 2019

Timing, rates and spectra of human germline mutation.

Germline mutations are a driving force behind genome evolution and genetic disease. We investigated genome-wide mutation rates and spectra in multi-sibling families. The mutation rate increased with paternal age in all families, but the number of additional mutations per year differed by more than twofold between families. Meta-analysis of 6,570 mutations showed that germline methylation influences mutation rates. In contrast to somatic mutations, we found remarkable consistency in germline mutation spectra between the sexes and at different paternal ages. In parental germ line, 3.8% of mutations were mosaic, resulting in 1.3% of mutations being shared by siblings. The number of these shared mutations varied significantly between families. Our data suggest that the mutation rate per cell division is higher during both early embryogenesis and differentiation of primordial germ cells but is reduced substantially during post-pubertal spermatogenesis. These findings have important consequences for the recurrence risks of disorders caused by de novo mutations.

July 7, 2019

OxyR-dependent formation of DNA methylation patterns in OpvABOFF and OpvABON cell lineages of Salmonella enterica.

Phase variation of the Salmonella enterica opvAB operon generates a bacterial lineage with standard lipopolysaccharide structure (OpvAB(OFF)) and a lineage with shorter O-antigen chains (OpvAB(ON)). Regulation of OpvAB lineage formation is transcriptional, and is controlled by the LysR-type factor OxyR and by DNA adenine methylation. The opvAB regulatory region contains four sites for OxyR binding (OBSA-D), and four methylatable GATC motifs (GATC1-4). OpvAB(OFF) and OpvAB(ON) cell lineages display opposite DNA methylation patterns in the opvAB regulatory region: (i) in the OpvAB(OFF) state, GATC1 and GATC3 are non-methylated, whereas GATC2 and GATC4 are methylated; (ii) in the OpvAB(ON) state, GATC2 and GATC4 are non-methylated, whereas GATC1 and GATC3 are methylated. We provide evidence that such DNA methylation patterns are generated by OxyR binding. The higher stability of the OpvAB(OFF) lineage may be caused by binding of OxyR to sites that are identical to the consensus (OBSA and OBSc), while the sites bound by OxyR in OpvAB(ON) cells (OBSB and OBSD) are not. In support of this view, amelioration of either OBSB or OBSD locks the system in the ON state. We also show that the GATC-binding protein SeqA and the nucleoid protein HU are ancillary factors in opvAB control.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 7, 2019

Genetics: profiling DNA methylation and beyond.

Both tried-and-true and new assays are helping labs to assess methylation at particular loci and from single cells.

July 7, 2019

Read-based phasing of related individuals.

Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potential to deliver results better than each individually.We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual.https://bitbucket.org/whatshap/whatshapt.marschall@mpi-inf.mpg.de.© The Author 2016. Published by Oxford University Press.

July 7, 2019

Single-locus enrichment without amplification for sequencing and direct detection of epigenetic modifications.

A gene-level targeted enrichment method for direct detection of epigenetic modifications is described. The approach is demonstrated on the CGG-repeat region of the FMR1 gene, for which large repeat expansions, hitherto refractory to sequencing, are known to cause fragile X syndrome. In addition to achieving a single-locus enrichment of nearly 700,000-fold, the elimination of all amplification steps removes PCR-induced bias in the repeat count and preserves the native epigenetic modifications of the DNA. In conjunction with the single-molecule real-time sequencing approach, this enrichment method enables direct readout of the methylation status and the CGG repeat number of the FMR1 allele(s) for a clonally derived cell line. The current method avoids potential biases introduced through chemical modification and/or amplification methods for indirect detection of CpG methylation events.

July 7, 2019

Genome sequence and analysis of Escherichia coli MRE600, a colicinogenic, nonmotile strain that lacks RNase I and the type I methyltransferase, EcoKI.

Escherichia coli strain MRE600 was originally identified for its low RNase I activity and has therefore been widely adopted by the biomedical research community as a preferred source for the expression and purification of transfer RNAs and ribosomes. Despite its widespread use, surprisingly little information about its genome or genetic content exists. Here, we present the first de novo assembly and description of the MRE600 genome and epigenome. To provide context to these studies of MRE600, we include comparative analyses with E. coli K-12 MG1655 (K12). Pacific Biosciences Single Molecule, Real-Time sequencing reads were assembled into one large chromosome (4.83 Mb) and three smaller plasmids (89.1, 56.9, and 7.1 kb). Interestingly, the 7.1-kb plasmid possesses genes encoding a colicin E1 protein and its associated immunity protein. The MRE600 genome has a G + C content of 50.8% and contains a total of 5,181 genes, including 4,913 protein-encoding genes and 268 RNA genes. We identified 41,469 modified DNA bases (0.83% of total) and found that MRE600 lacks the gene for type I methyltransferase, EcoKI. Phylogenetic, taxonomic, and genetic analyses demonstrate that MRE600 is a divergent E. coli strain that displays features of the closely related genus, Shigella. Nevertheless, comparative analyses between MRE600 and E. coli K12 show that these two strains exhibit nearly identical ribosomal proteins, ribosomal RNAs, and highly homologous tRNA species. Substantiating prior suggestions that MRE600 lacks RNase I activity, the RNase I-encoding gene, rna, contains a single premature stop codon early in its open-reading frame. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Auto Tag: Phasing

Genomic variation and evolution of Vibrio parahaemolyticus ST36 over the course of a transcontinental epidemic expansion.

Detection of complex structural variation from paired-end sequencing data

Methylomic and phenotypic analysis of the ModH5 phasevarion of Helicobacter pylori.

Ultraaccurate genome sequencing and haplotyping of single human cells.

Genome sequence-based marker development and genotyping in potato

Institutional profile: translational pharmacogenomics at the Icahn School of Medicine at Mount Sinai.

Estimating fitness of viral quasispecies from next-generation sequencing data.

HapCol: accurate and memory-efficient haplotype assembly from long reads.

Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Timing, rates and spectra of human germline mutation.

OxyR-dependent formation of DNA methylation patterns in OpvABOFF and OpvABON cell lineages of Salmonella enterica.

Genetics: profiling DNA methylation and beyond.

Read-based phasing of related individuals.

Single-locus enrichment without amplification for sequencing and direct detection of epigenetic modifications.

Genome sequence and analysis of Escherichia coli MRE600, a colicinogenic, nonmotile strain that lacks RNase I and the type I methyltransferase, EcoKI.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert