Menu
July 19, 2019

De novo assembly of haplotype-resolved genomes with trio binning.

Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly. In contrast with prior approaches, the effectiveness of our method improved with increasing heterozygosity. Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction. We used trio binning to recover both haplotypes of a diploid human genome and identified complex structural variants missed by alternative approaches. We sequenced an F1 cross between the cattle subspecies Bos taurus taurus and Bos taurus indicus and completely assembled both parental haplotypes with NG50 haplotig sizes of >20 Mb and 99.998% accuracy, surpassing the quality of current cattle reference genomes. We suggest that trio binning improves diploid genome assembly and will facilitate new studies of haplotype variation and inheritance.


July 19, 2019

Global genetic diversity of var2csa in Plasmodium falciparum with implications for malaria in pregnancy and vaccine development.

Malaria infection during pregnancy, caused by the sequestering of Plasmodium falciparum parasites in the placenta, leads to high infant mortality and maternal morbidity. The parasite-placenta adherence mechanism is mediated by the VAR2CSA protein, a target for natural occurring immunity. Currently, vaccine development is based on its ID1-DBL2Xb domain however little is known about the global genetic diversity of the encoding var2csa gene, which could influence vaccine efficacy. In a comprehensive analysis of the var2csa gene in >2,000?P. falciparum field isolates across 23 countries, we found that var2csa is duplicated in high prevalence (>25%), African and Oceanian populations harbour a much higher diversity than other regions, and that insertions/deletions are abundant leading to an underestimation of the diversity of the locus. Further, ID1-DBL2Xb haplotypes associated with adverse birth outcomes are present globally, and African-specific haplotypes exist, which should be incorporated into vaccine design.


July 19, 2019

Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L.

Modern sugarcanes are polyploid interspecific hybrids, combining high sugar content from Saccharum officinarum with hardiness, disease resistance and ratooning of Saccharum spontaneum. Sequencing of a haploid S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined. The reduction of basic chromosome number from 10 to 8 in S. spontaneum was caused by fissions of 2 ancestral chromosomes followed by translocations to 4 chromosomes. Surprisingly, 80% of nucleotide binding site-encoding genes associated with disease resistance are located in 4 rearranged chromosomes and 51% of those in rearranged regions. Resequencing of 64 S. spontaneum genomes identified balancing selection in rearranged regions, maintaining their diversity. Introgressed S. spontaneum chromosomes in modern sugarcanes are randomly distributed in AP85-441 genome, indicating random recombination among homologs in different S. spontaneum accessions. The allele-defined Saccharum genome offers new knowledge and resources to accelerate sugarcane improvement.


July 19, 2019

Prediction of smoking by multiplex bisulfite PCR with long amplicons considering allele-specific effects on DNA methylation.

Methylation of DNA is associated with a variety of biological processes. With whole-genome studies of DNA methylation, it became possible to determine a set of genomic sites where DNA methylation is associated with a specific phenotype. A method is needed that allows detailed follow-up studies of the sites, including taking into account genetic information. Bisulfite PCR is a natural choice for this kind of task, but multiplexing is one of the most important problems impeding its implementation. To address this task, we took advantage of a recently published method based on Pacbio sequencing of long bisulfite PCR products (single-molecule real-time bisulfite sequencing, SMRT-BS) and tested the validity of the improved methodology with a smoking phenotype.Herein, we describe the “panhandle” modification of the method, which permits a more robust PCR with multiple targets. We applied this technique to determine smoking by DNA methylation in 71 healthy people and 83 schizophrenia patients (n?=?50 smokers and n?=?104 non-smokers, Russians of the Moscow region). We used five targets known to be influenced by smoking (regions of genes AHRR, ALPPL2, IER3, GNG12, and GFI1). We discovered significant allele-specific methylation effects in the AHRR and IER3 regions and assessed how this information could be exploited to improve the prediction of smoking based on the collected DNA methylation data. We found no significant difference in the methylation profiles of selected targets in relation to schizophrenia suggesting that smoking affects methylation at the studied genomic sites in healthy people and schizophrenia patients in a similar way.We determined that SMRT-BS with “panhandle” modification performs well in the described setting. Additional information regarding methylation and allele-specific effects could improve the predictive accuracy of DNA methylation-based models, which could be valuable for both basic research and clinical applications.


July 19, 2019

Mapping the landscape of tandem repeat variability by targeted long read single molecule sequencing in familial X-linked intellectual disability.

The etiology of more than half of all patients with X-linked intellectual disability remains elusive, despite array-based comparative genomic hybridization, whole exome or genome sequencing. Since short read massive parallel sequencing approaches do not allow the detection of larger tandem repeat expansions, we hypothesized that such expansions could be a hidden cause of X-linked intellectual disability.We selectively captured over 1800 tandem repeats on the X chromosome and characterized them by long read single molecule sequencing in 3 families with idiopathic X-linked intellectual disability. In male DNA samples, full tandem repeat length sequences were obtained for 88-93% of the targets and up to 99.6% of the repeats with a moderate guanine-cytosine content. Read length and analysis pipeline allow to detect cases of >?900?bp tandem repeat expansion. In one family, one repeat expansion co-occurs with down-regulation of the neighboring MIR222 gene. This gene has previously been implicated in intellectual disability and is apparently linked to FMR1 and NEFH overexpression associated with neurological disorders.This study demonstrates the power of single molecule sequencing to measure tandem repeat lengths and detect expansions, and suggests that tandem repeat mutations may be a hidden cause of X-linked intellectual disability.


July 19, 2019

A forward genetic screen reveals a primary role for Plasmodium falciparum Reticulocyte Binding Protein Homologue 2a and 2b in determining alternative erythrocyte invasion pathways.

Invasion of human erythrocytes is essential for Plasmodium falciparum parasite survival and pathogenesis, and is also a complex phenotype. While some later steps in invasion appear to be invariant and essential, the earlier steps of recognition are controlled by a series of redundant, and only partially understood, receptor-ligand interactions. Reverse genetic analysis of laboratory adapted strains has identified multiple genes that when deleted can alter invasion, but how the relative contributions of each gene translate to the phenotypes of clinical isolates is far from clear. We used a forward genetic approach to identify genes responsible for variable erythrocyte invasion by phenotyping the parents and progeny of previously generated experimental genetic crosses. Linkage analysis using whole genome sequencing data revealed a single major locus was responsible for the majority of phenotypic variation in two invasion pathways. This locus contained the PfRh2a and PfRh2b genes, members of one of the major invasion ligand gene families, but not widely thought to play such a prominent role in specifying invasion phenotypes. Variation in invasion pathways was linked to significant differences in PfRh2a and PfRh2b expression between parasite lines, and their role in specifying alternative invasion was confirmed by CRISPR-Cas9-mediated genome editing. Expansion of the analysis to a large set of clinical P. falciparum isolates revealed common deletions, suggesting that variation at this locus is a major cause of invasion phenotypic variation in the endemic setting. This work has implications for blood-stage vaccine development and will help inform the design and location of future large-scale studies of invasion in clinical isolates.


July 19, 2019

The Dominant and Poorly Penetrant Phenotypes of Maize Unstable factor for orange1 Are Caused by DNA Methylation Changes at a Linked Transposon.

The maize (Zea mays) mutant Unstable factor for orange1 (Ufo1) has been implicated in the epigenetic modifications of pericarp color1 (p1), which regulates the production of the flavonoid pigments phlobaphenes. Here, we show that the ufo1 gene maps to a genetically recalcitrant region near the centromere of chromosome 10. Transcriptome analysis of Ufo1-1 mutant and wild-type plants identified a candidate gene in the mapping region using a comparative sequence-based approach. The candidate gene, GRMZM2G053177, is overexpressed by >45-fold in multiple tissues of Ufo1-1, explaining the dominance of Ufo1-1 and its phenotypes. In the mutant stock, GRMZM2G053177 has a unique transcript originating within a CACTA transposon inserted in its first intron, and it is missing the first four codons of the wild-type transcript. GRMZM2G053177 expression is regulated by the DNA methylation status of the CACTA transposon, explaining the incomplete penetrance and poor expressivity of Ufo1-1 Transgenic overexpression lines of GRMZM2G053177 (Ufo1-1) phenocopy the p1-induced pigmentation in coleoptiles, tassels, leaf sheaths, husks, pericarps, and cob glumes. Transcriptome analysis of Ufo1 versus wild-type tissues revealed changes in several pathways related to abiotic and biotic stress. Thus, this study addresses the enigma of Ufo1 identity in maize, which had gone unsolved for more than 50 years.© 2018 American Society of Plant Biologists. All rights reserved.


July 7, 2019

Prevalence of subtilase cytotoxin-encoding subAB variants among Shiga toxin-producing Escherichia coli strains isolated from wild ruminants and sheep differs from that of cattle and pigs and is predominated by the new allelic variant subAB2-2.

Subtilase cytotoxin (SubAB) is an AB5 toxin produced by Shiga toxin (Stx)-producing Escherichia coli (STEC) strains usually lacking the eae gene product intimin. Three allelic variants of SubAB encoding genes have been described: subAB1, located on a plasmid, subAB2-1, located on the pathogenicity island SE-PAI and subAB2-2 located in an outer membrane efflux protein (OEP) region. SubAB is becoming increasingly recognized as a toxin potentially involved in human pathogenesis. Ruminants and cattle have been identified as reservoirs of subAB-positive STEC. The presence of the three subAB allelic variants was investigated by PCR for 152 STEC strains originating from chamois, ibex, red deer, roe deer, cattle, sheep and pigs. Overall, subAB genes were detected in 45.5% of the strains. Prevalence was highest for STEC originating from ibex (100%), chamois (92%) and sheep (65%). None of the STEC of bovine or of porcine origin tested positive for subAB. None of the strains tested positive for subAB1. The allelic variant subAB2-2 was detected the most commonly, with 51.4% possessing subAb2-1 together with subAB2-2. STEC of ovine origin, serotypes O91:H- and O128:H2, the saa gene, which encodes for the autoagglutinating adhesin and stx2b were significantly associated with subAB-positive STEC. Our results suggest that subAB2-1 and subAB2-2 is widespread among STEC from wild ruminants and sheep and may be important as virulence markers in STEC pathogenic to humans. Copyright © 2014 Elsevier GmbH. All rights reserved.


July 7, 2019

Broad CTL response is required to clear latent HIV-1 due to dominance of escape mutations.

Despite antiretroviral therapy (ART), human immunodeficiency virus (HIV)-1 persists in a stable latent reservoir, primarily in resting memory CD4(+) T cells. This reservoir presents a major barrier to the cure of HIV-1 infection. To purge the reservoir, pharmacological reactivation of latent HIV-1 has been proposed and tested both in vitro and in vivo. A key remaining question is whether virus-specific immune mechanisms, including cytotoxic T lymphocytes (CTLs), can clear infected cells in ART-treated patients after latency is reversed. Here we show that there is a striking all or none pattern for CTL escape mutations in HIV-1 Gag epitopes. Unless ART is started early, the vast majority (>98%) of latent viruses carry CTL escape mutations that render infected cells insensitive to CTLs directed at common epitopes. To solve this problem, we identified CTLs that could recognize epitopes from latent HIV-1 that were unmutated in every chronically infected patient tested. Upon stimulation, these CTLs eliminated target cells infected with autologous virus derived from the latent reservoir, both in vitro and in patient-derived humanized mice. The predominance of CTL-resistant viruses in the latent reservoir poses a major challenge to viral eradication. Our results demonstrate that chronically infected patients retain a broad-spectrum viral-specific CTL response and that appropriate boosting of this response may be required for the elimination of the latent reservoir.


July 7, 2019

The draft genome of Primula veris yields insights into the molecular basis of heterostyly.

The flowering plant Primula veris is a common spring blooming perennial that is widely cultivated throughout Europe. This species is an established model system in the study of the genetics, evolution, and ecology of heterostylous floral polymorphisms. Despite the long history of research focused on this and related species, the continued development of this system has been restricted due the absence of genomic and transcriptomic resources.We present here a de novo draft genome assembly of P. veris covering 301.8 Mb, or approximately 63% of the estimated 479.22 Mb genome, with an N50 contig size of 9.5 Kb, an N50 scaffold size of 164 Kb, and containing an estimated 19,507 genes. The results of a RADseq bulk segregant analysis allow for the confident identification of four genome scaffolds that are linked to the P. veris S-locus. RNAseq data from both P. veris and the closely related species P. vulgaris allow for the characterization of 113 candidate heterostyly genes that show significant floral morph-specific differential expression. One candidate gene of particular interest is a duplicated GLOBOSA homolog that may be unique to Primula (PveGLO2), and is completely silenced in L-morph flowers.The P. veris genome represents the first genome assembled from a heterostylous species, and thus provides an immensely important resource for future studies focused on the evolution and genetic dissection of heterostyly. As the first genome assembled from the Primulaceae, the P. veris genome will also facilitate the expanded application of phylogenomic methods in this diverse family and the eudicots as a whole.


July 7, 2019

Exploring possible DNA structures in real-time polymerase kinetics using Pacific Biosciences sequencer data.

BackgroundPausing of DNA polymerase can indicate the presence of a DNA structure that differs from the canonical double-helix. Here we detail a method to investigate how polymerase pausing in the Pacific Biosciences sequencer reads can be related to DNA sequences. The Pacific Biosciences sequencer uses optics to view a polymerase and its interaction with a single DNA molecule in real-time, offering a unique way to detect potential alternative DNA structures.ResultsWe have developed a new way to examine polymerase kinetics data and relate it to the DNA sequence by using a wavelet transform of read information from the sequencer. We use this method to examine how polymerase kinetics are related to nucleotide base composition. We then examine tandem repeat sequences known for their ability to form different DNA structures: (CGG)n and (CG)n repeats which can, respectively, form G-quadruplex DNA and Z-DNA. We find pausing around the (CGG)n repeat that may indicate the presence of G-quadruplexes in some of the sequencer reads. The (CG)n repeat does not appear to cause polymerase pausing, but its kinetics signature nevertheless suggests the possibility that alternative nucleotide conformations may sometimes be present.ConclusionWe discuss the implications of using our method to discover DNA sequences capable of forming alternative structures. The analyses presented here can be reproduced on any Pacific Biosciences kinetics data for any DNA pattern of interest using an R package that we have made publicly available.


July 7, 2019

Do echinoderm genomes measure up?

Echinoderm genome sequences are a corpus of useful information about a clade of animals that serve as research models in fields ranging from marine ecology to cell and developmental biology. Genomic information from echinoids has contributed to insights into the gene interactions that drive the developmental process at the molecular level. Such insights often rely heavily on genomic information and the kinds of questions that can be asked thus depend on the quality of the sequence information. Here we describe the history of echinoderm genomic sequence assembly and present details about the quality of the data obtained. All of the sequence information discussed here is posted on the echinoderm information web system, Echinobase.org. Copyright © 2015 Elsevier B.V. All rights reserved.


July 7, 2019

Saccharina genomes provide novel insight into kelp biology.

Seaweeds are essential for marine ecosystems and have immense economic value. Here we present a comprehensive analysis of the draft genome of Saccharina japonica, one of the most economically important seaweeds. The 537-Mb assembled genomic sequence covered 98.5% of the estimated genome, and 18,733 protein-coding genes are predicted and annotated. Gene families related to cell wall synthesis, halogen concentration, development and defence systems were expanded. Functional diversification of the mannuronan C-5-epimerase and haloperoxidase gene families provides insight into the evolutionary adaptation of polysaccharide biosynthesis and iodine antioxidation. Additional sequencing of seven cultivars and nine wild individuals reveal that the genetic diversity within wild populations is greater than among cultivars. All of the cultivars are descendants of a wild S. japonica accession showing limited admixture with S. longissima. This study represents an important advance toward improving yields and economic traits in Saccharina and provides an invaluable resource for plant genome studies.


July 7, 2019

Phylogeographical analysis of the dominant multidrug-resistant H58 clade of Salmonella Typhi identifies inter- and intracontinental transmission events.

The emergence of multidrug-resistant (MDR) typhoid is a major global health threat affecting many countries where the disease is endemic. Here whole-genome sequence analysis of 1,832 Salmonella enterica serovar Typhi (S. Typhi) identifies a single dominant MDR lineage, H58, that has emerged and spread throughout Asia and Africa over the last 30 years. Our analysis identifies numerous transmissions of H58, including multiple transfers from Asia to Africa and an ongoing, unrecognized MDR epidemic within Africa itself. Notably, our analysis indicates that H58 lineages are displacing antibiotic-sensitive isolates, transforming the global population structure of this pathogen. H58 isolates can harbor a complex MDR element residing either on transmissible IncHI1 plasmids or within multiple chromosomal integration sites. We also identify new mutations that define the H58 lineage. This phylogeographical analysis provides a framework to facilitate global management of MDR typhoid and is applicable to similar MDR lineages emerging in other bacterial species.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.