Menu
September 22, 2019

Discovery of new genes involved in curli production by a uropathogenic Escherichia coli strain from the highly virulent O45:K1:H7 lineage.

Curli are bacterial surface-associated amyloid fibers that bind to the dye Congo red (CR) and facilitate uropathogenic Escherichia coli (UPEC) biofilm formation and protection against host innate defenses. Here we sequenced the genome of the curli-producing UPEC pyelonephritis strain MS7163 and showed it belongs to the highly virulent O45:K1:H7 neonatal meningitis-associated clone. MS7163 produced curli at human physiological temperature, and this correlated with biofilm growth, resistance of sessile cells to the human cationic peptide cathelicidin, and enhanced colonization of the mouse bladder. We devised a forward genetic screen using CR staining as a proxy for curli production and identified 41 genes that were required for optimal CR binding, of which 19 genes were essential for curli synthesis. Ten of these genes were novel or poorly characterized with respect to curli synthesis and included genes involved in purine de novo biosynthesis, a regulator that controls the Rcs phosphorelay system, and a novel repressor of curli production (referred to as rcpA). The involvement of these genes in curli production was confirmed by the construction of defined mutants and their complementation. The mutants did not express the curli major subunit CsgA and failed to produce curli based on CR binding. Mutation of purF (the first gene in the purine biosynthesis pathway) and rcpA also led to attenuated colonization of the mouse bladder. Overall, this work has provided new insight into the regulation of curli and the role of these amyloid fibers in UPEC biofilm formation and pathogenesis.IMPORTANCE Uropathogenic Escherichia coli (UPEC) strains are the most common cause of urinary tract infection, a disease increasingly associated with escalating antibiotic resistance. UPEC strains possess multiple surface-associated factors that enable their colonization of the urinary tract, including fimbriae, curli, and autotransporters. Curli are extracellular amyloid fibers that enhance UPEC virulence and promote biofilm formation. Here we examined the function and regulation of curli in a UPEC pyelonephritis strain belonging to the highly virulent O45:K1:H7 neonatal meningitis-associated clone. Curli expression at human physiological temperature led to increased biofilm formation, resistance of sessile cells to the human cationic peptide LL-37, and enhanced bladder colonization. Using a comprehensive genetic screen, we identified multiple genes involved in curli production, including several that were novel or poorly characterized with respect to curli synthesis. In total, this study demonstrates an important role for curli as a UPEC virulence factor that promotes biofilm formation, resistance, and pathogenesis. Copyright © 2018 Nhu et al.


September 22, 2019

Whole-genome sequencing of Chinese yellow catfish provides a valuable genetic resource for high-throughput identification of toxin genes.

Naturally derived toxins from animals are good raw materials for drug development. As a representative venomous teleost, Chinese yellow catfish (Pelteobagrus fulvidraco) can provide valuable resources for studies on toxin genes. Its venom glands are located in the pectoral and dorsal fins. Although with such interesting biologic traits and great value in economy, Chinese yellow catfish is still lacking a sequenced genome. Here, we report a high-quality genome assembly of Chinese yellow catfish using a combination of next-generation Illumina and third-generation PacBio sequencing platforms. The final assembly reached 714 Mb, with a contig N50 of 970 kb and a scaffold N50 of 3.65 Mb, respectively. We also annotated 21,562 protein-coding genes, in which 97.59% were assigned at least one functional annotation. Based on the genome sequence, we analyzed toxin genes in Chinese yellow catfish. Finally, we identified 207 toxin genes and classified them into three major groups. Interestingly, we also expanded a previously reported sex-related region (to ˜6 Mb) in the achieved genome assembly, and localized two important toxin genes within this region. In summary, we assembled a high-quality genome of Chinese yellow catfish and performed high-throughput identification of toxin genes from a genomic view. Therefore, the limited number of toxin sequences in public databases will be remarkably improved once we integrate multi-omics data from more and more sequenced species.


September 22, 2019

Real-time assembly of ribonucleoprotein complexes on nascent RNA transcripts.

Cellular protein-RNA complexes assemble on nascent transcripts, but methods to observe transcription and protein binding in real time and at physiological concentrations are not available. Here, we report a single-molecule approach based on zero-mode waveguides that simultaneously tracks transcription progress and the binding of ribosomal protein S15 to nascent RNA transcripts during early ribosome biogenesis. We observe stable binding of S15 to single RNAs immediately after transcription for the majority of the transcripts at 35?°C but for less than half at 20?°C. The remaining transcripts exhibit either rapid and transient binding or are unable to bind S15, likely due to RNA misfolding. Our work establishes the foundation for studying transcription and its coupled co-transcriptional processes, including RNA folding, ligand binding, and enzymatic activity such as in coupling of transcription to splicing, ribosome assembly or translation.


September 22, 2019

Loss of Rap1 supports recombination-based telomere maintenance independent of RNA-DNA hybrids in fission yeast

To investigate the molecular changes needed for cells to maintain their telomeres by recombination, we monitored telomere appearance during serial culture of fission yeast cells lacking the telomerase recruitment factor Ccq1. Rad52 is loaded onto critically short telomeres shortly after germination despite continued telomere erosion, suggesting that recruitment of recombination factors is not sufficient to maintain telomeres in the absence of telomerase function. Instead, survivor formation coincides with the derepression of telomeric repeat-containing RNA (TERRA). Degradation of telomere-associated TERRA in this context drives a severe growth crisis, ultimately leading to a distinct type of linear survivor with altered cytological telomere characteristics and the eviction of the shelterin component Rap1 (but not the TRF1/TRF2 orthologue, Taz1) from the telomere. We demonstrate that deletion of Rap1 is protective, preventing the growth crisis that is otherwise triggered by degradation of telomere-engaged TERRA in survivors with linear chromosomes. Thus, modulating the stoichiometry of shelterin components appears to support recombination-dependent survivors to persist in the absence of telomere-engaged TERRA.


September 22, 2019

Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools.

We produced an extensive collection of deep re-sequencing datasets for the Venter/HuRef genome using the Illumina massively-parallel DNA sequencing platform. The original Venter genome sequence is a very-high quality phased assembly based on Sanger sequencing. Therefore, researchers developing novel computational tools for the analysis of human genome sequence variation for the dominant Illumina sequencing technology can test and hone their algorithms by making variant calls from these Venter/HuRef datasets and then immediately confirm the detected variants in the Sanger assembly, freeing them of the need for further experimental validation. This process also applies to implementing and benchmarking existing genome analysis pipelines. We prepared and sequenced 200?bp and 350?bp short-insert whole-genome sequencing libraries (sequenced to 100x and 40x genomic coverages respectively) as well as 2?kb, 5?kb, and 12?kb mate-pair libraries (49x, 122x, and 145x physical coverages respectively). Lastly, we produced a linked-read library (128x physical coverage) from which we also performed haplotype phasing.


September 22, 2019

Mosaicism diminishes the value of pre-implantation embryo biopsies for detecting CRISPR/Cas9 induced mutations in sheep.

The production of knock-out (KO) livestock models is both expensive and time consuming due to their long gestational interval and low number of offspring. One alternative to increase efficiency is performing a genetic screening to select pre-implantation embryos that have incorporated the desired mutation. Here we report the use of sheep embryo biopsies for detecting CRISPR/Cas9-induced mutations targeting the gene PDX1 prior to embryo transfer. PDX1 is a critical gene for pancreas development and the target gene required for the creation of pancreatogenesis-disabled sheep. We evaluated the viability of biopsied embryos in vitro and in vivo, and we determined the mutation efficiency using PCR combined with gel electrophoresis and digital droplet PCR (ddPCR). Next, we determined the presence of mosaicism in?~?50% of the recovered fetuses employing a clonal sequencing methodology. While the use of biopsies did not compromise embryo viability, the presence of mosaicism diminished the diagnostic value of the technique. If mosaicism could be overcome, pre-implantation embryo biopsies for mutation screening represents a powerful approach that will streamline the creation of KO animals.


September 22, 2019

Report from the Killer-cell Immunoglobulin-like Receptors (KIR) component of the 17th International HLA and Immunogenetics Workshop.

The goals of the KIR component of the 17th International HLA and Immunogenetics Workshop (IHIW) were to encourage and educate researchers to begin analyzing KIR at allelic resolution, and to survey the nature and extent of KIR allelic diversity across human populations. To represent worldwide diversity, we analyzed 1269 individuals from ten populations, focusing on the most polymorphic KIR genes, which express receptors having three immunoglobulin (Ig)-like domains (KIR3DL1/S1, KIR3DL2 and KIR3DL3). We identified 13 novel alleles of KIR3DL1/S1, 13 of KIR3DL2 and 18 of KIR3DL3. Previously identified alleles, corresponding to 33 alleles of KIR3DL1/S1, 38 of KIR3DL2, and 43 of KIR3DL3, represented over 90% of the observed allele frequencies for these genes. In total we observed 37 KIR3DL1/S1 allotypes, 40 for KIR3DL2 and 44 for KIR3DL3. As KIR allotype diversity can affect NK cell function, this demonstrates potential for high functional diversity worldwide. Allelic variation further diversifies KIR haplotypes. We determined KIR3DL3?~?KIR3DL1/S1?~?KIR3DL2 haplotypes from five of the studied populations, and observed multiple population-specific haplotypes in each. This included 234 distinct haplotypes in European Americans, 191 in Ugandans, 35 in Papuans, 95 in Egyptians and 86 in Spanish populations. For another 35 populations, encompassing 642,105 individuals we focused on KIR3DL2 and identified another 375 novel alleles, with approximately half of them observed in more than one individual. The KIR allelic level data gathered from this project represents the most comprehensive summary of global KIR allelic diversity to date, and continued analysis will improve understanding of KIR allelic polymorphism in global populations. Further, the wealth of new data gathered in the course of this workshop component highlights the value of collaborative, community-based efforts in immunogenetics research, exemplified by the IHIW.Copyright © 2018. Published by Elsevier Inc.


September 21, 2019

Long-read genome sequencing identifies causal structural variation in a Mendelian disease.

PurposeCurrent clinical genomics assays primarily utilize short-read sequencing (SRS), but SRS has limited ability to evaluate repetitive regions and structural variants. Long-read sequencing (LRS) has complementary strengths, and we aimed to determine whether LRS could offer a means to identify overlooked genetic variation in patients undiagnosed by SRS.MethodsWe performed low-coverage genome LRS to identify structural variants in a patient who presented with multiple neoplasia and cardiac myxomata, in whom the results of targeted clinical testing and genome SRS were negative.ResultsThis LRS approach yielded 6,971 deletions and 6,821 insertions?>?50?bp. Filtering for variants that are absent in an unrelated control and overlap a disease gene coding exon identified three deletions and three insertions. One of these, a heterozygous 2,184?bp deletion, overlaps the first coding exon of PRKAR1A, which is implicated in autosomal dominant Carney complex. RNA sequencing demonstrated decreased PRKAR1A expression. The deletion was classified as pathogenic based on guidelines for interpretation of sequence variants.ConclusionThis first successful application of genome LRS to identify a pathogenic variant in a patient suggests that LRS has significant potential for the identification of disease-causing structural variation. Larger studies will ultimately be required to evaluate the potential clinical utility of LRS.


July 19, 2019

Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing.

Single-molecule real-time (SMRT) DNA sequencing allows the systematic detection of chemical modifications such as methylation but has not previously been applied on a genome-wide scale. We used this approach to detect 49,311 putative 6-methyladenine (m6A) residues and 1,407 putative 5-methylcytosine (m5C) residues in the genome of a pathogenic Escherichia coli strain. We obtained strand-specific information for methylation sites and a quantitative assessment of the frequency of methylation at each modified position. We deduced the sequence motifs recognized by the methyltransferase enzymes present in this strain without prior knowledge of their specificity. Furthermore, we found that deletion of a phage-encoded methyltransferase-endonuclease (restriction-modification; RM) system induced global transcriptional changes and led to gene amplification, suggesting that the role of RM systems extends beyond protecting host genomes from foreign DNA.


July 19, 2019

Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.

Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.


July 19, 2019

Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic.

DNA modifications such as methylation and DNA damage can play critical regulatory roles in biological systems. Single molecule, real time (SMRT) sequencing technology generates DNA sequences as well as DNA polymerase kinetic information that can be used for the direct detection of DNA modifications. We demonstrate that local sequence context has a strong impact on DNA polymerase kinetics in the neighborhood of the incorporation site during the DNA synthesis reaction, allowing for the possibility of estimating the expected kinetic rate of the enzyme at the incorporation site using kinetic rate information collected from existing SMRT sequencing data (historical data) covering the same local sequence contexts of interest. We develop an Empirical Bayesian hierarchical model for incorporating historical data. Our results show that the model could greatly increase DNA modification detection accuracy, and reduce requirement of control data coverage. For some DNA modifications that have a strong signal, a control sample is not even needed by using historical data as alternative to control. Thus, sequencing costs can be greatly reduced by using the model. We implemented the model in a R package named seqPatch, which is available at https://github.com/zhixingfeng/seqPatch.


July 19, 2019

Polymorphic microsatellite markers for a wind-dispersed tropical tree species, Triplaris cumingiana (Polygonaceae).

Novel microsatellite markers were characterized in the wind-dispersed and dioecious neotropical tree Triplaris cumingiana (Polygonaceae) for use in understanding the ecological processes and genetic impacts of pollen- and seed-mediated gene flow in tropical forests. •Sixty-two microsatellite primer pairs were screened, from which 12 markers showing five or more alleles per locus (range 5-17) were tested on 47 individuals. Observed and expected heterozygosities averaged 0.692 and 0.731, respectively. Polymorphism information content was between 0.417 and 0.874. Linkage disequilibrium was observed in one of the 66 pairwise comparisons between loci. Two loci showed deviation from Hardy-Weinberg equilibrium. An additional 14 markers exhibiting lower polymorphism were characterized on a smaller number of individuals. •These microsatellite markers have high levels of polymorphism and reproducibility and will be useful in studying gene flow and population structure in T. cumingiana.


July 19, 2019

Quantifying genome-editing outcomes at endogenous loci with SMRT sequencing.

Targeted genome editing with engineered nucleases has transformed the ability to introduce precise sequence modifications at almost any site within the genome. A major obstacle to probing the efficiency and consequences of genome editing is that no existing method enables the frequency of different editing events to be simultaneously measured across a cell population at any endogenous genomic locus. We have developed a novel method for quantifying individual genome editing outcomes at any site of interest using single molecule real time (SMRT) DNA sequencing. We show that this approach can be applied at various loci, using multiple engineered nuclease platforms including TALENs, RNA guided endonucleases (CRISPR/Cas9), and ZFNs, and in different cell lines to identify conditions and strategies in which the desired engineering outcome has occurred. This approach facilitates the evaluation of new gene editing technologies and permits sensitive quantification of editing outcomes in almost every experimental system used.


July 19, 2019

Global methylation state at base-pair resolution of the Caulobacter genome throughout the cell cycle.

The Caulobacter DNA methyltransferase CcrM is one of five master cell-cycle regulators. CcrM is transiently present near the end of DNA replication when it rapidly methylates the adenine in hemimethylated GANTC sequences. The timing of transcription of two master regulator genes and two cell division genes is controlled by the methylation state of GANTC sites in their promoters. To explore the global extent of this regulatory mechanism, we determined the methylation state of the entire chromosome at every base pair at five time points in the cell cycle using single-molecule, real-time sequencing. The methylation state of 4,515 GANTC sites, preferentially positioned in intergenic regions, changed progressively from full to hemimethylation as the replication forks advanced. However, 27 GANTC sites remained unmethylated throughout the cell cycle, suggesting that these protected sites could participate in epigenetic regulatory functions. An analysis of the time of activation of every cell-cycle regulatory transcription start site, coupled to both the position of a GANTC site in their promoter regions and the time in the cell cycle when the GANTC site transitions from full to hemimethylation, allowed the identification of 59 genes as candidates for epigenetic regulation. In addition, we identified two previously unidentified N(6)-methyladenine motifs and showed that they maintained a constant methylation state throughout the cell cycle. The cognate methyltransferase was identified for one of these motifs as well as for one of two 5-methylcytosine motifs.


July 19, 2019

Resolving complex tandem repeats with long reads.

Resolving tandemly repeated genomic sequences is a necessary step in improving our understanding of the human genome. Short tandem repeats (TRs), or microsatellites, are often used as molecular markers in genetics, and clinically, variation in microsatellites can lead to genetic disorders like Huntington’s diseases. Accurately resolving repeats, and in particular TRs, remains a challenging task in genome alignment, assembly and variation calling. Though tools have been developed for detecting microsatellites in short-read sequencing data, these are limited in the size and types of events they can resolve. Single-molecule sequencing technologies may potentially resolve a broader spectrum of TRs given their increased length, but require new approaches given their significantly higher raw error profiles. However, due to inherent error profiles of the single-molecule technologies, these reads presents a unique challenge in terms of accurately identifying and estimating the TRs.Here we present PacmonSTR, a reference-based probabilistic approach, to identify the TR region and estimate the number of these TR elements in long DNA reads. We present a multistep approach that requires as input, a reference region and the reference TR element. Initially, the TR region is identified from the long DNA reads via a 3-stage modified Smith-Waterman approach and then, expected number of TR elements is calculated using a pair-Hidden Markov Models-based method. Finally, TR-based genotype selection (or clustering: homozygous/heterozygous) is performed with Gaussian mixture models, using the Akaike information criteria, and coverage expectations. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.