Menu
July 19, 2019

Integrating DNA methylation and gene expression data in the development of the soybean-Bradyrhizobium N2-fixing symbiosis.

Very little is known about the role of epigenetics in the differentiation of a bacterium from the free-living to the symbiotic state. Here genome-wide analysis of DNA methylation changes between these states is described using the model of symbiosis between soybean and its root nodule-forming, nitrogen-fixing symbiont, Bradyrhizobium diazoefficiens. PacBio resequencing of the B. diazoefficiens genome from both states revealed 43,061 sites recognized by five motifs with the potential to be methylated genome-wide. Of those sites, 3276 changed methylation states in 2921 genes or 35.5% of all genes in the genome. Over 10% of the methylation changes occurred within the symbiosis island that comprises 7.4% of the genome. The CCTTGAG motif was methylated only during symbiosis with 1361 adenosines methylated among the 1700 possible sites. Another 89 genes within the symbiotic island and 768 genes throughout the genome were found to have methylation and significant expression changes during symbiotic development. Of those, nine known symbiosis genes involved in all phases of symbiotic development including early infection events, nodule development, and nitrogenase production. These associations between methylation and expression changes in many B. diazoefficiens genes suggest an important role of the epigenome in bacterial differentiation to the symbiotic state.


July 19, 2019

Genome structural diversity among 31 Bordetella pertussis isolates from two recent U.S. whooping cough statewide epidemics

During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B.~pertussis populations.IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B.~pertussis strains circulating during epidemics exhibit diversity visible on a genome structural level, previously undetectable by traditional sequence analysis using short-read technologies. For the first time, we combine short- and long-read sequencing platforms with restriction optical mapping for single-contig, de novo assembly of 31 isolates to investigate two geographically and temporally independent U.S. pertussis epidemics. These complete genomes reshape our understanding of B.~pertussis evolution and strengthen molecular epidemiology toward one day understanding the resurgence of pertussis.


July 19, 2019

Genomic changes following the reversal of a Y chromosome to an autosome in Drosophila pseudoobscura

Robertsonian translocations resulting in fusions between sex chromosomes and autosomes shape karyotype evolution by creating new sex chromosomes from autosomes. These translocations can also reverse sex chromosomes back into autosomes, which is especially intriguing given the dramatic differences between autosomes and sex chromosomes. To study the genomic events following a Y chromosome reversal, we investigated an autosome-Y translocation in Drosophila pseudoobscura. The ancestral Y chromosome fused to a small autosome (the dot chromosome) approximately 10–15 Mya. We used single molecule real-time sequencing reads to assemble the D. pseudoobscura dot chromosome, including this Y-to-dot translocation. We find that the intervening sequence between the ancestral Y and the rest of the dot chromosome is only ~78 Kb and is not repeat-dense, suggesting that the centromere now falls outside, rather than between, the fused chromosomes. The Y-to-dot region is 100 times smaller than the D. melanogaster Y chromosome, owing to changes in repeat landscape. However, we do not find a consistent reduction in intron sizes across the Y-to-dot region. Instead, deletions in intergenic regions and possibly a small ancestral Y chromosome size may explain the compact size of the Y-to-dot translocation.


July 19, 2019

AgIn: Measuring the landscape of CpG methylation of individual repetitive elements.

Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it produces long read lengths, and its kinetic information is sensitive to DNA modifications.We propose a novel linear-time algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Using a practical read coverage of ~30-fold from an inbred strain medaka (Oryzias latipes), we observed that both the sensitivity and precision of our method on individual CpG sites were ~93.7%. We also observed a high correlation coefficient (R?=?0.884) between our method and bisulfite sequencing, and for 92.0% of CpG sites, methylation levels ranging over [0, 1] were in concordance within an acceptable difference 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and hypomethylation and detecting hypomethylation hot spots of LTRs and LINEs. We uncovered the methylation states for nearly identical active transposons, two novel LINE insertions of identity ~99% and length 6050 base pairs (bp) in the human genome, and 16 Tol2 elements of identity >99.8% and length 4682?bp in the medaka genome.AgIn (Aggregate on Intervals) is available at: https://github.com/hacone/AgIn CONTACT: ysuzuki@cb.k.u-tokyo.ac.jp, moris@cb.k.u-tokyo.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author(s) 2016. Published by Oxford University Press.


July 19, 2019

Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11?kb), single molecule, real-time sequencing.

The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [~80.6% (A?+?T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12?kb, with 50% of the reads between 15.5 and 50?kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [~90-99% (A?+?T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 19, 2019

Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads.

Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups.


July 19, 2019

Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms.

Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ~3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.


July 19, 2019

Separate F-type plasmids have shaped the evolution of the H30 subclone of Escherichia coli sequence type 131.

The extraintestinal pathogenic Escherichia coli (ExPEC) H30 subclone of sequence type 131 (ST131-H30) has emerged abruptly as a dominant lineage of ExPEC responsible for human disease. The ST131-H30 lineage has been well described phylogenetically, yet its plasmid complement is not fully understood. Here, single-molecule, real-time sequencing was used to generate the complete plasmid sequences of ST131-H30 isolates and those belonging to other ST131 clades. Comparative analyses revealed separate F-type plasmids that have shaped the evolution of the main fluoroquinolone-resistant ST131-H30 clades. Specifically, an F1:A2:B20 plasmid is strongly associated with the H30R/C1 clade, whereas an F2:A1:B- plasmid is associated with the H30Rx/C2 clade. A series of plasmid gene losses, gains, and rearrangements involving IS26 likely led to the current plasmid complements within each ST131-H30 sublineage, which contain several overlapping gene clusters with putative functions in virulence and fitness, suggesting plasmid-mediated convergent evolution. Evidence suggests that the H30Rx/C2-associated F2:A1:B- plasmid type was present in strains ancestral to the acquisition of fluoroquinolone resistance and prior to the introduction of a multidrug resistance-encoding gene cassette harboring bla CTX-M-15. In vitro experiments indicated a host strain-independent low frequency of plasmid transfer, differential levels of plasmid stability even between closely related ST131-H30 strains, and possible epistasis for carriage of these plasmids within the H30R/Rx lineages. IMPORTANCE A clonal lineage of Escherichia coli known as ST131 has emerged as a dominating strain type causing extraintestinal infections in humans. The evolutionary history of ST131 E. coli is now well understood. However, the role of plasmids in ST131’s evolutionary history is poorly defined. This study utilized real-time, single-molecule sequencing to compare plasmids from various current and historical lineages of ST131. From this work, it was determined that a series of plasmid gains, losses, and recombinational events has led to the currently circulating plasmids of ST131 strains. These plasmids appear to have evolved to acquire similar gene clusters on multiple occasions, suggesting possible plasmid-mediated convergent evolution leading to evolutionary success. These plasmids also appear to be better suited to exist in specific strains of ST131 due to coadaptive mutations. Overall, a series of events has enabled the evolution of ST131 plasmids, possibly contributing to the lineage’s success.


July 19, 2019

High-quality assembly of an individual of Yoruban descent

De novo assembly of human genomes is now a tractable effort due in part to advances in sequencing and mapping technologies. We use PacBio single-molecule, real-time (SMRT) sequencing and BioNano genomic maps to construct the first de novo assembly of NA19240, a Yoruban individual from Africa. This chromosome-scaffolded assembly of 3.08 Gb with a contig N50 of 7.25 Mb and a scaffold N50 of 78.6 Mb represents one of the most contiguous high-quality human genomes. We utilize a BAC library derived from NA19240 DNA and novel haplotype-resolving sequencing technologies and algorithms to characterize regions of complex genomic architecture that are normally lost due to compression to a linear haploid assembly. Our results demonstrate that multiple technologies are still necessary for complete genomic representation, particularly in regions of highly identical segmental duplications. Additionally, we show that diploid assembly has utility in improving the quality of de novo human genome assemblies.


July 19, 2019

Towards precision medicine.

There is great potential for genome sequencing to enhance patient care through improved diagnostic sensitivity and more precise therapeutic targeting. To maximize this potential, genomics strategies that have been developed for genetic discovery – including DNA-sequencing technologies and analysis algorithms – need to be adapted to fit clinical needs. This will require the optimization of alignment algorithms, attention to quality-coverage metrics, tailored solutions for paralogous or low-complexity areas of the genome, and the adoption of consensus standards for variant calling and interpretation. Global sharing of this more accurate genotypic and phenotypic data will accelerate the determination of causality for novel genes or variants. Thus, a deeper understanding of disease will be realized that will allow its targeting with much greater therapeutic precision.


July 19, 2019

Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63.

Asian cultivated rice consists of two subspecies: Oryza sativa subsp. indica and O. sativa subsp. japonica Despite the fact that indica rice accounts for over 70% of total rice production worldwide and is genetically much more diverse, a high-quality reference genome for indica rice has yet to be published. We conducted map-based sequencing of two indica rice lines, Zhenshan 97 (ZS97) and Minghui 63 (MH63), which represent the two major varietal groups of the indica subspecies and are the parents of an elite Chinese hybrid. The genome sequences were assembled into 237 (ZS97) and 181 (MH63) contigs, with an accuracy >99.99%, and covered 90.6% and 93.2% of their estimated genome sizes. Comparative analyses of these two indica genomes uncovered surprising structural differences, especially with respect to inversions, translocations, presence/absence variations, and segmental duplications. Approximately 42% of nontransposable element related genes were identical between the two genomes. Transcriptome analysis of three tissues showed that 1,059-2,217 more genes were expressed in the hybrid than in the parents and that the expressed genes in the hybrid were much more diverse due to their divergence between the parental genomes. The public availability of two high-quality reference genomes for the indica subspecies of rice will have large-ranging implications for plant biology and crop genetic improvement.


July 19, 2019

Living apart together: crosstalk between the core and supernumerary genomes in a fungal plant pathogen.

Eukaryotes display remarkable genome plasticity, which can include supernumerary chromosomes that differ markedly from the core chromosomes. Despite the widespread occurrence of supernumerary chromosomes in fungi, their origin, relation to the core genome and the reason for their divergent characteristics are still largely unknown. The complexity of genome assembly due to the presence of repetitive DNA partially accounts for this.Here we use single-molecule real-time (SMRT) sequencing to assemble the genome of a prominent fungal wheat pathogen, Fusarium poae, including at least one supernumerary chromosome. The core genome contains limited transposable elements (TEs) and no gene duplications, while the supernumerary genome holds up to 25 % TEs and multiple gene duplications. The core genome shows all hallmarks of repeat-induced point mutation (RIP), a defense mechanism against TEs, specific for fungi. The absence of RIP on the supernumerary genome accounts for the differences between the two (sub)genomes, and results in a functional crosstalk between them. The supernumerary genome is a reservoir for TEs that migrate to the core genome, and even large blocks of supernumerary sequence (>200 kb) have recently translocated to the core. Vice versa, the supernumerary genome acts as a refuge for genes that are duplicated from the core genome.For the first time, a mechanism was determined that explains the differences that exist between the core and supernumerary genome in fungi. Different biology rather than origin was shown to be responsible. A “living apart together” crosstalk exists between the core and supernumerary genome, accelerating chromosomal and organismal evolution.


July 19, 2019

Standardization and quality management in next-generation sequencing

DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.


July 19, 2019

A distinct class of chromoanagenesis events characterized by focal copy number gains.

Chromoanagenesis is the process by which a single catastrophic event creates complex rearrangements confined to a single or a few chromosomes. It is usually characterized by the presence of multiple deletions and/or duplications, as well as by copy neutral rearrangements. In contrast, an array CGH screen of patients with developmental anomalies revealed three patients in which a single chromosome carries from 8 to 11 large copy number gains confined to a single chromosome or chromosomal arm, but the absence of deletions. Subsequent fluorescence in situ hybiridization and massive parallel sequencing revealed the duplicons to be clustered together in distinct locations across the altered chromosomes. Breakpoint junction sequences showed both microhomology and non-templated insertions of up to 40 bp. Hence, these patients each demonstrate a single altered chromosome of clustered insertional duplications, no deletions, and breakpoint junction sequences showing microhomology and/or non-templated insertions. These observations are difficult to reconcile with current mechanistic descriptions of chromothripsis and chromoanasynthesis. Therefore, we hypothesize those rearrangements to be of a mechanistically different origin. In addition, we suggest that large untemplated insertional sequences observed at breakpoints are driven by a non-canonical non-homologous end joining mechanism.© 2016 WILEY PERIODICALS, INC.


July 19, 2019

Condition-dependent co-regulation of genomic clusters of virulence factors in the grapevine trunk pathogen Neofusicoccum parvum.

The ascomycete Neofusicoccum parvum, one of the causal agents of Botryosphaeria dieback, is a destructive wood-infecting fungus and a serious threat to grape production worldwide. The capability to colonize woody tissue, combined with the secretion of phytotoxic compounds, is thought to underlie its pathogenicity and virulence. Here, we describe the repertoire of virulence factors and their transcriptional dynamics as the fungus feeds on different substrates and colonizes the woody stem. We assembled and annotated a highly contiguous genome using single-molecule real-time DNA sequencing. Transcriptome profiling by RNA sequencing determined the genome-wide patterns of expression of virulence factors both in vitro (potato dextrose agar or medium amended with grape wood as substrate) and in planta. Pairwise statistical testing of differential expression, followed by co-expression network analysis, revealed that physically clustered genes coding for putative virulence functions were induced depending on the substrate or stage of plant infection. Co-expressed gene clusters were significantly enriched not only in genes associated with secondary metabolism, but also in those associated with cell wall degradation, suggesting that dynamic co-regulation of transcriptional networks contributes to multiple aspects of N. parvum virulence. In most of the co-expressed clusters, all genes shared at least a common motif in their promoter region, indicative of co-regulation by the same transcription factor. Co-expression analysis also identified chromatin regulators with correlated expression with inducible clusters of virulence factors, suggesting a complex, multi-layered regulation of the virulence repertoire of N. parvum.© 2016 BSPP AND JOHN WILEY & SONS LTD.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.