Menu
July 7, 2019

Combination of short-read, long-read and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications.

Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and telomeric regions it influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly, LR) and single-molecule restriction maps (optical map assembly, OM). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing mis-assemblies. Combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using genome-wide population re-sequencing data, we estimated the population-scaled recombination rate (?) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin, and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three independent technologies, our results highlight the importance of adding a layer of information on genome structure inaccessible to each approach independently. Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii.

The probiotic yeast, Saccharomyces boulardii (Sb) is known to be effective against many gastrointestinal disorders and antibiotic-associated diarrhea. To understand molecular basis of probiotic-properties ascribed to Sb we determined the complete genomes of two strains of Sb i.e. Biocodex and unique28 and the draft genomes for three other Sb strains that are marketed as probiotics in India. We compared these genomes with 145 strains of S. cerevisiae (Sc) to understand genome-level similarities and differences between these yeasts. A distinctive feature of Sb from other Sc is absence of Ty elements Ty1, Ty3, Ty4 and associated LTR. However, we could identify complete Ty2 and Ty5 elements in Sb. The genes for hexose transporters HXT11 and HXT9, and asparagine-utilization are absent in all Sb strains. We find differences in repeat periods and copy numbers of repeats in flocculin genes that are likely related to the differential adhesion of Sb as compared to Sc. Core-proteome based taxonomy places Sb strains along with wine strains of Sc. We find the introgression of five genes from Z. bailii into the chromosome IV of Sb and wine strains of Sc. Intriguingly, genes involved in conferring known probiotic properties to Sb are conserved in most Sc strains.


July 7, 2019

Phenotypic diversity and genotypic flexibility of Burkholderia cenocepacia during long-term chronic infection of cystic fibrosis lungs.

Chronic bacterial infections of the lung are the leading cause of morbidity and mortality in cystic fibrosis patients. Tracking bacterial evolution during chronic infections can provide insights into how host selection pressures-including immune responses and therapeutic interventions-shape bacterial genomes. We carried out genomic and phenotypic analyses of 215 serially collected Burkholderia cenocepacia isolates from 16 cystic fibrosis patients, spanning a period of 2-20 yr and a broad range of epidemic lineages. Systematic phenotypic tests identified longitudinal bacterial series that manifested progressive changes in liquid media growth, motility, biofilm formation, and acute insect virulence, but not in mucoidy. The results suggest that distinct lineages follow distinct evolutionary trajectories during lung infection. Pan-genome analysis identified 10,110 homologous gene clusters present only in a subset of strains, including genes restricted to different molecular types. Our phylogenetic analysis based on 2148 orthologous gene clusters from all isolates is consistent with patient-specific clades. This suggests that initial colonization of patients was likely by individual strains, followed by subsequent diversification. Evidence of clonal lineages shared by some patients was observed, suggesting inter-patient transmission. We observed recurrent gene losses in multiple independent longitudinal series, including complete loss of Chromosome III and deletions on other chromosomes. Recurrently observed loss-of-function mutations were associated with decreases in motility and biofilm formation. Together, our study provides the first comprehensive genome-phenome analyses of B. cenocepacia infection in cystic fibrosis lungs and serves as a valuable resource for understanding the genomic and phenotypic underpinnings of bacterial evolution.© 2017 Lee et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Centromere location in Arabidopsis is unaltered by extreme divergence in CENH3 protein sequence.

During cell division, spindle fibers attach to chromosomes at centromeres. The DNA sequence at regional centromeres is fast evolving with no conserved genetic signature for centromere identity. Instead CENH3, a centromere-specific histone H3 variant, is the epigenetic signature that specifies centromere location across both plant and animal kingdoms. Paradoxically, CENH3 is also adaptively evolving. An ongoing question is whether CENH3 evolution is driven by a functional relationship with the underlying DNA sequence. Here, we demonstrate that despite extensive protein sequence divergence, CENH3 histones from distant species assemble centromeres on the same underlying DNA sequence. We first characterized the organization and diversity of centromere repeats in wild-type Arabidopsis thaliana We show that A. thaliana CENH3-containing nucleosomes exhibit a strong preference for a unique subset of centromeric repeats. These sequences are largely missing from the genome assemblies and represent the youngest and most homogeneous class of repeats. Next, we tested the evolutionary specificity of this interaction in a background in which the native A. thaliana CENH3 is replaced with CENH3s from distant species. Strikingly, we find that CENH3 from Lepidium oleraceum and Zea mays, although specifying epigenetically weaker centromeres that result in genome elimination upon outcrossing, show a binding pattern on A. thaliana centromere repeats that is indistinguishable from the native CENH3. Our results demonstrate positional stability of a highly diverged CENH3 on independently evolved repeats, suggesting that the sequence specificity of centromeres is determined by a mechanism independent of CENH3.© 2017 Maheshwari et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

The human reference genome assembly plays a central role in nearly all aspects of today’s basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health. © 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy. © 2017 Zimin et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism.

Plasmopara viticola causes downy mildew disease of grapevine which is one of the most devastating diseases of viticulture worldwide. Here we report a 101.3?Mb whole genome sequence of P. viticola isolate ‘JL-7-2’ obtained by a combination of Illumina and PacBio sequencing technologies. The P. viticola genome contains 17,014 putative protein-coding genes and has ~26% repetitive sequences. A total of 1,301 putative secreted proteins, including 100 putative RXLR effectors and 90 CRN effectors were identified in this genome. In the secretome, 261 potential pathogenicity genes and 95 carbohydrate-active enzymes were predicted. Transcriptional analysis revealed that most of the RXLR effectors, pathogenicity genes and carbohydrate-active enzymes were significantly up-regulated during infection. Comparative genomic analysis revealed that P. viticola evolved independently from the Arabidopsis downy mildew pathogen Hyaloperonospora arabidopsidis. The availability of the P. viticola genome provides a valuable resource not only for comparative genomic analysis and evolutionary studies among oomycetes, but also enhance our knowledge on the mechanism of interactions between this biotrophic pathogen and its host.


July 7, 2019

The recent emergence in hospitals of multidrug-resistant community-associated sequence type 1 and spa type t127 methicillin-resistant Staphylococcus aureus investigated by whole-genome sequencing: Implications for screening.

Community-associated spa type t127/t922 methicillin-resistant Staphylococcus aureus (MRSA) prevalence increased from 1%-7% in Ireland between 2010-2015. This study tracked the spread of 89 such isolates from June 2013-June 2016. These included 78 healthcare-associated and 11 community associated-MRSA isolates from a prolonged hospital outbreak (H1) (n = 46), 16 other hospitals (n = 28), four other healthcare facilities (n = 4) and community-associated sources (n = 11). Isolates underwent antimicrobial susceptibility testing, DNA microarray profiling and whole-genome sequencing. Minimum spanning trees were generated following core-genome multilocus sequence typing and pairwise single nucleotide variation (SNV) analysis was performed. All isolates were sequence type 1 MRSA staphylococcal cassette chromosome mec type IV (ST1-MRSA-IV) and 76/89 were multidrug-resistant. Fifty isolates, including 40/46 from H1, were high-level mupirocin-resistant, carrying a conjugative 39 kb iles2-encoding plasmid. Two closely related ST1-MRSA-IV strains (I and II) and multiple sporadic strains were identified. Strain I isolates (57/89), including 43/46 H1 and all high-level mupirocin-resistant isolates, exhibited =80 SNVs. Two strain I isolates from separate H1 healthcare workers differed from other H1/strain I isolates by 7-47 and 12-53 SNVs, respectively, indicating healthcare worker involvement in this outbreak. Strain II isolates (19/89), including the remaining H1 isolates, exhibited =127 SNVs. For each strain, the pairwise SNVs exhibited by healthcare-associated and community-associated isolates indicated recent transmission of ST1-MRSA-IV within and between multiple hospitals, healthcare facilities and communities in Ireland. Given the interchange between healthcare-associated and community-associated isolates in hospitals, the risk factors that inform screening for MRSA require revision.


July 7, 2019

Terpene synthases from Cannabis sativa.

Cannabis (Cannabis sativa) plants produce and accumulate a terpene-rich resin in glandular trichomes, which are abundant on the surface of the female inflorescence. Bouquets of different monoterpenes and sesquiterpenes are important components of cannabis resin as they define some of the unique organoleptic properties and may also influence medicinal qualities of different cannabis strains and varieties. Transcriptome analysis of trichomes of the cannabis hemp variety ‘Finola’ revealed sequences of all stages of terpene biosynthesis. Nine cannabis terpene synthases (CsTPS) were identified in subfamilies TPS-a and TPS-b. Functional characterization identified mono- and sesqui-TPS, whose products collectively comprise most of the terpenes of ‘Finola’ resin, including major compounds such as ß-myrcene, (E)-ß-ocimene, (-)-limonene, (+)-a-pinene, ß-caryophyllene, and a-humulene. Transcripts associated with terpene biosynthesis are highly expressed in trichomes compared to non-resin producing tissues. Knowledge of the CsTPS gene family may offer opportunities for selection and improvement of terpene profiles of interest in different cannabis strains and varieties.


July 7, 2019

Genetic and genomic tools for Cannabis sativa

The Cannabis industry is currently one of the fastest growing industries in the United States. Given the changing legal status of the plant, and the rapidly advancing research, updated information on the advancement of Cannabis genomics is needed. This versatile plant is used as medicine and for food, fiber, and bioremediation. Insights from modern, high-throughput genomic technology are revolutionizing our understanding of the plant and are providing new tools to further improve our knowledge and utilization of this unique species. This review quantifies and evaluates the currently available genomic resources for Cannabis research, including six whole-genome assemblies, two transcriptomes, and 393 other substantial genomic resources, as well as other smaller publicly available genetic and genomic resources. The open-source approaches followed by many leading scientists in the field promote collaboration and facilitate these rapid advances.


July 7, 2019

An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25?361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107?821, 61% larger than the previous assembly. © The Author 2017. Published by Oxford University Press.


July 7, 2019

Extremely low genomic diversity of Rickettsia japonica distributed in Japan.

Rickettsiae are obligate intracellular bacteria that have small genomes as a result of reductive evolution. Many Rickettsia species of the spotted fever group (SFG) cause tick-borne diseases known as “spotted fevers”. The life cycle of SFG rickettsiae is closely associated with that of the tick, which is generally thought to act as a bacterial vector and reservoir that maintains the bacterium through transstadial and transovarial transmission. Each SFG member is thought to have adapted to a specific tick species, thus restricting the bacterial distribution to a relatively limited geographic region. These unique features of SFG rickettsiae allow investigation of how the genomes of such biologically and ecologically specialized bacteria evolve after genome reduction and the types of population structures that are generated. Here, we performed a nationwide, high-resolution phylogenetic analysis of Rickettsia japonica, an etiological agent of Japanese spotted fever that is distributed in Japan and Korea. The comparison of complete or nearly complete sequences obtained from 31 R. japonica strains isolated from various sources in Japan over the past 30 years demonstrated an extremely low level of genomic diversity. In particular, only 34 single nucleotide polymorphisms were identified among the 27 strains of the major lineage containing all clinical isolates and tick isolates from the three tick species. Our data provide novel insights into the biology and genome evolution of R. japonica, including the possibilities of recent clonal expansion and a long generation time in nature due to the long dormant phase associated with tick life cycles.© The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

Characterization of Class IIa bacteriocin resistance in Enterococcus faecium.

Vancomycin-resistant enterococci, particularly resistant Enterococcus faecium, pose an escalating threat in nosocomial environments because of their innate resistance to many antibiotics, including vancomycin, a treatment of last resort. Many class IIa bacteriocins strongly target these enterococci and may offer a potential alternative for the management of this pathogen. However, E. faecium’s resistance to these peptides remains relatively uncharacterized. Here, we explored the development of resistance of E. faecium to a cocktail of three class IIa bacteriocins: enterocin A, enterocin P, and hiracin JM79. We started by quantifying the frequency of resistance to these peptides in four clinical isolates of E. faecium We then investigated the levels of resistance of E. faecium 6E6 mutants as well as their fitness in different carbon sources. In order to elucidate the mechanism of resistance of E. faecium to class IIa bacteriocins, we completed whole-genome sequencing of resistant mutants and performed reverse transcription-quantitative PCR (qRT-PCR) of a suspected target mannose phosphotransferase (ManPTS). We then verified this ManPTS’s role in bacteriocin susceptibility by showing that expression of the ManPTS in Lactococcus lactis results in susceptibility to the peptide cocktail. Based on the evidence found from these studies, we conclude that, in accord with other studies in E. faecalis and Listeria monocytogenes, resistance to class IIa bacteriocins in E. faecium 6E6 is likely caused by the disruption of a particular ManPTS, which we believe we have identified. Copyright © 2017 American Society for Microbiology.


July 7, 2019

The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression.

Spider silks are the toughest known biological materials, yet are lightweight and virtually invisible to the human immune system, and they thus have revolutionary potential for medicine and industry. Spider silks are largely composed of spidroins, a unique family of structural proteins. To investigate spidroin genes systematically, we constructed the first genome of an orb-weaving spider: the golden orb-weaver (Nephila clavipes), which builds large webs using an extensive repertoire of silks with diverse physical properties. We cataloged 28 Nephila spidroins, representing all known orb-weaver spidroin types, and identified 394 repeated coding motif variants and higher-order repetitive cassette structures unique to specific spidroins. Characterization of spidroin expression in distinct silk gland types indicates that glands can express multiple spidroin types. We find evidence of an alternatively spliced spidroin, a spidroin expressed only in venom glands, evolutionary mechanisms for spidroin diversification, and non-spidroin genes with expression patterns that suggest roles in silk production.


July 7, 2019

Sequencing and de novo assembly of a near complete indica rice genome.

A high-quality reference genome is critical for understanding genome structure, genetic variation and evolution of an organism. Here we report the de novo assembly of an indica rice genome Shuhui498 (R498) through the integration of single-molecule sequencing and mapping data, genetic map and fosmid sequence tags. The 390.3?Mb assembly is estimated to cover more than 99% of the R498 genome and is more continuous than the current reference genomes of japonica rice Nipponbare (MSU7) and Arabidopsis thaliana (TAIR10). We annotate high-quality protein-coding genes in R498 and identify genetic variations between R498 and Nipponbare and presence/absence variations by comparing them to 17 draft genomes in cultivated rice and its closest wild relatives. Our results demonstrate how to de novo assemble a highly contiguous and near-complete plant genome through an integrative strategy. The R498 genome will serve as a reference for the discovery of genes and structural variations in rice.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.