Menu
July 7, 2019

Genome scaffolding and annotation for the pathogen vector Ixodes ricinus by ultra-long single molecule sequencing.

Global warming and other ecological changes have facilitated the expansion of Ixodes ricinus tick populations. Ixodes ricinus is the most important carrier of vector-borne pathogens in Europe, transmitting viruses, protozoa and bacteria, in particular Borrelia burgdorferi (sensu lato), the causative agent of Lyme borreliosis, the most prevalent vector-borne disease in humans in the Northern hemisphere. To faster control this disease vector, a better understanding of the I. ricinus tick is necessary. To facilitate such studies, we recently published the first reference genome of this highly prevalent pathogen vector. Here, we further extend these studies by scaffolding and annotating the first reference genome by using ultra-long sequencing reads from third generation single molecule sequencing. In addition, we present the first genome size estimation for I. ricinus ticks and the embryo-derived cell line IRE/CTVM19.235,953 contigs were integrated into 204,904 scaffolds, extending the currently known genome lengths by more than 30% from 393 to 516 Mb and the N50 contig value by 87% from 1643 bp to a N50 scaffold value of 3067 bp. In addition, 25,263 sequences were annotated by comparison to the tick’s North American relative Ixodes scapularis. After (conserved) hypothetical proteins, zinc finger proteins, secreted proteins and P450 coding proteins were the most prevalent protein categories annotated. Interestingly, more than 50% of the amino acid sequences matching the homology threshold had 95-100% identity to the corresponding I. scapularis gene models. The sequence information was complemented by the first genome size estimation for this species. Flow cytometry-based genome size analysis revealed a haploid genome size of 2.65Gb for I. ricinus ticks and 3.80 Gb for the cell line.We present a first draft sequence map of the I. ricinus genome based on a PacBio-Illumina assembly. The I. ricinus genome was shown to be 26% (500 Mb) larger than the genome of its American relative I. scapularis. Based on the genome size of 2.65 Gb we estimated that we covered about 67% of the non-repetitive sequences. Genome annotation will facilitate screening for specific molecular pathways in I. ricinus cells and provides an overview of characteristics and functions.


July 7, 2019

Draft genome sequences of semiconstitutive red, dry, and rough biofilm-forming commensal and uropathogenic Escherichia coli isolates.

Strains of Escherichia coli exhibit diverse biofilm formation capabilities. E. coli K-12 expresses the red, dry, and rough (rdar) morphotype below 30°C, whereas clinical isolates frequently display the rdar morphotype semiconstitutively. We sequenced the genomes of eight E. coli strains to subsequently investigate the molecular basis of semiconstitutive rdar morphotype expression. Copyright © 2017 Cimdins et al.


July 7, 2019

An antimicrobial peptide-resistant minor subpopulation of Photorhabdus luminescens is responsible for virulence.

Some of the bacterial cells in isogenic populations behave differently from others. We describe here how a new type of phenotypic heterogeneity relating to resistance to cationic antimicrobial peptides (CAMPs) is determinant for the pathogenic infection process of the entomopathogenic bacterium Photorhabdus luminescens. We demonstrate that the resistant subpopulation, which accounts for only 0.5% of the wild-type population, causes septicemia in insects. Bacterial heterogeneity is driven by the PhoPQ two-component regulatory system and expression of pbgPE, an operon encoding proteins involved in lipopolysaccharide (LPS) modifications. We also report the characterization of a core regulon controlled by the DNA-binding PhoP protein, which governs virulence in P. luminescens. Comparative RNAseq analysis revealed an upregulation of marker genes for resistance, virulence and bacterial antagonism in the pre-existing resistant subpopulation, suggesting a greater ability to infect insect prey and to survive in cadavers. Finally, we suggest that the infection process of P. luminescens is based on a bet-hedging strategy to cope with the diverse environmental conditions experienced during the lifecycle.


July 7, 2019

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual.The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.


July 7, 2019

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy. © 2017 Zimin et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies.

Many tools have been developed for haplotype assembly-the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types-dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing-we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (~98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies.© 2017 Edge et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Whole-genome sequences of Burkholderia pseudomallei isolates exhibiting decreased meropenem susceptibility.

We report here paired isogenic Burkholderia pseudomallei genomes obtained from three patients receiving intravenous meropenem for melioidosis treatment, with post-meropenem isolates developing decreased susceptibility. Two genomes were finished, and four were drafted to improved high-quality standard. These genomes will be used to identify meropenem resistance mechanisms in B. pseudomallei. Copyright © 2017 Price et al.


July 7, 2019

High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development.

Using the latest sequencing and optical mapping technologies, we have produced a high-quality de novo assembly of the apple (Malus domestica Borkh.) genome. Repeat sequences, which represented over half of the assembly, provided an unprecedented opportunity to investigate the uncharacterized regions of a tree genome; we identified a new hyper-repetitive retrotransposon sequence that was over-represented in heterochromatic regions and estimated that a major burst of different transposable elements (TEs) occurred 21 million years ago. Notably, the timing of this TE burst coincided with the uplift of the Tian Shan mountains, which is thought to be the center of the location where the apple originated, suggesting that TEs and associated processes may have contributed to the diversification of the apple ancestor and possibly to its divergence from pear. Finally, genome-wide DNA methylation data suggest that epigenetic marks may contribute to agronomically relevant aspects, such as apple fruit development.


July 7, 2019

Complete genome sequence of a phthalic acid esters degrading Mycobacterium sp. YC-RL4

Mycobacterium sp. YC-RL4 is capable of utilizing a broad range of phthalic acid esters (PAEs) as sole source of carbon and energy for growth. The preliminary studies demonstrated its high degrading efficiency and good performance during the bioprocess with environmental samples. Here, we present the complete genome of Mycobacterium sp. YC-RL4, which consists of one circular chromosome (5,801,417 bp) and one plasmid (252,568 bp). The genomic analysis and gene annotation were performed and many potential genes responsible for the biodegradation of PAEs were identified from the genome. These results may advance the investigation of bioremediation of PAEs-contaminated environments by strain YC-RL4.


July 7, 2019

De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms.

Long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore MinION are capable of producing long sequencing reads with average fragment lengths of over 10,000 base-pairs and maximum lengths reaching 100,000 base- pairs. Compared with short reads, the assemblies obtained from long-read sequencing platforms have much higher contig continuity and genome completeness as long fragments are able to extend paths into problematic or repetitive regions. Many successful assembly applications of the Pacific Biosciences technology have been reported ranging from small bacterial genomes to large plant and animal genomes. Recently, genome assemblies using Oxford Nanopore MinION data have attracted much attention due to the portability and low cost of this novel sequencing instrument. In this paper, we re-sequenced a well characterized genome, the Saccharomyces cerevisiae S288C strain using three different platforms: MinION, PacBio and MiSeq. We present a comprehensive metric comparison of assemblies generated by various pipelines and discuss how the platform associated data characteristics affect the assembly quality. With a given read depth of 31X, the assemblies from both Pacific Biosciences and Oxford Nanopore MinION show excellent continuity and completeness for the 16 nuclear chromosomes, but not for the mitochondrial genome, whose reconstruction still represents a significant challenge.


July 7, 2019

Higher-order organisation of extremely amplified, potentially functional and massively methylated 5S rDNA in European pikes (Esox sp.).

Pikes represent an important genus (Esox) harbouring a pre-duplication karyotype (2n?=?2x?=?50) of economically important salmonid pseudopolyploids. Here, we have characterized the 5S ribosomal RNA genes (rDNA) in Esox lucius and its closely related E. cisalpinus using cytogenetic, molecular and genomic approaches. Intragenomic homogeneity and copy number estimation was carried out using Illumina reads. The higher-order structure of rDNA arrays was investigated by the analysis of long PacBio reads. Position of loci on chromosomes was determined by FISH. DNA methylation was analysed by methylation-sensitive restriction enzymes.The 5S rDNA loci occupy exclusively (peri)centromeric regions on 30-38 acrocentric chromosomes in both E. lucius and E. cisalpinus. The large number of loci is accompanied by extreme amplification of genes (>20,000 copies), which is to the best of our knowledge one of the highest copy number of rRNA genes in animals ever reported. Conserved secondary structures of predicted 5S rRNAs indicate that most of the amplified genes are potentially functional. Only few SNPs were found in genic regions indicating their high homogeneity while intergenic spacers were more heterogeneous and several families were identified. Analysis of 10-30 kb-long molecules sequenced by the PacBio technology (containing about 40% of total 5S rDNA) revealed that the vast majority (96%) of genes are organised in large several kilobase-long blocks. Dispersed genes or short tandems were less common (4%). The adjacent 5S blocks were directly linked, separated by intervening DNA and even inverted. The 5S units differing in the intergenic spacers formed both homogeneous and heterogeneous (mixed) blocks indicating variable degree of homogenisation between the loci. Both E. lucius and E. cisalpinus 5S rDNA was heavily methylated at CG dinucleotides.Extreme amplification of 5S rRNA genes in the Esox genome occurred in the absence of significant pseudogenisation suggesting its recent origin and/or intensive homogenisation processes. The dense methylation of units indicates that powerful epigenetic mechanisms have evolved in this group of fish to silence amplified genes. We discuss how the higher-order repeat structures impact on homogenisation of 5S rDNA in the genome.


July 7, 2019

Detection of an Escherichia coli sequence type 167 strain with two tandem copies of blaNDM-1 in the chromosome.

New Delhi metallo-ß-lactamase-1 (NDM-1)-producing Enterobacteriaceae has disseminated rapidly throughout the world and poses an urgent threat to public health. Previous studies confirmed that the blaNDM-1 gene is typically carried in plasmids but rarely in chromosome. We discovered a multidrug-resistant Escherichia coli strain Y5, originating from a urine sample and containing the blaNDM-1 gene, which did not transfer by either conjugation or electrotransformation. We confirmed the possibility of its chromosome location by S1-pulsed-field gel electrophoresis (PFGE) and XbaI-PFGE, followed by Southern blotting. To determine the genomic background of blaNDM-1, the genome of Y5 was completely sequenced and compared to other reference genomes. The results of our study revealed that this isolate consists of a 4.8-Mbp chromosome and three plasmids, it is an epidemic clone of sequence type (ST) 167, and it shows 99% identity with Escherichia coli 6409 (GenBank accession no. CP010371), which lacks the same blaNDM-1 gene-surrounding structure as Y5. The blaNDM-1 gene is embedded in the chromosome along with two tandem copies of an insertion sequence common region 1 (ISCR1) element (sul1-ARR-3-cat-blaNDM-1-bleo-ISCR1), which appears intact in the plasmid from Proteus mirabilis (GenBank accession no. KP662515). The genomic context indicates that the ISCR1 element mediated the blaNDM-1 transposition from a single source plasmid to the chromosome. Our study is the first report of an Enterobacteriaceae strain harboring a chromosomally integrated blaNDM-1, which directly reveals the vertical spreading pattern of the gene. Close surveillance is urgently needed to monitor the emergence and potential spread of ST167 strains that harbor blaNDM-1. Copyright © 2016 American Society for Microbiology.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.