Pacbio reads Archives - Page 30 of 53

July 7, 2019

A vast genomic deletion in the C56BL/6 genome affects different genes within the Ifi200 cluster on chromosome 1 and mediates obesity and insulin resistance.

Obesity, the excessive accumulation of body fat, is a highly heritable and genetically heterogeneous disorder. The complex, polygenic basis for the disease consisting of a network of different gene variants is still not completely known.In the current study we generated a BAC library of the obese-prone NZO strain to clarify the genomic alteration within the gene cluster Ifi200 on chr.1 including Ifi202b, an obesity gene that is in contrast to NZO not expressed in the lean B6 mouse. With the PacBio sequencing data of NZO BAC clones we identified a deletion spanning approximately 261.8 kb in the B6 reference genome. The deletion affects different members of the Ifi200 gene family which also includes the original first exon and 5′-regulatory parts of the Ifi202b gene and suggests to be the relevant cause of its expression deficiency in B6. In addition, the generation and characterization of congenic mice carrying the critical fragment on the B6 background demonstrate its crucial role for obesity and insulin resistance.Our data reveal the reconstruction of a complex genomic region on mouse chr.1 resulting from deletions and duplications of Ifi200 genes and suggest to be relevant for the development of obesity. The results further demonstrate the complexity of the disease and highlight the importance for studying rare genetic variants as they can be causal for large effects.

July 7, 2019

The genome sequence of Barbarea vulgaris facilitates the study of ecological biochemistry.

The genus Barbarea has emerged as a model for evolution and ecology of plant defense compounds, due to its unusual glucosinolate profile and production of saponins, unique to the Brassicaceae. One species, B. vulgaris, includes two ‘types’, G-type and P-type that differ in trichome density, and their glucosinolate and saponin profiles. A key difference is the stereochemistry of hydroxylation of their common phenethylglucosinolate backbone, leading to epimeric glucobarbarins. Here we report a draft genome sequence of the G-type, and re-sequencing of the P-type for comparison. This enables us to identify candidate genes underlying glucosinolate diversity, trichome density, and study the genetics of biochemical variation for glucosinolate and saponins. B. vulgaris is resistant to the diamondback moth, and may be exploited for “dead-end” trap cropping where glucosinolates stimulate oviposition and saponins deter larvae to the extent that they die. The B. vulgaris genome will promote the study of mechanisms in ecological biochemistry to benefit crop resistance breeding.

July 7, 2019

Fallacy of the unique genome: sequence diversity within single Helicobacter pylori strains.

Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra- and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopolysaccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains.IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish “the genome” of a bacterial strain. Variability is usually reduced (“only sequence from a single colony”), ignored (“just publish the consensus”), or placed in the “too-hard” basket (“analysis of raw read data is more robust”). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading. Copyright © 2017 Draper et al.

July 7, 2019

Genomic sequence of ‘Candidatus Liberibacter solanacearum’ haplotype C and its comparison with haplotype A and B genomes.

Haplotypes A and B of ‘Candidatus Liberibacter solanacearum’ (CLso) are associated with diseases of solanaceous plants, especially Zebra chip disease of potato, and haplotypes C, D and E are associated with symptoms on apiaceous plants. To date, one complete genome of haplotype B and two high quality draft genomes of haplotype A have been obtained for these unculturable bacteria using metagenomics from the psyllid vector Bactericera cockerelli. Here, we present the first genomic sequences obtained for the carrot-associated CLso. These two genomic sequences of haplotype C, FIN114 (1.24 Mbp) and FIN111 (1.20 Mbp), were obtained from carrot psyllids (Trioza apicalis) harboring CLso. Genomic comparisons between the haplotypes A, B and C revealed that the genome organization differs between these haplotypes, due to large inversions and other recombinations. Comparison of protein-coding genes indicated that the core genome of CLso consists of 885 ortholog groups, with the pan-genome consisting of 1327 ortholog groups. Twenty-seven ortholog groups are unique to CLso haplotype C, whilst 11 ortholog groups shared by the haplotypes A and B, are not found in the haplotype C. Some of these ortholog groups that are not part of the core genome may encode functions related to interactions with the different host plant and psyllid species.

July 7, 2019

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.

Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes, however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated PacBio long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromosome conformation capture data for genome scaffolding. Despite their technical differences, optical mapping and chromosome conformation capture performed similarly and doubled N50 values. After improving both integration methods, assembly contiguity reached chromosome-arm-levels. We rigorously assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map information. This showed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome. Most, but not all of these mis-joints were removed during the integration of the optical mapping and chromosome conformation capture data. Even though none of the centromeres was fully assembled, the scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions, which are not present in gold standard reference sequences. Published by Cold Spring Harbor Laboratory Press.

July 7, 2019

Variant tolerant read mapping using min-hashing

DNA read mapping is a ubiquitous task in bioinformatics, and many tools have been developed to solve the read mapping problem. However, there are two trends that are changing the landscape of readmapping: First, new sequencing technologies provide very long reads with high error rates (up to 15%). Second, many genetic variants in the population are known, so the reference genome is not considered as a single string over ACGT, but as a complex object containing these variants. Most existing read mappers do not handle these new circumstances appropriately.

July 7, 2019

Genome scaffolding and annotation for the pathogen vector Ixodes ricinus by ultra-long single molecule sequencing.

Global warming and other ecological changes have facilitated the expansion of Ixodes ricinus tick populations. Ixodes ricinus is the most important carrier of vector-borne pathogens in Europe, transmitting viruses, protozoa and bacteria, in particular Borrelia burgdorferi (sensu lato), the causative agent of Lyme borreliosis, the most prevalent vector-borne disease in humans in the Northern hemisphere. To faster control this disease vector, a better understanding of the I. ricinus tick is necessary. To facilitate such studies, we recently published the first reference genome of this highly prevalent pathogen vector. Here, we further extend these studies by scaffolding and annotating the first reference genome by using ultra-long sequencing reads from third generation single molecule sequencing. In addition, we present the first genome size estimation for I. ricinus ticks and the embryo-derived cell line IRE/CTVM19.235,953 contigs were integrated into 204,904 scaffolds, extending the currently known genome lengths by more than 30% from 393 to 516 Mb and the N50 contig value by 87% from 1643 bp to a N50 scaffold value of 3067 bp. In addition, 25,263 sequences were annotated by comparison to the tick’s North American relative Ixodes scapularis. After (conserved) hypothetical proteins, zinc finger proteins, secreted proteins and P450 coding proteins were the most prevalent protein categories annotated. Interestingly, more than 50% of the amino acid sequences matching the homology threshold had 95-100% identity to the corresponding I. scapularis gene models. The sequence information was complemented by the first genome size estimation for this species. Flow cytometry-based genome size analysis revealed a haploid genome size of 2.65Gb for I. ricinus ticks and 3.80 Gb for the cell line.We present a first draft sequence map of the I. ricinus genome based on a PacBio-Illumina assembly. The I. ricinus genome was shown to be 26% (500 Mb) larger than the genome of its American relative I. scapularis. Based on the genome size of 2.65 Gb we estimated that we covered about 67% of the non-repetitive sequences. Genome annotation will facilitate screening for specific molecular pathways in I. ricinus cells and provides an overview of characteristics and functions.

July 7, 2019

Draft genome sequence of Halolamina pelagica CDK2 isolated from natural salterns from Rann of Kutch, Gujarat, India.

Halolamina pelagica strain CDK2, a halophilic archaeon (growth range 1.36 to 5.12 M NaCl), was isolated from rhizosphere of wild grasses of hypersaline soil of the Rann of Kutch, Gujarat, India. Its draft genome contains 2,972,542 bp and 3,485 coding sequences, depicting genes for halophilic serine proteases and trehalose synthesis. Copyright © 2017 Gaba et al.

July 7, 2019

Complete genome and plasmid sequences of Staphylococcus aureus EDCC 5055 (DSM 28763), used to study implant-associated infections.

Staphylococcus aureus EDCC 5055 (DSM 28763) is a human clinical wound isolate intensively used to study implant-associated infections in rabbit and rat infection models. Here, we report its complete genome sequence (2,794,437 bp) along with that of one plasmid (27,437 bp). This strain belongs to sequence type 8 and contains a mecA gene. Copyright © 2017 Mannala et al.

July 7, 2019

The genome sequence of an oxytetracycline-resistant isolate of the fish pathogen Piscirickettsia salmonis harbors a multidrug resistance plasmid.

The amount of antibiotics needed to counteract frequent piscirickettsiosis outbreaks is a major concern for the Chilean salmon industry. Resistance to antibiotics may contribute to this issue. To understand the genetics underlying Piscirickettsia salmonis-resistant phenotypes, the genome of AY3800B, an oxytetracycline-resistant isolate bearing a multidrug resistance plasmid, is presented here. Copyright © 2017 Bohle et al.

July 7, 2019

Non hybrid long read consensus using local De Bruijn graph assembly

While second generation sequencing led to a vast increase in sequenced data, the shorter reads which came with it made assembly a much harder task and for some regions impossible with only short read data. This changed again with the advent of third generation long read sequencers. The length of the long reads allows a much better resolution of repetitive regions, their high error rate however is a major challenge. Using the data successfully requires to remove most of the sequencing errors. The first hybrid correction methods used low noise second generation data to correct third generation data, but this approach has issues when it is unclear where to place the short reads due to repeats and also because second generation sequencers fail to sequence some regions which third generation sequencers work on. Later non hybrid methods appeared. We present a new method for non hybrid long read error correction based on De Bruijn graph assembly of short windows of long reads with subsequent combination of these correct windows to corrected long reads. Our experiments show that this method yields a better correction than other state of the art non hybrid correction approaches.

July 7, 2019

Complete genome sequence of Edwardsiella hoshinae ATCC 35051.

Edwardsiella hoshinae is a Gram-negative facultative anaerobe that has primarily been isolated from avians and reptiles. We report here the complete and annotated genome sequence of an isolate from a monitor lizard (Varanus sp.), which contains a chromosome of 3,811,650 bp and no plasmids. Copyright © 2017 Reichley et al.

July 7, 2019

An antimicrobial peptide-resistant minor subpopulation of Photorhabdus luminescens is responsible for virulence.

Some of the bacterial cells in isogenic populations behave differently from others. We describe here how a new type of phenotypic heterogeneity relating to resistance to cationic antimicrobial peptides (CAMPs) is determinant for the pathogenic infection process of the entomopathogenic bacterium Photorhabdus luminescens. We demonstrate that the resistant subpopulation, which accounts for only 0.5% of the wild-type population, causes septicemia in insects. Bacterial heterogeneity is driven by the PhoPQ two-component regulatory system and expression of pbgPE, an operon encoding proteins involved in lipopolysaccharide (LPS) modifications. We also report the characterization of a core regulon controlled by the DNA-binding PhoP protein, which governs virulence in P. luminescens. Comparative RNAseq analysis revealed an upregulation of marker genes for resistance, virulence and bacterial antagonism in the pre-existing resistant subpopulation, suggesting a greater ability to infect insect prey and to survive in cadavers. Finally, we suggest that the infection process of P. luminescens is based on a bet-hedging strategy to cope with the diverse environmental conditions experienced during the lifecycle.

July 7, 2019

Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome.

Using second-generation sequencing (SGS) RNA-Seq strategies, extensive alterative splicing prediction is impractical and high variability of isoforms expression quantification is inevitable in organisms without true reference dataset. we report the development of a novel analysis method, termed hybrid sequencing and map finding (HySeMaFi) which combines the specific strengths of third-generation sequencing (TGS) (PacBio SMRT sequencing) and SGS (Illumina Hi-Seq/MiSeq sequencing) to effectively decipher gene splicing and to reliably estimate the isoforms abundance. Error-corrected long reads from TGS are capable of capturing full length transcripts or as large partial transcript fragments. Both true and false isoforms, from a particular gene, as well as that containing all possible exons, could be generated by employing different assembly methods in SGS. We first develop an effective method which can establish the mapping relationship between the error-corrected long reads and the longest assembled contig in every corresponding gene. According to the mapping data, the true splicing pattern of the genes was reliably detected, and quantification of the isoforms was also effectively determined. HySeMaFi is also the optimal strategy by which to decipher the full exon expression of a specific gene when the longest mapped contigs were chosen as the reference set.

July 7, 2019

Simultaneous emergence of multidrug-resistant Candida auris on 3 continents confirmed by whole-genome sequencing and epidemiological analyses.

Candida auris, a multidrug-resistant yeast that causes invasive infections, was first described in 2009 in Japan and has since been reported from several countries.To understand the global emergence and epidemiology of C. auris, we obtained isolates from 54 patients with C. auris infection from Pakistan, India, South Africa, and Venezuela during 2012-2015 and the type specimen from Japan. Patient information was available for 41 of the isolates. We conducted antifungal susceptibility testing and whole-genome sequencing (WGS).Available clinical information revealed that 41% of patients had diabetes mellitus, 51% had undergone recent surgery, 73% had a central venous catheter, and 41% were receiving systemic antifungal therapy when C. auris was isolated. The median time from admission to infection was 19 days (interquartile range, 9-36 days), 61% of patients had bloodstream infection, and 59% died. Using stringent break points, 93% of isolates were resistant to fluconazole, 35% to amphotericin B, and 7% to echinocandins; 41% were resistant to 2 antifungal classes and 4% were resistant to 3 classes. WGS demonstrated that isolates were grouped into unique clades by geographic region. Clades were separated by thousands of single-nucleotide polymorphisms, but within each clade isolates were clonal. Different mutations in ERG11 were associated with azole resistance in each geographic clade.C. auris is an emerging healthcare-associated pathogen associated with high mortality. Treatment options are limited, due to antifungal resistance. WGS analysis suggests nearly simultaneous, and recent, independent emergence of different clonal populations on 3 continents. Risk factors and transmission mechanisms need to be elucidated to guide control measures. Published by Oxford University Press for the Infectious Diseases Society of America 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

Auto Tag: Pacbio reads

A vast genomic deletion in the C56BL/6 genome affects different genes within the Ifi200 cluster on chromosome 1 and mediates obesity and insulin resistance.

The genome sequence of Barbarea vulgaris facilitates the study of ecological biochemistry.

Fallacy of the unique genome: sequence diversity within single Helicobacter pylori strains.

Genomic sequence of ‘Candidatus Liberibacter solanacearum’ haplotype C and its comparison with haplotype A and B genomes.

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.

Variant tolerant read mapping using min-hashing

Genome scaffolding and annotation for the pathogen vector Ixodes ricinus by ultra-long single molecule sequencing.

Draft genome sequence of Halolamina pelagica CDK2 isolated from natural salterns from Rann of Kutch, Gujarat, India.

Complete genome and plasmid sequences of Staphylococcus aureus EDCC 5055 (DSM 28763), used to study implant-associated infections.

The genome sequence of an oxytetracycline-resistant isolate of the fish pathogen Piscirickettsia salmonis harbors a multidrug resistance plasmid.

Non hybrid long read consensus using local De Bruijn graph assembly

Complete genome sequence of Edwardsiella hoshinae ATCC 35051.

An antimicrobial peptide-resistant minor subpopulation of Photorhabdus luminescens is responsible for virulence.

Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome.

Simultaneous emergence of multidrug-resistant Candida auris on 3 continents confirmed by whole-genome sequencing and epidemiological analyses.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert