Genome assembly Archives - Page 119 of 196

July 7, 2019

The value of new genome references.

Genomic information has become a ubiquitous and almost essential aspect of biological research. Over the last 10-15 years, the cost of generating sequence data from DNA or RNA samples has dramatically declined and our ability to interpret those data increased just as remarkably. Although it is still possible for biologists to conduct interesting and valuable research on species for which genomic data are not available, the impact of having access to a high quality whole genome reference assembly for a given species is nothing short of transformational. Research on a species for which we have no DNA or RNA sequence data is restricted in fundamental ways. In contrast, even access to an initial draft quality genome (see below for definitions) opens a wide range of opportunities that are simply not available without that reference genome assembly. Although a complete discussion of the impact of genome sequencing and assembly is beyond the scope of this short paper, the goal of this review is to summarize the most common and highest impact contributions that whole genome sequencing and assembly has had on comparative and evolutionary biology. Copyright © 2016. Published by Elsevier Inc.

July 7, 2019

The genome sequence of Barbarea vulgaris facilitates the study of ecological biochemistry.

The genus Barbarea has emerged as a model for evolution and ecology of plant defense compounds, due to its unusual glucosinolate profile and production of saponins, unique to the Brassicaceae. One species, B. vulgaris, includes two ‘types’, G-type and P-type that differ in trichome density, and their glucosinolate and saponin profiles. A key difference is the stereochemistry of hydroxylation of their common phenethylglucosinolate backbone, leading to epimeric glucobarbarins. Here we report a draft genome sequence of the G-type, and re-sequencing of the P-type for comparison. This enables us to identify candidate genes underlying glucosinolate diversity, trichome density, and study the genetics of biochemical variation for glucosinolate and saponins. B. vulgaris is resistant to the diamondback moth, and may be exploited for “dead-end” trap cropping where glucosinolates stimulate oviposition and saponins deter larvae to the extent that they die. The B. vulgaris genome will promote the study of mechanisms in ecological biochemistry to benefit crop resistance breeding.

July 7, 2019

Genome sequence of a unique Magnaporthe oryzae RMg-Dl isolate from India that causes blast disease in diverse cereal crops, obtained using PacBio single-molecule and Illumina HiSeq2500 sequencing.

The whole-genome assembly of a unique rice isolate from India, Magnaporthe oryzae RMg-Dl that causes blast disease in diverse cereal crops is presented. Analysis of the 34.82 Mb genome sequence will aid in better understanding the genetic determinants of host range, host jump, survival, pathogenicity, and virulence factors of M. oryzae. Copyright © 2017 Kumar et al.

July 7, 2019

Cytosine methylation at CpCpG sites triggers accumulation of non-CpG methylation in gene bodies.

Methylation of cytosine is an epigenetic mark involved in the regulation of transcription, usually associated with transcriptional repression. In mammals, methylated cytosines are found predominantly in CpGs but in plants non-CpG methylation (in the CpHpG or CpHpH contexts, where H is A, C or T) is also present and is associated with the transcriptional silencing of transposable elements. In addition, CpG methylation is found in coding regions of active genes. In the absence of the demethylase of lysine 9 of histone 3 (IBM1), a subset of body-methylated genes acquires non-CpG methylation. This was shown to alter their expression and affect plant development. It is not clear why only certain body-methylated genes gain non-CpG methylation in the absence of IBM1 and others do not. Here we describe a link between CpG methylation and the establishment of methylation in the CpHpG context that explains the two classes of body-methylated genes. We provide evidence that external cytosines of CpCpG sites can only be methylated when internal cytosines are methylated. CpCpG sites methylated in both cytosines promote spreading of methylation in the CpHpG context in genes protected by IBM1. In contrast, CpCpG sites remain unmethylated in IBM1-independent genes and do not promote spread of CpHpG methylation.© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 7, 2019

RelA mutant Enterococcus faecium with multiantibiotic tolerance arising in an immunocompromised host.

Serious bacterial infections in immunocompromised patients require highly effective antibacterial therapy for cure, and thus, this setting may reveal novel mechanisms by which bacteria circumvent antibiotics in the absence of immune pressure. Here, an infant with leukemia developed vancomycin-resistant Enterococcus faecium (VRE) bacteremia that persisted for 26 days despite appropriate antibiotic therapy. Sequencing of 22 consecutive VRE isolates identified the emergence of a single missense mutation (L152F) in relA, which constitutively activated the stringent response, resulting in elevated baseline levels of the alarmone guanosine tetraphosphate (ppGpp). Although the mutant remained susceptible to both linezolid and daptomycin in clinical MIC testing and during planktonic growth, it demonstrated tolerance to high doses of both antibiotics when growing in a biofilm. This biofilm-specific gain in resistance was reflected in the broad shift in transcript levels caused by the mutation. Only an experimental biofilm-targeting ClpP-activating antibiotic was able to kill the mutant strain in an established biofilm. The relA mutation was associated with a fitness trade-off, forming smaller and less-well-populated biofilms on biological surfaces. We conclude that clinically relevant relA mutations can emerge during prolonged VRE infection, causing baseline activation of the stringent response, subsequent antibiotic tolerance, and delayed eradication in an immunocompromised state.The increasing prevalence of antibiotic-resistant bacterial pathogens is a major challenge currently facing the medical community. Such pathogens are of particular importance in immunocompromised patients as these individuals may favor emergence of novel resistance determinants due to lack of innate immune defenses and intensive antibiotic exposure. During the course of chemotherapy, a patient developed prolonged bacteremia with vancomycin-resistant Enterococcus faecium that failed to clear despite multiple front-line antibiotics. The consecutive bloodstream isolates were sequenced, and a single missense mutation identified in the relA gene, the mediator of the stringent response. Strains harboring the mutation had elevated baseline levels of the alarmone and displayed heightened resistance to the bactericidal activity of multiple antibiotics, particularly in a biofilm. Using a new class of compounds that modulate ClpP activity, the biofilms were successfully eradicated. These data represent the first clinical emergence of mutations in the stringent response in vancomycin-resistant entereococci. Copyright © 2017 Honsa et al.

July 7, 2019

Comparative genomics and transcriptome analysis of Aspergillus niger and metabolic engineering for citrate production.

Despite a long and successful history of citrate production in Aspergillus niger, the molecular mechanism of citrate accumulation is only partially understood. In this study, we used comparative genomics and transcriptome analysis of citrate-producing strains-namely, A. niger H915-1 (citrate titer: 157?g?L(-1)), A1 (117?g?L(-1)), and L2 (76?g?L(-1))-to gain a genome-wide view of the mechanism of citrate accumulation. Compared with A. niger A1 and L2, A. niger H915-1 contained 92 mutated genes, including a succinate-semialdehyde dehydrogenase in the ?-aminobutyric acid shunt pathway and an aconitase family protein involved in citrate synthesis. Furthermore, transcriptome analysis of A. niger H915-1 revealed that the transcription levels of 479 genes changed between the cell growth stage (6?h) and the citrate synthesis stage (12?h, 24?h, 36?h, and 48?h). In the glycolysis pathway, triosephosphate isomerase was up-regulated, whereas pyruvate kinase was down-regulated. Two cytosol ATP-citrate lyases, which take part in the cycle of citrate synthesis, were up-regulated, and may coordinate with the alternative oxidases in the alternative respiratory pathway for energy balance. Finally, deletion of the oxaloacetate acetylhydrolase gene in H915-1 eliminated oxalate formation but neither influence on pH decrease nor difference in citrate production were observed.

July 7, 2019

Complete genome sequences of three multidrug-resistant clinical isolates of Streptococcus pneumoniae serotype 19A with different susceptibilities to the myxobacterial metabolite carolacton.

The full-genome sequences of three drug- and multidrug-resistant Streptococcus pneumoniae clinical isolates of serotype 19A were determined by PacBio single-molecule real-time sequencing, in combination with Illumina MiSeq sequencing. A comparison to the genomes of other pneumococci indicates a high nucleotide sequence identity to strains Hungary19A-6 and TCH8431/19A. Copyright © 2017 Donner et al.

July 7, 2019

The mitochondrial genome sequences of the round goby and the sand goby reveal patterns of recent evolution in gobiid fish.

Vertebrate mitochondrial genomes are optimized for fast replication and low cost of RNA expression. Accordingly, they are devoid of introns, are transcribed as polycistrons and contain very little intergenic sequences. Usually, vertebrate mitochondrial genomes measure between 16.5 and 17 kilobases (kb).During genome sequencing projects for two novel vertebrate models, the invasive round goby and the sand goby, we found that the sand goby genome is exceptionally small (16.4 kb), while the mitochondrial genome of the round goby is much larger than expected for a vertebrate. It is 19 kb in size and is thus one of the largest fish and even vertebrate mitochondrial genomes known to date. The expansion is attributable to a sequence insertion downstream of the putative transcriptional start site. This insertion carries traces of repeats from the control region, but is mostly novel. To get more information about this phenomenon, we gathered all available mitochondrial genomes of Gobiidae and of nine gobioid species, performed phylogenetic analyses, analysed gene arrangements, and compared gobiid mitochondrial genome sizes, ecological information and other species characteristics with respect to the mitochondrial phylogeny. This allowed us amongst others to identify a unique arrangement of tRNAs among Ponto-Caspian gobies.Our results indicate that the round goby mitochondrial genome may contain novel features. Since mitochondrial genome organisation is tightly linked to energy metabolism, these features may be linked to its invasion success. Also, the unique tRNA arrangement among Ponto-Caspian gobies may be helpful in studying the evolution of this highly adaptive and invasive species group. Finally, we find that the phylogeny of gobiids can be further refined by the use of longer stretches of linked DNA sequence.

July 7, 2019

Fallacy of the unique genome: sequence diversity within single Helicobacter pylori strains.

Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra- and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopolysaccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains.IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish “the genome” of a bacterial strain. Variability is usually reduced (“only sequence from a single colony”), ignored (“just publish the consensus”), or placed in the “too-hard” basket (“analysis of raw read data is more robust”). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading. Copyright © 2017 Draper et al.

July 7, 2019

Genomic sequence of ‘Candidatus Liberibacter solanacearum’ haplotype C and its comparison with haplotype A and B genomes.

Haplotypes A and B of ‘Candidatus Liberibacter solanacearum’ (CLso) are associated with diseases of solanaceous plants, especially Zebra chip disease of potato, and haplotypes C, D and E are associated with symptoms on apiaceous plants. To date, one complete genome of haplotype B and two high quality draft genomes of haplotype A have been obtained for these unculturable bacteria using metagenomics from the psyllid vector Bactericera cockerelli. Here, we present the first genomic sequences obtained for the carrot-associated CLso. These two genomic sequences of haplotype C, FIN114 (1.24 Mbp) and FIN111 (1.20 Mbp), were obtained from carrot psyllids (Trioza apicalis) harboring CLso. Genomic comparisons between the haplotypes A, B and C revealed that the genome organization differs between these haplotypes, due to large inversions and other recombinations. Comparison of protein-coding genes indicated that the core genome of CLso consists of 885 ortholog groups, with the pan-genome consisting of 1327 ortholog groups. Twenty-seven ortholog groups are unique to CLso haplotype C, whilst 11 ortholog groups shared by the haplotypes A and B, are not found in the haplotype C. Some of these ortholog groups that are not part of the core genome may encode functions related to interactions with the different host plant and psyllid species.

July 7, 2019

Genome sequence of Streptomyces sp. H-KF8, a marine actinobacterium isolated from a northern Chilean Patagonian fjord.

Streptomyces sp. H-KF8 is a fjord-derived marine actinobacterium capable of producing antimicrobial activity. Streptomyces sp. H-KF8 was isolated from sediments of the Comau fjord, located in the northern Chilean Patagonia. Here, we report the 7.7-Mb genome assembly, which represents the first genome of a Chilean marine actinobacterium. Copyright © 2017 Undabarrena et al.

July 7, 2019

Complete genome sequences of three Cupriavidus strains isolated from various Malaysian environments.

Cupriavidus sp. USMAA1020, USMAA2-4, and USMAHM13 are capable of producing polyhydroxyalkanoate (PHA). This biopolymer is an alternative solution to synthetic plastics, whereby polyhydroxyalkanoate synthase is the key enzyme involved in PHA biosynthesis. Here, we report the complete genomes of three Cupriavidus sp. strains: USMAA1020, USMAA2-4, and USMAHM13. Copyright © 2017 Shafie et al.

July 7, 2019

Complete genome sequence of Thermus brockianus GE-1 reveals key enzymes of xylan/xylose metabolism.

Thermus brockianus strain GE-1 is a thermophilic, Gram-negative, rod-shaped and non-motile bacterium that was isolated from the Geysir geothermal area, Iceland. Like other thermophiles, Thermus species are often used as model organisms to understand the mechanism of action of extremozymes, especially focusing on their heat-activity and thermostability. Genome-specific features of T. brockianus GE-1 and their properties further help to explain processes of the adaption of extremophiles at elevated temperatures. Here we analyze the first whole genome sequence of T. brockianus strain GE-1. Insights of the genome sequence and the methodologies that were applied during de novo assembly and annotation are given in detail. The finished genome shows a phred quality value of QV50. The complete genome size is 2.38 Mb, comprising the chromosome (2,035,182 bp), the megaplasmid pTB1 (342,792 bp) and the smaller plasmid pTB2 (10,299 bp). Gene prediction revealed 2,511 genes in total, including 2,458 protein-encoding genes, 53 RNA and 66 pseudo genes. A unique genomic region on megaplasmid pTB1 was identified encoding key enzymes for xylan depolymerization and xylose metabolism. This is in agreement with the growth experiments in which xylan is utilized as sole source of carbon. Accordingly, we identified sequences encoding the xylanase Xyn10, an endoglucanase, the membrane ABC sugar transporter XylH, the xylose-binding protein XylF, the xylose isomerase XylA catalyzing the first step of xylose metabolism and the xylulokinase XylB, responsible for the second step of xylose metabolism. Our data indicate that an ancestor of T. brockianus obtained the ability to use xylose as alternative carbon source by horizontal gene transfer.

July 7, 2019

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.

Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes, however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated PacBio long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromosome conformation capture data for genome scaffolding. Despite their technical differences, optical mapping and chromosome conformation capture performed similarly and doubled N50 values. After improving both integration methods, assembly contiguity reached chromosome-arm-levels. We rigorously assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map information. This showed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome. Most, but not all of these mis-joints were removed during the integration of the optical mapping and chromosome conformation capture data. Even though none of the centromeres was fully assembled, the scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions, which are not present in gold standard reference sequences. Published by Cold Spring Harbor Laboratory Press.

July 7, 2019

Variant tolerant read mapping using min-hashing

DNA read mapping is a ubiquitous task in bioinformatics, and many tools have been developed to solve the read mapping problem. However, there are two trends that are changing the landscape of readmapping: First, new sequencing technologies provide very long reads with high error rates (up to 15%). Second, many genetic variants in the population are known, so the reference genome is not considered as a single string over ACGT, but as a complex object containing these variants. Most existing read mappers do not handle these new circumstances appropriately.

Auto Tag: Genome assembly

The value of new genome references.

The genome sequence of Barbarea vulgaris facilitates the study of ecological biochemistry.

Genome sequence of a unique Magnaporthe oryzae RMg-Dl isolate from India that causes blast disease in diverse cereal crops, obtained using PacBio single-molecule and Illumina HiSeq2500 sequencing.

Cytosine methylation at CpCpG sites triggers accumulation of non-CpG methylation in gene bodies.

RelA mutant Enterococcus faecium with multiantibiotic tolerance arising in an immunocompromised host.

Comparative genomics and transcriptome analysis of Aspergillus niger and metabolic engineering for citrate production.

Complete genome sequences of three multidrug-resistant clinical isolates of Streptococcus pneumoniae serotype 19A with different susceptibilities to the myxobacterial metabolite carolacton.

The mitochondrial genome sequences of the round goby and the sand goby reveal patterns of recent evolution in gobiid fish.

Fallacy of the unique genome: sequence diversity within single Helicobacter pylori strains.

Genomic sequence of ‘Candidatus Liberibacter solanacearum’ haplotype C and its comparison with haplotype A and B genomes.

Genome sequence of Streptomyces sp. H-KF8, a marine actinobacterium isolated from a northern Chilean Patagonian fjord.

Complete genome sequences of three Cupriavidus strains isolated from various Malaysian environments.

Complete genome sequence of Thermus brockianus GE-1 reveals key enzymes of xylan/xylose metabolism.

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.

Variant tolerant read mapping using min-hashing

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert