Pacbio reads Archives - Page 44 of 53

July 7, 2019

The draft genome of MD-2 pineapple using hybrid error correction of long reads.

The introduction of the elite pineapple variety, MD-2, has caused a significant market shift in the pineapple industry. Better productivity, overall increased in fruit quality and taste, resilience to chilled storage and resistance to internal browning are among the key advantages of the MD-2 as compared with its previous predecessor, the Smooth Cayenne. Here, we present the genome sequence of the MD-2 pineapple (Ananas comosus (L.) Merr.) by using the hybrid sequencing technology from two highly reputable platforms, i.e. the PacBio long sequencing reads and the accurate Illumina short reads. Our draft genome achieved 99.6% genome coverage with 27,017 predicted protein-coding genes while 45.21% of the genome was identified as repetitive elements. Furthermore, differential expression of ripening RNASeq library of pineapple fruits revealed ethylene-related transcripts, believed to be involved in regulating the process of non-climacteric pineapple fruit ripening. The MD-2 pineapple draft genome serves as an example of how a complex heterozygous genome is amenable to whole genome sequencing by using a hybrid technology that is both economical and accurate. The genome will make genomic applications more feasible as a medium to understand complex biological processes specific to pineapple. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019

Draft genome sequence of an inbred line of Chenopodium quinoa, an allotetraploid crop with great environmental adaptability and outstanding nutritional properties.

Chenopodium quinoa Willd. (quinoa) originated from the Andean region of South America, and is a pseudocereal crop of the Amaranthaceae family. Quinoa is emerging as an important crop with the potential to contribute to food security worldwide and is considered to be an optimal food source for astronauts, due to its outstanding nutritional profile and ability to tolerate stressful environments. Furthermore, plant pathologists use quinoa as a representative diagnostic host to identify virus species. However, molecular analysis of quinoa is limited by its genetic heterogeneity due to outcrossing and its genome complexity derived from allotetraploidy. To overcome these obstacles, we established the inbred and standard quinoa accession Kd that enables rigorous molecular analysis, and presented the draft genome sequence of Kd, using an optimized combination of high-throughput next generation sequencing on the Illumina Hiseq 2500 and PacBio RS II sequencers. The de novo genome assembly contained 25 k scaffolds consisting of 1 Gbp with N50 length of 86 kbp. Based on these data, we constructed the free-access Quinoa Genome DataBase (QGDB). Thus, these findings provide insights into the mechanisms underlying agronomically important traits of quinoa and the effect of allotetraploidy on genome evolution. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019

Complete genome sequence of the hyperthermophilic and piezophilic archeon Thermococcus Piezophilus CDGST, able to grow under extreme hydrostatic pressures

We report the genome sequence of Thermococcus superprofundus strain CDGS(T), a new piezophilic and hyperthermophilic member of the order Thermococcales isolated from the world’s deepest hydrothermal vents, at the Mid-Cayman Rise. The genome is consistent with a heterotrophic, anaerobic, and piezophilic lifestyle. Copyright © 2016 Dalmasso et al.

July 7, 2019

Cloche is a bHLH-PAS transcription factor that drives haemato-vascular specification.

Vascular and haematopoietic cells organize into specialized tissues during early embryogenesis to supply essential nutrients to all organs and thus play critical roles in development and disease. At the top of the haemato-vascular specification cascade lies cloche, a gene that when mutated in zebrafish leads to the striking phenotype of loss of most endothelial and haematopoietic cells and a significant increase in cardiomyocyte numbers. Although this mutant has been analysed extensively to investigate mesoderm diversification and differentiation and continues to be broadly used as a unique avascular model, the isolation of the cloche gene has been challenging due to its telomeric location. Here we used a deletion allele of cloche to identify several new cloche candidate genes within this genomic region, and systematically genome-edited each candidate. Through this comprehensive interrogation, we succeeded in isolating the cloche gene and discovered that it encodes a PAS-domain-containing bHLH transcription factor, and that it is expressed in a highly specific spatiotemporal pattern starting during late gastrulation. Gain-of-function experiments show that it can potently induce endothelial gene expression. Epistasis experiments reveal that it functions upstream of etv2 and tal1, the earliest expressed endothelial and haematopoietic transcription factor genes identified to date. A mammalian cloche orthologue can also rescue blood vessel formation in zebrafish cloche mutants, indicating a highly conserved role in vertebrate vasculogenesis and haematopoiesis. The identification of this master regulator of endothelial and haematopoietic fate enhances our understanding of early mesoderm diversification and may lead to improved protocols for the generation of endothelial and haematopoietic cells in vivo and in vitro.

July 7, 2019

Genomic characterization of the Atlantic cod sex-locus.

A variety of sex determination mechanisms can be observed in evolutionary divergent teleosts. Sex determination is genetic in Atlantic cod (Gadus morhua), however the genomic location or size of its sex-locus is unknown. Here, we characterize the sex-locus of Atlantic cod using whole genome sequence (WGS) data of 227 wild-caught specimens. Analyzing more than 55 million polymorphic loci, we identify 166 loci that are associated with sex. These loci are located in six distinct regions on five different linkage groups (LG) in the genome. The largest of these regions, an approximately 55?Kb region on LG11, contains the majority of genotypes that segregate closely according to a XX-XY system. Genotypes in this region can be used genetically determine sex, whereas those in the other regions are inconsistently sex-linked. The identified region on LG11 and its surrounding genes have no clear sequence homology with genes or regulatory elements associated with sex-determination or differentiation in other species. The functionality of this sex-locus therefore remains unknown. The WGS strategy used here proved adequate for detecting the small regions associated with sex in this species. Our results highlight the evolutionary flexibility in genomic architecture underlying teleost sex-determination and allow practical applications to genetically sex Atlantic cod.

July 7, 2019

The two chromosomes of the mitochondrial genome of a sugarcane cultivar: assembly and recombination analysis using long PacBio reads.

Sugarcane accounts for a large portion of the worlds sugar production. Modern commercial cultivars are complex hybrids of S. officinarum and several other Saccharum species. Historical records identify New Guinea as the origin of S. officinarum and that a small number of plants originating from there were used to generate all modern commercial cultivars. The mitochondrial genome can be a useful way to identify the maternal origin of commercial cultivars. We have used the PacBio RSII to sequence and assemble the mitochondrial genome of a South East Asian commercial cultivar, known as Khon Kaen 3. The long read length of this sequencing technology allowed for the mitochondrial genome to be assembled into two distinct circular chromosomes with all repeat sequences spanned by individual reads. Comparison of five commercial hybrids, two S. officinarum and one S. spontaneum to our assembly reveals no structural rearrangements between our assembly, the commercial hybrids and an S. officinarum from New Guinea. The S. spontaneum, from India, and one sample of S. officinarum (unknown origin) are substantially rearranged and have a large number of homozygous variants. This supports the record that S. officinarum plants from New Guinea are the maternal source of all modern commercial hybrids.

July 7, 2019

The complete chloroplast genome sequence of the medicinal plant Swertia mussotii using the PacBio RS II platform.

Swertia mussotii is an important medicinal plant that has great economic and medicinal value and is found on the Qinghai Tibetan Plateau. The complete chloroplast (cp) genome of S. mussotii is 153,431 bp in size, with a pair of inverted repeat (IR) regions of 25,761 bp each that separate an large single-copy (LSC) region of 83,567 bp and an a small single-copy (SSC) region of 18,342 bp. The S. mussotii cp genome encodes 84 protein-coding genes, 37 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. The identity, number, and GC content of S. mussotii cp genes were similar to those in the genomes of other Gentianales species. Via analysis of the repeat structure, 11 forward repeats, eight palindromic repeats, and one reverse repeat were detected in the S. mussotii cp genome. There are 45 SSRs in the S. mussotii cp genome, the majority of which are mononucleotides found in all other Gentianales species. An entire cp genome comparison study of S. mussotii and two other species in Gentianaceae was conducted. The complete cp genome sequence provides intragenic information for the cp genetic engineering of this medicinal plant.

July 7, 2019

Genome sequence and annotation of Colletotrichum higginsianum, a causal agent of crucifer anthracnose disease.

Colletotrichum higginsianum is an ascomycete fungus causing anthracnose disease on numerous cultivated plants in the family Brassicaceae, as well as the model plant Arabidopsis thaliana We report an assembly of the nuclear genome and gene annotation of this pathogen, which was obtained using a combination of PacBio long-read sequencing and optical mapping. Copyright © 2016 Zampounis et al.

July 7, 2019

DBG2OLC: Efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies.

The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost.

July 7, 2019

Genomic insight into the host-endosymbiont relationship of Endozoicomonas montiporae CL-33(T) with its coral host.

The bacterial genus Endozoicomonas was commonly detected in healthy corals in many coral-associated bacteria studies in the past decade. Although, it is likely to be a core member of coral microbiota, little is known about its ecological roles. To decipher potential interactions between bacteria and their coral hosts, we sequenced and investigated the first culturable endozoicomonal bacterium from coral, the E. montiporae CL-33(T). Its genome had potential sign of ongoing genome erosion and gene exchange with its host. Testosterone degradation and type III secretion system are commonly present in Endozoicomonas and may have roles to recognize and deliver effectors to their hosts. Moreover, genes of eukaryotic ephrin ligand B2 are present in its genome; presumably, this bacterium could move into coral cells via endocytosis after binding to coral’s Eph receptors. In addition, 7,8-dihydro-8-oxoguanine triphosphatase and isocitrate lyase are possible type III secretion effectors that might help coral to prevent mitochondrial dysfunction and promote gluconeogenesis, especially under stress conditions. Based on all these findings, we inferred that E. montiporae was a facultative endosymbiont that can recognize, translocate, communicate and modulate its coral host.

July 7, 2019

Complete genome sequence of thermophilic Bacillus smithii type strain DSM 4216(T).

Bacillus smithii is a facultatively anaerobic, thermophilic bacterium able to use a variety of sugars that can be derived from lignocellulosic feedstocks. Being genetically accessible, it is a potential new host for biotechnological production of green chemicals from renewable resources. We determined the complete genomic sequence of the B. smithii type strain DSM 4216(T), which consists of a 3,368,778 bp chromosome (GenBank accession number CP012024.1) and a 12,514 bp plasmid (GenBank accession number CP012025.1), together encoding 3880 genes. Genome annotation via RAST was complemented by a protein domain analysis. Some unique features of B. smithii central metabolism in comparison to related organisms included the lack of a standard acetate production pathway with no apparent pyruvate formate lyase, phosphotransacetylase, and acetate kinase genes, while acetate was the second fermentation product.

July 7, 2019

Variation of 45S rDNA intergenic spacers in Arabidopsis thaliana.

Approximately seven hundred 45S rRNA genes (rDNA) in the Arabidopsis thaliana genome are organised in two 4 Mbp-long arrays of tandem repeats arranged in head-to-tail fashion separated by an intergenic spacer (IGS). These arrays make up 5?% of the A. thaliana genome. IGS are rapidly evolving sequences and frequent rearrangements inside the rDNA loci have generated considerable interspecific and even intra-individual variability which allows to distinguish among otherwise highly conserved rRNA genes. The IGS has not been comprehensively described despite its potential importance in regulation of rDNA transcription and replication. Here we describe the detailed sequence variation in the complete IGS of A. thaliana WT plants and provide the reference/consensus IGS sequence, as well as genomic DNA analysis. We further investigate mutants dysfunctional in chromatin assembly factor-1 (CAF-1) (fas1 and fas2 mutants), which are known to have a reduced number of rDNA copies, and plant lines with restored CAF-1 function (segregated from a fas1xfas2 genetic background) showing major rDNA rearrangements. The systematic rDNA loss in CAF-1 mutants leads to the decreased variability of the IGS and to the occurrence of distinct IGS variants. We present for the first time a comprehensive and representative set of complete IGS sequences, obtained by conventional cloning and by Pacific Biosciences sequencing. Our data expands the knowledge of the A. thaliana IGS sequence arrangement and variability, which has not been available in full and in detail until now. This is also the first study combining IGS sequencing data with RFLP analysis of genomic DNA.

July 7, 2019

Comparative genomics and transcriptomics of Pichia pastoris.

Pichia pastoris has emerged as an important alternative host for producing recombinant biopharmaceuticals, owing to its high cultivation density, low host cell protein burden, and the development of strains with humanized glycosylation. Despite its demonstrated utility, relatively little strain engineering has been performed to improve Pichia, due in part to the limited number and inconsistent frameworks of reported genomes and transcriptomes. Furthermore, the co-mingling of genomic, transcriptomic and fermentation data collected about Komagataella pastoris and Komagataella phaffii, the two strains co-branded as Pichia, has generated confusion about host performance for these genetically distinct species. Generation of comparative high-quality genomes and transcriptomes will enable meaningful comparisons between the organisms, and potentially inform distinct biotechnological utilies for each species.Here, we present a comprehensive and standardized comparative analysis of the genomic features of the three most commonly used strains comprising the tradename Pichia: K. pastoris wild-type, K. phaffii wild-type, and K. phaffii GS115. We used a combination of long-read (PacBio) and short-read (Illumina) sequencing technologies to achieve over 1000X coverage of each genome. Construction of individual genomes was then performed using as few as seven individual contigs to create gap-free assemblies. We found substantial syntenic rearrangements between the species and characterized a linear plasmid present in K. phaffii. Comparative analyses between K. phaffii genomes enabled the characterization of the mutational landscape of the GS115 strain. We identified and examined 35 non-synonomous coding mutations present in GS115, many of which are likely to impact strain performance. Additionally, we investigated transcriptomic profiles of gene expression for both species during cultivation on various carbon sources. We observed that the most highly transcribed genes in both organisms were consistently highly expressed in all three carbon sources examined. We also observed selective expression of certain genes in each carbon source, including many sequences not previously reported as promoters for expression of heterologous proteins in yeasts.Our studies establish a foundation for understanding critical relationships between genome structure, cultivation conditions and gene expression. The resources we report here will inform and facilitate rational, organism-wide strain engineering for improved utility as a host for protein production.

July 7, 2019

Assemblytics: a web analytics tool for the detection of variants from an assembly.

Assemblytics is a web app for detecting and analyzing variants from a de novo genome assembly aligned to a reference genome. It incorporates a unique anchor filtering approach to increase robustness to repetitive elements, and identifies six classes of variants based on their distinct alignment signatures. Assemblytics can be applied both to comparing aberrant genomes, such as human cancers, to a reference, or to identify differences between related species. Multiple interactive visualizations enable in-depth explorations of the genomic distributions of variants.http://assemblytics.com, https://github.com/marianattestad/assemblytics CONTACT: mnattest@cshl.eduSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Whole-genome sequencing recommendations

Recent technological developments have revolutionized the way we perform genetic analyses. In particular whole-genome sequencing provides access to the entire genetic makeup of an individual, and it is now an affordable approach for many research groups. As a consequence genome sequencing is pervading many fields of biological research. Sequencing technologies are evolving rapidly and so do their applications. Here we provide a first primer on whole-genome sequencing, focusing on two of the most popular applications: (1) de novo genome sequencing, in which the objective is obtaining a high-quality genome assembly that can serve as a reference for a species or variety, and (2) genome resequencing, when there is an available reference genome and the objective is to map sequence variation of an individual or a set of individuals. It is not our intention to provide a comprehensive overview of current methodologies that will likely soon become obsolete, but rather focus on general principles that will have a more general applicability.

Auto Tag: Pacbio reads

The draft genome of MD-2 pineapple using hybrid error correction of long reads.

Draft genome sequence of an inbred line of Chenopodium quinoa, an allotetraploid crop with great environmental adaptability and outstanding nutritional properties.

Complete genome sequence of the hyperthermophilic and piezophilic archeon Thermococcus Piezophilus CDGST, able to grow under extreme hydrostatic pressures

Cloche is a bHLH-PAS transcription factor that drives haemato-vascular specification.

Genomic characterization of the Atlantic cod sex-locus.

The two chromosomes of the mitochondrial genome of a sugarcane cultivar: assembly and recombination analysis using long PacBio reads.

The complete chloroplast genome sequence of the medicinal plant Swertia mussotii using the PacBio RS II platform.

Genome sequence and annotation of Colletotrichum higginsianum, a causal agent of crucifer anthracnose disease.

DBG2OLC: Efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies.

Genomic insight into the host-endosymbiont relationship of Endozoicomonas montiporae CL-33(T) with its coral host.

Complete genome sequence of thermophilic Bacillus smithii type strain DSM 4216(T).

Variation of 45S rDNA intergenic spacers in Arabidopsis thaliana.

Comparative genomics and transcriptomics of Pichia pastoris.

Assemblytics: a web analytics tool for the detection of variants from an assembly.

Whole-genome sequencing recommendations

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert