Gap filling Archives - Page 15 of 19

July 7, 2019

Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution.

Amaranth (Amaranthus hypochondriacus) was a food staple among the ancient civilizations of Central and South America that has recently received increased attention due to the high nutritional value of the seeds, with the potential to help alleviate malnutrition and food security concerns, particularly in arid and semiarid regions of the developing world. Here, we present a reference-quality assembly of the amaranth genome which will assist the agronomic development of the species.Utilizing single-molecule, real-time sequencing (Pacific Biosciences) and chromatin interaction mapping (Hi-C) to close assembly gaps and scaffold contigs, respectively, we improved our previously reported Illumina-based assembly to produce a chromosome-scale assembly with a scaffold N50 of 24.4 Mb. The 16 largest scaffolds contain 98% of the assembly and likely represent the haploid chromosomes (n?=?16). To demonstrate the accuracy and utility of this approach, we produced physical and genetic maps and identified candidate genes for the betalain pigmentation pathway. The chromosome-scale assembly facilitated a genome-wide syntenic comparison of amaranth with other Amaranthaceae species, revealing chromosome loss and fusion events in amaranth that explain the reduction from the ancestral haploid chromosome number (n?=?18) for a tetraploid member of the Amaranthaceae.The assembly method reported here minimizes cost by relying primarily on short-read technology and is one of the first reported uses of in vivo Hi-C for assembly of a plant genome. Our analyses implicate chromosome loss and fusion as major evolutionary events in the 2n?=?32 amaranths and clearly establish the homoeologous relationship among most of the subgenome chromosomes, which will facilitate future investigations of intragenomic changes that occurred post polyploidization.

July 7, 2019

The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies.

Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes.We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes.The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence.Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).

July 7, 2019

Genomic and functional analysis of Romboutsia ilealis CRIBT reveals adaptation to the small intestine.

The microbiota in the small intestine relies on their capacity to rapidly import and ferment available carbohydrates to survive in a complex and highly competitive ecosystem. Understanding how these communities function requires elucidating the role of its key players, the interactions among them and with their environment/host.The genome of the gut bacterium Romboutsia ilealis CRIBT was sequenced with multiple technologies (Illumina paired-end, mate-pair and PacBio). The transcriptome was sequenced (Illumina HiSeq) after growth on three different carbohydrate sources, and short chain fatty acids were measured via HPLC.We present the complete genome of Romboutsia ilealis CRIBT, a natural inhabitant and key player of the small intestine of rats. R. ilealis CRIBT possesses a circular chromosome of 2,581,778 bp and a plasmid of 6,145 bp, carrying 2,351 and eight predicted protein coding sequences, respectively. Analysis of the genome revealed limited capacity to synthesize amino acids and vitamins, whereas multiple and partially redundant pathways for the utilization of different relatively simple carbohydrates are present. Transcriptome analysis allowed identification of the key components in the degradation of glucose, L-fucose and fructo-oligosaccharides.This revealed that R. ilealis CRIBT is adapted to a nutrient-rich environment where carbohydrates, amino acids and vitamins are abundantly available.

July 7, 2019

Genome sequencing and comparative genomics reveal the potential pathogenic mechanism of Cercospora sojina Hara on soybean.

Frogeye leaf spot, caused by Cercospora sojina Hara, is a common disease of soybean in most soybean-growing countries of the world. In this study, we report a high-quality genome sequence of C. sojina by Single Molecule Real-Time sequencing method. The 40.8-Mb genome encodes 11,655 predicated genes, and 8,474 genes are revealed by RNA sequencing. Cercospora sojina genome contains large numbers of gene clusters that are involved in synthesis of secondary metabolites, including mycotoxins and pigments. However, much less carbohydrate-binding module protein encoding genes are identified in C. sojina genome, when compared with other phytopathogenic fungi. Bioinformatics analysis reveals that C. sojina harbours about 752 secreted proteins, and 233 of them are effectors. During early infection, the genes for metabolite biosynthesis and effectors are significantly enriched, suggesting that they may play essential roles in pathogenicity. We further identify 13 effectors that can inhibit BAX-induced cell death. Taken together, our results provide insights into the infection mechanisms of C. sojina on soybean.© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019

A high-quality genome assembly of quinoa provides insights into the molecular basis of salt bladder-based salinity tolerance and the exceptional nutritional value.

Chenopodium quinoa is a halophytic pseudocereal crop that is being cultivated in an ever-growing number of countries. Because quinoa is highly resistant to multiple abiotic stresses and its seed has a better nutritional value than any other major cereals, it is regarded as a future crop to ensure global food security. We generated a high-quality genome draft using an inbred line of the quinoa cultivar Real. The quinoa genome experienced one recent genome duplication about 4.3 million years ago, likely reflecting the genome fusion of two Chenopodium parents, in addition to the ? paleohexaploidization reported for most eudicots. The genome is highly repetitive (64.5% repeat content) and contains 54 438 protein-coding genes and 192 microRNA genes, with more than 99.3% having orthologous genes from glycophylic species. Stress tolerance in quinoa is associated with the expansion of genes involved in ion and nutrient transport, ABA homeostasis and signaling, and enhanced basal-level ABA responses. Epidermal salt bladder cells exhibit similar characteristics as trichomes, with a significantly higher expression of genes related to energy import and ABA biosynthesis compared with the leaf lamina. The quinoa genome sequence provides insights into its exceptional nutritional value and the evolution of halophytes, enabling the identification of genes involved in salinity tolerance, and providing the basis for molecular breeding in quinoa.

July 7, 2019

The Tartary buckwheat genome provides insights into rutin biosynthesis and abiotic stress tolerance.

Tartary buckwheat (Fagopyrum tataricum) is an important pseudocereal crop that is strongly adapted to growth in adverse environments. Its gluten-free grain contains complete proteins with a well-balanced composition of essential amino acids and is a rich source of beneficial phytochemicals that provide significant health benefits. Here, we report a high-quality, chromosome-scale Tartary buckwheat genome sequence of 489.3 Mb that is assembled by combining whole-genome shotgun sequencing of both Illumina short reads and single-molecule real-time long reads, sequence tags of a large DNA insert fosmid library, Hi-C sequencing data, and BioNano genome maps. We annotated 33 366 high-confidence protein-coding genes based on expression evidence. Comparisons of the intra-genome with the sugar beet genome revealed an independent whole-genome duplication that occurred in the buckwheat lineage after they diverged from the common ancestor, which was not shared with rosids or asterids. The reference genome facilitated the identification of many new genes predicted to be involved in rutin biosynthesis and regulation, aluminum stress resistance, and in drought and cold stress responses. Our data suggest that Tartary buckwheat’s ability to tolerate high levels of abiotic stress is attributed to the expansion of several gene families involved in signal transduction, gene regulation, and membrane transport. The availability of these genomic resources will facilitate the discovery of agronomically and nutritionally important genes and genetic improvement of Tartary buckwheat. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.

July 7, 2019

New insights into structural organization and gene duplication in a 1.75-Mb genomic region harboring the a-gliadin gene family in Aegilops tauschii, the source of wheat D genome.

Among the wheat prolamins important for its end-use traits, a-gliadins are the most abundant, and are also a major cause of food-related allergies and intolerances. Previous studies of various wheat species estimated that between 25 and 150 a-gliadin genes reside in the Gli-2 locus regions. To better understand the evolution of this complex gene family, the DNA sequence of a 1.75-Mb genomic region spanning the Gli-2 locus was analyzed in the diploid grass, Aegilops tauschii, the ancestral source of D genome in hexaploid bread wheat. Comparison with orthologous regions from rice, sorghum, and Brachypodium revealed rapid and dynamic changes only occurring to the Ae. tauschii Gli-2 region, including insertions of high numbers of non-syntenic genes and a high rate of tandem gene duplications, the latter of which have given rise to 12 copies of a-gliadin genes clustered within a 550-kb region. Among them, five copies have undergone pseudogenization by various mutation events. Insights into the evolutionary relationship of the duplicated a-gliadin genes were obtained from their genomic organization, transcription patterns, transposable element insertions and phylogenetic analyses. An ancestral glutamate-like receptor (GLR) gene encoding putative amino acid sensor in all four grass species has duplicated only in Ae. tauschii and generated three more copies that are interspersed with the a-gliadin genes. Phylogenetic inference and different gene expression patterns support functional divergence of the Ae. tauschii GLR copies after duplication. Our results suggest that the duplicates of a-gliadin and GLR genes have likely taken different evolutionary paths; conservation for the former and neofunctionalization for the latter.© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

July 7, 2019

The sea cucumber genome provides insights into morphological evolution and visceral regeneration.

Apart from sharing common ancestry with chordates, sea cucumbers exhibit a unique morphology and exceptional regenerative capacity. Here we present the complete genome sequence of an economically important sea cucumber, A. japonicus, generated using Illumina and PacBio platforms, to achieve an assembly of approximately 805 Mb (contig N50 of 190 Kb and scaffold N50 of 486 Kb), with 30,350 protein-coding genes and high continuity. We used this resource to explore key genetic mechanisms behind the unique biological characters of sea cucumbers. Phylogenetic and comparative genomic analyses revealed the presence of marker genes associated with notochord and gill slits, suggesting that these chordate features were present in ancestral echinoderms. The unique shape and weak mineralization of the sea cucumber adult body were also preliminarily explained by the contraction of biomineralization genes. Genome, transcriptome, and proteome analyses of organ regrowth after induced evisceration provided insight into the molecular underpinnings of visceral regeneration, including a specific tandem-duplicated prostatic secretory protein of 94 amino acids (PSP94)-like gene family and a significantly expanded fibrinogen-related protein (FREP) gene family. This high-quality genome resource will provide a useful framework for future research into biological processes and evolution in deuterostomes, including remarkable regenerative abilities that could have medical applications. Moreover, the multiomics data will be of prime value for commercial sea cucumber breeding programs.

July 7, 2019

Large-scale suppression of recombination predates genomic rearrangements in Neurospora tetrasperma.

A common feature of eukaryote genomes is large chromosomal regions where recombination is absent or strongly reduced, but the factors that cause this reduction are not well understood. Genomic rearrangements have often been implicated, but they may also be a consequence of recombination suppression rather than a cause. In this study, we generate eight high-quality genomic data sets of the filamentous ascomycete Neurospora tetrasperma, a fungus that lacks recombination over most of its largest chromosome. The genomes surprisingly reveal collinearity of the non-recombining regions and although large inversions are enriched in these regions, we conclude these inversions to be derived and not the cause of the suppression. To our knowledge, this is the first time that non-recombining, genic regions as large as 86% of a full chromosome (or 8?Mbp), are shown to be collinear. These findings are of significant interest for our understanding of the evolution of sex chromosomes and other supergene complexes.

July 7, 2019

The asparagus genome sheds light on the origin and evolution of a young Y chromosome.

Sex chromosomes evolved from autosomes many times across the eukaryote phylogeny. Several models have been proposed to explain this transition, some involving male and female sterility mutations linked in a region of suppressed recombination between X and Y (or Z/W, U/V) chromosomes. Comparative and experimental analysis of a reference genome assembly for a double haploid YY male garden asparagus (Asparagus officinalis L.) individual implicates separate but linked genes as responsible for sex determination. Dioecy has evolved recently within Asparagus and sex chromosomes are cytogenetically identical with the Y, harboring a megabase segment that is missing from the X. We show that deletion of this entire region results in a male-to-female conversion, whereas loss of a single suppressor of female development drives male-to-hermaphrodite conversion. A single copy anther-specific gene with a male sterile Arabidopsis knockout phenotype is also in the Y-specific region, supporting a two-gene model for sex chromosome evolution.

July 7, 2019

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus).

The de novo assembly of repeat-rich mammalian genomes using only high-throughput short read sequencing data typically results in highly fragmented genome assemblies that limit downstream applications. Here, we present an iterative approach to hybrid de novo genome assembly that incorporates datasets stemming from multiple genomic technologies and methods. We used this approach to improve the gray mouse lemur (Microcebus murinus) genome from early draft status to a near chromosome-scale assembly.We used a combination of advanced genomic technologies to iteratively resolve conflicts and super-scaffold the M. murinus genome.We improved the M. murinus genome assembly to a scaffold N50 of 93.32 Mb. Whole genome alignments between our primary super-scaffolds and 23 human chromosomes revealed patterns that are congruent with historical comparative cytogenetic data, thus demonstrating the accuracy of our de novo scaffolding approach and allowing assignment of scaffolds to M. murinus chromosomes. Moreover, we utilized our independent datasets to discover and characterize sequences associated with centromeres across the mouse lemur genome. Quality assessment of the final assembly found 96% of mouse lemur canonical transcripts nearly complete, comparable to other published high-quality reference genome assemblies.We describe a new assembly of the gray mouse lemur (Microcebus murinus) genome with chromosome-scale scaffolds produced using a hybrid bioinformatic and sequencing approach. The approach is cost effective and produces superior results based on metrics of contiguity and completeness. Our results show that emerging genomic technologies can be used in combination to characterize centromeres of non-model species and to produce accurate de novo chromosome-scale genome assemblies of complex mammalian genomes.

July 7, 2019

Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn’t show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes.© The Authors 2017. Published by Oxford University Press.

July 7, 2019

Genomics of parallel adaptation at two timescales in Drosophila.

Two interesting unanswered questions are the extent to which both the broad patterns and genetic details of adaptive divergence are repeatable across species, and the timescales over which parallel adaptation may be observed. Drosophila melanogaster is a key model system for population and evolutionary genomics. Findings from genetics and genomics suggest that recent adaptation to latitudinal environmental variation (on the timescale of hundreds or thousands of years) associated with Out-of-Africa colonization plays an important role in maintaining biological variation in the species. Additionally, studies of interspecific differences between D. melanogaster and its sister species D. simulans have revealed that a substantial proportion of proteins and amino acid residues exhibit adaptive divergence on a roughly few million years long timescale. Here we use population genomic approaches to attack the problem of parallelism between D. melanogaster and a highly diverged conger, D. hydei, on two timescales. D. hydei, a member of the repleta group of Drosophila, is similar to D. melanogaster, in that it too appears to be a recently cosmopolitan species and recent colonizer of high latitude environments. We observed parallelism both for genes exhibiting latitudinal allele frequency differentiation within species and for genes exhibiting recurrent adaptive protein divergence between species. Greater parallelism was observed for long-term adaptive protein evolution and this parallelism includes not only the specific genes/proteins that exhibit adaptive evolution, but extends even to the magnitudes of the selective effects on interspecific protein differences. Thus, despite the roughly 50 million years of time separating D. melanogaster and D. hydei, and despite their considerably divergent biology, they exhibit substantial parallelism, suggesting the existence of a fundamental predictability of adaptive evolution in the genus.

July 7, 2019

Genome sequence of the small brown planthopper, Laodelphax striatellus.

Laodelphax striatellus Fallén (Hemiptera: Delphacidae) is one of the most destructive rice pests. L. striatellus is different from 2 other rice planthoppers with a released genome sequence, Sogatella furcifera and Nilaparvata lugens, in many biological characteristics, such as host range, dispersal capacity, and vectoring plant viruses. Deciphering the genome of L. striatellus will further the understanding of the genetic basis of the biological differences among the 3 rice planthoppers.A total of 190 Gb of Illumina data and 32.4 Gb of Pacbio data were generated and used to assemble a high-quality L. striatellus genome sequence, which is 541 Mb in length and has a contig N50 of 118 Kb and a scaffold N50 of 1.08 Mb. Annotated repetitive elements account for 25.7% of the genome. A total of 17?736 protein-coding genes were annotated, capturing 97.6% and 98% of the BUSCO eukaryote and arthropoda genes, respectively. Compared with N. lugens and S. furcifera, L. striatellus has the smallest genome and the lowest gene number. Gene family expansion and transcriptomic analyses provided hints to the genomic basis of the differences in important traits such as host range, migratory habit, and plant virus transmission between L. striatellus and the other 2 planthoppers.We report a high-quality genome assembly of L. striatellus, which is an important genomic resource not only for the study of the biology of L. striatellus and its interactions with plant hosts and plant viruses, but also for comparison with other planthoppers.© The Authors 2017. Published by Oxford University Press.

July 7, 2019

Comparative genomic analysis of two clonally related multidrug resistant Mycobacterium tuberculosis by Single Molecule Real Time Sequencing.

Background: Multidrug-resistant tuberculosis (MDR-TB) is posing a major threat to global TB control. In this study, we focused on two consecutive MDR-TB isolated from the same patient before and after the initiation of anti-TB treatment. To better understand the genomic characteristics of MDR-TB, Single Molecule Real-Time (SMRT) Sequencing and comparative genomic analyses was performed to identify mutations that contributed to the stepwise development of drug resistance and growth fitness in MDR-TB underin vivochallenge of anti-TB drugs.Result:Both pre-treatment and post-treatment strain demonstrated concordant phenotypic and genotypic susceptibility profiles toward rifampicin, pyrazinamide, streptomycin, fluoroquinolones, aminoglycosides, cycloserine, ethionamide, and para-aminosalicylic acid. However, although both strains carried identical missense mutations atrpoBS531L,inhAC-15T, andembBM306V, MYCOTB Sensititre assay showed that the post-treatment strain had 16-, 8-, and 4-fold elevation in the minimum inhibitory concentrations (MICs) toward rifabutin, isoniazid, and ethambutol respectively. The results have indicated the presence of additional resistant-related mutations governing the stepwise development of MDR-TB. Further comparative genomic analyses have identified three additional polymorphisms between the clinical isolates. These include a single nucleotide deletion at nucleotide position 360 ofrv0888in pre-treatment strain, and a missense mutation atrv3303c(lpdA)V44I and a 6-bp inframe deletion at codon 67-68 inrv2071c(cobM)in the post-treatment strain. Multiple sequence alignment showed that these mutations were occurring at highly conserved regions among pathogenic mycobacteria. Using structural-based and sequence-based algorithms, we further predicted that the mutations potentially have deleterious effect on protein function.Conclusion:This is the first study that compared the full genomes of two clonally-related MDR-TB clinical isolates during the course of anti-TB treatment. Our work has demonstrated the robustness of SMRT Sequencing in identifying mutations among MDR-TB clinical isolates. Comparative genome analysis also suggested novel mutations atrv0888, lpdA, andcobMthat might explain the difference in antibiotic resistance and growth pattern between the two MDR-TB strains.

Auto Tag: Gap filling

Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution.

The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies.

Genomic and functional analysis of Romboutsia ilealis CRIBT reveals adaptation to the small intestine.

Genome sequencing and comparative genomics reveal the potential pathogenic mechanism of Cercospora sojina Hara on soybean.

A high-quality genome assembly of quinoa provides insights into the molecular basis of salt bladder-based salinity tolerance and the exceptional nutritional value.

The Tartary buckwheat genome provides insights into rutin biosynthesis and abiotic stress tolerance.

New insights into structural organization and gene duplication in a 1.75-Mb genomic region harboring the a-gliadin gene family in Aegilops tauschii, the source of wheat D genome.

The sea cucumber genome provides insights into morphological evolution and visceral regeneration.

Large-scale suppression of recombination predates genomic rearrangements in Neurospora tetrasperma.

The asparagus genome sheds light on the origin and evolution of a young Y chromosome.

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus).

Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

Genomics of parallel adaptation at two timescales in Drosophila.

Genome sequence of the small brown planthopper, Laodelphax striatellus.

Comparative genomic analysis of two clonally related multidrug resistant Mycobacterium tuberculosis by Single Molecule Real Time Sequencing.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert