Large genome Archives - Page 62 of 69

July 7, 2019

Analysis of the genome sequence of the medicinal plant Salvia miltiorrhiza.

Salvia miltiorrhiza Bunge (Danshen) is a medicinal plant of the Lamiaceae family, and its dried roots have long been used in traditional Chinese medicine with hydrophilic phenolic acids and tanshinones as pharmaceutically active components (Zhang et al., 2014; Xu et al., 2016). The first step of tanshinone biosynthesis is bicyclization of the general diterpene precursor (E,E,E)-geranylgeranyl diphosphate (GGPP) to copalyl diphosphate (CPP) by CPP synthases (CPSs), which is followed by a cyclization or rearrangement reaction catalyzed by kaurene synthase-like enzymes (KSL).

July 7, 2019

An improved genome assembly of Azadirachta indica A. Juss.

Neem (Azadirachta indica A. Juss.), an evergreen tree of the Meliaceae family, is known for its medicinal, cosmetic, pesticidal and insecticidal properties. We had previously sequenced and published the draft genome of the plant, using mainly short read sequencing data. In this report, we present an improved genome assembly generated using additional short reads from Illumina and long reads from Pacific Biosciences SMRT sequencer. We assembled short reads and error corrected long reads using Platanus, an assembler designed to perform well for heterozygous genomes. The updated genome assembly (v2.0) yielded 3- and 3.5-fold increase in N50 and N75, respectively; 2.6-fold decrease in the total number of scaffolds; 1.25-fold increase in the number of valid transcriptome alignments; 13.4-fold less mis-assembly and 1.85-fold increase in the percentage repeat, over the earlier assembly (v1.0). The current assembly also maps better to the genes known to be involved in the terpenoid biosynthesis pathway. Together, the data represents an improved assembly of the A. indica genome. The raw data described in this manuscript are submitted to the NCBI Short Read Archive under the accession numbers SRX1074131, SRX1074132, SRX1074133, and SRX1074134 (SRP013453). Copyright © 2016 Author et al.

July 7, 2019

Complete chloroplast genome sequences of Eucommia ulmoides: genome structure and evolution.

Eucommia ulmoides is an important traditional medicinal plant that is used for the production of locative Eucommia rubber. In this study, the complete chloroplast (cp) genome sequence of E. ulmoides was obtained by total DNA sequencing; this is the first cp genome sequence of the order Garryales. The cp genome of E. ulmoides was 163,341 bp long and included a pair of inverted repeat (IR) regions (31,300 bp), one large single copy (LSC) region (86,592 bp), and one small single copy (SSC) region (14,149 bp). The genome structure and GC content were similar to those of typical angiosperm cp genomes and contained 115 unique genes, including 80 protein-coding genes, 31 transfer RNA (tRNAs), and four ribosomal RNA (rRNAs). Compared with the entire cp genome sequence, three unique genome rearrangements were observed in the LSC region. Moreover, compared with the Sesamum and Nicotiana cp genomes, E. ulmoides contained no indels in the IR regions, and variable regions were identified in noncoding regions. The E. ulmoides cp genome showed extreme expansion at the IR/SSC boundary owing to the integration of an additional complete gene, ycf1. Twenty-nine simple sequence repeats (SSRs) were identified in the E. ulmoides cp genome. In addition, 36 protein-coding genes were used for phylogenetic inference, supporting a sister relationship between E. ulmoides and Aucuba, which belongs to Euasterids I. In summary, we described the complete cp genome sequence of E. ulmoides; this information will be useful for phylogenetic and evolutionary studies.

July 7, 2019

Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing.

Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly.We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies.Documentation and source code for Alpha-CENTAURI are freely available at http://github.com/volkansevim/alpha-CENTAURI CONTACT: ali.bashir@mssm.eduSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press.

July 7, 2019

The kiwifruit genome

The whole-genome sequence of Actinidia chinensis var. chinensis ‘Hongyang’ was published in 2013 and was represented as the first publicly available Ericales genome sequence. Publication in 2015 of an improved linkage map for A. chinensis and interspecific comparison analyses coupled with the availability of a second whole-genome sequence of a genotype closely related to ‘Hongyang’ have enabled the kiwifruit research community to improve the existing whole-genome sequence. This chapter describes the original genome sequence and steps towards its improvement.

July 7, 2019

Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.

July 7, 2019

OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

The assembly of large, repeat-rich eukaryotic genomes represents a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be expensive for larger genomes. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, out-performing state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation and third-generation sequencing technologies. OPERA-LG provides an avenue for systematic augmentation and improvement of thousands of existing draft eukaryotic genome assemblies.

July 7, 2019

Next-generation sequencing-based detection of germline L1-mediated transductions.

While active LINE-1 (L1) elements possess the ability to mobilize flanking sequences to different genomic loci through a process termed transduction influencing genomic content and structure, an approach for detecting polymorphic germline non-reference transductions in massively-parallel sequencing data has been lacking.Here we present the computational approach TIGER (Transduction Inference in GERmline genomes), enabling the discovery of non-reference L1-mediated transductions by combining L1 discovery with detection of unique insertion sequences and detailed characterization of insertion sites. We employed TIGER to characterize polymorphic transductions in fifteen genomes from non-human primate species (chimpanzee, orangutan and rhesus macaque), as well as in a human genome. We achieved high accuracy as confirmed by PCR and two single molecule DNA sequencing techniques, and uncovered differences in relative rates of transduction between primate species.By enabling detection of polymorphic transductions, TIGER makes this form of relevant structural variation amenable for population and personal genome analysis.

July 7, 2019

Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida.

Petunia hybrida is a popular bedding plant that has a long history as a genetic model system. We report the whole-genome sequencing and assembly of inbred derivatives of its two wild parents, P. axillaris N and P. inflata S6. The assemblies include 91.3% and 90.2% coverage of their diploid genomes (1.4 Gb; 2n?=?14) containing 32,928 and 36,697 protein-coding genes, respectively. The genomes reveal that the Petunia lineage has experienced at least two rounds of hexaploidization: the older gamma event, which is shared with most Eudicots, and a more recent Solanaceae event that is shared with tomato and other solanaceous species. Transcription factors involved in the shift from bee to moth pollination reside in particularly dynamic regions of the genome, which may have been key to the remarkable diversity of floral colour patterns and pollination systems. The high-quality genome sequences will enhance the value of Petunia as a model system for research on unique biological phenomena such as small RNAs, symbiosis, self-incompatibility and circadian rhythms.

July 7, 2019

Selecting reads for haplotype assembly

Haplotype assembly or read-based phasing is the problem of reconstructing both haplotypes of a diploid genome from next-generation sequencing data. This problem is formalized as the Minimum Error Correction (MEC) problem and can be solved using algorithms such as WhatsHap. The runtime of WhatsHap is exponential in the maximum coverage, which is hence controlled in a pre-processing step that selects reads to be used for phasing. Here, we report on a heuristic algorithm designed to choose beneficial reads for phasing, in particular to increase the connectivity of the phased blocks and the number of correctly phased variants compared to the random selection previously employed in by WhatsHap. The algorithm we describe has been integrated into the WhatsHap software, which is available under MIT licence from https://bitbucket.org/whatshap/whatshap.

July 7, 2019

The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts.

Catfish represent 12% of teleost or 6.3% of all vertebrate species, and are of enormous economic value. Here we report a high-quality reference genome sequence of channel catfish (Ictalurus punctatus), the major aquaculture species in the US. The reference genome sequence was validated by genetic mapping of 54,000 SNPs, and annotated with 26,661 predicted protein-coding genes. Through comparative analysis of genomes and transcriptomes of scaled and scaleless fish and scale regeneration experiments, we address the genomic basis for the most striking physical characteristic of catfish, the evolutionary loss of scales and provide evidence that lack of secretory calcium-binding phosphoproteins accounts for the evolutionary loss of scales in catfish. The channel catfish reference genome sequence, along with two additional genome sequences and transcriptomes of scaled catfishes, provide crucial resources for evolutionary and biological studies. This work also demonstrates the power of comparative subtraction of candidate genes for traits of structural significance.

July 7, 2019

Extensive sequencing of seven human genomes to characterize benchmark reference materials.

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

July 7, 2019

Genomic dark matter illuminated: Anopheles Y chromosomes.

Hall et al. have strategically used long-read sequencing technology to characterize the structure and highly repetitive content of the Y chromosome in Anopheles malaria mosquitoes. Their work confirms that this important but elusive heterochromatic sex chromosome is evolving extremely rapidly and harbors a remarkably small number of genes. Copyright © 2016 Elsevier Ltd. All rights reserved.

July 7, 2019

deBWT: parallel construction of Burrows-Wheeler Transform for large collection of genomes with de Bruijn-branch encoding.

With the development of high-throughput sequencing, the number of assembled genomes continues to rise. It is critical to well organize and index many assembled genomes to promote future genomics studies. Burrows-Wheeler Transform (BWT) is an important data structure of genome indexing, which has many fundamental applications; however, it is still non-trivial to construct BWT for large collection of genomes, especially for highly similar or repetitive genomes. Moreover, the state-of-the-art approaches cannot well support scalable parallel computing owing to their incremental nature, which is a bottleneck to use modern computers to accelerate BWT construction.We propose de Bruijn branch-based BWT constructor (deBWT), a novel parallel BWT construction approach. DeBWT innovatively represents and organizes the suffixes of input sequence with a novel data structure, de Bruijn branch encoding. This data structure takes the advantage of de Bruijn graph to facilitate the comparison between the suffixes with long common prefix, which breaks the bottleneck of the BWT construction of repetitive genomic sequences. Meanwhile, deBWT also uses the structure of de Bruijn graph for reducing unnecessary comparisons between suffixes. The benchmarking suggests that, deBWT is efficient and scalable to construct BWT for large dataset by parallel computing. It is well-suited to index many genomes, such as a collection of individual human genomes, with multiple-core servers or clusters.deBWT is implemented in C language, the source code is available at https://github.com/hitbc/deBWT or https://github.com/DixianZhu/deBWTContact: ydwang@hit.edu.cnSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press.

July 7, 2019

1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana.

Arabidopsis thaliana serves as a model organism for the study of fundamental physiological, cellular, and molecular processes. It has also greatly advanced our understanding of intraspecific genome variation. We present a detailed map of variation in 1,135 high-quality re-sequenced natural inbred lines representing the native Eurasian and North African range and recently colonized North America. We identify relict populations that continue to inhabit ancestral habitats, primarily in the Iberian Peninsula. They have mixed with a lineage that has spread to northern latitudes from an unknown glacial refugium and is now found in a much broader spectrum of habitats. Insights into the history of the species and the fine-scale distribution of genetic diversity provide the basis for full exploitation of A. thaliana natural variation through integration of genomes and epigenomes with molecular and non-molecular phenotypes. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

Asset Tag: Large genome

Analysis of the genome sequence of the medicinal plant Salvia miltiorrhiza.

An improved genome assembly of Azadirachta indica A. Juss.

Complete chloroplast genome sequences of Eucommia ulmoides: genome structure and evolution.

Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing.

The kiwifruit genome

Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

Next-generation sequencing-based detection of germline L1-mediated transductions.

Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida.

Selecting reads for haplotype assembly

The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts.

Extensive sequencing of seven human genomes to characterize benchmark reference materials.

Genomic dark matter illuminated: Anopheles Y chromosomes.

deBWT: parallel construction of Burrows-Wheeler Transform for large collection of genomes with de Bruijn-branch encoding.

1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert