Large genome Archives - Page 54 of 69

July 7, 2019

Draft nuclear genome sequence of the liquid hydrocarbon–accumulating green microalga Botryococcus braunii race B (Showa).

Botryococcus braunii has long been known as a prodigious producer of liquid hydrocarbon oils that can be converted into combustion engine fuels. This draft genome for the B race of B. braunii will allow researchers to unravel important hydrocarbon biosynthetic pathways and identify possible regulatory networks controlling this unusual metabolism. Copyright © 2017 Browne et al.

July 7, 2019

Genome-wide analysis of WOX genes in upland cotton and their expression pattern under different stresses.

WUSCHEL-related homeobox (WOX) family members play significant roles in plant growth and development, such as in embryo patterning, stem-cell maintenance, and lateral organ formation. The recently published cotton genome sequences allow us to perform comprehensive genome-wide analysis and characterization of WOX genes in cotton.In this study, we identified 21, 20, and 38 WOX genes in Gossypium arboreum (2n = 26, A2), G. raimondii (2n = 26, D5), and G. hirsutum (2n = 4x = 52, (AD)t), respectively. Sequence logos showed that homeobox domains were significantly conserved among the WOX genes in cotton, Arabidopsis, and rice. A total of 168 genes from three typical monocots and six dicots were naturally divided into three clades, which were further classified into nine sub-clades. A good collinearity was observed in the synteny analysis of the orthologs from At and Dt (t represents tetraploid) sub-genomes. Whole genome duplication (WGD) and segmental duplication within At and Dt sub-genomes played significant roles in the expansion of WOX genes, and segmental duplication mainly generated the WUS clade. Copia and Gypsy were the two major types of transposable elements distributed upstream or downstream of WOX genes. Furthermore, through comparison, we found that the exon/intron pattern was highly conserved between Arabidopsis and cotton, and the homeobox domain loci were also conserved between them. In addition, the expression pattern in different tissues indicated that the duplicated genes in cotton might have acquired new functions as a result of sub-functionalization or neo-functionalization. The expression pattern of WOX genes under different stress treatments showed that the different genes were induced by different stresses.In present work, WOX genes, classified into three clades, were identified in the upland cotton genome. Whole genome and segmental duplication were determined to be the two major impetuses for the expansion of gene numbers during the evolution. Moreover, the expression patterns suggested that the duplicated genes might have experienced a functional divergence. Together, these results shed light on the evolution of the WOX gene family, and would be helpful in future research.

July 7, 2019

Hybrid assembly with long and short reads improves discovery of gene family expansions.

Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation.We developed a hybrid assembly pipeline called “Alpaca” that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation.Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies.Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations.

July 7, 2019

Repeated divergent selection on pigmentation genes in a rapid finch radiation.

Instances of recent and rapid speciation are suitable for associating phenotypes with their causal genotypes, especially if gene flow homogenizes areas of the genome that are not under divergent selection. We study a rapid radiation of nine sympatric bird species known as capuchino seedeaters, which are differentiated in sexually selected characters of male plumage and song. We sequenced the genomes of a phenotypically diverse set of species to search for differentiated genomic regions. Capuchinos show differences in a small proportion of their genomes, yet selection has acted independently on the same targets in different members of this radiation. Many divergent regions contain genes involved in the melanogenesis pathway, with the strongest signal originating from putative regulatory regions. Selection has acted on these same genomic regions in different lineages, likely shaping the evolution of cis-regulatory elements, which control how more conserved genes are expressed and thereby generate diversity in classically sexually selected traits.

July 7, 2019

CLOVE: classification of genomic fusions into structural variation events.

A precise understanding of structural variants (SVs) in DNA is important in the study of cancer and population diversity. Many methods have been designed to identify SVs from DNA sequencing data. However, the problem remains challenging because existing approaches suffer from low sensitivity, precision, and positional accuracy. Furthermore, many existing tools only identify breakpoints, and so not collect related breakpoints and classify them as a particular type of SV. Due to the rapidly increasing usage of high throughput sequencing technologies in this area, there is an urgent need for algorithms that can accurately classify complex genomic rearrangements (involving more than one breakpoint or fusion).We present CLOVE, an algorithm for integrating the results of multiple breakpoint or SV callers and classifying the results as a particular SV. CLOVE is based on a graph data structure that is created from the breakpoint information. The algorithm looks for patterns in the graph that are characteristic of more complex rearrangement types. CLOVE is able to integrate the results of multiple callers, producing a consensus call.We demonstrate using simulated and real data that re-classified SV calls produced by CLOVE improve on the raw call set of existing SV algorithms, particularly in terms of accuracy. CLOVE is freely available from http://www.github.com/PapenfussLab .

July 7, 2019

A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana.

The mycalesine butterfly Bicyclus anynana , the ‘Squinting bush brown’, is a model organism in the study of lepidopteran ecology, development and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species.Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology. 128 Gb raw Illumina data were filtered to 124 Gb and assembled to a final size of 475 Mb (~260X assembly coverage). Contigs were scaffolded using mate-pair, transcriptome and PacBio data into 10,800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements, and encodes a total of 22,642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes.We report a high-quality draft genome sequence for Bicyclus anynana . The genome assembly and annotated gene models are available at LepBase ( http://ensembl.lepbase.org/index.html ).

July 7, 2019

Genome graphs

There is increasing recognition that a single, monoploid reference genome is a poor universal reference structure for human genetics, because it represents only a tiny fraction of human variation. Adding this missing variation results in a structure that can be described as a mathematical graph: a genome graph. We demonstrate that, in comparison to the existing reference genome (GRCh38), genome graphs can substantially improve the fractions of reads that map uniquely and perfectly. Furthermore, we show that this fundamental simplification of read mapping transforms the variant calling problem from one in which many non-reference variants must be discovered de-novo to one in which the vast majority of variants are simply re-identified within the graph. Using standard benchmarks as well as a novel reference-free evaluation, we show that a simplistic variant calling procedure on a genome graph can already call variants at least as well as, and in many cases better than, a state-of-the-art method on the linear human reference genome. We anticipate that graph-based references will supplant linear references in humans and in other applications where cohorts of sequenced individuals are available.

July 7, 2019

The origin, diversification and adaptation of a major mangrove clade (Rhizophoreae) revealed by whole-genome sequencing

Mangroves invade some very marginal habitats for woody plants—at the interface between land and sea. Since mangroves anchor tropical coastal communities globally, their origin, diversification and adaptation are of scientific significance, particularly at a time of global climate change. In this study, a combination of single-molecule long reads and the more conventional short reads are generated from Rhizophora apiculata for the de novo assembly of its genome to a near chromosome level. The longest scaffold, N50 and N90 for the R. apiculata genome, are 13.3 Mb, 5.4 Mb and 1.0 Mb, respectively. Short reads for the genomes and transcriptomes of eight related species are also generated. We find that the ancestor of Rhizophoreae experienced a whole-genome duplication ~70 Myrs ago, which is followed rather quickly by colonization and species diversification. Mangroves exhibit pan-exome modifications of amino acid (AA) usage as well as unusual AA substitutions among closely related species. The usage and substitution of AAs, unique among plants surveyed, is correlated with the rapid evolution of proteins in mangroves. A small subset of these substitutions is associated with mangroves’ highly specialized traits (vivipary and red bark) thought to be adaptive in the intertidal habitats. Despite the many adaptive features, mangroves are among the least genetically diverse plants, likely the result of continual habitat turnovers caused by repeated rises and falls of sea level in the geologically recent past. Mangrove genomes thus inform about their past evolutionary success as well as portend a possibly difficult future.

July 7, 2019

Tandem duplications lead to novel expression patterns through exon shuffling in Drosophila yakuba.

One common hypothesis to explain the impacts of tandem duplications is that whole gene duplications commonly produce additive changes in gene expression due to copy number changes. Here, we use genome wide RNA-seq data from a population sample of Drosophila yakuba to test this ‘gene dosage’ hypothesis. We observe little evidence of expression changes in response to whole transcript duplication capturing 5′ and 3′ UTRs. Among whole gene duplications, we observe evidence that dosage sharing across copies is likely to be common. The lack of expression changes after whole gene duplication suggests that the majority of genes are subject to tight regulatory control and therefore not sensitive to changes in gene copy number. Rather, we observe changes in expression level due to both shuffling of regulatory elements and the creation of chimeric structures via tandem duplication. Additionally, we observe 30 de novo gene structures arising from tandem duplications, 23 of which form with expression in the testes. Thus, the value of tandem duplications is likely to be more intricate than simple changes in gene dosage. The common regulatory effects from chimeric gene formation after tandem duplication may explain their contribution to genome evolution.

July 7, 2019

Whole genome sequencing predicts novel human disease models in rhesus macaques.

Rhesus macaques are an important pre-clinical model of human disease. To advance our understanding of genomic variation that may influence disease, we surveyed genome-wide variation in 21 rhesus macaques. We employed best-practice variant calling, validated with Mendelian inheritance. Next, we used alignment data from our cohort to detect genomic regions likely to produce inaccurate genotypes, potentially due to either gene duplication or structural variation between individuals. We generated a final dataset of >16 million high confidence variants, including 13 million in Chinese-origin rhesus macaques, an increasingly important disease model. We detected an average of 131 mutations predicted to severely alter protein coding per animal, and identified 45 such variants that coincide with known pathogenic human variants. These data suggest that expanded screening of existing breeding colonies will identify novel models of human disease, and that increased genomic characterization can help inform research studies in macaques. Copyright © 2017 Elsevier Inc. All rights reserved.

July 7, 2019

Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production.

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.

July 7, 2019

Genome-wide identification of the mutation underlying fleece variation and discriminating ancestral hairy species from modern woolly sheep.

The composition and structure of fleece variation observed in mammals is a consequence of a strong selective pressure for fiber production after domestication. In sheep, fleece variation discriminates ancestral species carrying a long and hairy fleece from modern domestic sheep (Ovis aries) owning a short and woolly fleece. Here, we report that the “woolly” allele results from the insertion of an antisense EIF2S2 retrogene (called asEIF2S2) into the 3′ UTR of the IRF2BP2 gene leading to an abnormal IRF2BP2 transcript. We provide evidence that this chimeric IRF2BP2/asEIF2S2 messenger 1) targets the genuine sense EIF2S2 RNA and 2) creates a long endogenous double-stranded RNA which alters the expression of both EIF2S2 and IRF2BP2 mRNA. This represents a unique example of a phenotype arising via a RNA-RNA hybrid, itself generated through a retroposition mechanism. Our results bring new insights on the sheep population history thanks to the identification of the molecular origin of an evolutionary phenotypic variation.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

July 7, 2019

A $4,000 workstation for mammalian genome assembly with long reads

Long-read sequencing has enabled the de novo assembly of several mammalian genomes, but with high cost in computing. Here, we demonstrated de novo assembly of mammalian genome using long reads in an efficient and inexpensive workstation.

July 7, 2019

N-glycan maturation mutants in Lotus japonicus for basic and applied glycoprotein research.

Studies of protein N-glycosylation are important for answering fundamental questions on the diverse functions of glycoproteins in plant growth and development. Here we generated and characterised a comprehensive collection of Lotus japonicusLORE1 insertion mutants, each lacking the activity of one of the 12 enzymes required for normal N-glycan maturation in the glycosylation machinery. The inactivation of the individual genes resulted in altered N-glycan patterns as documented using mass spectrometry and glycan-recognising antibodies, indicating successful identification of null mutations in the target glyco-genes. For example, both mass spectrometry and immunoblotting experiments suggest that proteins derived from the a1,3-fucosyltransferase (Lj3fuct) mutant completely lacked a1,3-core fucosylation. Mass spectrometry also suggested that the Lotus japonicus convicilin 2 was one of the main glycoproteins undergoing differential expression/N-glycosylation in the mutants. Demonstrating the functional importance of glycosylation, reduced growth and seed production phenotypes were observed for the mutant plants lacking functional mannosidase I, N-acetylglucosaminyltransferase I, and a1,3-fucosyltransferase, even though the relative protein composition and abundance appeared unaffected. The strength of our N-glycosylation mutant platform is the broad spectrum of resulting glycoprotein profiles and altered physiological phenotypes that can be produced from single, double, triple and quadruple mutants. This platform will serve as a valuable tool for elucidating the functional role of protein N-glycosylation in plants. Furthermore, this technology can be used to generate stable plant mutant lines for biopharmaceutical production of glycoproteins displaying relative homogeneous and mammalian-like N-glycosylation features.© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

July 7, 2019

Automated structural variant verification in human genomesw using single-molecule electronic DNA mapping.

The importance of structural variation in human disease and the difficulty of detecting structural variants larger than 50 base pairs has led to the development of several long-read sequencing technologies and optical mapping platforms. Frequently, multiple technologies and ad hoc methods are required to obtain a consensus regarding the location, size and nature of a structural variant, with no approach able to reliably bridge the gap of variant sizes between the domain of short-read approaches and the largest rearrangements observed with optical mapping. To address this unmet need, we have developed a new software package, SV-VerifyTM, which utilizes data collected with the Nabsys High Definition Mapping (HD-MappingTM) system, to perform hypothesis-based verification of putative deletions. We demonstrate that whole genome maps, constructed from electronic detection of tagged DNA, hundreds of kilobases in length, can be used effectively to facilitate calling of structural variants ranging in size from 300 base pairs to hundreds of kilobase pairs. SV-Verify implements hypothesis-based verification of putative structural variants using a set of support vector machines and is capable of concurrently testing several thousand independent hypotheses. We describe support vector machine training, utilizing a well-characterized human genome, and application of the resulting classifiers to another human genome, demonstrating high sensitivity and specificity for deletions >= 300 base pairs.

Asset Tag: Large genome

Draft nuclear genome sequence of the liquid hydrocarbon–accumulating green microalga Botryococcus braunii race B (Showa).

Genome-wide analysis of WOX genes in upland cotton and their expression pattern under different stresses.

Hybrid assembly with long and short reads improves discovery of gene family expansions.

Repeated divergent selection on pigmentation genes in a rapid finch radiation.

CLOVE: classification of genomic fusions into structural variation events.

A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana.

Genome graphs

The origin, diversification and adaptation of a major mangrove clade (Rhizophoreae) revealed by whole-genome sequencing

Tandem duplications lead to novel expression patterns through exon shuffling in Drosophila yakuba.

Whole genome sequencing predicts novel human disease models in rhesus macaques.

Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production.

Genome-wide identification of the mutation underlying fleece variation and discriminating ancestral hairy species from modern woolly sheep.

A $4,000 workstation for mammalian genome assembly with long reads

N-glycan maturation mutants in Lotus japonicus for basic and applied glycoprotein research.

Automated structural variant verification in human genomesw using single-molecule electronic DNA mapping.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert