Large genome Archives - Page 19 of 25

September 22, 2019

Repeated inversions within a pannier intron drive diversification of intraspecific colour patterns of ladybird beetles.

How genetic information is modified to generate phenotypic variation within a species is one of the central questions in evolutionary biology. Here we focus on the striking intraspecific diversity of >200 aposematic elytral (forewing) colour patterns of the multicoloured Asian ladybird beetle, Harmonia axyridis, which is regulated by a tightly linked genetic locus h. Our loss-of-function analyses, genetic association studies, de novo genome assemblies, and gene expression data reveal that the GATA transcription factor gene pannier is the major regulatory gene located at the h locus, and suggest that repeated inversions and cis-regulatory modifications at pannier led to the expansion of colour pattern variation in H. axyridis. Moreover, we show that the colour-patterning function of pannier is conserved in the seven-spotted ladybird beetle, Coccinella septempunctata, suggesting that H. axyridis’ extraordinary intraspecific variation may have arisen from ancient modifications in conserved elytral colour-patterning mechanisms in ladybird beetles.

September 22, 2019

Exploring the limits and causes of plastid genome expansion in volvocine green algae.

Plastid genomes are not normally celebrated for being large. But researchers are steadily uncovering algal lineages with big and, in rare cases, enormous plastid DNAs (ptDNAs), such as volvocine green algae. Plastome sequencing of five different volvocine species has revealed some of the largest, most repeat-dense plastomes on record, including that of Volvox carteri (~525?kb). Volvocine algae have also been used as models for testing leading hypotheses on organelle genome evolution (e.g., the mutational hazard hypothesis), and it has been suggested that ptDNA inflation within this group might be a consequence of low mutation rates and/or the transition from a unicellular to multicellular existence. Here, we further our understanding of plastome size variation in the volvocine line by examining the ptDNA sequences of the colonial species Yamagishiella unicocca and Eudorina sp. NIES-3984 and the multicellular Volvox africanus, which are phylogenetically situated between species with known ptDNA sizes. Although V. africanus is closely related and similar in multicellular organization to V. carteri, its ptDNA was much less inflated than that of V. carteri. Synonymous- and noncoding-site nucleotide substitution rate analyses of these two Volvox ptDNAs suggest that there are drastically different plastid mutation rates operating in the coding versus intergenic regions, supporting the idea that error-prone DNA repair in repeat-rich intergenic spacers is contributing to genome expansion. Our results reinforce the idea that the volvocine line harbors extremes in plastome size but ultimately shed doubt on some of the previously proposed hypotheses for ptDNA inflation within the lineage.

September 22, 2019

A draft genome assembly of the Chinese sillago (Sillago sinica), the first reference genome for Sillaginidae fishes.

Sillaginidae, also known as smelt-whitings, is a family of benthic coastal marine fishes in the Indo-West Pacific that have high ecological and economic importance. Many Sillaginidae species, including the Chinese sillago (Sillago sinica), have been recently described in China, providing valuable material to analyze genetic diversification of the family Sillaginidae. Here, we constructed a reference genome for the Chinese sillago, with the aim to set up a platform for comparative analysis of all species in this family.Using the single-molecule real-time DNA sequencing platform Pacific Biosciences (PacBio) Sequel, we generated ~27.3 Gb genomic DNA sequences for the Chinese sillago. We reconstructed a genome assembly of 534 Mb using a strategy that takes advantage of complementary strengths of two genome assembly programs, Canu and FALCON. The genome size was consistent with the estimated genome size based on k-mer analysis. The assembled genome consisted of 802 contigs with a contig N50 length of 2.6 Mb. We annotated 22,122 protein-coding genes in the Chinese sillago genomes using a de novo method as well as RNA sequencing data and homologies to other teleosts. According to the phylogenetic analysis using protein-coding genes, the Chinese sillago is closely related to Larimichthys crocea and Dicentrarchus labrax and diverged from their ancestor around 69.5-82.6 million years ago.Using long reads generated with PacBio sequencing technology, we have built a draft genome assembly for the Chinese sillago, which is the first reference genome for Sillaginidae species. This genome assembly sets a stage for comparative analysis of the diversification and adaptation of fishes in Sillaginidae.

September 22, 2019

A statistical method for observing personal diploid methylomes and transcriptomes with Single-Molecule Real-Time sequencing.

We address the problem of observing personal diploid methylomes, CpG methylome pairs of homologous chromosomes that are distinguishable with respect to phased heterozygous variants (PHVs), which is challenging due to scarcity of PHVs in personal genomes. Single molecule real-time (SMRT) sequencing is promising as it outputs long reads with CpG methylation information, but a serious concern is whether reliable PHVs are available in erroneous SMRT reads with an error rate of ~15%. To overcome the issue, we propose a statistical model that reduces the error rate of phasing CpG site to 1%, thereby calling CpG hypomethylation in each haplotype with >90% precision and sensitivity. Using our statistical model, we examined GNAS complex locus known for a combination of maternally, paternally, or biallelically expressed isoforms, and observed allele-specific methylation pattern almost perfectly reflecting their respective allele-specific expression status, demonstrating the merit of elucidating comprehensive personal diploid methylomes and transcriptomes.

September 22, 2019

Insights into the evolution of multicellularity from the sea lettuce genome.

We report here the 98.5 Mbp haploid genome (12,924 protein coding genes) of Ulva mutabilis, a ubiquitous and iconic representative of the Ulvophyceae or green seaweeds. Ulva’s rapid and abundant growth makes it a key contributor to coastal biogeochemical cycles; its role in marine sulfur cycles is particularly important because it produces high levels of dimethylsulfoniopropionate (DMSP), the main precursor of volatile dimethyl sulfide (DMS). Rapid growth makes Ulva attractive biomass feedstock but also increasingly a driver of nuisance “green tides.” Ulvophytes are key to understanding the evolution of multicellularity in the green lineage, and Ulva morphogenesis is dependent on bacterial signals, making it an important species with which to study cross-kingdom communication. Our sequenced genome informs these aspects of ulvophyte cell biology, physiology, and ecology. Gene family expansions associated with multicellularity are distinct from those of freshwater algae. Candidate genes, including some that arose following horizontal gene transfer from chromalveolates, are present for the transport and metabolism of DMSP. The Ulva genome offers, therefore, new opportunities to understand coastal and marine ecosystems and the fundamental evolution of the green lineage. Copyright © 2018 Elsevier Ltd. All rights reserved.

September 22, 2019

B chromosomes of the Asian seabass (Lates calcarifer) contribute to genome variations at the level of individuals and populations.

The Asian seabass (Lates calcarifer) is a bony fish from the Latidae family, which is widely distributed in the tropical Indo-West Pacific region. The karyotype of the Asian seabass contains 24 pairs of A chromosomes and a variable number of AT- and GC-rich B chromosomes (Bchrs or Bs). Dot-like shaped and nucleolus-associated AT-rich Bs were microdissected and sequenced earlier. Here we analyzed DNA fragments from Bs to determine their repeat and gene contents using the Asian seabass genome as a reference. Fragments of 75 genes, including an 18S rRNA gene, were found in the Bs; repeats represented 2% of the Bchr assembly. The 18S rDNA of the standard genome and Bs were similar and enriched with fragments of transposable elements. A higher nuclei DNA content in the male gonad and somatic tissue, compared to the female gonad, was demonstrated by flow cytometry. This variation in DNA content could be associated with the intra-individual variation in the number of Bs. A comparison between the copy number variation among the B-related fragments from whole genome resequencing data of Asian seabass individuals identified similar profiles between those from the South-East Asian/Philippines and Indian region but not the Australian ones. Our results suggest that Bs might cause variations in the genome among the individuals and populations of Asian seabass. A personalized copy number approach for segmental duplication detection offers a suitable tool for population-level analysis across specimens with low coverage genome sequencing.

September 22, 2019

Comparison of the mitochondrial genome sequences of six Annulohypoxylon stygium isolates suggests short fragment insertions as a potential factor leading to larger genomic size.

Mitochondrial DNA (mtDNA) is a core non-nuclear genetic material found in all eukaryotic organisms, the size of which varies extensively in the eumycota, even within species. In this study, mitochondrial genomes of six isolates of Annulohypoxylon stygium (Lév.) were assembled from raw reads from PacBio and Illumina sequencing. The diversity of genomic structures, conserved genes, intergenic regions and introns were analyzed and compared. Genome sizes ranged from 132 to 147 kb and contained the same sets of conserved protein-coding, tRNA and rRNA genes and shared the same gene arrangements and orientation. In addition, most intergenic regions were homogeneous and had similar sizes except for the region between cytochrome b (cob) and cytochrome c oxidase I (cox1) genes which ranged from 2,998 to 8,039 bp among the six isolates. Sixty-five intron insertion sites and 99 different introns were detected in these genomes. Each genome contained 45 or more introns, which varied in distribution and content. Introns from homologous insertion sites also showed high diversity in size, type and content. Comparison of introns at the same loci showed some complex introns, such as twintrons and ORF-less introns. There were 44 short fragment insertions detected within introns, intergenic regions, or as introns, some of them located at conserved domain regions of homing endonuclease genes. Insertions of short fragments such as small inverted repeats might affect or hinder the movement of introns, and these allowed for intron accumulation in the mitochondrial genomes analyzed, and enlarged their size. This study showed that the evolution of fungal mitochondrial introns is complex, and the results suggest short fragment insertions as a potential factor leading to larger mitochondrial genomes in A. stygium.

September 22, 2019

The Arctic charr (Salvelinus alpinus) genome and transcriptome assembly.

Arctic charr have a circumpolar distribution, persevere under extreme environmental conditions, and reach ages unknown to most other salmonids. The Salvelinus genus is primarily composed of species with genomes that are structured more like the ancestral salmonid genome than most Oncorhynchus and Salmo species of sister genera. It is thought that this aspect of the genome may be important for local adaptation (due to increased recombination) and anadromy (the migration of fish from saltwater to freshwater). In this study, we describe the generation of a new genetic map, the sequencing and assembly of the Arctic charr genome (GenBank accession: GCF_002910315.2) using the newly created genetic map and a previous genetic map, and present several analyses of the Arctic charr genes and genome assembly. The newly generated genetic map consists of 8,574 unique genetic markers and is similar to previous genetic maps with the exception of three major structural differences. The N50, identified BUSCOs, repetitive DNA content, and total size of the Arctic charr assembled genome are all comparable to other assembled salmonid genomes. An analysis to identify orthologous genes revealed that a large number of orthologs could be identified between salmonids and many appear to have highly conserved gene expression profiles between species. Comparing orthologous gene expression profiles may give us a better insight into which genes are more likely to influence species specific phenotypes.

September 22, 2019

Structural variants exhibit allelic heterogeneity and shape variation in complex traits

Despite extensive effort to reveal the genetic basis of complex phenotypic variation, studies typically explain only a fraction of trait heritability. It has been hypothesized that individually rare hidden structural variants (SVs) could account for a significant fraction of variation in complex traits. To investigate this hypothesis, we assembled 14 Drosophila melanogaster genomes and systematically identified more than 20,000 euchromatic SVs, of which ~40% are invisible to high specificity short read genotyping approaches. SVs are common in Drosophila genes, with almost one third of diploid individuals harboring an SV in genes larger than 5kb, and nearly a quarter harboring multiple SVs in genes larger than 10kb. We show that SV alleles are rarer than amino acid polymorphisms, implying that they are more strongly deleterious. A number of functionally important genes harbor previously hidden structural variants that likely affect complex phenotypes (e.g., Cyp6g1, Drsl5, Cyp28d1&2, InR, and Gss1&2). Furthermore, SVs are overrepresented in quantitative trait locus candidate genes from eight Drosophila Synthetic Population Resource (DSPR) mapping experiments. We conclude that SVs are pervasive in genomes, are frequently present as heterogeneous allelic series, and can act as rare alleles of large effect.

September 22, 2019

Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (single instruction multiple data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we (a) distribute many independent alignments on multiple threads and (b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal.We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon PhiTM (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon PhiTM and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module.The module is programmed in C++?using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4 under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME: SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++?compilers on various platforms.Supplementary data are available at Bioinformatics online.

September 22, 2019

Development and validation of 58K SNP-array and high-density linkage map in Nile tilapia (O. niloticus).

Despite being the second most important aquaculture species in the world accounting for 7.4% of global production in 2015, tilapia aquaculture has lacked genomic tools like SNP-arrays and high-density linkage maps to improve selection accuracy and accelerate genetic progress. In this paper, we describe the development of a genotyping array containing more than 58,000 SNPs for Nile tilapia (Oreochromis niloticus). SNPs were identified from whole genome resequencing of 32 individuals from the commercial population of the Genomar strain, and were selected for the SNP-array based on polymorphic information content and physical distribution across the genome using the Orenil1.1 genome assembly as reference sequence. SNP-performance was evaluated by genotyping 4991 individuals, including 689 offspring belonging to 41 full-sib families, which revealed high-quality genotype data for 43,588 SNPs. A preliminary genetic linkage map was constructed using Lepmap2 which in turn was integrated with information from the O_niloticus_UMD1 genome assembly to produce an integrated physical and genetic linkage map comprising 40,186 SNPs distributed across 22 linkage groups (LGs). Around one-third of the LGs showed a different recombination rate between sexes, with the female being greater than the male map by a factor of 1.2 (1632.9 to 1359.6 cM, respectively), with most LGs displaying a sigmoid recombination profile. Finally, the sex-determining locus was mapped to position 40.53 cM on LG23, in the vicinity of the anti-Müllerian hormone (amh) gene. These new resources has the potential to greatly influence and improve the genetic gain when applying genomic selection and surpass the difficulties of efficient selection for invasively measured traits in Nile tilapia.

September 22, 2019

Variation graph toolkit improves read mapping by representing genetic variation in the reference.

Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual’s genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.

September 22, 2019

Bias in resistance gene prediction due to repeat masking

Several recently published Brassicaceae genome annotations show strong differences in resistance (R)-gene content. We believe that this is caused by different approaches to repeat masking. Here we show that some of the repeats stored in public databases used for repeat masking carry pieces of predicted R-gene-related domains, and demonstrate that at least some of the variance in R-gene content in recent genome annotations is caused by using these repeats for repeat masking. We also show that other classes of genes are less affected by this phenomenon, and estimate a false positive rate of R genes (0 to 4.6%) that are in reality transposons carrying the R-gene domains. These results may partially explain why there has been a decrease in published novel R genes in recent years, which has implications for plant breeding, especially in the face of pathogens changing as a response to climate change.

September 22, 2019

A homeobox gene, BarH-1, underlies a female alternative life-history strategy

Colias butterflies (the “clouded sulphurs”) often occur in mixed populations where females exhibit two color morphs, yellow/orange or white. White females, known as the Alba morph, reallocate resources from the synthesis of costly colored pigments to reproductive and somatic development 1. Due to this tradeoff Alba females develop faster and have higher fecundity than orange females 2. However orange females, that have instead invested in pigments, are preferred by males who in turn provide a nutrient rich spermatophore during mating 2,3,4. Thus the wing color morphs represent alternative life history strategies (ALHS) that are female-limited, wherein tradeoffs, due to divergent resource investment, result in distinct phenotypes with associated fitness consequences. Here we map the genetic basis of Alba in Colias crocea to a transposable element insertion downstream of the Colias homolog of BarH-1. To investigate the phenotypic effects of this insertion we use CRISPR/Cas9 to validate BarH-1’s functional role in the wing color switch and antibody staining to confirm expression differences in the scale building cells of pupal wings. We then use scanning electron microscopy to determine that BarH-1 expression in the wings causes a reduction in pigment granules within wing scales, and thereby gives rise to the white color. Finally, lipid and transcriptome analyses reveal additional physiological differences that arise due to Alba, suggesting pleiotropic effects beyond wing color. Together these findings provide the first well documented mechanism for a female ALHS and support an alternative view of color polymorphism as indicative of pleiotropic effects with life history consequences.

September 22, 2019

Antiviral adaptive immunity and tolerance in the mosquito Aedes aegyti

Mosquitoes spread pathogenic arboviruses while themselves tolerate infection. We here characterize an immunity pathway providing long-term antiviral protection and define how this pathway discriminates between self and non-self. Mosquitoes use viral RNAs to create viral derived cDNAs (vDNAs) central to the antiviral response. vDNA molecules are acquired through a process of reverse-transcription and recombination directed by endogenous retrotransposons. These vDNAs are thought to integrate in the host genome as endogenous viral elements (EVEs). Sequencing of pre-integrated vDNA revealed that the acquisition process exquisitely distinguishes viral from host RNA, providing one layer of self-nonself discrimination. Importantly, we show EVE-derived piRNAs have antiviral activity and are loaded onto Piwi4 to inhibit virus replication. In a second layer of self-non-self discrimination, Piwi4 preferentially loads EVE-derived piRNAs, discriminating against transposon-targeting piRNAs. Our findings define a fundamental virus-specific immunity pathway in mosquitoes that uses EVEs as a potent and specific antiviral transgenerational mechanism.

Auto Tag: Large genome

Repeated inversions within a pannier intron drive diversification of intraspecific colour patterns of ladybird beetles.

Exploring the limits and causes of plastid genome expansion in volvocine green algae.

A draft genome assembly of the Chinese sillago (Sillago sinica), the first reference genome for Sillaginidae fishes.

A statistical method for observing personal diploid methylomes and transcriptomes with Single-Molecule Real-Time sequencing.

Insights into the evolution of multicellularity from the sea lettuce genome.

B chromosomes of the Asian seabass (Lates calcarifer) contribute to genome variations at the level of individuals and populations.

Comparison of the mitochondrial genome sequences of six Annulohypoxylon stygium isolates suggests short fragment insertions as a potential factor leading to larger genomic size.

The Arctic charr (Salvelinus alpinus) genome and transcriptome assembly.

Structural variants exhibit allelic heterogeneity and shape variation in complex traits

Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

Development and validation of 58K SNP-array and high-density linkage map in Nile tilapia (O. niloticus).

Variation graph toolkit improves read mapping by representing genetic variation in the reference.

Bias in resistance gene prediction due to repeat masking

A homeobox gene, BarH-1, underlies a female alternative life-history strategy

Antiviral adaptive immunity and tolerance in the mosquito Aedes aegyti

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert