Large genome Archives - Page 11 of 25

September 22, 2019

Whole genome sequencing of greater amberjack (Seriola dumerili) for SNP identification on aligned scaffolds and genome structural variation analysis using parallel resequencing

Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8?Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence.

September 22, 2019

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we develop a reproducible, cloud-based pipeline to integrate multiple sequencing datasets and form benchmark calls, enabling application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. These new genomes’ broad, open consent with few restrictions on availability of samples and data is enabling a uniquely diverse array of applications. Our new methods produce 17% more high-confidence SNPs, 176% more indels, and 12% larger regions than our previously published calls. To demonstrate that these calls can be used for accurate benchmarking, we compare other high-quality callsets to ours (e.g., Illumina Platinum Genomes), and we demonstrate that the majority of discordant calls are errors in the other callsets, We also highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. We show that benchmarking tools from the Global Alliance for Genomics and Health can be used with our calls to stratify performance metrics by variant type and genome context and elucidate strengths and weaknesses of a method.

September 22, 2019

Long-read genome sequence and assembly of Leptopilina boulardi: a specialist Drosophila parasitoid

Background: Leptopilina boulardi is a specialist parasitoid belonging to the order Hymenoptera, which attacks the larval stages of Drosophila. The Leptopilina genus has enormous value in the biological control of pests as well as in understanding several aspects of host-parasitoid biology. However, none of the members of Figitidae family has their genomes sequenced. In order to improve the understanding of the parasitoid wasps by generating genomic resources, we sequenced the whole genome of L. boulardi. Findings: Here, we report a high quality genome of L. boulardi, assembled from 70Gb of Illumina reads and 10.5Gb of PacBio reads, forming a total coverage of 230X. The 375Mb draft genome has an N50 of 275Kb with 6315 scaffolds >500bp, and encompasses >95% complete BUSCOs. The GC% of the genome is 28.26%, and RepeatMasker identified 868105 repeat elements covering 43.9% of the assembly. A total of 25259 protein-coding genes were predicted using a combination of ab-initio and RNA-Seq based methods, with an average gene size of 3.9Kb. 78.11% of the predicted genes could be annotated with at least one function. Conclusion: Our study provides a highly reliable assembly of this parasitoid wasp, which will be a valuable resource to researchers studying parasitoids. In particular, it can help delineate the host-parasitoid mechanisms that are part of the Drosophila-Leptopilina model system.

September 22, 2019

Targeted sequencing by gene synteny, a new strategy for polyploid species: sequencing and physical structure of a complex sugarcane region.

Sugarcane exhibits a complex genome mainly due to its aneuploid nature and high ploidy level, and sequencing of its genome poses a great challenge. Closely related species with well-assembled and annotated genomes can be used to help assemble complex genomes. Here, a stable quantitative trait locus (QTL) related to sugar accumulation in sorghum was successfully transferred to the sugarcane genome. Gene sequences related to this QTL were identified in silico from sugarcane transcriptome data, and molecular markers based on these sequences were developed to select bacterial artificial chromosome (BAC) clones from the sugarcane variety SP80-3280. Sixty-eight BAC clones containing at least two gene sequences associated with the sorghum QTL were sequenced using Pacific Biosciences (PacBio) technology. Twenty BAC sequences were found to be related to the syntenic region, of which nine were sufficient to represent this region. The strategy we propose is called “targeted sequencing by gene synteny,” which is a simpler approach to understanding the genome structure of complex genomic regions associated with traits of interest.

September 22, 2019

The sequence of the salamander.

The genome of the aquatic axolotl salamander, a native of Mexico’s lakes, has yielded some surprises, and the technique used could point the way to analysis of other organisms that have complex genomes with large numbers of sequence repeats, such as the lungfish and many species of plants.

September 22, 2019

Genomics of habitat choice and adaptive evolution in a deep-sea fish.

Intraspecific diversity promotes evolutionary change, and when partitioned among geographic regions or habitats can form the basis for speciation. Marine species live in an environment that can provide as much scope for diversification in the vertical as in the horizontal dimension. Understanding the relevant mechanisms will contribute significantly to our understanding of eco-evolutionary processes and effective biodiversity conservation. Here, we provide an annotated genome assembly for the deep-sea fish Coryphaenoides rupestris and re-sequencing data to show that differentiation at non-synonymous sites in functional loci distinguishes individuals living at different depths, independent of horizontal spatial distance. Our data indicate disruptive selection at these loci; however, we find no clear evidence for differentiation at neutral loci that may indicate assortative mating. We propose that individuals with distinct genotypes at relevant loci segregate by depth as they mature (supported by survey data), which may be associated with ecotype differentiation linked to distinct phenotypic requirements at different depths.

September 22, 2019

Transcriptional profiling, molecular cloning, and functional analysis of C1 inhibitor, the main regulator of the complement system in black rockfish, Sebastes schlegelii.

C1-inhibitor (C1inh) plays a crucial role in assuring homeostasis and is the central regulator of the complement activation involved in immunity and inflammation. A C1-inhibitor gene from Sebastes schlegelii was identified and designated as SsC1inh. The identified genomic DNA and cDNA sequences were 6837 bp and 2161 bp, respectively. The genomic DNA possessed 11 exons, interrupted by 10 introns. The amino acid sequence possessed two immunoglobulin-like domains and a serpin domain. Multiple sequence alignment revealed that the serpin domain of SsC1inh was highly conserved among analyzed species where the two immunoglobulin-like domains showed divergence. The distinctiveness of teleost C1inh from other homologs was indicated by the phylogenetic analysis, genomic DNA organization, and their extended N-terminal amino acid sequences. Under normal physiological conditions, SsC1inh mRNA was most expressed in the liver, followed by the gills. The involvement of SsC1inh in homeostasis was demonstrated by modulated transcription profiles in the liver and spleen upon pathogenic stress by different immune stimulants. The protease inhibitory potential of recombinant SsC1inh (rSsC1inh) and the potentiation effect of heparin on rSsC1inh was demonstrated against C1esterase and thrombin. For the first time, the anti-protease activity of the teleost C1inh against its natural substrates C1r and C1s was proved in this study. The protease assay conducted with recombinant black rockfish C1r and C1s proteins in the presence or absence of rSsC1inh showed that the activities of both proteases were significantly diminished by rSsC1inh. Taken together, results from the present study indicate that SsC1inh actively plays a significant role in maintaining homeostasis in the immune system of black rock fish. Copyright © 2018. Published by Elsevier Ltd.

September 22, 2019

Pathogen-specific binding soluble Down syndrome cell adhesion molecule (Dscam) regulates phagocytosis via membrane-bound Dscam in crab.

The Down syndrome cell adhesion molecule (Dscam) gene is an extraordinary example of diversity that can produce thousands of isoforms and has so far been found only in insects and crustaceans. Cumulative evidence indicates that Dscam may contribute to the mechanistic foundations of specific immune responses in insects. However, the mechanism and functions of Dscam in relation to pathogens and immunity remain largely unknown. In this study, we identified the genome organization and alternative Dscam exons from Chinese mitten crab, Eriocheir sinensis. These variants, designated EsDscam, potentially produce 30,600 isoforms due to three alternatively spliced immunoglobulin (Ig) domains and a transmembrane domain. EsDscam was significantly upregulated after bacterial challenge at both mRNA and protein levels. Moreover, bacterial specific EsDscam isoforms were found to bind specifically with the original bacteria to facilitate efficient clearance. Furthermore, bacteria-specific binding of soluble EsDscam via the complete Ig1-Ig4 domain significantly enhanced elimination of the original bacteria via phagocytosis by hemocytes; this function was abolished by partial Ig1-Ig4 domain truncation. Further studies showed that knockdown of membrane-bound EsDscam inhibited the ability of EsDscam with the same extracellular region to promote bacterial phagocytosis. Immunocytochemistry indicated colocalization of the soluble and membrane-bound forms of EsDscam at the hemocyte surface. Far-Western and coimmunoprecipitation assays demonstrated homotypic interactions between EsDscam isoforms. This study provides insights into a mechanism by which soluble Dscam regulates hemocyte phagocytosis via bacteria-specific binding and specific interactions with membrane-bound Dscam as a phagocytic receptor.

September 22, 2019

Genome sequence, assembly and characterization of two Metschnikowia fructicola strains used as biocontrol agents of postharvest diseases.

The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term) and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue.

September 22, 2019

A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) ‘Hongyang’ draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models.A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within ‘Hongyang’ The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned ‘Hort16A’ cDNAs and comparing with the predicted protein models for Red5 and both the original ‘Hongyang’ assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised ‘Hongyang’ annotation, respectively, compared with 90.9% to the Red5 models.Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.

September 22, 2019

Comparative genomics of the wheat fungal pathogen Pyrenophora tritici-repentis reveals chromosomal variations and genome plasticity.

Pyrenophora tritici-repentis (Ptr) is a necrotrophic fungal pathogen that causes the major wheat disease, tan spot. We set out to provide essential genomics-based resources in order to better understand the pathogenicity mechanisms of this important pathogen.Here, we present eight new Ptr isolate genomes, assembled and annotated; representing races 1, 2 and 5, and a new race. We report a high quality Ptr reference genome, sequenced by PacBio technology with Illumina paired-end data support and optical mapping. An estimated 98% of the genome coverage was mapped to 10 chromosomal groups, using a two-enzyme hybrid approach. The final reference genome was 40.9 Mb and contained a total of 13,797 annotated genes, supported by transcriptomic and proteogenomics data sets.Whole genome comparative analysis revealed major chromosomal segmental rearrangements and fusions, highlighting intraspecific genome plasticity in this species. Furthermore, the Ptr race classification was not supported at the whole genome level, as phylogenetic analysis did not cluster the ToxA producing isolates. This expansion of available Ptr genomics resources will directly facilitate research aimed at controlling tan spot disease.

September 22, 2019

Comparison of phasing strategies for whole human genomes.

Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.

September 22, 2019

Ginseng Genome Database: an open-access platform for genomics of Panax ginseng.

The ginseng (Panax ginseng C.A. Meyer) is a perennial herbaceous plant that has been used in traditional oriental medicine for thousands of years. Ginsenosides, which have significant pharmacological effects on human health, are the foremost bioactive constituents in this plant. Having realized the importance of this plant to humans, an integrated omics resource becomes indispensable to facilitate genomic research, molecular breeding and pharmacological study of this herb.The first draft genome sequences of P. ginseng cultivar “Chunpoong” were reported recently. Here, using the draft genome, transcriptome, and functional annotation datasets of P. ginseng, we have constructed the Ginseng Genome Database http://ginsengdb.snu.ac.kr /, the first open-access platform to provide comprehensive genomic resources of P. ginseng. The current version of this database provides the most up-to-date draft genome sequence (of approximately 3000 Mbp of scaffold sequences) along with the structural and functional annotations for 59,352 genes and digital expression of genes based on transcriptome data from different tissues, growth stages and treatments. In addition, tools for visualization and the genomic data from various analyses are provided. All data in the database were manually curated and integrated within a user-friendly query page.This database provides valuable resources for a range of research fields related to P. ginseng and other species belonging to the Apiales order as well as for plant research communities in general. Ginseng genome database can be accessed at http://ginsengdb.snu.ac.kr /.

September 22, 2019

Draft genome of the Peruvian scallop Argopecten purpuratus.

The Peruvian scallop, Argopecten purpuratus, is mainly cultured in southern Chile and Peru was introduced into China in the last century. Unlike other Argopecten scallops, the Peruvian scallop normally has a long life span of up to 7 to 10 years. Therefore, researchers have been using it to develop hybrid vigor. Here, we performed whole genome sequencing, assembly, and gene annotation of the Peruvian scallop, with an important aim to develop genomic resources for genetic breeding in scallops.A total of 463.19-Gb raw DNA reads were sequenced. A draft genome assembly of 724.78 Mb was generated (accounting for 81.87% of the estimated genome size of 885.29 Mb), with a contig N50 size of 80.11 kb and a scaffold N50 size of 1.02 Mb. Repeat sequences were calculated to reach 33.74% of the whole genome, and 26,256 protein-coding genes and 3,057 noncoding RNAs were predicted from the assembly.We generated a high-quality draft genome assembly of the Peruvian scallop, which will provide a solid resource for further genetic breeding and for the analysis of the evolutionary history of this economically important scallop.

September 22, 2019

Epigenetic landscape influences the liver cancer genome architecture.

The accumulations of different types of genetic alterations such as nucleotide substitutions, structural rearrangements and viral genome integrations and epigenetic alterations contribute to carcinogenesis. Here, we report correlation between the occurrence of epigenetic features and genetic aberrations by whole-genome bisulfite, whole-genome shotgun, long-read, and virus capture sequencing of 373 liver cancers. Somatic substitutions and rearrangement breakpoints are enriched in tumor-specific hypo-methylated regions with inactive chromatin marks and actively transcribed highly methylated regions in the cancer genome. Individual mutation signatures depend on chromatin status, especially, signatures with a higher transcriptional strand bias occur within active chromatic areas. Hepatitis B virus (HBV) integration sites are frequently detected within inactive chromatin regions in cancer cells, as a consequence of negative selection for integrations in active chromatin regions. Ultra-high structural instability and preserved unmethylation of integrated HBV genomes are observed. We conclude that both precancerous and somatic epigenetic features contribute to the cancer genome architecture.

Auto Tag: Large genome

Whole genome sequencing of greater amberjack (Seriola dumerili) for SNP identification on aligned scaffolds and genome structural variation analysis using parallel resequencing

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Long-read genome sequence and assembly of Leptopilina boulardi: a specialist Drosophila parasitoid

Targeted sequencing by gene synteny, a new strategy for polyploid species: sequencing and physical structure of a complex sugarcane region.

The sequence of the salamander.

Genomics of habitat choice and adaptive evolution in a deep-sea fish.

Transcriptional profiling, molecular cloning, and functional analysis of C1 inhibitor, the main regulator of the complement system in black rockfish, Sebastes schlegelii.

Pathogen-specific binding soluble Down syndrome cell adhesion molecule (Dscam) regulates phagocytosis via membrane-bound Dscam in crab.

Genome sequence, assembly and characterization of two Metschnikowia fructicola strains used as biocontrol agents of postharvest diseases.

A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

Comparative genomics of the wheat fungal pathogen Pyrenophora tritici-repentis reveals chromosomal variations and genome plasticity.

Comparison of phasing strategies for whole human genomes.

Ginseng Genome Database: an open-access platform for genomics of Panax ginseng.

Draft genome of the Peruvian scallop Argopecten purpuratus.

Epigenetic landscape influences the liver cancer genome architecture.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert