Large genome Archives - Page 25 of 69

September 22, 2019

Jointly aligning a group of DNA reads improves accuracy of identifying large deletions.

Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of genomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments, and consequently on the variant calls-with current read lengths, this affects more than one third of known large deletions in the C. Venter genome. We present a method to jointly align reads to a genome, whereby alignment ambiguity of one read can be disambiguated by other reads. We show this leads to a significant improvement in the accuracy of identifying large deletions (=20 bases), while imposing minimal computational overhead and maintaining an overall running time that is at par with current tools. A software implementation is available as an open-source Python program called JRA at https://bitbucket.org/jointreadalignment/jra-src.

September 22, 2019

Bat biology, genomes, and the Bat1K project: To generate chromosome-level genomes for all living bat species.

Bats are unique among mammals, possessing some of the rarest mammalian adaptations, including true self-powered flight, laryngeal echolocation, exceptional longevity, unique immunity, contracted genomes, and vocal learning. They provide key ecosystem services, pollinating tropical plants, dispersing seeds, and controlling insect pest populations, thus driving healthy ecosystems. They account for more than 20% of all living mammalian diversity, and their crown-group evolutionary history dates back to the Eocene. Despite their great numbers and diversity, many species are threatened and endangered. Here we announce Bat1K, an initiative to sequence the genomes of all living bat species (n~1,300) to chromosome-level assembly. The Bat1K genome consortium unites bat biologists (>148 members as of writing), computational scientists, conservation organizations, genome technologists, and any interested individuals committed to a better understanding of the genetic and evolutionary mechanisms that underlie the unique adaptations of bats. Our aim is to catalog the unique genetic diversity present in all living bats to better understand the molecular basis of their unique adaptations; uncover their evolutionary history; link genotype with phenotype; and ultimately better understand, promote, and conserve bats. Here we review the unique adaptations of bats and highlight how chromosome-level genome assemblies can uncover the molecular basis of these traits. We present a novel sequencing and assembly strategy and review the striking societal and scientific benefits that will result from the Bat1K initiative.

September 22, 2019

A survey of localized sequence rearrangements in human DNA.

Genomes mutate and evolve in ways simple (substitution or deletion of bases) and complex (e.g. chromosome shattering). We do not fully understand what types of complex mutation occur, and we cannot routinely characterize arbitrarily-complex mutations in a high-throughput, genome-wide manner. Long-read DNA sequencing methods (e.g. PacBio, nanopore) are promising for this task, because one read may encompass a whole complex mutation. We describe an analysis pipeline to characterize arbitrarily-complex ‘local’ mutations, i.e. intrachromosomal mutations encompassed by one DNA read. We apply it to nanopore and PacBio reads from one human cell line (NA12878), and survey sequence rearrangements, both real and artifactual. Almost all the real rearrangements belong to recurring patterns or motifs: the most common is tandem multiplication (e.g. heptuplication), but there are also complex patterns such as localized shattering, which resembles DNA damage by radiation. Gene conversions are identified, including one between hemoglobin gamma genes. This study demonstrates a way to find intricate rearrangements with any number of duplications, deletions, and repositionings. It demonstrates a probability-based method to resolve ambiguous rearrangements involving highly similar sequences, as occurs in gene conversion. We present a catalog of local rearrangements in one human cell line, and show which rearrangement patterns occur.

September 22, 2019

Assembly and analysis of a qingke reference genome demonstrate its close genetic relation to modern cultivated barley.

Qingke, the local name of hulless barley in the Tibetan Plateau, is a staple food for Tibetans. The availability of its reference genome sequences could be useful for studies on breeding and molecular evolution. Taking advantage of the third-generation sequencer (PacBio), we de novo assembled a 4.84-Gb genome sequence of qingke, cv. Zangqing320 and anchored a 4.59-Gb sequence to seven chromosomes. Of the 46,787 annotated ‘high-confidence’ genes, 31 564 were validated by RNA-sequencing data of 39 wild and cultivated barley genotypes with wide genetic diversity, and the results were also confirmed by nonredundant protein database from NCBI. As some gaps in the reference genome of Morex were covered in the reference genome of Zangqing320 by PacBio reads, we believe that the Zangqing320 genome provides the useful supplements for the Morex genome. Using the qingke genome as a reference, we conducted a genome comparison, revealing a close genetic relationship between a hulled barley (cv. Morex) and a hulless barley (cv. Zangqing320), which is strongly supported by the low-diversity regions in the two genomes. Considering the origin of Morex from its breeding pedigree, we then demonstrated a close genomic relationship between modern cultivated barley and qingke. Given this genomic relationship and the large genetic diversity between qingke and modern cultivated barley, we propose that qingke could provide elite genes for barley improvement.© 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

September 22, 2019

Anisogamy evolved with a reduced sex-determining region in volvocine green algae

Male and female gametes differing in size—anisogamy—emerged independently from isogamous ancestors in various eukaryotic lineages, although genetic bases of this emergence are still unknown. Volvocine green algae are a model lineage for investigating the transition from isogamy to anisogamy. Here we focus on two closely related volvocine genera that bracket this transition—isogamous Yamagishiella and anisogamous Eudorina. We generated de novo nuclear genome assemblies of both sexes of Yamagishiella and Eudorina to identify the dimorphic sex-determining chromosomal region or mating-type locus (MT) from each. In contrast to the large (>1?Mb) and complex MT of oogamous Volvox, Yamagishiella and Eudorina MT are smaller (7–268?kb) and simpler with only two sex-limited genes—the minus/male-limited MID and the plus/female-limited FUS1. No prominently dimorphic gametologs were identified in either species. Thus, the first step to anisogamy in volvocine algae presumably occurred without an increase in MT size and complexity.

September 22, 2019

Genotype assembly, biological activity and adaptation of spatially separated isolates of Spodoptera litura nucleopolyhedrovirus.

The cotton leafworm Spodoptera litura is a polyphagous insect. It has recently made a comeback as a primary insect pest of cotton in Pakistan due to reductions in pesticide use on the advent of genetically modified cotton, resistant to Helicoverpa armigera. Spodoptera litura nucleopolyhedrovirus (SpltNPV) infects S. litura and is recognized as a potential candidate to control this insect. Twenty-two NPV isolates were collected from S. litura from different agro-ecological zones (with collection sites up to 600?km apart) and cropping systems in Pakistan to see whether there is spatial dispersal and adaptation of the virus and/or adaptation to crops. Therefore, the genetic make-up and biological activity of these isolates was measured. Among the SpltNPV isolates tested for speed of kill in 3rd instar larvae of S. litura, TAX1, SFD1, SFD2 and GRW1 were significantly faster killing isolates than other Pakistani isolates. Restriction fragment length analysis of the DNA showed that the Pakistan SpltNPV isolates are all variants of a single SpltNPV biotype. The isolates could be grouped into three genogroups (A-C). The speed of kill of genogroup A viruses was higher than in group C according to a Cox’ proportional hazards analysis. Sequence analysis showed that the Pakistan SpltNPV isolates are more closely related to each other than to the SpltNPV type species G2 (Pang et al., 2001). This suggests a single introduction of SpltNPV into Pakistan. The SpltNPV-PAK isolates are distinct from Spodoptera littoralis nucleopolyhedrovirus. There was a strong correlation between geographic spread and the genetic variation of SpltNPV, and a marginally significant correlation between the latter and the cropping system. The faster killing isolates may be good candidates for biological control of S. litura in Pakistan. Copyright © 2018 Elsevier Inc. All rights reserved.

September 22, 2019

Loss of stomach, loss of appetite? Sequencing of the ballan wrasse (Labrus bergylta) genome and intestinal transcriptomic profiling illuminate the evolution of loss of stomach function in fish.

The ballan wrasse (Labrus bergylta) belongs to a large teleost family containing more than 600 species showing several unique evolutionary traits such as lack of stomach and hermaphroditism. Agastric fish are found throughout the teleost phylogeny, in quite diverse and unrelated lineages, indicating stomach loss has occurred independently multiple times in the course of evolution. By assembling the ballan wrasse genome and transcriptome we aimed to determine the genetic basis for its digestive system function and appetite regulation. Among other, this knowledge will aid the formulation of aquaculture diets that meet the nutritional needs of agastric species.Long and short read sequencing technologies were combined to generate a ballan wrasse genome of 805 Mbp. Analysis of the genome and transcriptome assemblies confirmed the absence of genes that code for proteins involved in gastric function. The gene coding for the appetite stimulating protein ghrelin was also absent in wrasse. Gene synteny mapping identified several appetite-controlling genes and their paralogs previously undescribed in fish. Transcriptome profiling along the length of the intestine found a declining expression gradient from the anterior to the posterior, and a distinct expression profile in the hind gut.We showed gene loss has occurred for all known genes related to stomach function in the ballan wrasse, while the remaining functions of the digestive tract appear intact. The results also show appetite control in ballan wrasse has undergone substantial changes. The loss of ghrelin suggests that other genes, such as motilin, may play a ghrelin like role. The wrasse genome offers novel insight in to the evolutionary traits of this large family. As the stomach plays a major role in protein digestion, the lack of genes related to stomach digestion in wrasse suggests it requires formulated diets with higher levels of readily digestible protein than those for gastric species.

September 22, 2019

Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation.

The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome.The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads.The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease.

September 22, 2019

The complete mitochondrial genome of the hermaphroditic freshwater mussel Anodonta cygnea (Bivalvia: Unionidae): in silico analyses of sex-specific ORFs across order Unionoida.

Doubly uniparental inheritance (DUI) of mitochondrial DNA in bivalves is a fascinating exception to strictly maternal inheritance as practiced by all other animals. Recent work on DUI suggests that there may be unique regions of the mitochondrial genomes that play a role in sex determination and/or sexual development in freshwater mussels (order Unionoida). In this study, one complete mitochondrial genome of the hermaphroditic swan mussel, Anodonta cygnea, is sequenced and compared to the complete mitochondrial genome of the gonochoric duck mussel, Anodonta anatina. An in silico assessment of novel proteins found within freshwater bivalve species (known as F-, H-, and M-open reading frames or ORFs) is conducted, with special attention to putative transmembrane domains (TMs), signal peptides (SPs), signal cleavage sites (SCS), subcellular localization, and potential control regions. Characteristics of TMs are also examined across freshwater mussel lineages.In silico analyses suggests the presence of SPs and SCSs and provides some insight into possible function(s) of these novel ORFs. The assessed confidence in these structures and functions was highly variable, possibly due to the novelty of these proteins. The number and topology of putative TMs appear to be maintained among both F- and H-ORFs, however, this is not the case for M-ORFs. There does not appear to be a typical control region in H-type mitochondrial DNA, especially given the loss of tandem repeats in unassigned regions when compared to F-type mtDNA.In silico analyses provides a useful tool to discover patterns in DUI and to navigate further in situ analyses related to DUI in freshwater mussels. In situ analysis will be necessary to further explore the intracellular localizations and possible role of these open reading frames in the process of sex determination in freshwater mussel.

September 22, 2019

Whole genome sequencing of greater amberjack (Seriola dumerili) for SNP identification on aligned scaffolds and genome structural variation analysis using parallel resequencing

Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8?Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence.

September 22, 2019

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we develop a reproducible, cloud-based pipeline to integrate multiple sequencing datasets and form benchmark calls, enabling application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. These new genomes’ broad, open consent with few restrictions on availability of samples and data is enabling a uniquely diverse array of applications. Our new methods produce 17% more high-confidence SNPs, 176% more indels, and 12% larger regions than our previously published calls. To demonstrate that these calls can be used for accurate benchmarking, we compare other high-quality callsets to ours (e.g., Illumina Platinum Genomes), and we demonstrate that the majority of discordant calls are errors in the other callsets, We also highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. We show that benchmarking tools from the Global Alliance for Genomics and Health can be used with our calls to stratify performance metrics by variant type and genome context and elucidate strengths and weaknesses of a method.

September 22, 2019

Long-read genome sequence and assembly of Leptopilina boulardi: a specialist Drosophila parasitoid

Background: Leptopilina boulardi is a specialist parasitoid belonging to the order Hymenoptera, which attacks the larval stages of Drosophila. The Leptopilina genus has enormous value in the biological control of pests as well as in understanding several aspects of host-parasitoid biology. However, none of the members of Figitidae family has their genomes sequenced. In order to improve the understanding of the parasitoid wasps by generating genomic resources, we sequenced the whole genome of L. boulardi. Findings: Here, we report a high quality genome of L. boulardi, assembled from 70Gb of Illumina reads and 10.5Gb of PacBio reads, forming a total coverage of 230X. The 375Mb draft genome has an N50 of 275Kb with 6315 scaffolds >500bp, and encompasses >95% complete BUSCOs. The GC% of the genome is 28.26%, and RepeatMasker identified 868105 repeat elements covering 43.9% of the assembly. A total of 25259 protein-coding genes were predicted using a combination of ab-initio and RNA-Seq based methods, with an average gene size of 3.9Kb. 78.11% of the predicted genes could be annotated with at least one function. Conclusion: Our study provides a highly reliable assembly of this parasitoid wasp, which will be a valuable resource to researchers studying parasitoids. In particular, it can help delineate the host-parasitoid mechanisms that are part of the Drosophila-Leptopilina model system.

September 22, 2019

The sequence of the salamander.

The genome of the aquatic axolotl salamander, a native of Mexico’s lakes, has yielded some surprises, and the technique used could point the way to analysis of other organisms that have complex genomes with large numbers of sequence repeats, such as the lungfish and many species of plants.

September 22, 2019

Genomics of habitat choice and adaptive evolution in a deep-sea fish.

Intraspecific diversity promotes evolutionary change, and when partitioned among geographic regions or habitats can form the basis for speciation. Marine species live in an environment that can provide as much scope for diversification in the vertical as in the horizontal dimension. Understanding the relevant mechanisms will contribute significantly to our understanding of eco-evolutionary processes and effective biodiversity conservation. Here, we provide an annotated genome assembly for the deep-sea fish Coryphaenoides rupestris and re-sequencing data to show that differentiation at non-synonymous sites in functional loci distinguishes individuals living at different depths, independent of horizontal spatial distance. Our data indicate disruptive selection at these loci; however, we find no clear evidence for differentiation at neutral loci that may indicate assortative mating. We propose that individuals with distinct genotypes at relevant loci segregate by depth as they mature (supported by survey data), which may be associated with ecotype differentiation linked to distinct phenotypic requirements at different depths.

September 22, 2019

Transcriptional profiling, molecular cloning, and functional analysis of C1 inhibitor, the main regulator of the complement system in black rockfish, Sebastes schlegelii.

C1-inhibitor (C1inh) plays a crucial role in assuring homeostasis and is the central regulator of the complement activation involved in immunity and inflammation. A C1-inhibitor gene from Sebastes schlegelii was identified and designated as SsC1inh. The identified genomic DNA and cDNA sequences were 6837 bp and 2161 bp, respectively. The genomic DNA possessed 11 exons, interrupted by 10 introns. The amino acid sequence possessed two immunoglobulin-like domains and a serpin domain. Multiple sequence alignment revealed that the serpin domain of SsC1inh was highly conserved among analyzed species where the two immunoglobulin-like domains showed divergence. The distinctiveness of teleost C1inh from other homologs was indicated by the phylogenetic analysis, genomic DNA organization, and their extended N-terminal amino acid sequences. Under normal physiological conditions, SsC1inh mRNA was most expressed in the liver, followed by the gills. The involvement of SsC1inh in homeostasis was demonstrated by modulated transcription profiles in the liver and spleen upon pathogenic stress by different immune stimulants. The protease inhibitory potential of recombinant SsC1inh (rSsC1inh) and the potentiation effect of heparin on rSsC1inh was demonstrated against C1esterase and thrombin. For the first time, the anti-protease activity of the teleost C1inh against its natural substrates C1r and C1s was proved in this study. The protease assay conducted with recombinant black rockfish C1r and C1s proteins in the presence or absence of rSsC1inh showed that the activities of both proteases were significantly diminished by rSsC1inh. Taken together, results from the present study indicate that SsC1inh actively plays a significant role in maintaining homeostasis in the immune system of black rock fish. Copyright © 2018. Published by Elsevier Ltd.

Asset Tag: Large genome

Jointly aligning a group of DNA reads improves accuracy of identifying large deletions.

Bat biology, genomes, and the Bat1K project: To generate chromosome-level genomes for all living bat species.

A survey of localized sequence rearrangements in human DNA.

Assembly and analysis of a qingke reference genome demonstrate its close genetic relation to modern cultivated barley.

Anisogamy evolved with a reduced sex-determining region in volvocine green algae

Genotype assembly, biological activity and adaptation of spatially separated isolates of Spodoptera litura nucleopolyhedrovirus.

Loss of stomach, loss of appetite? Sequencing of the ballan wrasse (Labrus bergylta) genome and intestinal transcriptomic profiling illuminate the evolution of loss of stomach function in fish.

Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation.

The complete mitochondrial genome of the hermaphroditic freshwater mussel Anodonta cygnea (Bivalvia: Unionidae): in silico analyses of sex-specific ORFs across order Unionoida.

Whole genome sequencing of greater amberjack (Seriola dumerili) for SNP identification on aligned scaffolds and genome structural variation analysis using parallel resequencing

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Long-read genome sequence and assembly of Leptopilina boulardi: a specialist Drosophila parasitoid

The sequence of the salamander.

Genomics of habitat choice and adaptive evolution in a deep-sea fish.

Transcriptional profiling, molecular cloning, and functional analysis of C1 inhibitor, the main regulator of the complement system in black rockfish, Sebastes schlegelii.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert