Large genome Archives - Page 22 of 25

September 22, 2019

Whole-genome sequencing of Chinese yellow catfish provides a valuable genetic resource for high-throughput identification of toxin genes.

Naturally derived toxins from animals are good raw materials for drug development. As a representative venomous teleost, Chinese yellow catfish (Pelteobagrus fulvidraco) can provide valuable resources for studies on toxin genes. Its venom glands are located in the pectoral and dorsal fins. Although with such interesting biologic traits and great value in economy, Chinese yellow catfish is still lacking a sequenced genome. Here, we report a high-quality genome assembly of Chinese yellow catfish using a combination of next-generation Illumina and third-generation PacBio sequencing platforms. The final assembly reached 714 Mb, with a contig N50 of 970 kb and a scaffold N50 of 3.65 Mb, respectively. We also annotated 21,562 protein-coding genes, in which 97.59% were assigned at least one functional annotation. Based on the genome sequence, we analyzed toxin genes in Chinese yellow catfish. Finally, we identified 207 toxin genes and classified them into three major groups. Interestingly, we also expanded a previously reported sex-related region (to ˜6 Mb) in the achieved genome assembly, and localized two important toxin genes within this region. In summary, we assembled a high-quality genome of Chinese yellow catfish and performed high-throughput identification of toxin genes from a genomic view. Therefore, the limited number of toxin sequences in public databases will be remarkably improved once we integrate multi-omics data from more and more sequenced species.

September 22, 2019

The chromosome-level quality genome provides insights into the evolution of the biosynthesis genes for aroma compounds of Osmanthus fragrans.

Sweet osmanthus (Osmanthus fragrans) is a very popular ornamental tree species throughout Southeast Asia and USA particularly for its extremely fragrant aroma. We constructed a chromosome-level reference genome of O. fragrans to assist in studies of the evolution, genetic diversity, and molecular mechanism of aroma development. A total of over 118?Gb of polished reads was produced from HiSeq (45.1?Gb) and PacBio Sequel (73.35?Gb), giving 100× depth coverage for long reads. The combination of Illumina-short reads, PacBio-long reads, and Hi-C data produced the final chromosome quality genome of O. fragrans with a genome size of 727?Mb and a heterozygosity of 1.45 %. The genome was annotated using de novo and homology comparison and further refined with transcriptome data. The genome of O. fragrans was predicted to have?45,542 genes, of which 95.68 % were functionally annotated. Genome annotation found 49.35 % as the repetitive sequences, with long terminal repeats (LTR) being the richest (28.94 %). Genome evolution analysis indicated the evidence of whole-genome duplication 15 million years ago, which contributed to the current content of 45,242 genes. Metabolic analysis revealed that linalool, a monoterpene is the main aroma compound. Based on the genome and transcriptome, we further demonstrated the direct connection between terpene synthases (TPSs) and the rich aromatic molecules in O. fragrans. We identified three new flower-specific TPS genes, of which the expression coincided with the production of linalool. Our results suggest that the high number of TPS genes and the flower tissue- and stage-specific TPS genes expressions might drive the strong unique aroma production of O. fragrans.

September 22, 2019

Phenotypic and genomic comparison of Photorhabdus luminescens subsp. laumondii TT01 and a widely used rifampicin-resistant Photorhabdus luminescens laboratory strain.

Photorhabdus luminescens is an enteric bacterium, which lives in mutualistic association with soil nematodes and is highly pathogenic for a broad spectrum of insects. A complete genome sequence for the type strain P. luminescens subsp. laumondii TT01, which was originally isolated in Trinidad and Tobago, has been described earlier. Subsequently, a rifampicin resistant P. luminescens strain has been generated with superior possibilities for experimental characterization. This strain, which is widely used in research, was described as a spontaneous rifampicin resistant mutant of TT01 and is known as TT01-RifR.Unexpectedly, upon phenotypic comparison between the rifampicin resistant strain and its presumed parent TT01, major differences were found with respect to bioluminescence, pigmentation, biofilm formation, haemolysis as well as growth. Therefore, we renamed the strain TT01-RifR to DJC. To unravel the genomic basis of the observed differences, we generated a complete genome sequence for strain DJC using the PacBio long read technology. As strain DJC was supposed to be a spontaneous mutant, only few sequence differences were expected. In order to distinguish these from potential sequencing errors in the published TT01 genome, we re-sequenced a derivative of strain TT01 in parallel, also using the PacBio technology. The two TT01 genomes differed at only 30 positions. In contrast, the genome of strain DJC varied extensively from TT01, showing 13,000 point mutations, 330 frameshifts, and 220 strain-specific regions with a total length of more than 300 kb in each of the compared genomes.According to the major phenotypic and genotypic differences, the rifampicin resistant P. luminescens strain, now named strain DJC, has to be considered as an independent isolate rather than a derivative of strain TT01. Strains TT01 and DJC both belong to P. luminescens subsp. laumondii.

September 22, 2019

Correcting palindromes in long reads after whole-genome amplification.

Next-generation sequencing requires sufficient DNA to be available. If limited, whole-genome amplification is applied to generate additional amounts of DNA. Such amplification often results in many chimeric DNA fragments, in particular artificial palindromic sequences, which limit the usefulness of long sequencing reads.Here, we present Pacasus, a tool for correcting such errors. Two datasets show that it markedly improves read mapping and de novo assembly, yielding results similar to these that would be obtained with non-amplified DNA.With Pacasus long-read technologies become available for sequencing targets with very small amounts of DNA, such as single cells or even single chromosomes.

September 22, 2019

Growth factor gene IGF1 is associated with bill size in the black-bellied seedcracker Pyrenestes ostrinus.

Pyrenestes finches are unique among birds in showing a non-sex-determined polymorphism in bill size and are considered a textbook example of disruptive selection. Morphs breed randomly with respect to bill size, and differ in diet and feeding performance relative to seed hardness. Previous breeding experiments are consistent with the polymorphism being controlled by a single genetic factor. Here, we use genome-wide pooled sequencing to explore the underlying genetic basis of bill morphology and identify a single candidate region. Targeted resequencing reveals extensive linkage disequilibrium across a 300?Kb region containing the insulin-like growth factor 1 (IGF1) gene, with a single 5-million-year-old haplotype associating with phenotypic dominance of the large-billed morph. We find no genetic similarities controlling bill size in the well-studied Darwin’s finches (Geospiza). Our results show how a single genetic factor may control bill size and provide a foundation for future studies to examine this phenomenon within and among avian species.

September 22, 2019

Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.

Recent developments in third-gen long read sequencing and diploid-aware assemblers have resulted in the rapid release of numerous reference-quality assemblies for diploid genomes. However, assembly of highly heterozygous genomes is still problematic when regional heterogeneity is so high that haplotype homology is not recognised during assembly. This results in regional duplication rather than consolidation into allelic variants and can cause issues with downstream analysis, for example variant discovery, or haplotype reconstruction using the diploid assembly with unpaired allelic contigs.A new pipeline-Purge Haplotigs-was developed specifically for third-gen sequencing-based assemblies to automate the reassignment of allelic contigs, and to assist in the manual curation of genome assemblies. The pipeline uses a draft haplotype-fused assembly or a diploid assembly, read alignments, and repeat annotations to identify allelic variants in the primary assembly. The pipeline was tested on a simulated dataset and on four recent diploid (phased) de novo assemblies from third-generation long-read sequencing, and compared with a similar tool. After processing with Purge Haplotigs, haploid assemblies were less duplicated with minimal impact on genome completeness, and diploid assemblies had more pairings of allelic contigs.Purge Haplotigs improves the haploid and diploid representations of third-gen sequencing based genome assemblies by identifying and reassigning allelic contigs. The implementation is fast and scales well with large genomes, and it is less likely to over-purge repetitive or paralogous elements compared to alignment-only based methods. The software is available at https://bitbucket.org/mroachawri/purge_haplotigs under a permissive MIT licence.

September 22, 2019

Unexpected patterns of segregation distortion at a selfish supergene in the fire ant Solenopsis invicta.

The Sb supergene in the fire ant Solenopsis invicta determines the form of colony social organization, with colonies whose inhabitants bear the element containing multiple reproductive queens and colonies lacking it containing only a single queen. Several features of this supergene – including suppressed recombination, presence of deleterious mutations, association with a large centromere, and “green-beard” behavior – suggest that it may be a selfish genetic element that engages in transmission ratio distortion (TRD), defined as significant departures in progeny allele frequencies from Mendelian inheritance ratios. We tested this possibility by surveying segregation ratios in embryo progenies of 101 queens of the “polygyne” social form (3512 embryos) using three supergene-linked markers and twelve markers outside the supergene.Significant departures from Mendelian ratios were observed at the supergene loci in 3-5 times more progenies than expected in the absence of TRD and than found, on average, among non-supergene loci. Also, supergene loci displayed the greatest mean deviations from Mendelian ratios among all study loci, although these typically were modest. A surprising feature of the observed inter-progeny variation in TRD was that significant deviations involved not only excesses of supergene alleles but also similarly frequent excesses of the alternate alleles on the homologous chromosome. As expected given the common occurrence of such “drive reversal” in this system, alleles associated with the supergene gain no consistent transmission advantage over their alternate alleles at the population level. Finally, we observed low levels of recombination and incomplete gametic disequilibrium across the supergene, including between adjacent markers within a single inversion.Our data confirm the prediction that the Sb supergene is a selfish genetic element capable of biasing its own transmission during reproduction, yet counterselection for suppressor loci evidently has produced an evolutionary stalemate in TRD between the variant homologous haplotypes on the “social chromosome”. Evidence implicates prezygotic segregation distortion as responsible for the TRD we document, with “true” meiotic drive the most likely mechanism. Low levels of recombination and incomplete gametic disequilibrium across the supergene suggest that selection does not preserve a single uniform supergene haplotype responsible for inducing polygyny.

September 22, 2019

An improved genome assembly for Larimichthys crocea reveals hepcidin gene expansion with diversified regulation and function.

Larimichthys crocea (large yellow croaker) is a type of perciform fish well known for its peculiar physiological properties and economic value. Here, we constructed an improved version of the L. crocea genome assembly, which contained 26,100 protein-coding genes. Twenty-four pseudo-chromosomes of L. crocea were also reconstructed, comprising 90% of the genome assembly. This improved assembly revealed several expansions in gene families associated with olfactory detection, detoxification, and innate immunity. Specifically, six hepcidin genes (LcHamps) were identified in L. crocea, possibly resulting from lineage-specific gene duplication. All LcHamps possessed similar genomic structures and functional domains, but varied substantially with respect to expression pattern, transcriptional regulation, and biological function. LcHamp1 was associated specifically with iron metabolism, while LcHamp2s were functionally diverse, involving in antibacterial activity, antiviral activity, and regulation of intracellular iron metabolism. This functional diversity among gene copies may have allowed L. crocea to adapt to diverse environmental conditions.

September 22, 2019

Improved reference genome for the domestic horse increases assembly contiguity and composition.

Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference genome assemblies in terms of contiguity and composition. EquCab2, a reference genome for the domestic horse, was released in 2007. Although of equal or better quality compared to other first-generation Sanger assemblies, it had many of the shortcomings common to them. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab2 and incorporating new short-read data, long-read data, and proximity ligation data. Here, we present EquCab3. The count of non-N bases in the incorporated chromosomes is improved from 2.33?Gb in EquCab2 to 2.41?Gb in EquCab3. Contiguity has also been improved nearly 40-fold with a contig N50 of 4.5?Mb and scaffold contiguity enhanced to where all but one of the 32 chromosomes is comprised of a single scaffold.

September 22, 2019

Cryptocurrencies and Zero Mode Wave guides: An unclouded path to a more contiguous Cannabis sativa L. genome assembly

We describe the use ofa Decentralized Autonomous Organization (DAO) to crypto- fund the single molecule sequencing and publication ofa Type ll Cannabis plant. This resulted in the construction of the most contiguous Cannabis genome assembly to date. The combined use of the Dash cryptocurrency, DAOs, and Pacific Biosciences sequencing delivered a 1.03 Gb genome with a N50 of 665Kb in 77 days from funding to public upload. This represents a 230 fold improvement in the contiguity of the first cannabis assemblies in 2011 and a 4 fold improvement over all cannabis assemblies to date. 34Gb ofadditional sequencing pushed the assembly to a N50 of 3.8Mb. Hi-C data from Phase Genomics further scaffolded the assembly to 35 contigs at an N50 of 74Mb but requires additional curation. The genome is partially phased and larger than previously reported (2N : 1.33Gb). The CBCA, THCA and CBDA synthase gene clusters have been phased onto respective contigs demonstrating tandem repeat expansions.

September 22, 2019

3D molecular cytology of Hop (Humulus lupulus) meiotic chromosomes reveals non-disomic pairing and segregation, aneuploidy, and genomic structural variation.

Hop (Humulus lupulus L.) is an important crop worldwide, known as the main flavoring ingredient in beer. The diversifying brewing industry demands variation in flavors, superior process properties, and sustainable agronomics, which are the focus of advanced molecular breeding efforts in hops. Hop breeders have been limited in their ability to create strains with desirable traits, however, because of the unusual and unpredictable inheritance patterns and associated non-Mendelian genetic marker segregation. Cytogenetic analysis of meiotic chromosome behavior has also revealed conspicuous and prevalent occurrences of multiple, atypical, non-disomic chromosome complexes, including those involving autosomes in late prophase. To explore the role of meiosis in segregation distortion, we undertook 3D cytogenetic analysis of hop pollen mother cells stained with DAPI and FISH. We used telomere FISH to demonstrate that hop exhibits a normal telomere clustering bouquet. We also identified and characterized a new sub-terminal 180 bp satellite DNA tandem repeat family called HSR0, located proximal to telomeres. Highly variable 5S rDNA FISH patterns within and between plants, together with the detection of anaphase chromosome bridges, reflect extensive departures from normal disomic signal composition and distribution. Subsequent FACS analysis revealed variable DNA content in a cultivated pedigree. Together, these findings implicate multiple phenomena, including aneuploidy, segmental aneuploidy, or chromosome rearrangements, as contributing factors to segregation distortion in hop.

September 22, 2019

Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data

Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.

September 22, 2019

The impact of genome evolution on the allotetraploid Nicotiana rustica – an intriguing story of enhanced alkaloid production.

Nicotiana rustica (Aztec tobacco), like common tobacco (Nicotiana tabacum), is an allotetraploid formed through a recent hybridization event; however, it originated from completely different progenitor species. Here, we report the comparative genome analysis of wild type N. rustica (5 Gb; 2n?=?4x?=?48) with its three putative diploid progenitors (2.3-3 Gb; 2n?=?2x =24), Nicotiana undulata, Nicotiana paniculata and Nicotiana knightiana.In total, 41% of N. rustica genome originated from the paternal donor (N. undulata), while 59% originated from the maternal donor (N. paniculata/N. knightiana). Chloroplast genome and gene analyses indicated that N. knightiana is more closely related to N. rustica than N. paniculata. Gene clustering revealed 14,623 ortholog groups common to other Nicotiana species and 207 unique to N. rustica. Genome sequence analysis indicated that N. knightiana is more closely related to N. rustica than N. paniculata, and that the higher nicotine content of N. rustica leaves is the result of the progenitor genomes combination and of a more active transport of nicotine to the shoot.The availability of four new Nicotiana genome sequences provide insights into how speciation impacts plant metabolism, and in particular alkaloid transport and accumulation, and will contribute to better understanding the evolution of Nicotiana species.

September 22, 2019

Three New Genome Assemblies Support a Rapid Radiation in Musa acuminata (Wild Banana).

Edible bananas result from interspecific hybridization between Musa acuminata and Musa balbisiana, as well as among subspecies in M. acuminata. Four particular M. acuminata subspecies have been proposed as the main contributors of edible bananas, all of which radiated in a short period of time in southeastern Asia. Clarifying the evolution of these lineages at a whole-genome scale is therefore an important step toward understanding the domestication and diversification of this crop. This study reports the de novo genome assembly and gene annotation of a representative genotype from three different subspecies of M. acuminata. These data are combined with the previously published genome of the fourth subspecies to investigate phylogenetic relationships. Analyses of shared and unique gene families reveal that the four subspecies are quite homogenous, with a core genome representing at least 50% of all genes and very few M. acuminata species-specific gene families. Multiple alignments indicate high sequence identity between homologous single copy-genes, supporting the close relationships of these lineages. Interestingly, phylogenomic analyses demonstrate high levels of gene tree discordance, due to both incomplete lineage sorting and introgression. This pattern suggests rapid radiation within Musa acuminata subspecies that occurred after the divergence with M. balbisiana. Introgression between M. a. ssp. malaccensis and M. a. ssp. burmannica was detected across the genome, though multiple approaches to resolve the subspecies tree converged on the same topology. To support evolutionary and functional analyses, we introduce the PanMusa database, which enables researchers to exploration of individual gene families and trees.

September 22, 2019

Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools.

We produced an extensive collection of deep re-sequencing datasets for the Venter/HuRef genome using the Illumina massively-parallel DNA sequencing platform. The original Venter genome sequence is a very-high quality phased assembly based on Sanger sequencing. Therefore, researchers developing novel computational tools for the analysis of human genome sequence variation for the dominant Illumina sequencing technology can test and hone their algorithms by making variant calls from these Venter/HuRef datasets and then immediately confirm the detected variants in the Sanger assembly, freeing them of the need for further experimental validation. This process also applies to implementing and benchmarking existing genome analysis pipelines. We prepared and sequenced 200?bp and 350?bp short-insert whole-genome sequencing libraries (sequenced to 100x and 40x genomic coverages respectively) as well as 2?kb, 5?kb, and 12?kb mate-pair libraries (49x, 122x, and 145x physical coverages respectively). Lastly, we produced a linked-read library (128x physical coverage) from which we also performed haplotype phasing.

Auto Tag: Large genome

Whole-genome sequencing of Chinese yellow catfish provides a valuable genetic resource for high-throughput identification of toxin genes.

The chromosome-level quality genome provides insights into the evolution of the biosynthesis genes for aroma compounds of Osmanthus fragrans.

Phenotypic and genomic comparison of Photorhabdus luminescens subsp. laumondii TT01 and a widely used rifampicin-resistant Photorhabdus luminescens laboratory strain.

Correcting palindromes in long reads after whole-genome amplification.

Growth factor gene IGF1 is associated with bill size in the black-bellied seedcracker Pyrenestes ostrinus.

Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.

Unexpected patterns of segregation distortion at a selfish supergene in the fire ant Solenopsis invicta.

An improved genome assembly for Larimichthys crocea reveals hepcidin gene expansion with diversified regulation and function.

Improved reference genome for the domestic horse increases assembly contiguity and composition.

Cryptocurrencies and Zero Mode Wave guides: An unclouded path to a more contiguous Cannabis sativa L. genome assembly

3D molecular cytology of Hop (Humulus lupulus) meiotic chromosomes reveals non-disomic pairing and segregation, aneuploidy, and genomic structural variation.

Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data

The impact of genome evolution on the allotetraploid Nicotiana rustica – an intriguing story of enhanced alkaloid production.

Three New Genome Assemblies Support a Rapid Radiation in Musa acuminata (Wild Banana).

Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert