New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls =50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.
Supernumerary B chromosomes (Bs) are extra karyotype units in addition to A chromosomes, and are found in some fungi and thousands of animals and plant species. Bs are uniquely characterized due to their non-Mendelian inheritance, and represent one of the best examples of genomic conflict. Over the last decades, their genetic composition, function and evolution have remained an unresolved query, although a few successful attempts have been made to address these phenomena. A classical concept based on cytogenetics and genetics is that Bs are selfish and abundant with DNA repeats and transposons, and in most cases, they do not carry any function. However, recently, the modern quantum development of high scale multi-omics techniques has shifted B research towards a new-born field that we call “B-omics”. We review the recent literature and add novel perspectives to the B research, discussing the role of new technologies to understand the mechanistic perspectives of the molecular evolution and function of Bs. The modern view states that B chromosomes are enriched with genes for many significant biological functions, including but not limited to the interesting set of genes related to cell cycle and chromosome structure. Furthermore, the presence of B chromosomes could favor genomic rearrangements and influence the nuclear environment affecting the function of other chromatin regions. We hypothesize that B chromosomes might play a key function in driving their transmission and maintenance inside the cell, as well as offer an extra genomic compartment for evolution.
Into the Thermus Mobilome: Presence, Diversity and Recent Activities of Insertion Sequences Across Thermus spp.
A high level of transposon-mediated genome rearrangement is a common trait among microorganisms isolated from thermal environments, probably contributing to the extraordinary genomic plasticity and horizontal gene transfer (HGT) observed in these habitats. In this work, active and inactive insertion sequences (ISs) spanning the sequenced members of the genus Thermus were characterized, with special emphasis on three T. thermophilus strains: HB27, HB8, and NAR1. A large number of full ISs and fragments derived from different IS families were found, concentrating within megaplasmids present in most isolates. Potentially active ISs were identified through analysis of transposase integrity, and domestication-related transposition events of ISTth7 were identified in laboratory-adapted HB27 derivatives. Many partial copies of ISs appeared throughout the genome, which may serve as specific targets for homologous recombination contributing to genome rearrangement. Moreover, recruitment of IS1000 32 bp segments as spacers for CRISPR sequence was identified, pointing to the adaptability of these elements in the biology of these thermophiles. Further knowledge about the activity and functional diversity of ISs in this genus may contribute to the generation of engineered transposons as new genetic tools, and enrich our understanding of the outstanding plasticity shown by these thermophiles.
A full-length transcriptome of Sepia esculenta using a combination of single-molecule long-read (SMRT) and Illumina sequencing
As an economically important cephalopods species, wild-caught Sepia esculenta fishery has suffered a server decline due to over-fishing and ocean environmental damage. To restore this seriously declining fishery resource, we should understand the genetic foundation and molecular mechanism of spawning, reproduction and mortal of golden cuttlefish. In this study, we generated the full-length transcriptome of S. esculenta based on the total RNA of tissue samples (brain, optic gland, nidamental gland, ovary and muscle at different developmental stages) using a combination of single-molecule real-time (SMRT) and Illumina RNA-seq technology. A total of 14.16 Gb SMRT sequencing data were assembled into 94,635 transcripts. Meanwhile, 35.15 Gb Illumina HiSeq data were assembled into 177,226 non-redundant transcripts. Then, we merged SMRT and Illumina assembled data to generate a more complete/full-length S. esculenta transcriptome with 177,951 high-quality transcripts. Based on the obtained transcriptome data, total 81,459 transcripts were annotated in at least one of seven functional databases and 49,189 nucleotide sequences of coding regions were identified. Additionally, 161,327 SSRs distributed in 64,933 transcripts were identified based on SSR analysis. This full-length and high-quality transcriptome of S. esculenta can provide an important foundation for future genomic research on growth and development, reproduction and mortal of cephalopod and further recovery of this recessionary fisheries resources.
Physiological properties and genetic analysis related to exopolysaccharide (EPS) production in the fresh-water unicellular cyanobacterium Aphanothece sacrum (Suizenji Nori).
The clonal strains, phycoerythrin(PE)-rich- and PE-poor strains, of the unicellular, fresh water cyanobacterium Aphanothece sacrum (Suringar) Okada (Suizenji Nori, in Japanese) were isolated from traditional open-air aquafarms in Japan. A. sacrum appeared to be oligotrophic on the basis of its growth characteristics. The optimum temperature for growth was around 20°C. Maximum growth and biomass increase at 20°C was obtained under light intensities between 40 to 80 µmol m-2 s-1 (fluorescent lamps, 12 h light/12 h dark cycles) and between 40 to 120 µmol m-2 s-1 for PE-rich and PE-poor strains, respectively, of A. sacrum . Purified exopolysaccharide (EPS) of A. sacrum has a molecular weight of ca. 104 kDa with five major monosaccharides (glucose, xylose, rhamnose, galactose and mannose; =85 mol%). We also deciphered the whole genome sequence of the two strains of A. sacrum. The putative genes involved in the polymerization, chain length control, and export of EPS would contribute to understand the biosynthetic process of their extremely high molecular weight EPS. The putative genes encoding Wzx-Wzy-Wzz- and Wza-Wzb-Wzc were conserved in the A. sacrum strains FPU1 and FPU3. This result suggests that the Wzy-dependent pathway participates in the EPS production of A. sacrum.
Biolistic transformation delivers nucleic acids into plant cells by bombarding the cells with microprojectiles, which are micron-scale, typically gold particles. Despite the wide use of this technique, little is known about its effect on the cell’s genome. We biolistically transformed linear 48-kb phage lambda and two different circular plasmids into rice (Oryza sativa) and maize (Zea mays) and analyzed the results by whole genome sequencing and optical mapping. Although some transgenic events showed simple insertions, others showed extreme genome damage in the form of chromosome truncations, large deletions, partial trisomy, and evidence of chromothripsis and breakage-fusion bridge cycling. Several transgenic events contained megabase-scale arrays of introduced DNA mixed with genomic fragments assembled by nonhomologous or microhomology-mediated joining. Damaged regions of the genome, assayed by the presence of small fragments displaced elsewhere, were often repaired without a trace, presumably by homology-dependent repair (HDR). The results suggest a model whereby successful biolistic transformation relies on a combination of end joining to insert foreign DNA and HDR to repair collateral damage caused by the microprojectiles. The differing levels of genome damage observed among transgenic events may reflect the stage of the cell cycle and the availability of templates for HDR. © 2019 American Society of Plant Biologists. All rights reserved.
Full-length transcriptome sequences obtained by a combination of sequencing platforms applied to heat shock proteins and polyunsaturated fatty acids biosynthesis in Pyropia haitanensis
Pyropia haitanensis is a high-yield commercial seaweed in China. Pyropia haitanensis farms often suffer from problems such as severe germplasm degeneration, while the mechanisms underlying resistance to abiotic stresses remain unknown because of lacking genomic information. Although many previous studies focused on using next-generation sequencing (NGS) technologies, the short-read sequences generated by NGS generally prevent the assembly of full-length transcripts, and then limit screening functional genes. In the present study, which was based on hybrid sequencing (NGS and single-molecular real-time sequencing) of the P. haitanensis thallus transcriptome, we obtained high-quality full-length transcripts with a mean length of 2998 bp and an N50 value of 3366 bp. A total of 14,773 unigenes (93.52%) were annotated in at least one database, while approximately 60% of all unigenes were assembled by short Illumina reads. Moreover, we herein suggested that the genes involved in the biosynthesis of polyunsaturated fatty acids and heat shock proteins play an important role in the process of development and resistance to abiotic stresses in P. haitanensis. The present study, together with previously published ones, may facilitate seaweed transcriptome research.
Many of our major crop species are polyploids, containing more than one genome or set of chromosomes. Polyploid crops present unique challenges, including difficulties in genome assembly, in discriminating between multiple gene and sequence copies, and in genetic mapping, hindering use of genomic data for genetics and breeding. Polyploid genomes may also be more prone to containing structural variation, such as loss of gene copies or sequences (presence–absence variation) and the presence of genes or sequences in multiple copies (copy-number variation). Although the two main types of genomic structural variation commonly identified are presence–absence variation and copy-number variation, we propose that homeologous exchanges constitute a third major form of genomic structural variation in polyploids. Homeologous exchanges involve the replacement of one genomic segment by a similar copy from another genome or ancestrally duplicated region, and are known to be extremely common in polyploids. Detecting all kinds of genomic structural variation is challenging, but recent advances such as optical mapping and long-read sequencing offer potential strategies to help identify structural variants even in complex polyploid genomes. All three major types of genomic structural variation (presence–absence, copy-number, and homeologous exchange) are now known to influence phenotypes in crop plants, with examples of flowering time, frost tolerance, and adaptive and agronomic traits. In this review, we summarize the challenges of genome analysis in polyploid crops, describe the various types of genomic structural variation and the genomics technologies and data that can be used to detect them, and collate information produced to date related to the impact of genomic structural variation on crop phenotypes. We highlight the importance of genomic structural variation for the future genetic improvement of polyploid crops.
Multiple cotton genomes (diploid and tetraploid) have been assembled. However, genomic variations between cultivars of allotetraploid upland cotton (Gossypium hirsutum L.), the most widely planted cotton species in the world, remain unexplored. Here, we use single-molecule long read and Hi-C sequencing technologies to assemble genomes of the two upland cotton cultivars TM-1 and zhongmiansuo24 (ZM24). Comparisons among TM-1 and ZM24 assemblies and the genomes of the diploid ancestors reveal a large amount of genetic variations. Among them, the top three longest structural variations are located on chromosome A08 of the tetraploid upland cotton, which account for ~30% total length of this chromosome. Haplotype analyses of the mapping population derived from these two cultivars and the germplasm panel show suppressed recombination rates in this region. This study provides additional genomic resources for the community, and the identified genetic variations, especially the reduced meiotic recombination on chromosome A08, will help future breeding.
Efficient crop improvement depends on the application of accurate genetic information contained in diverse germplasm resources. Here we report a reference-grade genome of wild soybean accession W05, with a final assembled genome size of 1013.2?Mb and a contig N50 of 3.3?Mb. The analytical power of the W05 genome is demonstrated by several examples. First, we identify an inversion at the locus determining seed coat color during domestication. Second, a translocation event between chromosomes 11 and 13 of some genotypes is shown to interfere with the assignment of QTLs. Third, we find a region containing copy number variations of the Kunitz trypsin inhibitor (KTI) genes. Such findings illustrate the power of this assembly in the analysis of large structural variations in soybean germplasm collections. The wild soybean genome assembly has wide applications in comparative genomic and evolutionary studies, as well as in crop breeding and improvement programs.