Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species.A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly…
Herpesviruses have relatively large DNA genomes of more than 150 kb that are difficult to clone and sequence. Bacterial artificial chromosome (BAC) cloning of herpesvirus genomes is a powerful technique that greatly facilitates whole viral genome sequencing as well as functional characterization of reconstituted viruses. We describe recently invented technologies for rapid BAC cloning of herpesvirus genomes using CRISPR/Cas9-mediated homology-directed repair. We focus on recent BAC cloning techniques of Epstein-Barr virus (EBV) genomes and discuss the possible advantages of a CRISPR/Cas9-mediated strategy comparatively with precedent EBV-BAC cloning strategies. We also describe the design decisions of this technology as well as…
Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions.…
PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II’s sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing,…
Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the…
Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ~360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ~25-35 Mya and CFHR1 and CFHR3 ~7-13 Mya). Remarkably, all evolutionary breakpoints share a common ~4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has…
Discovering the mutational events that fuel adaptation to environmental change remains an important challenge for evolutionary biology. The classroom example of a visible evolutionary response is industrial melanism in the peppered moth (Biston betularia): the replacement, during the Industrial Revolution, of the common pale typica form by a previously unknown black (carbonaria) form, driven by the interaction between bird predation and coal pollution. The carbonaria locus has been coarsely localized to a 200-kilobase region, but the specific identity and nature of the sequence difference controlling the carbonaria-typica polymorphism, and the gene it influences, are unknown. Here we show that the…
Panax ginseng C. A. Meyer, reputed as the king of medicinal herbs, has slow growth, long generation time, low seed production and complicated genome structure that hamper its study. Here, we unveil the genomic architecture of tetraploid P. ginseng by de novo genome assembly, representing 2.98 Gbp with 59 352 annotated genes. Resequencing data indicated that diploid Panax species diverged in association with global warming in Southern Asia, and two North American species evolved via two intercontinental migrations. Two whole genome duplications (WGD) occurred in the family Araliaceae (including Panax) after divergence with the Apiaceae, the more recent one contributing to the ability…
Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these…
Interferon inducible transmembrane (IFITM) proteins are effectors of the immune system widely characterized for their role in restricting infection by diverse enveloped and non-enveloped viruses. The chicken IFITM (chIFITM) genes are clustered on chromosome 5 and to date four genes have been annotated, namely chIFITM1, chIFITM3, chIFITM5 and chIFITM10. However, due to poor assembly of this locus in the Gallus Gallus v4 genome, accurate characterization has so far proven problematic. Recently, a new chicken reference genome assembly Gallus Gallus v5 was generated using Sanger, 454, Illumina and PacBio sequencing technologies identifying considerable differences in the chIFITM locus over the previous genome…
Spiders swath their eggs with silk to protect developing embryos and hatchlings. Egg case silks, like other fibrous spider silks, are primarily composed of proteins called spidroins (spidroin = spider-fibroin). Silks, and thus spidroins, are important throughout the lives of spiders, yet the evolution of spidroin genes has been relatively understudied. Spidroin genes are notoriously difficult to sequence because they are typically very long (= 10 kb of coding sequence) and highly repetitive. Here, we investigate the evolution of spider silk genes through long-read sequencing of Bacterial Artificial Chromosome (BAC) clones. We demonstrate that the silver garden spiderArgiope argentatahas multiple…
Persimmon (Diospyros kaki) is a tree crop species that originated in East Asia, consists mainly of hexaploid individuals (2n = 6x = 90) with some nonaploid individuals. One of the unique characteristics of persimmon is the continuous accumulation of proanthocyanidins (PAs) in its fruit until the middle of fruit development, resulting in a strong astringent taste even at commercial fruit maturity. Among persimmon cultivars, pollination-constant and non-astringent (PCNA) types cease PA accumulation in early fruit development and become non-astringent at commercial maturity. PCNA is an allelic trait to non-PCNA and is controlled by a single locus called the ASTRINGENCY (AST)…
The cultivated apple is susceptible to several pests including the rosy apple aphid (RAA; Dysaphis plantaginea Passerini), control of which is mainly based on chemical treatments. A few cases of resistance to aphids have been described in apple germplasm resources, laying the basis for the development of new resistant cultivars by breeding. The cultivar ‘Florina’ is resistant to RAA, and recently, the Dp-fl locus responsible for its resistance was mapped on linkage group 8 of the apple genome. In this paper, a chromosome walking approach was performed by using a ‘Florina’ bacterial artificial chromosome (BAC) library. The walking started from…
The divergence patterns of NBS – LRR genes in soybean Rsv3 locus were deciphered and several divergent alleles ( NBS_C, NBS_D and Columbia NBS_E ) were identified as the likely functional candidates of Rsv3. The soybean Rsv3 locus, which confers resistance to the soybean mosaic virus (SMV), has been previously mapped to a region containing five nucleotide binding site-leucine-rich repeats (NBS-LRR) genes (referred to as nbs_A-E) in Williams 82. In resistant cultivars, however, the number of NBS-LRR genes in this region and their divergence from susceptible alleles remain unclear. In the present study, we constructed and screened a bacterial artificial…
Maize is well known for its exceptional structural diversity, including copy number variants (CNVs) and presence/absence variants (PAVs), and there is growing evidence for the role of structural variation in maize adaptation. While PAVs have been described in this important crop species, they have been only scarcely characterized at the sequence level and the extent of presence/absence variation and relative chromosomal landscape of inbred-specific regions remain to be elucidated.De novo genome sequencing of the French F2 maize inbred line revealed 10,044 novel genomic regions larger than 1 kb, making up 88 Mb of DNA, that are present in F2 but not in…