Since the advent of Next-Generation Sequencing (NGS), the cost of de novo genome sequencing and assembly have dropped precipitately, which has spurred interest in genome sequencing overall. Unfortunately the contiguity of the NGS assembled sequences, as well as the accuracy of these assemblies have suffered. Additionally, most NGS de novo assemblies leave large portions of genomes unresolved, and repetitive regions are often collapsed. When compared to the reference quality genome sequences produced before the NGS era, the new sequences are highly fragmented and often prove to be difficult to properly annotate. In some cases the contiguous portions are smaller than the average gene size making the sequence not nearly as useful for biologists as the earlier reference quality genomes including of Human, Mouse, C. elegans, or Drosophila. Recently, new 3rd generation sequencing technologies, long-range molecular techniques, and new informatics tools have facilitated a return to high quality assembly. We will discuss the capabilities of the technologies and assess their impact on assembly projects across the tree of life from small microbial and fungal genomes through large plant and animal genomes. Beyond improvements to contiguity, we will focus on the additional biological insights that can be made with better assemblies, including more complete analysis genes in their flanking regulatory context, in-depth studies of transposable elements and other complex gene families, and long-range synteny analysis of entire chromosomes. We will also discuss the need for new algorithms for representing and analyzing collections of many complete genomes at once.
High-Quality Whole-Genome Sequence of an Estradiol-Degrading Strain, Novosphingobium tardaugens NBRC 16725
In this work we report the complete sequence and assembly of the estradiol-degrading bacterium Novosphingobium tardaugens NBRC 16725 genome into a single contig using the Pacific Biosciences RS II system.
Recombination between loci underlying mate choice and ecological traits is a major evolutionary force acting against speciation with gene flow. The evolution of linkage disequilibrium between such loci is therefore a fundamental step in the origin of species. Here, we show that this process can take place in the absence of physical linkage in hamlets-a group of closely related reef fishes from the wider Caribbean that differ essentially in colour pattern and are reproductively isolated through strong visually-based assortative mating. Using full-genome analysis, we identify four narrow genomic intervals that are consistently differentiated among sympatric species in a backdrop of extremely low genomic divergence. These four intervals include genes involved in pigmentation (sox10), axial patterning (hoxc13a), photoreceptor development (casz1) and visual sensitivity (SWS and LWS opsins) that develop islands of long-distance and inter-chromosomal linkage disequilibrium as species diverge. The relatively simple genomic architecture of species differences facilitates the evolution of linkage disequilibrium in the presence of gene flow.
Icefishes (suborder Notothenioidei; family Channichthyidae) are the only vertebrates that lack functional haemoglobin genes and red blood cells. Here, we report a high-quality genome assembly and linkage map for the Antarctic blackfin icefish Chaenocephalus aceratus, highlighting evolved genomic features for its unique physiology. Phylogenomic analysis revealed that Antarctic fish of the teleost suborder Notothenioidei, including icefishes, diverged from the stickleback lineage about 77 million years ago and subsequently evolved cold-adapted phenotypes as the Southern Ocean cooled to sub-zero temperatures. Our results show that genes involved in protection from ice damage, including genes encoding antifreeze glycoprotein and zona pellucida proteins, are highly expanded in the icefish genome. Furthermore, genes that encode enzymes that help to control cellular redox state, including members of the sod3 and nqo1 gene families, are expanded, probably as evolutionary adaptations to the relatively high concentration of oxygen dissolved in cold Antarctic waters. In contrast, some crucial regulators of circadian homeostasis (cry and per genes) are absent from the icefish genome, suggesting compromised control of biological rhythms in the polar light environment. The availability of the icefish genome sequence will accelerate our understanding of adaptation to extreme Antarctic environments.
Closing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t fully utilize the information contained in the sequence data. For example, while it is known that many gaps are caused by genomic repeats, existing tools often ignore many sequence reads that originate from a repeat-related gap.We compare GAPPadder with GapCloser, GapFiller and Sealer on one bacterial genome, human chromosome 14 and the human whole genome with paired-end and mate-paired reads with both short and long insert sizes. Empirical results show that GAPPadder can close more gaps than these existing tools. Besides closing gaps on draft genomes assembled only from short sequence reads, GAPPadder can also be used to close gaps for draft genomes assembled with long reads. We show GAPPadder can close gaps on the bed bug genome and the Asian sea bass genome that are assembled partially and fully with long reads respectively. We also show GAPPadder is efficient in both time and memory usage.In this paper, we propose a new approach called GAPPadder for gap closing. The main advantage of GAPPadder is that it uses more information in sequence data for gap closing. In particular, GAPPadder finds and uses reads that originate from repeat-related gaps. We show that these repeat-associated reads are useful for gap closing, even though they are ignored by all existing tools. Other main features of GAPPadder include utilizing the information in sequence reads with different insert sizes and performing two-stage local assembly of gap sequences. The results show that our method can close more gaps than several existing tools. The software tool, GAPPadder, is available for download at https://github.com/Reedwarbler/GAPPadder .
Characterization of a male specific region containing a candidate sex determining gene in Atlantic cod.
The genetic mechanisms determining sex in teleost fishes are highly variable and the master sex determining gene has only been identified in few species. Here we characterize a male-specific region of 9?kb on linkage group 11 in Atlantic cod (Gadus morhua) harboring a single gene named zkY for zinc knuckle on the Y chromosome. Diagnostic PCR test of phenotypically sexed males and females confirm the sex-specific nature of the Y-sequence. We identified twelve highly similar autosomal gene copies of zkY, of which eight code for proteins containing the zinc knuckle motif. 3D modeling suggests that the amino acid changes observed in six copies might influence the putative RNA-binding specificity. Cod zkY and the autosomal proteins zk1 and zk2 possess an identical zinc knuckle structure, but only the Y-specific gene zkY was expressed at high levels in the developing larvae before the onset of sex differentiation. Collectively these data suggest zkY as a candidate master masculinization gene in Atlantic cod. PCR amplification of Y-sequences in Arctic cod (Arctogadus glacialis) and Greenland cod (Gadus macrocephalus ogac) suggests that the male-specific region emerged in codfishes more than 7.5 million years ago.