Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers in large genome complexities, such as long, highly repetitive, low-complexity regions and duplication events, and differentiating between transcript isoforms that are difficult to resolve with short-read technologies. We present solutions available for both reference genome improvement (>100 MB) and transcriptome research to best leverage long reads that have exceeded 20 Kb in length. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. Highlights from our genome assembly projects using the latest P5-C3 chemistry on model organisms…
Reference quality de novo genome assemblies were once solely the domain of large, well-funded genome projects. While next-generation short read technology removed some of the cost barriers, accurate chromosome-scale assembly remains a real challenge. Here we present efforts to de novo assemble the goat (Capra hircus) genome. Through the combination of single-molecule technologies from Pacific Biosciences (sequencing) and BioNano Genomics (optical mapping) coupled with high-throughput chromosome conformation capture sequencing (Hi-C), an inbred San Clemente goat genome has been sequenced and assembled to a high degree of completeness at a relatively modest cost. Starting with 38 million PacBio reads, we integrated…
It is a common knowledge that sex chromosome mutations are better tolerated and more viable compared to changes in autosomes. This is explained by relatively low gene density in both the X and the Y chromosome and by random X chromosome inactivation in mammalian females buffering the effect of X-aneuploidies. However, it is not well understood why apparently similar sex chromosome abnormalities, such as X-monosomy or certain Y chromosome rearrangements, result in different phenotypic effects in different species. It is thought that this is due to species differences in the organization of the Y chromosome, differences in the set of…
PacBio Sequencing is characterized by very long sequence reads (averaging > 10,000 bases), lack of GC-bias, and high consensus accuracy. These features have allowed the method to provide a new gold standard in de novo genome assemblies, producing highly contiguous (contig N50 > 1 Mb) and accurate (> QV 50) genome assemblies. We will briefly describe the technology and then highlight the full workflow, from sample preparation through sequencing to data analysis, on examples of insect genome assemblies, and illustrate the difference these high-quality genomes represent with regard to biological insights, compared to fragmented draft assemblies generated by short-read sequencing.
Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions…
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a…
Satellite repeats are a structural component of centromeres and telomeres, and in some instances their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50?bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently…
Suppressed recombination allows divergence between homologous sex chromosomes and the functionality of their genes. Here, we reveal patterns of the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua on the basis of cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis. The genome assembly contained 34,105 expressed genes, of which 10,076 were assigned to linkage groups. Genetic mapping and exome resequencing of individuals across the species range both identified the largest linkage group, LG1, as the sex chromosome. Although the sex chromosomes of M. annua are karyotypically…
African cichlid fishes are well known for their rapid radiations and are a model system for studying evolutionary processes. Here we compare multiple, high-quality, chromosome-scale genome assemblies to elucidate the genetic mechanisms underlying cichlid diversification and study how genome structure evolves in rapidly radiating lineages.We re-anchored our recent assembly of the Nile tilapia (Oreochromis niloticus) genome using a new high-density genetic map. We also developed a new de novo genome assembly of the Lake Malawi cichlid, Metriaclima zebra, using high-coverage Pacific Biosciences sequencing, and anchored contigs to linkage groups (LGs) using 4 different genetic maps. These new anchored assemblies allow…
Supernumerary B chromosomes (Bs) are extra karyotype units in addition to A chromosomes, and are found in some fungi and thousands of animals and plant species. Bs are uniquely characterized due to their non-Mendelian inheritance, and represent one of the best examples of genomic conflict. Over the last decades, their genetic composition, function and evolution have remained an unresolved query, although a few successful attempts have been made to address these phenomena. A classical concept based on cytogenetics and genetics is that Bs are selfish and abundant with DNA repeats and transposons, and in most cases, they do not carry…
Y chromosomes often contain amplified genes which can increase dosage of male fertility genes and counteract degeneration via gene conversion. Here we identify genes with increased copy number on both X and Y chromosomes in various species of Drosophila, a pattern that has previously been associated with sex chromosome drive involving the Slx and Sly gene families in mice. We show that recurrent X/Y co-amplification appears to be an important evolutionary force that has shaped gene content evolution of sex chromosomes in Drosophila. We demonstrate that convergent acquisition and amplification of testis expressed gene families are common on Drosophila sex…
In the past several years, single-molecule sequencing platforms, such as those by Pacific Biosciences and Oxford Nanopore Technologies, have become available to researchers and are currently being tested for clinical applications. They offer exceptionally long reads that permit direct sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms. This includes disease-causing long repetitive elements, extreme GC content regions, and complex gene loci. Similarly, these platforms enable structural variation characterization at previously unparalleled resolution and direct detection of epigenetic marks in native DNA. Here, we review how these technologies are opening up new clinical avenues that…
The terrestrial isopod Armadillidium vulgare is an original model to study the evolution of sex determination and symbiosis in animals. Its sex can be determined by ZW sex chromosomes, or by feminizing Wolbachia bacterial endosymbionts. Here, we report the sequence and analysis of the ZW female genome of A. vulgare. A distinguishing feature of the 1.72 gigabase assembly is the abundance of repeats (68% of the genome). We show that the Z and W sex chromosomes are essentially undifferentiated at the molecular level and the W-specific region is extremely small (at most several hundreds of kilobases). Our results suggest that…
The discovery of mutations associated with human genetic dis- ease is an exercise in comparative genomics (see Glossary). Although there are many different strategies and approaches, the central premise is that affected persons harbor a significant excess of pathogenic DNA variants as com- pared with a group of unaffected persons (controls) that is either clinically defined1 or established by surveying large swaths of the general population.2 The more exclu- sive the variant is to the disease, the greater its penetrance, the larger its effect size, and the more relevant it becomes to both disease diagnosis and future therapeutic investigation. The…
Circulating DNA in plasma consists of short DNA fragments. The biological processes generating such fragments are not well understood. DNASE1L3 is a secreted DNASE1-like nuclease capable of digesting DNA in chromatin, and its absence causes anti-DNA responses and autoimmunity in humans and mice. We found that the deletion of Dnase1l3 in mice resulted in aberrations in the fragmentation of plasma DNA. Such aberrations included an increase in short DNA molecules below 120 bp, which was positively correlated with anti-DNA antibody levels. We also observed an increase in long, multinucleosomal DNA molecules and decreased frequencies of the most common end motifs…