PacBio 2013 User Group Meeting Presentation Slides: Lisbeth Guethlein from Stanford University School of Medicine looked at highly repetitive and variable immune regions of the orangutan genome. Guethlein reported that “PacBio managed to accomplish in a week what I have been working on for a couple years” (with Sanger sequencing), and the results were concordant. “Long story short, I was a happy customer.”
PAG Conference: Long-read sequencing reveals complex genomic architecture in independent carnivorous plant lineages
In this PAG 2018 presentation, Tanya Renner of Pennsylvania State University shares research using PacBio SMRT Sequencing to understand the genomes and transcriptomes of carnivorous plants. She describes the humped…
This tutorial provides an overview of the PacBio Demultiplex Barcodes analysis application in SMRT Link, followed by de novo assembly of the demultiplexed samples using HGAP4 for the Multiplexed Microbial…
In this presentation Fritz Sedlazeck describes his latest work to obtain comprehensive genomes leveraging long-read sequencing and linked reads.
In this PacBio User Group Meeting presentation, Jonas Korlach and Roberto Lleras share the latest updates to the structural variation application and analysis tools.
In this PacBio User Group Meeting presentation, Tim Smith of the USDA’s Agricultural Research Service describes efforts to generate reference-grade genome assemblies for various bovine species and analyze them to…
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gene transfer between bacterial species is an important mechanism for adaptation. For example, sets of genes that confer the ability to form nitrogen-fixing root nodules on host plants have frequently moved between Rhizobium species. It is not clear, though, whether such transfer is exceptional, or if frequent inter-species introgression is typical. To address this, we sequenced the genomes of 196 isolates of the Rhizobium leguminosarum species complex obtained from root nodules of white clover (Trifolium repens). Core gene phylogeny placed the isolates into five distinct genospecies that show high intra-genospecies recombination rates and remarkably different demographic histories. Most gene phylogenies were largely concordant with the genospecies, indicating that recent gene transfer between genospecies was rare. In contrast, very similar symbiosis gene sequences were found in two or more genospecies, suggesting recent horizontal transfer. The replication and conjugative transfer genes of the plasmids carrying the symbiosis genes showed a similar pattern, implying that introgression occurred by conjugative plasmid transfer. The only other regions that showed strong phylogenetic discordance with the genospecies classification were two small chromosomal clusters, one neighbouring a conjugative transfer system. Phage-related sequences were observed in the genomes, but appeared to have very limited impact on introgression. Introgression among these closely-related species has been very limited, confined to the symbiosis plasmids and a few chromosomal islands. Both introgress through conjugative transfer, but have been subject to different types of selective forces.
Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes
As they migrated out of Africa and into Europe and Asia, anatomically modern humans interbred with archaic hominins, such as Neanderthals and Denisovans. The result of this genetic introgression on the recipient populations has been of considerable interest, especially in cases of selection for specific archaic genetic variants. Hsieh et al. characterized adaptive structural variants and copy number variants that are likely targets of positive selection in Melanesians. Focusing on population-specific regions of the genome that carry duplicated genes and show an excess of amino acid replacements provides evidence for one of the mechanisms by which genetic novelty can arise and result in differentiation between human genomes.Science, this issue p. eaax2083INTRODUCTIONCharacterizing genetic variants underlying local adaptations in human populations is one of the central goals of evolutionary research. Most studies have focused on adaptive single-nucleotide variants that either arose as new beneficial mutations or were introduced after interbreeding with our now-extinct relatives, including Neanderthals and Denisovans. The adaptive role of copy number variants (CNVs), another well-known form of genomic variation generated through deletions or duplications that affect more base pairs in the genome, is less well understood, despite evidence that such mutations are subject to stronger selective pressures.RATIONALEThis study focuses on the discovery of introgressed and adaptive CNVs that have become enriched in specific human populations. We combine whole-genome CNV calling and population genetic inference methods to discover CNVs and then assess signals of selection after controlling for demographic history. We examine 266 publicly available modern human genomes from the Simons Genome Diversity Project and genomes of three ancient homininstextemdasha Denisovan, a Neanderthal from the Altai Mountains in Siberia, and a Neanderthal from Croatia. We apply long-read sequencing methods to sequence-resolve complex CNVs of interest specifically in the Melanesianstextemdashan Oceanian population distributed from Papua New Guinea to as far east as the islands of Fiji and known to harbor some of the greatest amounts of Neanderthal and Denisovan ancestry.RESULTSConsistent with the hypothesis of archaic introgression outside Africa, we find a significant excess of CNV sharing between modern non-African populations and archaic hominins (P = 0.039). Among Melanesians, we observe an enrichment of CNVs with potential signals of positive selection (n = 37 CNVs), of which 19 CNVs likely introgressed from archaic hominins. We show that Melanesian-stratified CNVs are significantly associated with signals of positive selection (P = 0.0323). Many map near or within genes associated with metabolism (e.g., ACOT1 and ACOT2), development and cell cycle or signaling (e.g., TNFRSF10D and CDK11A and CDK11B), or immune response (e.g., IFNLR1). We characterize two of the largest and most complex CNVs on chromosomes 16p11.2 and 8p21.3 that introgressed from Denisovans and Neanderthals, respectively, and are absent from most other human populations. At chromosome 16p11.2, we sequence-resolve a large duplication of >383 thousand base pairs (kbp) that originated from Denisovans and introgressed into the ancestral Melanesian population 60,000 to 170,000 years ago. This large duplication occurs at high frequency (>79%) in diverse Melanesian groups, shows signatures of positive selection, and maps adjacent to Homo sapienstextendashspecific duplications that predispose to rearrangements associated with autism. On chromosome 8p21.3, we identify a Melanesian haplotype that carries two CNVs, a ~6-kbp deletion, and a ~38-kbp duplication, with a Neanderthal origin and that introgressed into non-Africans 40,000 to 120,000 years ago. This CNV haplotype occurs at high frequency (44%) and shows signals consistent with a partial selective sweep in Melanesians. Using long-read sequencing genomic and transcriptomic data, we reconstruct the structure and complex evolutionary history for these two CNVs and discover previously undescribed duplicated genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that show an excess of amino acid replacements consistent with the action of positive selection.CONCLUSIONOur results suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation that is absent from current reference genomes.Large adaptive-introgressed CNVs at chromosomes 8p21.3 and 16p11.2 in Melanesians.The magnifying glasses highlight structural differences between the archaic (top) and reference (bottom) genomes. Neanderthal (red) and Denisovan (blue) haplotypes encompassing large CNVs occur at high frequencies in Melanesians (44 and 79%, respectively) but are absent (black) in all non-Melanesians. These CNVs create positively selected genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that are absent from the reference genome.Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.
A High-Quality Grapevine Downy Mildew Genome Assembly Reveals Rapidly Evolving and Lineage-Specific Putative Host Adaptation Genes.
Downy mildews are obligate biotrophic oomycete pathogens that cause devastating plant diseases on economically important crops. Plasmopara viticola is the causal agent of grapevine downy mildew, a major disease in vineyards worldwide. We sequenced the genome of Pl. viticola with PacBio long reads and obtained a new 92.94?Mb assembly with high contiguity (359 scaffolds for a N50 of 706.5?kb) due to a better resolution of repeat regions. This assembly presented a high level of gene completeness, recovering 1,592 genes encoding secreted proteins involved in plant-pathogen interactions. Plasmopara viticola had a two-speed genome architecture, with secreted protein-encoding genes preferentially located in gene-sparse, repeat-rich regions and evolving rapidly, as indicated by pairwise dN/dS values. We also used short reads to assemble the genome of Plasmopara muralis, a closely related species infecting grape ivy (Parthenocissus tricuspidata). The lineage-specific proteins identified by comparative genomics analysis included a large proportion of RxLR cytoplasmic effectors and, more generally, genes with high dN/dS values. We identified 270 candidate genes under positive selection, including several genes encoding transporters and components of the RNA machinery potentially involved in host specialization. Finally, the Pl. viticola genome assembly generated here will allow the development of robust population genomics approaches for investigating the mechanisms involved in adaptation to biotic and abiotic selective pressures in this species. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Recombination between loci underlying mate choice and ecological traits is a major evolutionary force acting against speciation with gene flow. The evolution of linkage disequilibrium between such loci is therefore a fundamental step in the origin of species. Here, we show that this process can take place in the absence of physical linkage in hamlets-a group of closely related reef fishes from the wider Caribbean that differ essentially in colour pattern and are reproductively isolated through strong visually-based assortative mating. Using full-genome analysis, we identify four narrow genomic intervals that are consistently differentiated among sympatric species in a backdrop of extremely low genomic divergence. These four intervals include genes involved in pigmentation (sox10), axial patterning (hoxc13a), photoreceptor development (casz1) and visual sensitivity (SWS and LWS opsins) that develop islands of long-distance and inter-chromosomal linkage disequilibrium as species diverge. The relatively simple genomic architecture of species differences facilitates the evolution of linkage disequilibrium in the presence of gene flow.
Microbes have been critical drivers of evolutionary innovation in animals. To understand the processes that influence the origin of specialized symbiotic organs, we report the sequencing and analysis of the genome of Euprymna scolopes, a model cephalopod with richly characterized host-microbe interactions. We identified large-scale genomic reorganization shared between E. scolopes and Octopus bimaculoides and posit that this reorganization has contributed to the evolution of cephalopod complexity. To reveal genomic signatures of host-symbiont interactions, we focused on two specialized organs of E. scolopes: the light organ, which harbors a monoculture of Vibrio fischeri, and the accessory nidamental gland (ANG), a reproductive organ containing a bacterial consortium. Our findings suggest that the two symbiotic organs within E. scolopes originated by different evolutionary mechanisms. Transcripts expressed in these microbe-associated tissues displayed their own unique signatures in both coding sequences and the surrounding regulatory regions. Compared with other tissues, the light organ showed an abundance of genes associated with immunity and mediating light, whereas the ANG was enriched in orphan genes known only from E. scolopes Together, these analyses provide evidence for different patterns of genomic evolution of symbiotic organs within a single host. Copyright © 2019 the Author(s). Published by PNAS.
Defenses conferred by microbial symbionts play a vital role in the health and fitness of their animal hosts. An important outstanding question in the study of defensive symbiosis is what determines long term stability and effectiveness against diverse natural enemies. In this study, we combine genome and transcriptome sequencing, symbiont transfection and parasite protection experiments, and toxin activity assays to examine the evolution of the defensive symbiosis between Drosophila flies and their vertically transmitted Spiroplasma bacterial symbionts, focusing in particular on ribosome-inactivating proteins (RIPs), symbiont-encoded toxins that have been implicated in protection against both parasitic wasps and nematodes. Although many strains of Spiroplasma, including the male-killing symbiont (sMel) of Drosophila melanogaster, protect against parasitic wasps, only the strain (sNeo) that infects the mycophagous fly Drosophila neotestacea appears to protect against parasitic nematodes. We find that RIP repertoire is a major differentiating factor between strains that do and do not offer nematode protection, and that sMel RIPs do not show activity against nematode ribosomes in vivo. We also discovered a strain of Spiroplasma infecting a mycophagous phorid fly, Megaselia nigra. Although both the host and its Spiroplasma are distantly related to D. neotestacea and its symbiont, genome sequencing revealed that the M. nigra symbiont encodes abundant and diverse RIPs, including plasmid-encoded toxins that are closely related to the RIPs in sNeo. Our results suggest that distantly related Spiroplasma RIP toxins may perform specialized functions with regard to parasite specificity and suggest an important role for horizontal gene transfer in the emergence of novel defensive phenotypes.
Combining high-throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short-read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long-read sequencing technology for comparative genomic analyses of the haemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom-made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100 kb) and MN (~200 kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genomic organization of the Hb genes within this lineage, yet with several, lineage-specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long-read capture as a versatile approach for comparative genomic studies by generation of a cross-species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes. © 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
The enormous sizes of adhesion G protein-coupled receptors (aGPCRs) go along with complex genomic exon-intron architectures giving rise to multiple mRNA variants. There is a need for a comprehensive catalog of aGPCR variants for proper evaluation of the complex functions of aGPCRs found in structural, in vitro and animal model studies. We used an established bioinformatics pipeline to extract, quantify and visualize mRNA variants of aGPCRs from deeply sequenced transcriptomes. Data analysis showed that aGPCRs have multiple transcription start sites even within introns and that tissue-specific splicing is frequent. On average, 19 significantly expressed transcript variants are derived from a given aGPCR gene. The domain architecture of the N terminus encoded by transcript variants often differs and N termini without or with an incomplete seven-helix transmembrane anchor as well as separate seven-helix transmembrane domains are frequently derived from aGPCR genes. Experimental analyses of selected aGPCR transcript variants revealed marked functional differences. Our analysis has an impact on a rational design of aGPCR constructs for structural analyses and gene-deficient mouse lines and provides new support for independent functions of both, the large N terminus and the transmembrane domain of aGPCRs.