PacBio 2013 User Group Meeting Presentation Slides: Lisbeth Guethlein from Stanford University School of Medicine looked at highly repetitive and variable immune regions of the orangutan genome. Guethlein reported that “PacBio managed to accomplish in a week what I have been working on for a couple years” (with Sanger sequencing), and the results were concordant. “Long story short, I was a happy customer.”
Single Molecule, Real-Time (SMRT) Sequencing provides efficient, streamlined solutions to address new frontiers in plant genomes and transcriptomes. Inherent challenges presented by highly repetitive, low-complexity regions and duplication events are directly addressed with multi- kilobase read lengths exceeding 8.5 kb on average, with many exceeding 20 kb. Differentiating between transcript isoforms that are difficult to resolve with short-read technologies is also now possible. We present solutions available for both reference genome and transcriptome research that best leverage long reads in several plant projects including algae, Arabidopsis, rice, and spinach using only the PacBio platform. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. We will share highlights from our genome projects using the latest P5- C3 chemistry to generate high-quality reference genomes with the highest contiguity, contig N50 exceeding 1 Mb, and average base quality of QV50. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented for full transcriptome characterization and targeted surveys of genes with complex structures. PacBio provides the most comprehensive assembly with annotation when combining offerings for both genome and transcriptome research efforts. For more focused investigation, PacBio also offers researchers opportunities to easily investigate and survey genes with complex structures.
Unique haplotype structure determination in human genome using Single Molecule, Real-Time (SMRT) Sequencing of targeted full-length fosmids.
Determination of unique individual haplotypes is an essential first step toward understanding how identical genotypes having different phases lead to different biological interpretations of function, phenotype, and disease. Genome-wide methods for identifying individual genetic variation have been limited in their ability to acquire phased, extended, and complete genomic sequences that are long enough to assemble haplotypes with high confidence. We explore a recombineering approach for isolation and sequencing of a tiling of targeted fosmids to capture interesting regions from human genome. Each individual fosmid contains large genomic fragments (~35?kb) that are sequenced with long-read SMRT technology to generate contiguous long reads. These long reads can be easily de novo assembled for targeted haplotype resolution within an individual’s genomes. The P5-C3 chemistry for SMRT Sequencing generated contiguous, full-length fosmid sequences of 30 to 40 kb in a single read, allowing assembly of resolved haplotypes with minimal data processing. The phase preserved in fosmid clones spanned at least two heterozygous variant loci, providing the essential detail of precise haplotype structures. We show complete assembly of haplotypes for various targeted loci, including the complex haplotypes of the KIR locus (~150 to 200 kb) and conserved extended haplotypes (CEHs) of the MHC region. This method is easily applicable to other regions of the human genome, as well as other genomes.
Comparative genome analysis of Clavibacter michiganensis subsp. michiganensis strains provides insights into genetic diversity and virulence.
Clavibacter michiganensis subsp. michiganensis (Cmm) is a gram positive actinomycete, causing bacterial canker of tomato (Solanum lycopersicum) a disease that can cause significant losses in tomato production. In this study, we determined the complete genome sequence of 13 California Cmm strains and one saprophytic Clavibacter strain using a combination of Ilumina and PacBio sequencing. The California Cmm strains have genome size (3.2 -3.3 mb) similar to the reference strain NCPPB382 (3.3 mb) with =98% sequence identity. Cmm strains from California share =92% genes (8-10% are noble genes) with the reference Cmm strain NCPPB382. Despite this similarity, we detected significant alternatives in California strains with respect to plasmid number, plasmid composition, and genomic island presence indicating acquisition of unique mechanisms controlling virulence. Plasmids pCM1 and pCM2, that were previously demonstrated to be required for NCPPB382 virulence, also differ in their presence and gene content across Cmm strains. pCM2 is absent in some Cmm strains and that still retain virulence in tomato. Saprophytic Clavibacter possess a novel plasmid, pSCM, and lacks the majority of characterized virulence factors. Genome sequence information was also used to design specific and sensitive primer pairs for Cmm detection. A mechanistic understanding of how genomic changes have impacted Cmm virulence and survival across diverse strains will be necessary for developing a robust disease control strategies for bacterial canker of tomato.
Arabica coffee, revered for its taste and aroma, has a complex genome. It is an allotetraploid (2n=4x=44) with a genome size of approximately 1.3 Gb, derived from the recent (< 0.6 Mya) hybridization of two diploid progenitors (2n=2x=22), C. canephora (710 Mb) and C. eugenioides (670 Mb). Both parental species diverged recently (< 4.2Mya) and their genomes are highly homologous. To facilitate assembly, a dihaploid plant was chosen for sequencing. Initial genome assembly attempts with short read data produced an assembly covering 1,031 Mb of the C. arabica genome with a contig L50 of 9kb. By implementation of long read PacBio at greater than 50x coverage and cutting-edge PacBio software, a de novo PacBio-only genome assembly was constructed that covers 1,042 Mb of the genome with an L50 of 267 kb. The two assemblies were assessed and compared to determine gene content, chimeric regions, and the ability to separate the parental genomes. A genetic map that contains 600 SSRs is being used for anchoring the contigs and improve the sub-genome differentiation together with the search of sub-genome specific SNPs. PacBio transcriptome sequencing is currently being added to finalize gene annotation of the polished assembly. The finished genome assembly will be used to guide re-sequencing assemblies of parental genomes (C. canephora and C. eugenioides) as well as a template for GBS analysis and whole genome re-sequencing of a set of C. arabica accessions representative of the species diversity. The obtained data will provide powerful genomic tools to enable more efficient coffee breeding strategies for this crop, which is highly susceptible to climate change and is the main source of income for millions of small farmers in producing countries.
It is a common knowledge that sex chromosome mutations are better tolerated and more viable compared to changes in autosomes. This is explained by relatively low gene density in both the X and the Y chromosome and by random X chromosome inactivation in mammalian females buffering the effect of X-aneuploidies. However, it is not well understood why apparently similar sex chromosome abnormalities, such as X-monosomy or certain Y chromosome rearrangements, result in different phenotypic effects in different species. It is thought that this is due to species differences in the organization of the Y chromosome, differences in the set of genes escaping X-inactivation, and the presence of species/lineage specific sex-linked genes with functions in development and reproduction. Current knowledge about the species differences in sex chromosome organization and function is limited, this despite the availability of reference genome assemblies for most domestic species. It appears that sequence assembly of the X chromosome in most species is rather patchy containing multiple gaps and possible misassemblies, being the poorest in the pseudoautosomal region and in regions containing putative lineage-specific sequences. The Y chromosome, on the other hand, is typically not included in the reference genome and is studied separately, whereas complete sequence assembly of the male-specific portion of the Y is not yet available for any domestic species. In this talk I will discuss comparative organization and function of animal sex chromosomes and related phenotypes proceeding from our research in horses.
The killer immunoglobulin-like receptors (KIR) genes belong to the immunoglobulin superfamily and are widely studied due to the critical role they play in coordinating the innate immune response to infection and disease. Highly accurate, contiguous, long reads, like those generated by SMRT Sequencing, when combined with target-enrichment protocols, provide a straightforward strategy for generating complete de novo assembled KIR haplotypes. We have explored two different methods to capture the KIR region; one applying the use of fosmid clones and one using Nimblegen capture.
Maize is an amazingly diverse crop. A study in 20051 demonstrated that half of the genome sequence and one-third of the gene content between two inbred lines of maize were not shared. This diversity, which is more than two orders of magnitude larger than the diversity found between humans and chimpanzees, highlights the inability of a single reference genome to represent the full pan-genome of maize and all its variants. Here we present and review several efforts to characterize the complete diversity within maize using the highly accurate long reads of PacBio Single Molecule, Real-Time (SMRT) Sequencing. These methods provide a framework for a pan-genomic approach that can be applied to studies of a wide variety of important crop species.
Richard Kuo from the Roslin Institute gave this PAG 2017 talk about using the PacBio Iso-Seq data to generate genome annotations that outperform current gold-standard annotations. Included: findings from a…
PAG PacBio Workshop: Comparative analyses of next generation technologies for generating chromosome-level reference genome assemblies
At PAG 2017, Rockefeller University’s Erich Jarvis offered an in-depth comparison of methods for generating highly contiguous genome assemblies, using hummingbird as the basis to evaluate a number of sequencing…
AGBT Conference: A community effort using multiple technologies to produce a dramatically improved genome assembly of the Zika virus mosquito vector
At AGBT 2017, the Broad Institute’s Daniel Neafsey reported a large collaborative effort to sequence the mosquito that carries Zika virus. The team is using long-read PacBio sequencing to produce…
User Group Meeting: New genotype to phenotype associations in viral metagenomes enabled by SMRT Sequencing
In this PacBio User Group Meeting lightning talk, Shawn Polson of the University of Delaware speaks about viral metagenomes, which are more challenging to distinguish than their bacterial counterparts because…
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.
The use of Online Tools for Antimicrobial Resistance Prediction by Whole Genome Sequencing in MRSA and VRE.
The antimicrobial resistance (AMR) crisis represents a serious threat to public health and has resulted in concentrated efforts to accelerate development of rapid molecular diagnostics for AMR. In combination with publicly-available web-based AMR databases, whole genome sequencing (WGS) offers the capacity for rapid detection of antibiotic resistance genes. Here we studied the concordance between WGS-based resistance prediction and phenotypic susceptibility testing results for methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin resistant Enterococcus (VRE) clinical isolates using publicly-available tools and databases.Clinical isolates prospectively collected at the University of Pittsburgh Medical Center between December 2016 and December 2017 underwent WGS. Antibiotic resistance gene content was assessed from assembled genomes by BLASTn search of online databases. Concordance between WGS-predicted resistance profile and phenotypic susceptibility as well as sensitivity, specificity, positive and negative predictive values (NPV, PPV) were calculated for each antibiotic/organism combination, using the phenotypic results as the gold standard.Phenotypic susceptibility testing and WGS results were available for 1242 isolate/antibiotic combinations. Overall concordance was 99.3% with a sensitivity, specificity, PPV, NPV of 98.7% (95% CI, 97.2-99.5%), 99.6% (95 % CI, 98.8-99.9%), 99.3% (95% CI, 98.0-99.8%), 99.2% (95% CI, 98.3-99.7%), respectively. Additional identification of point mutations in housekeeping genes increased the concordance to 99.4% and the sensitivity to 99.3% (95% CI, 98.2-99.8%) and NPV to 99.4% (95% CI, 98.4-99.8%).WGS can be used as a reliable predicator of phenotypic resistance for both MRSA and VRE using readily-available online tools.Copyright © 2019. Published by Elsevier Ltd.