The three classes of genes that comprise the MHC gene family are actively involved in determining donor-recipient compatibility for organ transplant, as well as susceptibility to autoimmune diseases via cross-reacting immunization. Specifically, Class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DQ and -DP are considered medically important for genetic analysis to determine histocompatibility. They are highly polymorphic and have thousands of alleles implicated in disease resistance and susceptibility. The importance of full-length HLA gene sequencing for genotyping, detection of null alleles, and phasing is now widely acknowledged. While DNA-sequencing-based HLA genotyping has become routine, only 7% of…
Sequence based typing (SBT) is considered the gold standard method for HLA typing. Current SBT methods are rather laborious and are prone to phase ambiguity problems and genotyping uncertainties. As a result, the NGS community is rapidly seeking to remedy these challenges, to produce high resolution and high throughput HLA sequencing conducive to a clinical setting. Today, second generation NGS technologies are limited in their ability to yield full length HLA sequences required for adequate phasing and identification of novel alleles. Here we present the use of single molecule real time (SMRT) sequencing as a means of determining full length/long…
Human MHC class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DP and -DQ, play a critical role in the immune system as major factors responsible for organ transplant rejection. The have a direct or linkage-based association with several diseases, including cancer and autoimmune diseases, and are important targets for clinical and drug sensitivity research. HLA genes are also highly polymorphic and their diversity originates from exonic combinations as well as recombination events. A large number of new alleles are expected to be encountered if these genes are sequenced through the UTRs. Thus allele-level resolution is strongly preferred…
MHC class I and II genes are critically monitored by high-resolution sequencing for organ transplant decisions due to their role in GVHD. Their direct or linkage-based causal association, have increased their prominence as targets for drug sensitivity, autoimmune, cancer and infectious disease research. Monitoring HLA genes can however be tricky due to their highly polymorphic nature. Allele-level resolution is thus strongly preferred. However, most studies were historically focused on peptide binding domains of the HLA genes, due to technological challenges. As a result knowledge about the functional role of polymorphisms outside of exons 2 and 3 of HLA genes was…
Outside of the simplest cases (haploid, bacteria, or inbreds), genomic information is not carried in a single reference per individual, but rather has higher ploidy (n=>2) for almost all organisms. The existence of two or more highly related sequences within an individual makes it extremely difficult to build high quality, highly contiguous genome assemblies from short DNA fragments. Based on the earlier work on a polyploidy aware assembler, FALCON ( https://github.com/PacificBiosciences/FALCON) , we developed new algorithms and software (“FALCON-unzip”) for de novo haplotype reconstructions from SMRT Sequencing data. We generate two datasets for developing the algorithms and the prototype software:…
The Pacific Biosciences Iso-Seq method, which can produce high-quality isoform sequences of 10 kb and longer, has been used to annotate many important plant and animal genomes. Here, we develop an algorithm called IsoPhase that postprocesses Iso-Seq data to retrieve allele specific isoform information. Using simulated data, we show that for both diploid and tetraploid genomes, IsoPhase results in good SNP recovery with low FDR at error rates consistent with CCS reads. We apply IsoPhase to a haplotyperesolved genome assembly and multiple fetal tissue Iso-Seq dataset from a F1 cross of Angus x Brahman cattle subspecies. IsoPhase-called haplotypes were validated…
Incomplete annotation of genomes represents a major impediment to understanding biological processes, functional differences between species, and evolutionary mechanisms. Often, genes that are large, embedded within duplicated genomic regions, or associated with repeats are difficult to study by short-read expression profiling and assembly. In addition, most genes in eukaryotic organisms produce alternatively spliced isoforms, broadening the diversity of proteins encoded by the genome, which are difficult to resolve with short-read methods. Short-read RNA sequencing (RNA-seq) works by physically shearing transcript isoforms into smaller pieces and bioinformatically reassembling them, leaving opportunity for misassembly or incomplete capture of the full diversity of…
An important need in analyzing complex genomes is the ability to separate and phase haplotypes. While whole genome assembly can deliver this information, it cannot reveal whether there is allele-specific gene or isoform expression. The PacBio Iso-Seq method, which can produce high-quality transcript sequences of 10 kb and longer, has been used to annotate many important plant and animal genomes. We present an algorithm called IsoPhase that post-processes Iso-Seq data for transcript-based haplotyping. We applied IsoPhase to a maize Iso-Seq dataset consisting of two homozygous parents and two F1 cross hybrids. We validated the majority of the SNPs called with…
The PacBio Iso-Seq method produces high-quality, full-length transcripts of up to 10 kb and longer and has been used to annotate many important plant and animal genomes. We describe here the full Iso-Seq ecosystem that enables researchers to achieve high-quality genome annotations. The Iso-Seq Express workflow is a 1-day protocol that requires only 60-300 ng of total RNA and supports multiplexing of different tissues. Sequencing on a single SMRT Cell 8M on the Sequel II System produces up to 4 million full-length reads, sufficient to exhaustively characterize a whole transcriptome on the order of 15,000-17,000 genes with 100,000 or more…
With Single Molecule, Real-Time (SMRT) Sequencing and the Sequel Systems, you can easily and affordably sequence complete transcript isoforms in genes of interest or across the entire transcriptome. The Iso-Seq method allows users to generate full-length cDNA sequences up to 10 kb in length — with no assembly required — to confidently characterize full-length transcript isoforms.
Melissa Laird Smith discussed how the Icahn School of Medicine at Mount Sinai uses long-read sequencing for translational research. She gave several examples of targeted sequencing projects run on the Sequel System including CYP2D6, phased mutations of GLA in Fabry’s disease, structural variation breakpoint validation in glioblastoma, and full-length immune profiling of TCR sequences.
Human MHC class I genes HLA-A, -B, -C, and class II genes HLA -DR, -DQ, and -DP play a critical role in the immune system as primary factors responsible for organ transplant rejection. Additionally, the HLA genes are important targets for clinical and drug sensitivity research because of their direct or linkage-based association with several diseases, including cancer, and autoimmune diseases. HLA genes are highly polymorphic, and their diversity originates from exonic combinations as well as recombination events. With full-length gene sequencing, a significant increase of new alleles in the HLA database is expected, stressing the need for high-resolution sequencing.…
In this ASHG workshop presentation, Stuart Scott of the Icahn School of Medicine at Mount Sinai, presented on using the PacBio system for amplicon sequencing in pharmacogenomics and clinical genomics workflows. Accurate, phased amplicon sequence for the CYP2D6 gene, for example, has allowed his team to reclassify up to 20% of samples, providing data that’s critical for drug metabolism and dosing. In clinical genomics, Scott presented several case studies illustrating the utility of highly accurate, long-read sequencing for assessing copy number variants and for confirming a suspected medical diagnosis in rare disease patients. He noted that the latest Sequel System…
In this presentation, Elizabeth Tseng explains how PacBio’s full-length RNA Sequencing using the Iso-Seq method can characterize full-length transcripts without the need for computational transcript assembly. The Iso-Seq method is fully supported bioinformatically through PacBio’s SMRT Analysis software that outputs high-quality, full-length transcript sequences that can be used for genome annotation and novel gene discovery. Elizabeth shows that the highly accurate reads can be used to discover allelic-specific isoform expressions in transcriptome data.
In this presentation, Justin Blethrow provides an overview of recent and upcoming developments across PacBio’s SMRT Sequencing product portfolio, and their implications for PacBio’s major applications. In presenting the product roadmap, he illustrates how key new products coming in 2019 will make SMRT Sequencing dramatically more affordable and easy to use, and how they will enable customers to routinely produce highly accurate, single-molecule long reads.