June 1, 2021  |  

Candidate gene screening using long-read sequencing

We have developed several candidate gene screening applications for both Neuromuscular and Neurological disorders. The power behind these applications comes from the use of long-read sequencing. It allows us to access previously unresolvable and even unsequencable genomic regions. SMRT Sequencing offers uniform coverage, a lack of sequence context bias, and very high accuracy. In addition, it is also possible to directly detect epigenetic signatures and characterize full-length gene transcripts through assembly-free isoform sequencing. In addition to calling the bases, SMRT Sequencing uses the kinetic information from each nucleotide to distinguish between modified and native bases.

June 1, 2021  |  

Highly contiguous de novo human genome assembly and long-range haplotype phasing using SMRT Sequencing

The long reads, random error, and unbiased sampling of SMRT Sequencing enables high quality, de novo assembly of the human genome. PacBio long reads are capable of resolving genomic variations at all size scales, including SNPs, insertions, deletions, inversions, translocations, and repeat expansions, all of which are both important in understanding the genetic basis for human disease, and difficult to access via other technologies. In demonstration of this, we report a new high-quality, diploid-aware de novo assembly of Craig Venter’s well-studied genome.

June 1, 2021  |  

Multiplex target enrichment using barcoded multi-kilobase fragments and probe-based capture technologies

Target enrichment capture methods allow scientists to rapidly interrogate important genomic regions of interest for variant discovery, including SNPs, gene isoforms, and structural variation. Custom targeted sequencing panels are important for characterizing heterogeneous, complex diseases and uncovering the genetic basis of inherited traits with more uniform coverage when compared to PCR-based strategies. With the increasing availability of high-quality reference genomes, customized gene panels are readily designed with high specificity to capture genomic regions of interest, thus enabling scientists to expand their research scope from a single individual to larger cohort studies or population-wide investigations. Coupled with PacBio® long-read sequencing, these technologies can capture 5 kb fragments of genomic DNA (gDNA), which are useful for interrogating intronic, exonic, and regulatory regions, characterizing complex structural variations, distinguishing between gene duplications and pseudogenes, and interpreting variant haplotyes. In addition, SMRT® Sequencing offers the lowest GC-bias and can sequence through repetitive regions. We demonstrate the additional insights possible by using in-depth long read capture sequencing for key immunology, drug metabolizing, and disease causing genes such as HLA, filaggrin, and cancer associated genes.

June 1, 2021  |  

Phased human genome assemblies with Single Molecule, Real-Time Sequencing

In recent years, human genomic research has focused on comparing short-read data sets to a single human reference genome. However, it is becoming increasingly clear that significant structural variations present in individual human genomes are missed or ignored by this approach. Additionally, remapping short-read data limits the phasing of variation among individual chromosomes. This reduces the newly sequenced genome to a table of single nucleotide polymorphisms (SNPs) with little to no information as to the co-linearity (phasing) of these variants, resulting in a “mosaic” reference representing neither of the parental chromosomes. The variation between the homologous chromosomes is lost in this representation, including allelic variations, structural variations, or even genes present in only one chromosome, leading to lost information regarding allelic-specific gene expression and function. To address these limitations, we have made significant progress integrating haplotype information directly into genome assembly process with long reads. The FALCON-Unzip algorithm leverages a string graph assembly approach to facilitate identification and separation of heterozygosity during the assembly process to produce a highly contiguous assembly with phased haplotypes representing the genome in its diploid state. The outputs of the assembler are pairs of sequences (haplotigs) containing the allelic differences, including SNPs and structural variations, present in the two sets of chromosomes. The development and testing of our de-novo diploid assembler was facilitated and carefully validated using inbred reference model organisms and F1 progeny, which allowed us to ascertain the accuracy and concordance of haplotigs relative to the two inbred parental assemblies. Examination of the results confirmed that our haplotype-resolved assemblies are “Gold Level” reference genomes having a quality similar to that of Sanger-sequencing, BAC-based assembly approaches. We further sequenced and assembled two well-characterized human samples into their respective phased diploid genomes with gap-free contig N50 sizes greater than 23 Mb and haplotig N50 sizes greater than 380 kb. Results of these assemblies and a comparison between the haplotype sets are presented.

June 1, 2021  |  

The MHC Diversity in Africa Project (MDAP) pilot – 125 African high resolution HLA types from 5 populations

The major histocompatibility complex (MHC), or human leukocyte antigen (HLA) in humans, is a highly diverse gene family with a key role in immune response to disease; and has been implicated in auto-immune disease, cancer, infectious disease susceptibility, and vaccine response. It has clinical importance in the field of solid organ and bone marrow transplantation, where donors and recipient matching of HLA types is key to transplanted organ outcomes. The Sanger based typing (SBT) methods currently used in clinical practice do not capture the full diversity across this region, and require specific reference sequences to deconvolute ambiguity in HLA types. However, reference databases are based largely on European populations, and the full extent of diversity in Africa remains poorly understood. Here, we present the first systematic characterisation of HLA diversity within Africa in the pilot phase of the MHC Diversity in Africa Project, together with an evaluation of methods to carry out scalable cost-effective, as well as reliable, typing of this region in African populations.To sample a geographically representative panel of African populations we obtained 125 samples, 25 each from the Zulu (South Africa), Igbo (Nigeria), Kalenjin (Kenya), Moroccan and Ashanti (Ghana) groups. For methods validation we included two controls from the International Histocompatibility Working Group (IHWG) collection with known typing information. Sanger typing and Illumina HiSeq X sequencing of these samples indicated potentially novel Class I and Class II alleles; however, we found poor correlation between HiSeq X sequencing and SBT for both classes. Long Range PCR and high resolution PacBio RS-II typing of 4 of these samples identified 7 novel Class II alleles, highlighting the high levels of diversity in these populations, and the need for long read sequencing approaches to characterise this comprehensively. We have now expanded this approach to the entire pilot set of 125 samples. We present these confirmed types and discuss a workflow for scaling this to 5000 individuals across Africa.The large number of new alleles identified in our pilot suggests the high level of African HLA diversity and the utility of high resolution methods. The MDAP project will provide a framework for accurate HLA typing, in addition to providing an invaluable resource for imputation in GWAS, boosting power to identify and resolve HLA disease associations.

June 1, 2021  |  

T-cell receptor profiling using PacBio sequencing of SMARTer libraries

T-cells play a central part in the immune response in humans and related species. T-cell receptors (TCRs), heterodimers located on the T-cell surface, specifically bind foreign antigens displayed on the MHC complex of antigen-presenting cells. The wide spectrum of potential antigens is addressed by the diversity of TCRs created by V(D)J recombination. Profiling this repertoire of TCRs could be useful from, but not limited to, diagnosis, monitoring response to treatments, and examining T-cell development and diversification.

June 1, 2021  |  

Comprehensive variant detection in a human genome with PacBio high-fidelity reads

Human genomic variations range in size from single nucleotide substitutions to large chromosomal rearrangements. Sequencing technologies tend to be optimized for detecting particular variant types and sizes. Short reads excel at detecting SNVs and small indels, while long or linked reads are typically used to detect larger structural variants or phase distant loci. Long reads are more easily mapped to repetitive regions, but tend to have lower per-base accuracy, making it difficult to call short variants. The PacBio Sequel System produces two main data types: long continuous reads (up to 100 kbp), generated by single passes over a long template, and Circular Consensus Sequence (CCS) reads, generated by calculating the consensus of many sequencing passes over a single shorter template (500 bp to 20 kbp). The long-range information in continuous reads is useful for genome assembly and structural variant detection. The higher base accuracy of CCS effectively detects and phases short variants in single molecules. Recent improvements in library preparation protocols and sequencing chemistry have increased the length, accuracy, and throughput of CCS reads. For the human sample HG002, we collected 28-fold coverage 15 kbp high-fidelity CCS reads with an average read quality above Q20 (99% accuracy). The length and accuracy of these reads allow us to detect SNVs, indels, and structural variants not only in the Genome in a Bottle (GIAB) high confidence regions, but also in segmental duplications, HLA loci, and clinically relevant “difficult-to-map” genes. As with continuous long reads, we call structural variants at 90.0% recall compared to the GIAB structural variant benchmark “truth” set, with the added advantages of base pair resolution for variant calls and improved recall at compound heterozygous loci. With minimap2 alignments, GATK4 HaplotypeCaller variant calls, and simple variant filtration, we have achieved a SNP F-Score of 99.51% and an INDEL F-Score of 80.10% against the GIAB short variant benchmark “truth” set, in addition to calling variants outside of the high confidence region established by GIAB using previous technologies. With the long-range information available in 15 kbp reads, we applied the read-backed phasing tool WhatsHap to generate phase blocks with a mean length of 65 kbp across the entire genome. Using an alignment-based approach, we typed all major MHC class I and class II genes to at least 3-field precision. This new data type has the potential to expand the GIAB high confidence regions and “truth” benchmark sets to many previously difficult-to-map genes and allow a single sequencing protocol to address both short variants and large structural variants.

February 5, 2021  |  

AGBT 2015 Highlights: Customer interviews day 1

PacBio customers discuss their applications of PacBio SMRT Sequencing and long reads, including Lemuel Racacho (Children’s Hospital of Eastern Ontario Research Institute), Matthew Blow (JGI), Yuta Suzuki (U. of Tokyo),…

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.