Today’s genome sequencing is nothing short of transformative. But the technology’s true potential isn’t unlocked by chemistry alone, it’s the software that brings it to life. If sequencing is the engine of insight, then software and bioinformatics serve as the navigation system analyzing the data stream and charting the course toward meaningful insight. These tools find signal in the noise and turn potential into meaning —especially in long-read sequencing, where both data richness and complexity run high, software acts as the intelligence layer that makes discovery possible.
This guide breaks down the full workflow of long-read data analysis: from primary and secondary steps that process raw data from instruments and organize reads, to the tertiary tools that interpret variant calls in biological context. Along the way, you’ll discover how PacBio solutions—from SMRT Link to analysis pipelines make this complex process intuitive, accessible, and actionable.
Want to see it all in action? Sign up for our upcoming webinar to learn about SMRT Link Cloud in a real-world use case.
Genome sequencing software
Preparing your runs
Before a single base is called, genome sequencing relies on careful software-guided preparation. Upstream of data generation, analysis software plays a critical role in shaping the success of a sequencing experiment. These early steps include designing the run, setting parameters for sample loading and barcoding, monitoring progress in real time, and assessing quality metrics as the instrument collects data. When these processes are well-supported, researchers can avoid wasted runs, detect issues early, and make informed adjustments that ultimately improve downstream results.
For PacBio systems, these pre-sequencing and real-time functions are managed through SMRT Link, the sequencing software command center. SMRT Link enables researchers to plan and launch sequencing experiments with tools for run setup, sample management, and barcode assignment. During the run, users can track progress live, view instrument health metrics, and check sequencing quality as data comes in. SMRT Link visualization dashboards and summary reports help ensure each run is on target and performing within expectations.
SMRT Link serves as the operational core of the PacBio software ecosystem. Whether run locally or accessed via the cloud, it provides a consistent environment for managing runs, monitoring progress, troubleshooting issues, assessing quality metrics, and executing analytical pipelines. SMRT Link software includes a user-friendly web interface, giving users flexibility to support real-time decision-making.
Not just a cursory step, this planning and monitoring phase has significant impact. A well-prepared run not only maximizes the use of each SMRT Cell but also improves the consistency and interpretability of results — and since the quality of downstream analysis is only as good as the input data, investing in good run setup and real-time oversight helps protect every subsequent step of the sequencing process.
Primary analysis
Primary analysis begins immediately upon sequencing and focuses on translating raw signal data into nucleotide sequences. In PacBio HiFi sequencing platforms like the Revio and Vega systems, these signals are generated by fluorescence and processed through sophisticated algorithms to produce accurate base calls. This typically occurs on-instrument, ensuring that base-level accuracy is resolved as part of the sequencing run itself. The result is a collection of long, highly accurate reads, which form the foundation for downstream analysis.
Unlike older sequencing technologies that require extensive error correction, HiFi reads do not require additional error correction by the user. This simplifies the early stages of analysis and reduces the need for computational polishing, allowing researchers to move quickly from raw output to actionable data. That said, quality filtering, performance checks, and data normalization are still important steps in ensuring consistency and reliability before transitioning to secondary analysis.
In a typical workflow, these reads are accompanied by confidence scores and metadata that allow researchers to assess base quality, identify anomalies, and evaluate overall run success. These metrics serve as the first checkpoint in a sequencing experiment and ensure that only the best possible data advances to the next step. This level of initial quality control helps avoid downstream errors and facilitates high-integrity research.
For HiFi sequencing, additional post-primary analysis steps are also performed on-instrument before the reads even leave the system. These include demultiplexing of barcoded samples and automatic detection of methylation patterns directly from sequencing data without the need for bisulfite conversion or special library prep. This native methylation detection streamlines traditional short-read workflows, bringing additional value to each sequencing run while preserving the original sample integrity.
Secondary analysis
Once high-quality reads are in hand, secondary analysis organizes and analyzes them to prepare for biological interpretation. This step typically includes genome assembly or read alignment, variant detection, and optional polishing or phasing of the results.
For genomes that have no reference sequences available, reads must be pieced together from scratch through de novo assembly. Tools like hifiasm, which are tailored to long-read input, produce highly contiguous assemblies even in repeat-rich regions. As a “ground truth” representation of an organism’s genome, these assemblies provide the foundation for novel annotation and comparative analysis.
Alternatively, where reference genomes are available, reads are mapped to reference sequences through read alignment, which is important for identifying key differences between a sequenced individual and the reference genome. The PacBio pbmm2 tool, a wrapper around minimap2 optimized for HiFi reads, enables precise and fast alignment to known genomic coordinates for accurate variant detection.
Variant detection is the next major step, and in the context of PacBio workflows, this often means running the WGS Variant Pipeline, a consolidated, comprehensive workflow for detecting and annotating genomic variation. The pipeline includes tools such as DeepVariant for detecting SNVs and small indels, pbsv for structural variants, TRGT for tandem repeats, Paraphase for duplicated gene genotyping, and HiFiCNV and HiPhase for copy number variation and phasing. Together, they cover the full spectrum of variation types. The pipeline is accessible to users of all experience levels and can be run through bioinformatics platforms such as DNAnexus, FormBio, Terra, and DNAstack. These platforms have been vetted by PacBio and offer a streamlined interface that supports secure, reproducible, and collaborative analysis.
One powerful example of impactful analysis pipelines is the HiFi somatic WDL, a variant calling workflow built for paired tumor-normal samples used in cancer research. This pipeline leverages the accuracy and read length of HiFi sequencing to detect low-frequency SNVs, structural variants, CNVs, and methylation in cancer genomes. Experimental data on the COLO829 melanoma and HCC1395 breast cancer cell lines show that this workflow not only detects key driver mutations, but also provides phasing and epigenetic insights in a single assay. It demonstrates how analytical precision can amplify biological relevance.
While the examples described here refer primarily to DNA-based applications, PacBio also has a suite of secondary analysis solutions for other applications, like full-length isoform sequencing with the Iso-Seq method and microbial applications like metagenomics assembly and profiling, and full-length 16S rRNA sequencing.
Bioinformatics for interpretation
Tertiary analysis is not just the final step in the sequencing data journey; it’s the stage where sequences gain context, and data transforms into insight with functional or clinical relevance.
This stage involves integrating annotations, functional predictions, population data, and phenotypic context to assess the relevance of variants and assemblies. Functional annotation is typically the first step, assigning known gene features to the sequences and variants. Reference databases like GENCODE, RefSeq, and ClinVar provide essential context to understand whether a mutation disrupts a coding region or lies in a regulatory element. Predictive algorithms can then estimate the likely impact of variants on protein function or gene expression.
Variant prioritization further refines the results, ranking mutations based on known pathogenicity, population frequency, or disease association. In cancer studies or inherited disease research, this prioritization helps zero in on actionable insights. Interpretation may also involve visual inspection of genome browsers like IGV, comparative analysis across multiple samples, or validation through additional experimental techniques.
For more advanced applications, researchers often conduct comparative analyses across conditions or individuals. This could involve detecting differentially methylated regions, identifying population-specific variants, or correlating genotype with phenotype.
To support trusted and meaningful interpretation, the PacBio Compatible program connects PacBio-generated data with best-in-class bioinformatics solutions across the industry. This includes partnerships with leading platforms, tool developers, and cloud providers, ensuring that no matter your project scope or research focus, you have access to reliable, validated pipelines and visualizations. It also fosters a community of support, where shared standards and documentation help simplify integration and reproducibility across tools and teams.
PacBio bioinformatics tools
PacBio provides a comprehensive toolkit built around SMRT Link and its complimentary analysis workflows and tools. SMRT Link enables users to manage sequencing runs, track performance, and execute modular pipelines for demultiplexing, alignment, variant detection, phasing, and methylation calling, all in one place.
But SMRT Link Cloud takes this accessibility to another level. As a fully hosted version maintained by PacBio, it eliminates the need for servers, software installs, or IT troubleshooting. Users can log in and monitor runs from anywhere with an internet connection. With seamless integration with PacBio Compatible partners and cloud storage environments like AWS, Azure, and Google Cloud, SMRT Link Cloud opens the door to scalable, collaborative long-read analysis, whether you’re a solo researcher or a nationwide consortium.
If you’re analyzing a single microbial genome or 10,000 human samples, SMRT Link offers the infrastructure, the tools, and the flexibility to get the job done, from wherever you may be.
Turning long-read sequencing data into discovery
Genome sequencing doesn’t stop at the sequencer. It continues through layers of digital analysis that transform base calls into biological understanding. In long-read sequencing, where accuracy and context are paramount, this software ecosystem must be powerful, precise, and flexible.
PacBio meets that need with tools that span every analytical stage. These tools are the connective tissue of a data-driven research ecosystem, allowing scientists to move from questions to answers with exceptional speed and accuracy.