Basecamp Research’s recent selection of PacBio HiFi sequencing to support its Trillion Gene Atlas initiative highlights an important shift in biological AI: the future will not be shaped by model scale alone, but by the quality, completeness, and context of the underlying data.
As AI becomes an increasingly important tool in drug discovery and biological design, one challenge is coming into sharper focus: models are only as powerful as the biology they can see. For many applications, this means moving beyond fragmented views of sequence space and toward richer, more representative datasets that capture how genes are organized and function in nature.
This is why the combination of metagenomics and HiFi sequencing for AI-driven discovery is so compelling.
AI needs better biological data
In AI, better inputs lead to better outputs. Biology is no exception.
Many of today’s biological datasets are limited by bias, incompleteness, or lack of genomic context. Reference databases represent only a fraction of the diversity found in the natural world, and conventional sequencing approaches can make it difficult to reconstruct complete genomes or preserve the broader genomic context in which genes and other elements are organized.
For teams building foundation models for biology, those gaps matter. Sequence alone is valuable, but sequence with context can be transformative.
Basecamp Research’s Trillion Gene Atlas is built around that idea. This initiative aims to dramatically expand access to evolutionary genetic diversity by generating biological data at extraordinary scale. The goal is not simply to collect more sequences, but rather to build a richer map of biology that AI can learn from.
To build this dataset, Basecamp Research is working with PacBio to generate large-scale environmental and host-associated metagenomic data using HiFi sequencing on the Revio system. The project is expected to include roughly 100,000 deeply sequenced samples from more than 30 countries, creating a highly diverse dataset designed to support next-generation biological AI models. This kind of dataset allows AI models to train on biology in a form that more accurately reflects real-world complexity.
Why HiFi metagenomics matters
Metagenomics opens a direct window into the complexity of life in the environment, from soil and oceans to host-associated microbiomes. But capturing that complexity in a form that is useful for AI requires more than throughput. It requires data that is both highly accurate and sufficiently complete to preserve biological context.
PacBio HiFi sequencing is uniquely well suited to this challenge.
By combining long reads with high accuracy, HiFi sequencing enables researchers to assemble more complete genomes directly from complex samples and resolve variation at the strain level. It can improve reconstruction of features that are often difficult to recover with shorter or noisier reads like repetitive regions, structural variation, mobile genetic elements, plasmids, phage genomes, and operons.
In metagenomics, these capabilities are especially important. Environmental and host-associated samples are inherently complex, containing mixtures of organisms with diverse abundances and genomic architectures. When reads are both long and accurate, researchers can recover a more faithful representation of that complexity and generate data that is more biologically meaningful for downstream modeling.
From raw data to discovery
As biological foundation models mature, one of the biggest opportunities is connecting large-scale sequencing data to practical applications in therapeutic and drug development. But to reach that potential, AI systems need training data that reflects biology as it exists in the real world.
This is where long-read metagenomics can make all the difference.
Rather than learning on fragmented genetic snapshots from conventional sequencing approaches, there is a push for AI models to be trained on data that retains the broader structure of genomes and the evolutionary relationships embedded within them. This richer context can help models infer function more effectively, identify novel biological patterns, and support more informed design decisions.
In other words, the value of HiFi metagenomics lies not only in discovering what is present in a sample, but also in making that information usable for the next generation of biological AI.
Scaling the next era of biology
The Trillion Gene Atlas also reflects another major trend: the need for infrastructure to scale alongside the biology.
Generating and interpreting data at this scale requires more than sequencing alone. It depends on integrated workflows that connect sample collection, sequencing, assembly, accelerated computing, and AI. As organizations push toward petabase-scale analysis, the quality of each layer in that stack becomes increasingly important.
The PacBio Revio system with SPRQ-Nx chemistry is designed to support that kind of scale while delivering the long, highly accurate reads needed for high-resolution genomic analysis. In projects where completeness and context matter, this combination can help unlock insights that may otherwise remain hidden.
For biological AI, this matters because scaling poor-quality data only limits the quality of the output. Scaling high-quality, context-rich data creates a stronger foundation for discovery.
A new chapter for therapeutic innovation
The collaboration between PacBio and Basecamp Research underscores a broader industry shift. AI-designed therapeutics will not be built on algorithms alone. Instead, they will be built on more complete maps of biology that better represent natural diversity and the way genomes are organized and evolved.
HiFi sequencing has an important role to play in that future.
By enabling more complete, accurate, and context-preserving metagenomic data, PacBio is helping researchers move from fragmented sequence collections toward a deeper understanding of biological systems. That, in turn, can help power AI models designed to uncover new functions, identify promising candidates, and accelerate the path from discovery to application.
To check out more examples of how HiFi sequencing is helping uncover new insights, visit our microbial genomics page or our microbiome + metagenomics page and explore HiFi data yourself at our datasets page.