February 24, 2013  |  Agrigenomics

AGBT Day 3 Highlights: Long-Read Sequence Data Makes a Difference for Human, Crop Studies

The third day of sessions has wrapped up at AGBT, and
we’ve got one day left to go. It’s the last leg of the marathon! We’ve been
having a great time here, enjoying the opportunity to meet with old friends and
make new acquaintances as well. Today’s talks included two of particular
interest to us: one from Eric Schadt and another from Mike Schatz.

Eric Schadt, founder and director of the Icahn Institute
for Genomics and Multiscale Biology at Mount Sinai School of Medicine in New
York, gave a talk during the afternoon session entitled “Whole Human Genome
SMRT® Sequencing Reveals Uncharacterized Structural Variations
Providing a Path to More Informed Diagnostic Testing.” In it, he illustrated
the use of long-read PacBio sequence data to study repeat expansion regions in
the human genome.
Schadt said that his mandate at Mount Sinai is to ensure
that when a patient walks through the door, the clinical and research teams are
able to make the most of the digital universe of information to offer better
treatment for that patient. For Schadt, who is opening a CLIA-certified
sequencing lab, this means using an integrative omics approach, building
predictive models of disease conditions, and matching key driver genes against
existing drugs to identify better treatments, as a starting point. In the
process of pulling together as many layers of information as possible, Schadt’s
team found that long-read sequence data from the PacBio RS added a critical dimension that wasn’t available otherwise, especially
when it came to longer-range information, extreme GC content regions in the
human genome, and highly repetitive regions. As a proof of principle, the team
generated SMRT Sequencing data for a well-studied CEPH individual and looked at
genes often queried for carrier screening with a specific focus on repeat
expansions that are longer than reads generated by short-read sequencers. With
mean read lengths of 4,000, Schadt and his colleagues were able to resolve an
impressive 84% of the ~10,000 repeat structures they were looking for because
they had sequence reads that fully spanned these regions.

In a talk during the evening session, Mike Schatz, an
assistant professor at Cold Spring Harbor Laboratory, spoke about “Assembling
Crop Genomes with Single Molecule Sequencing.” Crops are important to sequence
— 15 crops represent 90% of the world’s food, Schatz said — but are notoriously
difficult to study because of their large genome size, high repeat content, and
higher ploidy. Along with Sergey Koren and Adam Phillippy, he has built a
pipeline to create hybrid genome assemblies using PacBio long reads combined
with shorter-read sequence — either CCS reads from PacBio or data from another
sequencing platform. In an example he offered of a rice strain, an attempted
genome assembly using just Illumina reads yielded an N50 contig of 16Kb, but
adding PacBio long reads to that boosted the N50 contig to 25Kb. Ultimately,
Schatz said, he expects that as PacBio’s readlength improves, this kind of
approach could routinely generate megabase-size contigs or even pull plant
chromosomes into single contigs.
For more information on Mike Schatz’s work using SMRT
Sequencing, check out thiscase study describing an automated pipeline for genome finishing with
PacBio long reads.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.