This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.
Search PacBio’s Blog
An article published today in Genetics in Medicine from Jason Merker, Euan Ashley, and colleagues at Stanford University reports the first successful application of PacBio whole genome sequencing to identify a disease-causing mutation. (Check out Stanford’s news release here.) The authors describe an individual who presented over 20 years with a series of benign tumors in his heart and glands. The individual satisfied the clinical criteria for Carney complex, but after eight years of genetic evaluation, including whole genome short-read sequencing, experts were still unable to pinpoint the underlying genetic mutation and confirm a diagnosis.
Ultimately, the authors turned to the Sequel System to evaluate structural variants, large genetic differences that involve at least 50 base pairs and are uniquely discoverable with long-read sequencing. This quickly led to the identification of the causative mutation: a 2.2 kb deletion that affects PRKAR1A, the gene involved in Carney complex. This case demonstrates the ability of long-read sequencing on the Sequel System to reveal genetic variation that is inaccessible with short-read technologies and highlights the potential to apply PacBio sequencing to precision medicine .
A human genome has around 20,000 structural variants (differences ≥50 bp) spanning 10 Mb, more base pairs than single nucleotide variants and small indels put together. Because structural variants tend to lie in repetitive regions of the genome and/or are larger than short-read sequencers can span, the vast majority (80%) are identified only by long-read sequencing. This means even so-called “whole” genome sequencing with short reads misses much of the variation in a human genome. 
Figure 1. Structural variation in the human genome. (a) Types of structural variation. (b) Differences between two typical human genomes. (c) Structural variants detected in a typical human genome with PacBio sequencing compared to short-read sequencing.
Carney complex, a multiple neoplasia syndrome, is exceedingly rare, with fewer than 750 cases ever reported. Most individuals with the syndrome have a mutation that inactivates one of the two copies of the gene PRKAR1A. However, in the case reported today, clinical sequencing of PRKAR1A did not reveal any mutations. Then, short-read whole genome sequencing was applied to look for mutations throughout the genome, but it was uninformative. Ashley, Merker and colleagues were then driven to apply PacBio long-read sequencing to evaluate structural variants missed by previous methods.
The Sequel System was used to generate approximately eight-fold coverage of the human genome. Reads were mapped with NGM-LR , and structural variants were called with PBHoney , yielding 6,971 deletions and 6,821 insertions. These were filtered for rare, genic variants associated with disease genes, which left only six candidates for manual evaluation. One of the six variants was a heterozygous 2.2 kb deletion that removes the first coding exon of PRKAR1A. The variant was evaluated with Sanger sequencing in the individual and his parents, which demonstrated that the deletion is a de novo mutation not present in the parents.
Approximately two-thirds of individuals with presumed genetic disorders remain undiagnosed even after short-read exome and whole genome sequencing. It is hypothesized many of the undiagnosed cases are explained by variants missed by short-read sequencing technologies, most notably structural variants, variants in GC-rich regions of the genome, and repeat expansions . The study published today provides a proof-of-principle demonstrating that PacBio long-read sequencing identifies previously overlooked structural variants, even at relatively low sequencing coverage. We are excited by upcoming studies that will evaluate many more cases to elucidate the improvement in diagnostic yield from long-read sequencing, and to demonstrate that precision medicine requires a comprehensive view of genetic variation.
 Merker JD, et al. (2017). Genetics in Medicine.
 Huddleston J, et al. (2017). Genome Research, 27(5):677-685.
 English AC, et al. (2015). BMC Genomics, 16:286.
 Biesecker LG, et al. (2011). Genome Biology, 12(9):128.
Last month, we co-hosted the 2nd annual SMRT Leiden conference with Leiden University Medical Center. SMRT Leiden featured three days of excellent presentations, including one day focused on bioinformatics. If you missed it, we’ve prepared this quick recap to cover the highlights. In addition, several of the presentations are available to download, and you can check out tweets from day 1 and day 2.
The meeting kicked off with a clinical angle: Eric Schadt from the Icahn School of Medicine at Mount Sinai gave a keynote talk about capturing the clinically actionable genome. Noting that we are in an age of data explosion, Schadt presented ideas for how to take advantage of that to improve human health — and ultimately to model individual health trajectories for optimal decision-making in the clinic. At Mount Sinai, Schadt said genetic testing is becoming more comprehensive, citing examples like a pan-ethnic carrier screen and pregnancy-related testing that starts before conception and follows the infant after birth. SMRT Sequencing is important for these efforts because of its excellent accuracy and long reads, which enable phasing variants and resolving complex regions. By combining technologies, Schadt said his team improved carrier screening to deliver meaningful results to more than 60% of patients, compared to fewer than 7% with traditional testing. Schadt’s colleague Robert Sebra also gave a clinical talk, in which he said that the ideal approach will be whole genome sequencing with long reads to capture challenging genes, pseudogenes, and other important but complex elements. While that is not yet practical, he noted that previous efforts in the lab to sequence whole human genomes took a year and 1,000 SMRT Cells on the PacBio RS II; with the Sequel System, that now takes 50 SMRT Cells and can be completed in two weeks.
Two keynote presentations focused on genome evolution. Shinichi Morishita from the University of Tokyo spoke about bacterial metagenomics, for which PacBio sequencing improved the detection rate for mobile elements and methylation motifs. He also works on centromeres, for which he uses PacBio sequencing with the Hi-C method. Jason Underwood from the University of Washington presented the use of long reads to compare apes and humans in order to find elements specific to humans. His team is using SMRT Sequencing to generate high-quality primate genomes, such as the recent Susie3 assembly, and to annotate them. These projects have improved structural variation detection and increased discovery of human-specific events. Underwood said high-quality PacBio assemblies would be available in the next year or two for gibbon, bonobo, and rhesus macaque.
The Max Planck Institute’s Stefan Mundlos kicked off the afternoon with a keynote about using topologically associated domains, CRISPR, and other approaches to elucidate skeletal disease. Following that, several presentations focused on the use of SMRT Sequencing to resolve challenging regions in the human genome. Adam Ameur from Uppsala University is using PacBio sequencing for targeted and whole-genome methods to resolve repeats, low frequency mutations, and more. As part of the Swedish 1000 Genomes Project, his team has sequenced two whole genomes with SMRT Sequencing so far, finding about 20,000 structural variants in each one — 80% of which were missed by short-read sequencing. From NUI Galway, Brian McStay presented on the genomic architecture of regions on human acrocentric chromosomes. These regions are difficult to sequence due to repetitive DNA, but he was able to target and sequence them successfully with NimbleGen capture and SMRT Sequencing. Our own Tyson Clark spoke about using amplification-free targeted enrichment for analyzing genomic regions associated with repeat expansion disorders.
A number of great talks focused on plants, animals, and microbes. Felix Bemm from MPI Tübingen focused on Arabidopsis, in which structural variation was being missed with short-read sequencers. By incorporating PacBio sequencing, his team was able to explore NLR complexity; they also produced 10 platinum-grade genomes for a deep dive into structural variants. The University of Rochester’s Amanda Larracuente is studying Y chromosome dynamics in Drosophila. By adding SMRT Sequencing data to their pipeline, her team improved coverage for elusive Y genes and now have as much as 40% of the Y chromosome in contigs. Wasp parasites captured our attention in a talk from Ken Kraaijeveld at VU Amsterdam. He studied asexual and sexually reproducing parasites to understand the differences in mutation accumulation in their genomes, finding that transposable elements may play a role in reduced recombination.
From the University of Oslo, Ave Tooming-Klunderud spoke about targeted sequence capture in a cod study. Focusing on a 300 kb region of hemoglobin genes, the team analyzed eight species and optimized the sample prep protocol with barcoding, which resulted in using just nine SMRT Cells. Richard Kuo from the University of Edinburgh presented data from using the Iso-Seq method to understand chicken transcriptomes; the approach improved detection of lncRNAs, transcripts that were missed in previous annotations, and splicing diversity. Finally, Thomas Otto from the Wellcome Trust Sanger Institute gave a keynote talk about long-read sequencing of parasite genomes, with a focus on Plasmodium falciparum. Otto noted that the first assembly for this genome cost $18 million (that was back in 2002), and today on the PacBio RS II System it only takes five SMRT Cells. Because the genome has only 19% GC content, SMRT Sequencing is more successful at calling intergenic regions that can’t be mapped using short-read data.
We really enjoyed two talks about immune-related genes. Marvyn Koning from our LUMC host spoke about B cells and the adaptive immune system. Sequencing has been difficult because of the high mutation rate across many locations, but Koning developed a method called ARTISAN PCR to anchor primers in one region that didn’t change. With PacBio sequencing, the approach yields much higher accuracy than short-read sequencing. Julie Karl from the University of Wisconsin-Madison talked about sequencing the complex MHC region in macaques. For this work, SMRT Sequencing has been essential to achieve the accuracy needed for a genomic region that’s even more complex than the human MHC locus.
We were treated to some proteogenomic talks as well. In a keynote presentation, Gloria Sheynkman from the Dana-Farber Cancer Institute spoke about approaches to understand the complexity of splice diversity and the proteins they produce. One method is ORF-seq, which measures the isoforms in various functional groups and relies on SMRT Sequencing to characterize the isoforms. And NKI’s Gosia Komor presented a proteogenomic analysis of alternative splicing for a colorectal cancer biomarker study. With the Iso-Seq method, the team is building up the reference set of isoforms to find those associated with cancer risk.
Finally, our own Lance Hepler offered a look at new applications for SMRT Sequencing, including new software for detection of minor variants and structural variants and multiplexed whole genome sequencing for microbes. The new Juliet tool for characterizing minor variant frequency and pbsv for increased structural variant sensitivity will both be included in the SMRT Link 5, due to be released this summer. Hepler also noted that with the multiplexing protocol a single SMRT Cell on the Sequel System will be able to sequence up to 12 microbes with genomes of ~4.5 Mb; the protocol works for the PacBio RS II System as well.
We are thankful to all of the fantastic speakers who shared their research, for our gracious host Yahya Anvar and the entire LUMC as well as everyone who attended the event. We look forward to seeing you again next year in Leiden!
A large group of scientists published a new reference genome assembly for maize. It was generated with SMRT Sequencing and other technologies, and represents a major leap forward in accurately portraying and annotating the genome of this important crop.
“Improved maize reference genome with single-molecule technologies” comes from lead author Yinping Jiao, senior author Doreen Ware, and collaborators at Cold Spring Harbor Laboratory, the USDA ARS, and many other institutions. They embarked on the project because the existing reference for maize, based on Sanger technology and released in 2009, “is composed of more than 100,000 small contigs, many of which are arbitrarily ordered and oriented, markedly complicating detailed analysis of individual loci and impeding investigation of intergenic regions crucial to our understanding of phenotypic variation and genome evolution,” the authors explain. A higher-quality assembly would be extremely useful for crop breeding and selection programs as well as basic research.
The new reference is based on PacBio sequencing data, which led to a preliminary assembly with fewer than 3,000 contigs and a contig N50 of 1.2 Mb. Scientists then layered in data from an optical map, a BAC-based minimum tiling path, and a high-density genetic map. The end result: a high-quality 2 Gb assembly with just 2,522 gaps. “The new maize B73 reference genome has 240-fold higher contiguity than the recently published short-read genome assembly of maize cultivar PH207,” they report.
To assess the new assembly, the team compared it to the previous Sanger-based reference. That “revealed more than 99.9% sequence identity and a 52-fold increase in the mean contig length, with 84% of the BACs spanned by a single contig from the long reads assembly,” the authors write. ChIP-seq analysis showed that centromeres in the new assembly were mostly intact and correctly placed. The new assembly fixed many known mis-oriented regions in the reference genome, and an updated annotation consolidated gene models with the support of 111,000 full-length transcripts from SMRT Sequencing. “Our reference assembly also vastly improved the coverage of regulatory sequences, decreasing the number of genes exhibiting gaps in the 3-kb region(s) flanking coding sequence from 20% to <1%,” the team adds.
The scientists interrogated transposable elements, which are well known and important in the maize genome. The previous maize annotation had few intact representations of these elements; for long terminal repeat retrotransposon copies, not even 1% were complete. The team incorporated “a new homology-independent annotation pipeline” and uncovered 1.2 Gb of intact retrotransposons, about half of which were “nested retrotransposon copies disrupted by the insertion of other transposable elements,” they note. “Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize.” This information will contribute to a better understanding of the diversity and evolution of maize varieties.
In closing, the scientists write, “Our improved assembly of the B73 genome, generated using single-molecule technologies, demonstrates that additional assemblies of other maize inbred lines and similar high-quality assemblies of other repeat-rich and large-genome plants are feasible.”
In a preprint available from bioRxiv, scientists from the University of Lausanne and Swiss Institute of Bioinformatics present the first SMRT Sequencing results from isolates of the fungal pathogen Candida glabrata. “Comparative Genomics Of Two Sequential Candida glabrata Clinical Isolates” comes from Luis Andre Vale-Silva, Emmanuel Beaudoing, Van Du T. Tran, and Dominique Sanglard.
The study involved two C. glabrata samples collected at different times from an HIV-positive patient diagnosed with oropharyngeal candidiasis. Scientists initially turned to short-read sequencing to analyze the genomes, which were of particular interest because C. glabrata is known to rapidly develop resistance to antifungal therapies. However, because sequence data had to be aligned to a reference genome, the assemblies “did not reflect actual genome rearrangements of strains DSY562 and DSY565,” the scientists report. “We therefore undertook an alternative genome sequencing approach using PacBio technologies enabling de novo assembly of large reads.”
The PacBio assemblies featured contig N50s longer than a megabase for each isolate, “highlighting the high quality of the assemblies,” the authors note. “Assembled contigs almost reconstituted the entire set of chromosomes that is known from the [reference] genome.” They generated data for both nuclear and mitochondrial genomes.
The team was eager to learn more about adhesins in these strains, since adherence to host cells is an important factor in increased virulence of C. glabrata. “It is estimated that C. glabrata contains 63 ORF with adhesin properties,” the scientists write, underscoring their interest in genome-wide data.
Based on SMRT Sequencing data, “we determined the presence of more than 100 adhesin-like genes in both DSY strains, which was not yet anticipated from other genome-wide studies,” the authors report. “This number exceeds by far the numbers published for [the reference genome] and therefore suggests that an expansion of this gene family occurred in our isolates.”
Vale-Silva et al. note that further studies will be needed to determine whether this pattern holds up for other strains. “Since no equivalent C. glabrata genome assembly has been yet published using a PacBio approach, we can still not confirm whether or not the investigated isolates constitute a unique case,” they conclude. “Now that a more extensive repertoire of adhesins is available from our studies, such analysis may be undertaken in the future.”
Interested in microbial genomics? Check out our latest SMRT Grant opportunity today.
The Joint Genome Institute recently announced results from a project that used SMRT Sequencing to generate high-quality genome assemblies and detect epigenetic modifications for fungal species that represent the earliest branches of that kingdom’s phylogeny. The work was done as part of the 1000 Fungal Genomes Project, which aims to better characterize a diverse range of fungal species.
Published in Nature Genetics, “Widespread adenine N6-methylation of active genes in fungi” comes from lead author Stephen Mondo, senior author Igor Grigoriev, and collaborators at JGI and other institutions. The major finding is that N6-methyldeoxyadenine (6mA) is seen at the earliest stages of fungal evolution, in groups that have not been studied much in genomics. “By and large, early-diverging fungi are very poorly understood compared to other lineages. However, many of these fungi turn out to be important in a variety of ways,” Mondo said.
The study involved analyzing 16 fungal genomes with SMRT Sequencing, which generates genome-wide epigenetic data while it sequences the DNA. Scientists discovered that 6mA, which is present at low levels in many plant and animal species, was much more common in these early-diverging fungi. As many as 2.8% of adenine bases were methylated, “far exceeding levels observed in other eukaryotes and more derived fungi,” they report in the paper. The previous highest 6mA rate was observed in Chlamydomonas reinhardtii, an alga with 0.4% of its adenines methylated.
The team also found that the presence of 6mA and 5-methylcytosine (5mC) is inversely correlated, and that 6mA appears to boost gene expression while 5mC suppresses it. “Our analysis has shown that 6mA modifications are associated with expressed genes and is preferentially deposited based on gene function and conservation, revealing 6mA as a marker of expression for important functionally-relevant genes,” Grigoriev said.
In the paper, the authors write, “Our results show a striking contrast in the genomic distributions of 6mA and 5-methylcytosine and reinforce a distinct role for 6mA as a gene-expression-associated epigenomic mark in eukaryotes.”
To learn more, watch this presentation from Mondo on 6mA in fungal genomes.
This week the HudsonAlpha Institute for Biotechnology and the University of Georgia are co-hosting CROPS 2017, a meeting focused on genomic technologies and their use in crop improvement and breeding programs. The three-day event attracts over 200 attendees involved in research and breeding for a range of important crop species. PacBio was proud to be a sponsor of the conference.
HudsonAlpha’s Jeremy Schmutz kicked off the meeting with an introductory talk about trends in plant genomics, expanded transcriptome resources, and the improved representation of all plant genomes with many new genome assemblies. Schmutz, who also works with the Joint Genome Institute, highlighted efforts to generate higher-quality, more complete reference genomes for economically important plants used for food, fiber, or biomass. He told attendees that SMRT Sequencing has made a significant difference in those efforts. His team has generated high-quality plant assemblies with the PacBio RS II Sequencing System and more recently with the Sequel System for several cotton genomes as well as Brachypodium, peanut, sorghum, and more.
According to Schmutz, the big difference with SMRT Sequencing is that the assemblies it produces are of high enough quality to be useful for functional studies such as genotype-phenotype associations, which are essential for breeding and selection programs. “We can now generate high-quality reference genomes for most plants,” he said. “PacBio has made that possible.” Schmutz noted that SMRT Sequencing has been successful even for very challenging plant genomes with highly repetitive elements, GC-rich regions, areas of high and low complexity, and of course varying degrees of ploidy.
Schmutz, his colleague Jane Grimwood and their team at the Genome Sequencing Center have made the transition to the higher-capacity Sequel System, which enables comparable results with lower project costs. Schmutz said they are generating, on average, 4 Gb to 8 Gb per SMRT Cell. For a project sequencing the 2.6 Gb Brazilian cotton genome, he said, the preliminary assembly is at least as good as a previous cotton assembly generated on the PacBio RS II — but data collection took just five weeks, compared with almost five months for the older assembly. Even in this preliminary stage, the Sequel System cotton assembly has an impressive contig N50 of 2.2 Mb.
Schmutz noted that the point of the CROPS meeting wasn’t to present new assemblies for their own sake; it’s all about how these resources are being used by the plant community. By including scientists studying many different crop species, he hopes to accelerate the uptake of new genome-based approaches to as many research groups as possible. The meeting covered topics such as breeding and selection strategies, functional work to identify the mechanisms underlying important traits, and automated phenotyping. “We really focus on the translation of genomics directly into crop improvement platforms,” Schmutz said.
A sweeping new report on Klebsiella pneumoniae sequence data from scientists at the Houston Methodist Research Institute, Weill Cornell Medical College, and other institutions found more diversity than expected in strains of the pathogen in a Texas population. The publication also indicates the emergence of a virulent, antibiotic-resistant strain of this organism.
Published in mBio, “Population Genomic Analysis of 1,777 Extended-Spectrum Beta-Lactamase-Producing Klebsiella pneumoniae Isolates, Houston, Texas: Unexpected Abundance of Clonal Group 307” comes from lead author Wesley Long, senior author James Musser, and collaborators.
K. pneumoniae is a dangerous source of infection, often acquired in hospitals and increasingly resistant to antibiotics. Scientists launched this study to contribute new genomic information that might be used to inform new therapeutics. They sequenced nearly 1,800 isolates collected from patients in the Houston Methodist Hospital system over four years, and then selected five key strains for deeper analysis with SMRT Sequencing.
Previous Klebsiella studies in the U.S. had determined that clonal group 258 was dominant in this country. In this project, however, scientists found that this group represented just a quarter of isolates. More than 35% of strains belonged to clonal group 307, with isolates collected in a number of hospitals. The remaining cases represented a number of different strain types. “We discovered that CG307 strains have been abundant in Houston for many years,” the scientists report, noting that this strain is as virulent as pandemic K. pneumoniae strains. “Our results may portend the emergence of an especially successful clonal group of antibiotic-resistant K. pneumoniae.”
The team used SMRT Sequencing to generate reference-grade genome assemblies and annotations for five strains “chosen to represent regions of the phylogenetic tree for which existing reference genomes deposited in [publicly] available databases were lacking,” the authors report. “In addition, genomes containing the blaNDM-1 and OXA-48 genes… were chosen to allow more in-depth analysis of these important strains.”
All five strains were represented in closed genome assemblies, with two to five plasmids for each. Analysis revealed that a reference strain previously collected in Pittsburgh and one of the Houston isolates “are lineally descended from a common ancestor organism,” the scientists write.
Sequencing efforts were followed up with transcriptome analysis and mouse models to produce data that could be relevant for the development of new therapies. The team also used the whole genome data to generate “classifiers that accurately predict clinical antimicrobial resistance for 12 of the 16 antibiotics tested,” they write. “We conclude that analysis of large, comprehensive, population-based strain samples can assist understanding of the molecular diversity of these organisms and contribute to enhanced translational research.”
We can’t wait for ASM Microbe! It’s one of the largest microbiology conferences in the world and is taking place June 1-5 in New Orleans. With more than 10,000 people expected to attend, it’s a key event for connecting with this community, hearing the latest scientific and clinical advances, and introducing microbial experts to the opportunities afforded by long-read SMRT Sequencing.
If you’ll be at the conference, please stop by booth #1328 to meet the PacBio team. We will be happy to provide updates on the newest enhancements coming to SMRT Sequencing in the next few months. One of those involves multiplexing bacterial sequencing to get results from many different organisms in a single SMRT Cell — exciting stuff!
This year’s meeting will feature several talks and posters using PacBio data to help monitor infectious disease, investigate metagenomes, characterize viruses and more. Here is the full list of posters – don’t miss Jessica Sieber, winner of the 2016 SMRT Grant Program, who will be presenting data in her poster on June 3rd at 12:15 pm.
Check out these talks utilizing PacBio Sequencing data:
Friday, June 2
Session 119 – Hot Topics in Invasive Staphylococcus aureus Infections
3:30 – 3:45 PM – Room #267
Comparative Genomics of Colonizing and Infectious S. aureus Isolates from Single Hosts
R. Altman et al. Icahn School of Medicine at Mount Sinai
Session 114 – Clinical Application of Genomics of Relevant Health Care Associated MDR Bacteria
5:00 – 5:15 PM – Room #265
Continuous Surveillance by Whole Genome Sequencing Identifies Genomic Characteristics of Highly Transmissible MRSA
J. Sullivan et al. Icahn School of Medicine at Mount Sinai
Saturday, June 3
Session 274 – Long Reads: Transforming Microbiology
3:15 – 3:45 PM – Room #267
Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing
Julia Oh et al. The Jackson Lab
3:45 – 4:00 PM – Room #267
Population Structure Analysis of Mycoplasma pneumoniae Using Whole Genome Sequencing
H. Diaz et al. Centers for Disease Control and Prevention
4:00 – 4:30 PM – Room #267
The Long and the Short of it: Metagenomic Binning, Assembly, and Discovery with Long Read Sequencing Platforms
Lizzy Wilbanks et al. University of California, Santa Barbara
4:30 – 4:45 PM- Room #267
2000 Genomes and Counting: An Update on NCTC3000, a Type Culture Reference Genome Project
Alexander et al. Public Health England
Sunday, June 4
Session 432 – Marine Microbial Activities and Interactions
3:45 – 4:00 PM – Room #238
Outstanding Abstract Award: Viral Predation Drives the Diversification of Natural Microbial Populations
A. Hussain et al. Massachusetts Institute of Technology
We look forward to seeing you in New Orleans!
The May issue of Genome Research is a special edition focusing on advances in sequencing technologies and genome assembly techniques. The research papers selected for this special issue cover reference-grade genome assemblies, structural variant detection, diploid assemblies, and other features enabled by new high-quality sequencing tools.
The issue kicks off with a perspective from NHGRI’s Adam Phillippy, who reflects on the history of sequencing and assembly. Dusting off publications from as early as 1979, he illustrates the remarkable pace of advances in this field for the past four decades. Phillippy has worked with just about every kind of sequence data, so his view of the current landscape is particularly instructive. “The biggest gains in contig lengths have come from single-molecule sequencing,” he writes. “Critically, 10-kb reads are longer than the most common repeats in both microbial and vertebrate genomes and can therefore generate highly continuous assemblies. In fact, the complete reconstruction of bacterial genomes—a process that used to require teams of people—is now automated and routine.” Phillippy also notes that long-read sequencing assemblies have spurred “a renewed interest in repetitive sequences, which can be properly analyzed for the first time” and are “even revealing new variation in the human genome.”
We are very pleased that more than half of the papers in this special issue feature our sequencing data and genome assemblies derived therefrom, underscoring PacBio’s leading role in long-read sequencing and de novo assembly. We congratulate all the authors for their exciting contributions to this special issue and encourage you to review these excellent publications:
- Discovery and genotyping of structural variation from long-read haploid genome sequence data: Scientists used SMRT Sequencing to scan human genomes for structural variants, finding that more than 89% of those found had been missed in the 1,000 Genomes Project.
- Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly: An exploration of the latest human reference assembly, which expands the number of alternate loci and for the first time includes sequence coverage of centromeres.
Plant and Animal Genomes
- Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data: This project used SMRT Sequencing data to generate genomes of three relatives of the model plant Arabidopsis thaliana,assembling all three genomes into only a few hundred contigs. Integration of optical mapping and chromosome conformation capture techniques yielded chromosome-scale assemblies of these repetitive plant genomes. The scaffolds even revealed some of the heterochromatic regions which are not present in gold standard reference sequences.
- Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster: Long-read PacBio sequencing allowed scientists to characterize complex satellite DNA regions, which have been challenging to resolve due to their repetitive nature.
- Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications: This analysis of Eurasian crow genomes found that assembling two high-quality genome references using SMRT sequencing, combined with optical mapping, made it possible to recover missing regions and correct errors in a previous short-read-only assembly.
- An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations: Scientists use SMRT Sequencing of full-length cDNAs for genome annotation of a new wheat genome assembly, identifying protein-coding genes and noncoding RNA genes with high confidence.
New Tools for Long-Read Data
- Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm: Scientists present a new hybrid assembly algorithm to combine short-read and long-read data for optimal accuracy and contiguity.
- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation: Based on Celera Assembler, Canu was designed for long-read data and significantly reduces computational time for genome assembly.
- HINGE: long-read assembly achieves optimal repeat resolution: This assembler focuses on resolving challenging repeats.
- Fast and accurate de novo genome assembly from long uncorrected reads: For long-read assembly, scientists pair Racon with miniasm to rapidly generate high-quality consensus sequences without an error-correction step.
- HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies: This tool performs fast, high-resolution haplotype assembly from data produced by long-read sequencing, short-read sequencing, and other genome analysis technologies.
- HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies: This method calls structural variants from human genomes using short-read and long-read sequence data; tests showed it improved detection rates for several types of variants.
A publication in BMC Genomics upends some of the conventional wisdom about variants that may cause virulence in Mycobacterium tuberculosis. Scientists at San Diego State University used SMRT Sequencing to produce a complete assembly of the pathogen, finding that earlier assemblies encountered problems due to GC bias and repetitive DNA.
“SMRT genome assembly corrects reference errors, resolving the genetic basis of virulence in Mycobacterium tuberculosis” comes from Afif Elghraoui, Samuel Modlin, and Faramarz Valafar. The team used long-read PacBio sequencing on an attenuated strain of M. tuberculosis, which is often compared to a virulent strain to highlight sources of pathogenicity. The same strain was previously sequenced with Sanger technology and published in 2008.
The sequencing process required just two SMRT Cells to achieve an average of 217-fold coverage. Assembly resulted in a single contig. Later, the scientists went back to the data and found that the same sequence results were achieved using results from only one of the SMRT Cells. A comparison of the new assembly to the previous one, as well as to a reference assembly of the virulent M. tuberculosis strain, found that the Sanger assembly overstated the genetic differences between the two microbes.
“Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies,” the authors report. Many of the previous sequencing errors were found in genes known to be repetitive and GC-rich. “Our results constrain the set of genomic differences possibly affecting virulence by more than half, which focuses laboratory investigation on pertinent targets and demonstrates the power of SMRT sequencing for producing high-quality reference genomes,” they add.
Elghraoui et al. note that SMRT Sequencing offers significant advantages in accuracy and read length. “The random error profile of this technology allows for consensus accuracy to increase as a function of sequencing depth,” they write, reporting a QV greater than 60 for their assembly. In addition, the long reads “allowed us to easily and unambiguously capture known structural variants in H37Ra, as well as two novel to the strain.”
These results lead the authors to “advise caution when analyzing GC-rich and repetitive sequences among reference genomes, not to mention draft genomes,” they write. “As de novo assembly can be routinely performed for microbes using single-molecule sequencing, we strongly recommend this for mycobacteria.”
Microbiology fans can find the PacBio team at the upcoming ASM conference in booth #1328.
A new publication in BMC Genomics explores the use of RNA normalization and 5’ cap selection to enhance results from Iso-Seq studies using SMRT Sequencing. Scientists from the University of Edinburgh report that these modifications significantly boosted transcriptome coverage in a study of chicken.
“Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human” comes from lead author Richard Kuo, senior author David Burt, and collaborators. The team chose this project because existing chicken annotation resources have far fewer genes than expected, with very little evidence of alternative splicing. This situation was believed to have stemmed from prior technology limitations.
In earlier studies, “researchers had to choose between low-throughput, costly methods to generate accurate full-length transcript models, such as cDNA cloning or high-throughput, cheaper methods to generate imprecise transcript models, such as short read RNA sequencing,” Kuo et al. write. “The current status of chicken annotation represents a prime example of this trade off.” The annotation has just over 17,000 genes and fewer than 18,000 transcripts, far less on both counts than other vertebrates.
RNA sequencing based on short-read data is particularly challenged in identifying essential transcription characteristics, the authors note: transcript start and termination sites, transcriptional noise, and exon chaining. These problems “are practically eliminated with long read sequencing where the full-length of a transcript may be sequenced in a single read,” they add.
For this project, scientists deployed SMRT Sequencing, tweaking the Iso-Seq protocol to incorporate RNA normalization as well as 5’ cap selection. They analyzed RNA from chicken brain and embryonic tissues, normalizing both libraries but using 5’ cap selection only for embryo samples. They also collected short-read data to compare results.
This approach yielded some 60,000 transcripts and 29,000 genes, including more than 20,000 novel lncRNA transcripts. The team also found nearly 15,000 unmapped reads from both libraries, likely representing “a significant number of genes that are not currently represented in the Chicken annotations due to gaps in the genome assembly,” they report. They compared their findings to results from Thomas et al., an earlier publication using SMRT Sequencing for chicken transcriptome analysis that did not include the modifications. Kuo et al. estimate that their normalization protocol “appears to have provided a transcriptome coverage efficiency of more than 5 times that of the previous study,” they write. “This means that for every SMRT cell used with the normalization method, 5 SMRT cells would be required without normalization to achieve the same amount of transcriptome coverage.”
The team’s new PacBio-based transcriptome “suggests a level of transcriptional complexity that is more consistent with expectations based on the well-characterised human genome,” the scientists conclude. “Using PacBio sequencing to create a high quality transcriptome annotation can correct [underrepresentation] issues that are common in many of the public annotations.”
Richard Kuo will be speaking about this research at the SMRT Leiden conference taking place this week in the Netherlands. Follow along at #SMRTLeiden!
Screening for pathogenic variants associated with polycystic kidney disease is now more accurate and affordable with SMRT Sequencing. A new paper in Human Mutation from scientists at Leiden University Medical Center and other institutes reports the evaluation of long-read PacBio sequencing as a potential replacement for costly, time-consuming Sanger pipelines.
“Detecting PKD1 variants in polycystic kidney disease patients by single-molecule long-read sequencing” comes from lead author Daniel Borràs, senior author Seyed Yahya Anvar, and collaborators. The team notes that previous efforts to get away from conventional tools by implementing short-read sequencing were never successful enough for clinical use. “A genetic diagnosis of autosomal dominant polycystic kidney disease (ADPKD) is challenging due to allelic heterogeneity, high GC-content, and homology of the PKD1 gene with six pseudogenes,” the scientists explain. In earlier studies, ambiguities from short-read sequencing “produced low true positive variant detection rates of 28% to 50% for the duplicated region of PKD1, and many false positives (10%) due to misalignments, low quality alignments and contamination by residual amplification of pseudogenes.”
The team predicted that long-read sequencing could address these issues, and evaluated the technology on 19 previously analyzed samples. They designed long-range PCR products to cover the coding regions of PKD1 and PKD2, and used PacBio’s Long Amplicon Analysis tool to reconstruct alleles from reads 3 kb or longer. Results were compared to those obtained previously from Sanger sequencing (requiring laborious long-range PCR, followed by many nested PCR reactions) and multiplex ligation-dependent probe amplification (MLPA). An initial examination of coverage found that “all PKD1 and PKD2 exons … from 19 ADPKD patients could be completely covered using long-reads.”
Variants detected with SMRT Sequencing and other approaches were compared; scientists determined that 17 high-confidence variants were detected by PacBio but not by Sanger. PacBio sequencing missed one pathogenic insertion, resulting in accurate calls for 18 of the 19 samples tested. “This provided a diagnosis for 94.7% of the patients, resulting in the correct detection of all PKD1 substitutions, single-nucleotide deletions, large deletions, one deletion-insertion, and 3 out of 4 insertions or duplications,” the scientists report.
These results point to SMRT Sequencing as an excellent replacement for older technologies to scan PKD1 and other medically relevant genes for pathogenic variants. “On top of reducing the PCR amplification steps required and limiting the implicit PCR artifacts, single molecule sequencing improves sequence alignments and aids in discriminating between homologous or repeated sequences, such as PKD1 pseudogenes,” the scientists write. “This provides a cleaner dataset for variant calling.”
The scientists conclude, “This method is highly valuable for a diagnostic setting, as it increases the resolution power of clinically relevant but difficult to sequence or to resolve genomic regions.”
Senior author Anvar from Leiden University Medical Center will be presenting at the SMRT Leiden events taking place this week. Follow along at #SMRTLeiden!
It’s DNA Day, the annual celebration of the discovery of the double helix, the completion of the Human Genome Project, and all things genetic. We like to take the opportunity to look back at DNA-based advances from the past year, and progress has been truly stunning. Just when we think it couldn’t get more awe-inspiring, scientists generate new results that prove us wrong.
One of the most impressive feats in the past year has been the proliferation of population-specific, reference-grade human genomes. From the Chinese genome assembly that recovered nearly 13 Mb of sequence missed in GRCh38 and produced new insights around alternative splicing to the diploid Korean genome assembly that detected nearly 12,000 novel structural variants — including several specific to Asian populations — these new resources are showing us how much sequencing must be done to represent the universe of natural human genetic variation. Several other country or population genome projects have reported results or are in the works, and we’re eager to see how this data fills in the blanks to help us better understand the human genome. Structural variation in particular is being detected more comprehensively than ever, with even small amounts of long-read sequencing helping scientists to connect these elements to their likely function.
We’ve also seen compelling work from the plant and animal research community. Just in the past year, scientists have published new high-quality genome assemblies for quinoa and goat, shattering contiguity records even for challenging genomes. In maize, researchers reported new studies that produced accurate gene copy number counts and a more complex transcriptome than anticipated. Alternative splicing was also the focus of a sorghum study. And we were delighted to learn that the Genome 10K (G10K) and Bird 10,000 Genomes (B10K) initiatives announced plans to ramp up their efforts to generate high-quality de novo genome assemblies.
On the microbial front, we were especially fascinated by a new report detailing the epigenetic changes that occur as free-living bacteria morph into symbiotic bacteria associated with a host. There was also a project that investigated how drug-resistance plasmids are swapped across bacterial species by analyzing the entire “mobilome” of carbapenemase-producing Enterobacteriaceae. And since we’re suckers for extremophile research, we couldn’t resist this genome profile of a single-celled diatom living in the Antarctic Ocean.
All of these projects were accomplished with SMRT Sequencing. On DNA Day, we’d like to congratulate the entire research community working to improve our understanding of genomics.
Today is Earth Day, a great time to reflect on the growing trend of conversation genomics. We are proud that many scientists are using PacBio long-read sequencing for the goal of rescuing endangered species and preserving delicate ecosystems around the world.
One of the first examples we saw of this approach came from Oliver Ryder at the San Diego Zoo Institute for Conservation Research. Ryder and his team performed SMRT Sequencing for the ‘alalā, a Hawaiian crow, which no longer existed in the wild. In this video, he describes how having a high-quality genome assembly for this bird will have a significant impact on biologists’ ability to breed and reintroduce healthy crows back to their native environment. Ryder is also a founder of the Genome 10K (G10K) project, which aims to create high-quality assemblies for 10,000 vertebrate species as part of a large-scale conservation effort.
We’ve also been impressed by public support for a crowdfunded conservation genomics project — this one for the kākāpō bird, a critically endangered species found only in New Zealand. David Iorns, founder of the Genetic Rescue Foundation, is using SMRT Sequencing to build a reference-grade de novo genome assembly for the bird, followed by resequencing all 125 remaining kākāpōs. These members of the parrot family are facing fertility issues, a major population bottleneck, and other challenges that make a conservation effort necessary to prevent them from going extinct.
Recently, conservation expert Rebecca Johnson from the Australian Museum Research Institute gave a talk on the de novo genome assembly of a koala. This lovable marsupial species has been on the radar of conservation biologists who want to protect it in part because it has a number of unique and interesting features. Johnson used SMRT Sequencing to analyze the 3.6 Gb genome, yielding what she calls the best marsupial assembly to date.
This year, Earth Day is also marked by the first-ever March for Science, including more than 500 marches across the globe to support better research funding and pro-science policies. We’ll be cheering on all the scientists involved in conservation genomics and other important efforts to protect our planet and all the creatures that call it home!
Researchers from the Okinawa Institute of Advanced Sciences published a compelling review article describing several recent clinically relevant projects they have completed using SMRT Sequencing. Released in the journal Human Cell, “Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area” comes from lead author Kazuma Nakano, senior author Takashi Hirano, and collaborators.
The team adopted long-read PacBio sequencing as an alternative to short-read sequencers that missed too many important genomic elements. “PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization,” they write. “These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions.”
The scientists present several examples of how this technology has made a difference in their work. Many of these studies were previously unpublished. While we can’t cover them all, here are a few vignettes that caught our attention:
- The team fully sequenced the genome of the Kurono strain of Mycobacterium tuberculosis, yielding a single, circular contig. GC content was as high as 80% across the genome, which also featured “117 sets of >1000 bp identical sequence pairs.”
- They sequenced the genome of a multidrug-resistant isolate of Acinetobacter baumanniicollected in a Nepalese hospital. The assembly, represented in two circular contigs for a chromosome and its plasmid, included several genes conferring drug resistance.
- SMRT Sequencing allowed the scientists to perform de novo assembly and methylation detection for several variants of Leptospira interrogans in a study designed to identify mechanisms underlying virulence in the zoonotic disease leptospirosis.
- A flu study relied on SMRT Sequencing for whole genome analysis of 48 influenza viruses isolated in Okinawa. The study included at least one sample from the H1N1 pandemic in 2009. “Our genomic data set contained temporal and spatial information about the seasonal and pandemic prevalence of flu in Okinawa,” the authors report. “Such insight gleaned will help elucidate the mechanism of acquired resistance to vaccines and drugs and thus inform future drug and vaccine development.”
- The team used long-read sequencing to explore why the incidence of gastric cancer is lower in Okinawa than anywhere else in Japan, despite consistent prevalence of Helicobacter pyloriacross the country. By conducting whole genome sequencing and methylation detection for eight pylori strains, the scientists spotted virulence factor-dependent motifs.
This review demonstrates several of the clinically relevant applications for SMRT Sequencing. PacBio “has significantly impacted basic science and biology and is reaching its influence into the clinical/medical atmosphere,” the scientists write. The technology is “ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization.”
A preprint from scientists at the University of Florida, Centro de Investigaciones Principe Felipe, and other institutes describes a new analysis tool to help boost quality of transcriptome studies. “SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification” comes from lead author Manuel Tardaguila, senior author Ana Conesa, and collaborators.
The automated pipeline for Structural and Quality Annotation of Novel Transcript Isoforms (SQANTI) was developed as a quality-assessment tool for transcripts discovered with SMRT Sequencing. SQANTI “calculates up to 35 different descriptors of transcript quality and creates a wide range of summary graphs to aid in the interpretation of the sequencing output,” the authors report.
Development of this new pipeline was spurred by the realization that different transcript analysis tools yielded different results, even for the same data set. “As an example, sequencing the mouse neural transcriptome with PacBio long reads, we obtained ~ 80,000, 12,000 and 16,000 different transcripts when applying Tapis, IDP or the ToFU pipeline, respectively,” the scientists write. “Implementing a comprehensive, quality aware analysis of PacBio reads is fundamental at a time when long read transcriptome sequencing is becoming more popular and important conclusions on transcriptome diversity will be drawn from these data.”
SQANTI consists of tools to classify transcripts by comparison to a reference annotation, analyze data by more than 30 metrics, and generate graphs to report results. The team tested it using neural tissue from mice, performing extensive RT-PCR validation to measure transcript expression. PacBio sequencing of the tissue identified many novel transcripts, but “an important fraction of the novel sequences are presumably bioinformatics or retrotranscription artifacts that can be removed by using SQANTI descriptors,” the scientists report.
They also evaluated results against data from short-read sequencing. “A comparison of Iso-Seq over the classical RNA-seq approaches solely based on short-reads demonstrates that the PacBio transcriptome not only succeeds in capturing the most robustly expressed fraction of transcripts, but also avoids quantification errors caused by unaccounted 3’ end variability in the reference,” Tardaguila et al. write. “SQANTI allows the user to maximize the analytical outcome of long read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.”
A new publication in Genome Research shows how the use of SMRT Sequencing, in combination with other technologies, can reveal far more about repetitive DNA and structural variants than short-read sequencing alone. In this paper, scientists compared genome assemblies produced with short reads, long reads, and optical maps to understand the performance of each approach.
From Uppsala University, the University of Munich, and Bionano Genomics, the team studied the Eurasian crow for this project. The resulting paper, “Combination of short-read, long-read and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications,” comes from lead author Matthias Weissensteiner, senior author Jochen Wolf, and collaborators. They used an existing short-read assembly and generated a de novo PacBio long-read assembly and an optical map with Bionano Genomics, all from the same individual.
The PacBio-only assembly alone delivered a major improvement over the short-read assembly. Contiguity increased by almost 90-fold, with the long-read assembly featuring a contig N50 longer than 8.5 Mb. The SMRT Sequencing assembly also resolved more than 70 Mb of sequence missed in the short-read assembly, including nearly 16 Mb of repetitive elements.
The various assemblies were then compared and joined to determine how each source of information contributed to a final, high-quality genome resource. This step allowed the team to spot mis-assemblies, which occurred more frequently in the short-read assembly. They detected 43 mis-joins in the short-read assembly, and fewer than half that number in the long-read assembly.
One of the motivating factors for this project was an interest in understanding the repetitive DNA associated with constitutive heterochromatin, which has an influence on recombination. To that end, the team analyzed large tandem repeat arrays in the crow genome and used population resequencing data to estimate effects on recombination rate. “We characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit,” the scientists report. They determined that the recombination rate was “significantly reduced in these regions.”
“Our results demonstrate the potential of combining independent technologies to discover previously inaccessible genomic features,” Weissensteiner et al. write. “With an emerging picture of genome architecture affecting the distribution of genetic diversity across genomes, the integration of large tandem repeat arrays into genome assemblies constitutes an important improvement.”
We’re delighted to see the release of another high-quality avian genome, which will support ongoing efforts in the B10K and G10K projects to represent as many species as possible.
We are excited to announce our 2017 series of SMRT Community Events and User Group Meetings (UGMs), where you can learn first-hand how members of the scientific community are leveraging the latest capabilities of SMRT Sequencing for a growing number of applications.
Our vibrant community of users are enthusiastic about sharing tips, exchanging ideas, and developing new applications. These upcoming events will facilitate just that — and we hope you can join us!
We are now accepting registrations for our SMRT Leiden, Americas East Coast UGM and Asia Pacific UGM. Please save the date for our Americas West Coast UGM, SMRT Developers Meeting, and EMEA UGM, taking place later in the year.
- May 2 – 4, SMRT Leiden: SMRT Scientific Symposium & Informatics Developers Meeting, Leiden, The Netherlands
- May 31 – June 1, APAC User Group Meeting, Seoul, South Korea
- June 27 – 28, Americas East Coast User Group Meeting & Workshops, Baltimore, MD
- September 6 – 7, Americas West Coast User Group Meeting & Workshops, Palo Alto, CA
- Fall 2017, SMRT Informatics Developers Meeting, TBD, MD
- November 2 – 3, EMEA User Group Meeting, Barcelona, Spain
Call for Speakers
Our scientific advisory committee is currently reviewing speakers for the East Coast UGM & Workshops. If you are interested in sharing your latest research, please submit a proposal when you register. The deadline for consideration is Wednesday, May 17.
We look forward to seeing you at our upcoming SMRT Community Events & User Group Meetings!
A new preprint offers an enticing look at transcriptome results from analysis of a hummingbird using SMRT Sequencing. In this study, scientists found new clues to explain unique attributes of the bird’s metabolism. The work was made possible through full-length isoform sequencing, which allowed deep, assembly-free analysis even though no reference genome was available.
“Single molecule, full-length transcript sequencing provides insight into the extreme metabolism of ruby-throated hummingbird Archilochus colubris” is now available on BioRxiv. From Rachael Workman, Alexander Myrka, Elizabeth Tseng, William Wong, Kenneth Welch, and Winston Timp, the paper describes a project designed to better understand how hummingbirds switch metabolic gears to focus on sugars or lipids as needed. “This metabolic flexibility is remarkable both in that the birds can switch between exclusive use of each fuel type within minutes,” they write, “and in that de novo lipogenesis from dietary sugar precursors is the principle way in which fat stores are built, sometimes at exceptionally high rates, such as during the few days prior to a migratory flight.”
The team used the Iso-Seq method with long-read PacBio data to generate full-length isoform sequences, focusing on the liver of Archilochus colubris. According to the paper, this represents “the first high-coverage transcriptome of any single avian tissue.” They also aligned transcripts to Calypte anna, a recently completed hummingbird assembly that also made use of SMRT Sequencing.
Workman et al. report that the use of long-read PacBio data allowed for more accurate views of isoforms and alternative splicing, even without a reference genome. “Using full-length transcript data, we found alignment unnecessary to generate clear pictures of the gene isoforms,” they note. “The long reads negate the need for transcript assembly, a precarious analysis in the absence of a genome.” Nearly half of the reads in the final analysis covered full-length genes, including the 5’ and 3’ ends as well as the polyA tail.
The team used the COGENT pipeline to assign transcripts to gene families and focus on unique isoforms. “COGENT is specifically designed for transcriptome assembly in the absence of a reference genome, allowing for isoforms of the same gene to be distinctly identified from different gene families,” the scientists write. Their analysis generated a highly diverse set of isoforms, which the authors believe “represents a nearly complete transcriptome of the hummingbird liver.”
With that dataset, the scientists found genes unique to hummingbird. “These genes showed a specific enrichment for pathways involved in lipid metabolism — suggesting that the hummingbird has evolved variants of these genes to achieve its high levels of metabolic efficiency,” they report.
The scientists note that follow-up functional assays will be an important next step in understanding and verifying the function of many genes of interest.
We’re excited to be heading to Washington, DC, for the annual meeting of the American Association for Cancer Research. The PacBio team always enjoys hearing about the latest in cancer translational research at AACR, along with thousands of leading scientists in the field.
Many of those scientists have already learned that SMRT Sequencing provides a unique view into cancer, revealing structural variation, phasing distant variants, and delivering full-length isoform sequences. With uniform coverage, industry-leading consensus accuracy, and reads extending to tens of kilobases, PacBio long-read sequencing gives researchers the ability to monitor and make sense of even the most complex changes in tumor DNA.
If you’ll be attending AACR, stop by booth #1617 to get a first look at the new Integrative Genomics Viewer (IGV v3) featuring greatly improved support for SMRT Sequencing data. We’ll be demonstrating the new features in IGV v3 with a PacBio whole genome sequencing dataset (the SK-BR-3 Human Breast Cancer cell line). Visit us to see how PacBio data visualized in IGV v3 reveals the hidden landscape of somatic structural variants in a cancer genome including translocations, gene fusions, and novel mobile element insertion sites.
In addition, check out these posters from our scientists to see SMRT Sequencing data in action for cancer studies:
SMRT Sequencing of Full-length Androgen Receptor Isoforms in Prostate Cancer Reveals Previously Hidden Drug Resistant Variants
Tyson Clark, Ph.D., PacBio
Sunday, April 2, 1 p.m. – 5 p.m., Abstract #425/25, Section 17
Simplified Sequencing of Full-length Isoforms in Cancer on the PacBio Sequel System
Meredith Ashby, Ph.D., PacBio
Monday, April 3, 1 p.m. – 5 p.m., Abstract #2442/29, Section 17
Detection of Low-frequency Somatic Variants using Single-molecule, Real-time Sequencing
Primo Baybayan, PacBio
Wednesday, April 5, 8 a.m. – 12 p.m., Abstract #5366/22, Section 15
Finally, we’ll be launching a new SMRT Grant program at AACR. Just tell us how full-length isoform sequencing of your cancer samples will drive new discoveries in your research for a chance to win library construction, PacBio sequencing, and bioinformatics analysis for your project. Check out the rules and submit your 250-word proposal by May 15. Many thanks to our partner GENEWIZ for helping us make this grant program possible!