This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.
Search PacBio’s Blog
Blog readers may recall that last year’s SMRT Grant winner was Renying Zhuo from the Chinese Academy of Forestry. We’re pleased to report that the project is now complete!
Zhuo proposed sequencing the genomes of two strains of the Sedum alfredii plant from the same ecosystem — one that accumulates cadmium ions from polluted soil and one that doesn’t. The goal was to use high-quality assemblies for comparative genomic analysis to determine the genetic mechanisms responsible for this remediation effect.
Plant DNA was sequenced on the Sequel System by RTL Genomics, and genome assembly was performed by Computomics. (We’re also grateful to Sage Science and Experiment, the other co-sponsors of the SMRT Grant program in making this a worldwide democratic event.) Both plant genomes made it into the “1 Mb contig N50 club” (#1MbCtgClub on Twitter), with contig N50s of 1.08 Mb for Sedum alfredii HE and 1.26 Mb for Sedum alfredii NHZ.
Zhuo and his team will now dive into a deep, detailed comparative analysis between the two genomes to identify genes associated with metal accumulation. Ultimately, they hope the results can be used to improve bioremediation efforts for soils contaminated with heavy metals.
Detailed stats for the two plant assemblies from Computomics:
|Sedum alfredii HE||Sedum alfredii NHZ|
|Contig size [bp]||235739357||397076979|
|Longest Contigs [bp]||3521758||5719050|
|Contigs > 1 M [#]||74||117|
|N50 contig length [bp]||1087129||1256010|
|L50 contig count [#]||65||91|
|BUSCO complete [%]||88.5||90.1|
|BUSCO complete single copy [%]||60.9||21.8|
|BUSCO complete duplicated [%]||27.6||68.4|
|BUSCO fragmented [%]||2.4||1.7|
|BUSCO missing [%]||9.1||8.1|
Voting is now open for this year’s Plant and Animal SMRT Grant program. Check out the five finalists and cast your vote by April 5!
Efforts to produce a reference-grade goat genome assembly for improved breeding programs have paid off. A new Nature Genetics publication reports a high-quality, highly contiguous assembly that can be used to develop genotyping tools for quick, reliable analysis of traits such as milk and meat quality or adaptation to harsh environments. The program also offers a look at how different scaffolding approaches perform with SMRT Sequencing data.
“Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome” comes from lead authors Derek Bickhart, Benjamin Rosen, and Sergey Koren; senior author Tim Smith; and collaborators. The large team of scientists is affiliated with the USDA Agricultural Research Service, National Human Genome Research Institute, the University of Washington, and many other organizations.
The project was motivated by a clear need to develop methods for high-quality livestock genome assemblies to benefit breeding communities. Goat offers a particular boost to developing countries, where these animals are a primary source of textile fiber, milk, and meat. “A finished, accurate reference genome is essential for advanced genomic selection of productive traits and gene editing in agriculturally relevant plant and animal species,” the scientists report. Previous efforts to sequence the goat genome with short reads resulted in a highly fragmented assembly that could not resolve repetitive and other challenging regions. For this work, the team analyzed the genome of a highly homozygous male San Clemente goat (Capra hircus) using a number of technologies.
They chose SMRT Sequencing because its long reads could characterize even the most difficult genomic regions. “Initial assembly of the PacBio data alone resulted in a contig NG50 … of 3.8 Mb,” the team reports. PacBio contigs were then connected with optical mapping and Hi-C data to create extremely long scaffolds in the final 2.92 Gb assembly. “These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps,” they write. The assembly is 400 times more continuous than the previous short-read assembly.
To learn more about how these technologies complement each other, the scientists analyzed results from optical mapping and Hi-C data separately. They found that Hi-C data yielded a tenth the number of scaffolds that optical mapping did, but it led to more misoriented contigs, which were correlated with restriction site density. “Ultimately, we found that sequential scaffolding with optical mapping data followed by Hi-C data yielded an assembly with the highest continuity and best agreement with the [radiation hybrid] map,” the team reports, noting that this approach is significantly less expensive than generating a short-read draft genome assembly and manually finishing it to high quality.
The final assembly includes notoriously difficult regions, such as centromeric DNA and the Y chromosome. Two chromosomes appear to be completely assembled, and two others seem to include “the elusive p arm,” Bickhart et al. write.
Of course, since the scientists were focused on building a resource that would help breeding programs, they also assessed its potential impact in that space. “Chromosome-scale continuity of the ARS1 assembly was found to have appreciable positive impact on genetic marker order for the existing C. hircus 52K SNP chip3,” they report.
Going forward, the team hopes to generate a phased diploid assembly for C. hircus.
Our team of scientist reviewers has considered hundreds of submissions for the latest SMRT Grant award and narrowed the selection to five finalists. Now it’s your turn! We welcome the community to vote for their favorite project now through April 5th. The winner will receive SMRT Sequencing and genome assembly or Iso-Seq analysis sponsored by PacBio and our partners, the Arizona Genomics Institute and Computomics.
Here’s a look at the entries from our five finalists:
Project: Temple Pitviper
Principle Investigators: Mrinalini Mrinalini, National University of Singapore; Ryan McCleary, Utah State University; Manjunatha Kini, National University of Singapore
The highly venomous snake Tropidolaemus wagleri, common to southeast Asia, has a number of unique features that merit further study. Its venom contains toxic proteins not found in other species of snake, including a group of novel toxins that have not been well characterized. This reptile also has sex-specific phenotypes, which is unusual for snakes; interestingly, these differences are not seen until the snake reaches sexual maturity, but the biological trigger for this is not understood.
Project: Solar-powered Slug
Principle Investigators: Carola Greve, Zoological Research Museum A. Koenig; Alexander Donath, Zoological Research Museum A. Koenig
Scientists propose sequencing the genome of Elysia timida, a Mediterranean sea slug that has the rare ability to consume algae and keep the ingested chloroplasts functioning. Inside the slug, these chloroplasts continue photosynthesis, building up a starch reservoir that can feed the slug for three months. The project aims to scour the genome for genes associated with this unique ability, to understand the mollusk’s eco-friendly biology, as well as the process of incorporating organelles.
Project: Pink Pigeon
Principle Investigators: Matthew Clark, Earlham Institute; Cock Van Oosterhout, University of East Anglia
This effort would use the Iso-Seq method to generate the transcriptome of the pink pigeon, an endangered species native to Mauritius. The species suffers high levels of infertility and pathogen susceptibility, possibly related to a population bottleneck. Scientists would use SMRT Sequencing data to study the bird’s loss of genetic variation and to find variants associated with fitness and pathogen resistance.
Project: Explosive Beetle
Principle Investigators: Tanya Renner, San Diego State University; Aman Gill, University of California, Berkeley; Wendy Moore, University of Arizona; Kipling Will, University of California, Berkeley; Athula Attygalle, Stevens Institute of Technology
The bombardier beetle (Brachinus elongatulus) is known for its ability to “explosively discharge a toxic mix of quinones, oxygen, and water vapor at over 100°C,” this proposal says. Scientists would sequence the insect’s 500 Mb genome to understand insect chemical biosynthesis and biodiversity. This would represent the first genome sequence for the beetle suborder Adephaga.
Project: Dancing with Dingoes
Principle Investigators: Bill Ballard, University of New South Wales; Claire Wade, University of Sydney, Sydney, Australia
This team aims to sequence the 2.5 Gb genome of the Australian dingo and compare it to that of the wild wolf and domestic dog to understand the evolutionary process that led from wild animal to pet. According to the proposal, this project will also “inform aspects of indigenous Australian culture and advance our understanding of the Australian continent’s top-level predator.”
Congratulations to all five finalists for their excellent proposals – may the most interesting genome win! Help support your favorite project now until April 5th.
A new genome assembly has remarkable promise to boost the global food supply. Scientists from King Abdullah University of Science and Technology and other institutions sequenced quinoa, a nutritious grain that can grow in marginal lands and other suboptimal environments. Their assembly offers new clues that could help improve breeding efforts to make the plant more accessible worldwide.
“The genome of Chenopodium quinoa” was published recently in Nature by lead author David Jarvis, senior author Mark Tester, and a large group of collaborators. They focused on this plant, which is believed to have been domesticated more than 7,000 years ago in South America, because it is rapidly becoming accepted as a superfood with potential to address the growing food supply challenge. Quinoa is a relatively low-sugar, gluten-free grain with lots of nutrients. But expanding its use as a crop around the world requires new breeding efforts, the authors report. They used SMRT Sequencing to generate a high-quality, chromosome-scale genome assembly for the allotetraploid plant, a valuable resource that can now be used by breeding programs to produce shorter, higher-yielding plants with increased stress tolerance and other desirable traits.
The team sequenced a plant from coastal Chile, followed by scaffolding with Bionano Genomics and Dovetail Genomics tools. The assembly is 1.39 Gb, represented in fewer than 3,500 scaffolds. Ninety percent of the genome is covered in just 439 scaffolds. “This assembly represents a substantial improvement over the previously published quinoa draft genome sequence, which contained more than 24,000 scaffolds with 25% missing data,” the scientists report. Iso-Seq analysis and other annotation methods resulted in nearly 45,000 gene models, while a BUSCO analysis found that more than 97% of reported genes were included in the assembly. The group also sequenced two diploids from ancestral quinoa relatives.
One of the most exciting findings from the project was the discovery of a transcription factor that is believed to regulate production of saponins, bitter-tasting molecules in the quinoa shell. A premature stop codon found in sweet quinoa strains suggests that it may be possible to breed these saponins out to produce a plant more amenable for farming.
“These resources provide the foundation for accelerating the genetic improvement of the crop, with the objective of enhancing global food security for a growing world population,” Jarvis et al. write.
A recent effort to understand the genetic mechanisms behind swappable elements of drug-resistance among bacteria built on previous studies of Enterobacteriaceae isolates collected at the National Institutes of Health Clinical Center. The work was made possible by high-quality genome assemblies of these organisms generated earlier with SMRT Sequencing technology.
In this project, scientists from the U.S., France, and Brazil teamed up to learn precisely how drug-resistance plasmids are spread from one species to another. They report the results of that investigation in mBio with the publication “Mechanisms of Evolution in High-Consequence Drug Resistance Plasmids” from lead author Susu He, senior author Fred Dyda, and collaborators. The team focused on the full complement of mobile elements (or the “mobilome”) found in carbapenemase-producing Enterobacteriaceae. “The availability of highly accurate plasmid assemblies for these strains based on long-read PacBio SMRT sequencing allows for the unambiguous and precise annotation of mobile elements,” they report.
The scientists analyzed plasmid evolution from isolates collected during an outbreak of carbapenem-resistant Klebsiella pneumoniae at the NIH Clinical Center in 2011 and 2012 as well as from other samples collected at the center over several years. By tracking target site duplications in samples, the team could infer the evolution of drug resistance. “We are able to propose the exact historical molecular events underlying plasmid rearrangements which provide a basis for understanding how antibiotic-resistant strains change over time, with significant implications for combating plasmid-mediated antimicrobial resistance,” they write.
Of course, that raises the question of which evolutionary mechanisms are causing the changes they characterized. The scientists found two mobile element types — IS26 and Tn3 transposons — that appeared to be driving drug resistance evolution in the K. pneumoniae samples studied. However, they note, there was no clear explanation for that discovery. “This analysis revealed that plasmid reorganizations occurring in the natural context of colonization of human hosts were overwhelmingly driven by genetic rearrangements carried out by replicative transposons working in concert with the process of homologous recombination,” the authors report, adding that perhaps this kind of information will one day inform new approaches to combat antibiotic resistance.
“The rapidly decreasing cost of high-quality, long-read sequencing will enable the type of analysis described here to be applied more broadly to the problem of how resistance plasmids evolve in patients, hospitals, and the environment,” the scientists conclude.
Now, with the Sequel System and the recently released protocols for multiplexed microbial genome assembly (template preparation and data analysis), this application is even more accessible for the scientific community.
A recent Nature publication from a large team of scientists in Europe, Canada, and the US reports the use of SMRT Sequencing to elucidate the genome of Fragilariopsis cylindrus, a single-celled eukaryotic diatom adapted to living in polar waters of the Antarctic Ocean. The work has implications for the biotechnology industry, which looks to extremophiles as a potential source of important enzymes.
“Evolutionary genomics of the cold-adapted diatom Fragilariopsis cylindrus” comes from lead author Thomas Mock, senior author Igor Grigoriev, and many collaborators at the University of East Anglia, Earlham Institute, Joint Genome Institute, University of California, Berkeley, and several other organizations. The project investigated how this diatom evolved to thrive in its extreme environment, frequently living in high salinity directly under sea ice.
To achieve this, the team started by sequencing the F. cylindrus genome using both Sanger and PacBio systems. For SMRT Sequencing, the scientists produced two libraries with different insert sizes (4 kb and 20 kb) and ran seven SMRT cells, which yielded 63-fold coverage of the genome. The team used the diploid-aware FALCON assembler, which generated a 59.7 Mb assembly with 745 primary contigs. In an analysis and comparison to the Sanger assembly, the scientists determined the PacBio assembly was highly accurate in sequence (ranging from 99.65% to 100%) and structure (through validation fosmid comparison).
F. cylindrus is characterized by highly divergent alleles, which represent nearly a quarter of its genome. An analysis of those genes determined that the “divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO2,” the scientists report. “Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation.” The team hypothesized that allele diversification took place after the last glacial period and has been maintained because the variety of gene content allows for rapid adaptation to a changing environment.
The Earlham Institute issued a press release about the project, including this comment from scientist Pirita Paajanen: “This is the first time at EI that a genome of this type was assembled into chromosomes. It is only very recently that the technology has been developed to cope with such a highly heterozygous organism and the data show that this diatom does actually have a large amount of variation within their genes.”
The second day of AGBT featured a number of great talks and posters, and also our user workshop called “Covering All the Bases with SMRT Sequencing.” We’d like to thank the hundreds of attendees who crowded into the room for this event!
The workshop kicked off with Nezih Cereb, CEO of Histogenetics, who spoke about using long-read PacBio sequencing for typing HLA class I and II genes, which are important for applications such as matching organ transplants to recipients. The company has been performing industrial-scale SMRT Sequencing since it first acquired its PacBio RS II instrument, but recently increased capacity further by adding the Sequel System. Histogenetics types thousands of HLA samples each day with these instruments, and Cereb noted that SMRT Sequencing is essential for its ability to phase mutations in the HLA alleles. This layer of information cannot be accessed with short-read or Sanger technologies but is critical for understanding an individual’s immune function. Cereb told attendees that the Sequel System has performed so well that his company acquired three more of these sequencers to boost HLA typing throughput and allow new investigations into other complex regions, such as KIR. He concluded by saying that sequencing the full HLA genes is now the gold standard for typing samples.
Next up was Margaret Roy from Calico Life Sciences presenting results of a de novo genome sequence for the naked mole rat. The rodent has a remarkably long life span and resistance to cancer, both of which make it an appealing model to the Calico team. There were two existing assemblies for it, but both had been done with short-read sequencing and were highly fragmented. Roy and her team used SMRT Sequencing to collect libraries with fragments of at least 25 kb and 45 kb and conducted sequencing on both the PacBio RS II and Sequel Systems. While the assembly is not yet complete, Roy told attendees that its metrics look good: the 2.5 Gb genome is represented in just 493 contigs, with the largest contig covering 71 Mb. The team is working to add scaffolding data from BioNano Genomics and will integrate additional data sets in the near future to achieve a high-quality final assembly for annotation. Roy said that the Sequel has been a welcome addition for the project, because lab members can load a tenth of the library onto a SMRT Cell and get five times the amount of data they would have with the PacBio RS II system. Once the project is complete, Roy said, she anticipates publishing the genome and releasing it publicly.
The final workshop speaker was our CSO, Jonas Korlach, who offered a look at where the Sequel System is currently and future improvements in the works. He showed a map of PacBio sequencer installations, noting that there are now about as many Sequel Systems in labs as PacBio RS II systems. He also reviewed some exciting applications of SMRT Sequencing, including shotgun metagenomics, human de novo assemblies, Iso-Seq analysis, and more. Looking ahead, Korlach said users can expect the Sequel System throughput to double this year and again next year, followed by a new SMRT Cell with eight times the number of zero-mode waveguides by the end of 2018. In total, this will enable a 30-fold increase of throughput, which should make it possible to complete a de novo human assembly for about $1,000. For only structural variation coverage, the cost could be as little as $200 per person.
In other conference talks during the day, Emma Teeling from the University College Dublin made a compelling case for her unique study of bats. These organisms have not been well represented by the genomics community, but she expressed hope that it would be possible in the not-too-distant future to achieve chromosome-level assemblies for bats using long-read sequencing and other advanced technologies. Separately, Mark DePristo from Google’s Verily Life Sciences unit presented results of a deep learning tool trained to spot variants from images of sequence reads. DeepVariant, which won an award for accurate SNP calling in the PrecisionFDA competition, has been used to call variants in PacBio data with excellent results; DePristo noted that it’s one of the few diploid variant callers available.
In one of the last talks of the day, Mike Schatz from Johns Hopkins University and Cold Spring Harbor Laboratory shared results of sequencing, assembling, and analyzing personalized, phased diploid genomes with Illumina, 10x, and PacBio data. The PacBio and 10x assemblies were most contiguous, but Schatz pointed out that the 10x assembly had many unknown bases, where the PacBio assembly was made up of complete contigs. Those platforms also led to more structural variant calls than the short-read data, but the 10x approach was not able to detect the range of variants that SMRT Sequencing could, missing long insertions and other events. Schatz reported a large and unexpected number of translocations identified with PacBio data, noting that follow-up studies confirmed they were real. He also said that SMRT Sequencing data has the best concordance, outperforming both Illumina and 10x results. His talk really got the audience excited about the power of using personalized diploid genomes to mine for structural variation and understand its effects on regulation.
It’s hard to believe there’s only one day left. We’re already wearing down but eager to see what else AGBT has in store for us!
We’re thrilled to be at the AGBT conference this week, taking place this year in Hollywood, Fla. On the first full day of the meeting, everyone’s mandatory wristbands look shiny and new (we suspect by the end of the week they’ll be as wilted as us). And we’ve even been getting that work/life balance down thanks to some beach volleyball with our friends from BioNano Genomics and Swift Biosciences.
At the opening session on Monday, Eimear Kenny from the Icahn School of Medicine at Mount Sinai showed why it’s essential to fully understand natural genetic diversity in a fascinating talk about a deep analysis of hospital patients from across New York City. She offered one example of a variant that’s incredibly rare in most populations, but is found in about 2% of people of Puerto Rican descent and is likely pathogenic. We were delighted to hear the presentation, which fits nicely with the efforts of many PacBio users to generate population-specific reference genomes to help characterize the full breadth of natural genetic variation.
Tuesday’s talks included an infectious disease theme, with several speakers supporting the idea that the global incidence of viral outbreaks is rising. These talks made it clear that the optimal response to these health threats involves having complete genome assemblies, including accessory genomes, where genes associated with antimicrobial resistance are often found.
The infectious disease theme continued with a talk from the Broad Institute’s Daniel Neafsey, who presented results of an ongoing effort to produce a high-quality genome assembly for Aedes aegypti, the mosquito responsible for transmitting Zika and other viruses. As part of the Aedes Genome Working Group, Neafsey’s team is working to replace a 10-year-old Sanger assembly of this mosquito with a PacBio-based assembly. The project is not yet complete, but already represents a big step forward despite the organism’s high heterozygosity and highly repetitive content: the new assembly used FALCON-Unzip to reduce the number of contigs by at least 10-fold and boost the contig N50 to nearly 2 Mb. Most importantly, gene content is far more complete in the new assembly. Neafsey offered the example of a sex determination gene, Nix, which was absent from the 2007 assembly but was found in the new assembly and could be essential for CRISPR-based efforts to control mosquito populations. Neafsey also showed the results of several scaffolding technologies — including Dovetail Genomics, 10x Genomics, and BioNano Genomics — and noted that the final result should include data from all three approaches. In the next few weeks, the team will integrate all this information, remove homologous contigs, and complete annotation. Neafsey noted that he hopes this work inspires other research communities to improve older draft assemblies they’ve been working with for other organisms. If you’re attending AGBT, you can see more assembly details at poster #1105.
Speaking of major genome improvements, Jason Underwood from the University of Washington and PacBio spoke about long-read genome assemblies of primates, as well as a new approach to understanding transcripts. The standard pipeline for building better primate assemblies in his lab involves PacBio sequencing, FALCON assembly, and scaffolding with BioNano Genomics mapping. Structural variants are then called with the SMRT-SV protocol. This has resulted in drastic improvements, such as a 560-fold more complete and contiguous gorilla assembly. Underwood also spoke about projects designed to understand human-specific variation that can be identified with these improved resources. Using the Iso-Seq method, the team is sequencing full-length cDNAs; in one recent study, they used the Sequel System to generate 118,000 full-length reads from a single SMRT Cell. They also developed Iso-Cross, through which they compare transcripts from two closely related organisms to each other (such as human and chimpanzee); the ones that map better to the organism they came from are more likely to have functional and specific roles. One example Underwood showed was a 1.9 kb human-specific deletion that removes an exon found in our close primate relatives. He told attendees that their investigation has turned up 200 human-specific variants that seem likely to have functional importance.
We invite all AGBT attendees to visit us in suite 317, where our fun Lego station lets everyone build their own plastic doppelganger!
We’re heading cross-country to the Advances in Genome Biology and Technology (AGBT) Meeting starting Monday in sunny Hollywood, Florida. There will be several great opportunities to learn about how scientists are using SMRT Sequencing and the Sequel System throughout the meeting, and we hope you have time to enjoy at least a few.
We’ll be hosting a one-hour workshop on Wednesday, February 15th, at 3:30 pm in the Grand Ballroom. Speakers will include Calico’s Margaret Roy, sharing her experience using the Sequel System for de novo sequencing of the naked mole-rat genome; Nezih Cereb of Histogenetics, discussing high-throughput HLA Class I whole gene and HLA Class II long-range typing using targeted sequencing and our own CSO Jonas Korlach, presenting the road map for Sequel performance improvements. The workshop won’t be live streamed, but we will be recording talks to share afterward and we will be blogging from the event. We’ll also be providing coffee and dessert in case you need an afternoon energy boost!
We’re looking forward to several program presentations that will include SMRT Sequencing results. On Tuesday, February 14th, don’t miss talks in the concurrent general biology session from Jason Underwood and Daniel Neafsey. The next night, Mark DePristo and Mike Schatz will speak in the informatics session. In addition, there will be several posters demonstrating SMRT Sequencing technology in various applications. Check out the complete list of our AGBT 2017 presentations and activities.
As usual, we’re proud to be sponsoring AGBT and helping to facilitate this event. Don’t forget to stop by our hospitality suite (#317) to build your own minifigure doppleganger and pose them with our Sequel System in the ‘lab’. We hope to see you there!
If you didn’t get to the Plant and Animal Genome meeting this year, you missed a great workshop featuring SMRT Sequencing users and the fascinating projects they’re working on across plant, animal, agricultural, and conservation sciences and human health. Here are quick summaries of each talk, with full video recordings available for more detail.
Our event kicked off with PacBio CSO Jonas Korlach welcoming attendees and delivering an update on the genomics community’s impressive advances with SMRT Sequencing. There are now more than 2,000 publications citing the PacBio long-read technology — a rate of about 30 per week. He also spoke about improvements to the platform, including better assembly tools such as FALCON and FALCON-unzip as well as the recently released Sequel System chemistry that delivers 5-8 Gb of data per SMRT Cell and significantly reduces DNA input requirements. These improvements make it possible to run a broader range of projects on the Sequel System.
Representing the plant side of the conference, the University of Arizona’s Rod Wing spoke about using SMRT Sequencing to produce high-quality genome assemblies for several varieties of rice. He’s undertaken this project to help develop higher-yielding, hardier strains of rice to feed the rapidly growing global population. In the work he presented, his team sequenced two parents of a common hybrid strain, generating the highest-quality publicly available assemblies of Indica rice ever produced. His data illustrated how long-read PacBio sequencing allows for excellent contiguity in assemblies, with one strain represented in just 19 contigs and a strain featuring eight complete chromosomes, including centromeres. Wing also included data from a third genome being sequenced with the Sequel System.
Other speakers focused on animals or insects. Rebecca Johnson from the Australian Museum Research Institute reported on the de novo genome assembly of a koala, a genome about 3.6 Gb in size. The work was undertaken due to conservation concerns for these marsupials, which have many biologically interesting features such as a gestation period of just 35 days. With SMRT Sequencing, her team produced what Johnson called the best marsupial assembly to date; analysis showed that only 5% of BUSCO genes were not represented. The assembly allowed her team to study lactation-related genes that are important for koala development, as well as immune elements (for example, koalas have been found to harbor novel antimicrobials that show effectiveness against drug-resistant bacteria). Johnson’s genome work continues, and she told attendees that she fully expects to achieve a chromosome-level assembly for the animal.
Richard Kuo from the Roslin Institute spoke about using the Iso-Seq method to study brain and embryo tissues from chicken. He said this approach addresses the limitations of other gene expression methods that skip long non-coding RNA (lncRNAs), the full universe of isoforms, and more. By producing full-length transcripts from the transcription start site to the transcription end site without any assembly required, SMRT Sequencing is ideal for characterizing the transcriptome. With the chicken project, Kuo evaluated the importance of protocols such as normalization and using 5’ cap selection, both of which provided richer data sets. Kuo told attendees that using the Iso-Seq method allows scientists to immediately leapfrog to the best available annotations, producing more information on the transcriptome than ever.
Rockefeller University’s Erich Jarvis presented an update on his work with bird genomes for the B10K project. He offered a comparison of assembly techniques for hummingbird, which has been analyzed with everything from short-read sequencers to genome mapping tools. The PacBio-powered assemblies consistently ranked as the highest quality, with the fewest contigs and best accuracy. He also included a look at four genes associated with vocal learning, which were complete in the PacBio assembly, showing the importance of incorporating long reads into the assembly.
Representing the insect front, Ben Matthews from Rockefeller University reported on a genome assembly project for Aedes aegypti, the common vector for Zika, dengue, and yellow fever. Noting that mosquitoes are believed to be the most deadly creature in the world, he said that a clear and complete understanding of their genomes will be essential to thwarting public health threats. The original assembly of the Aedes aegypti genome is nine years old and hasn’t been improved much, so Matthews and his colleagues turned to SMRT Sequencing for a new version of the same strain. The effort yielded a much better assembly, boosting the contig N50 from 83 kb to about 2 Mb. Further analysis showed that 7,500 transcripts map to the new assembly but not to the old, indicating a significant amount of new gene content. Matthews anticipates that this new assembly will replace the old one for the entire mosquito research community, and serve as an important resource for understanding resistance to repellants and designing guide RNAs for CRISPR genome modification to constrain population growth.
Many thanks to all of our workshop speakers for a great event!
Scientists from the University of Washington and McDonnell Genome Institute recently reported in Genome Research the results of an in-depth assessment of structural variation in the human genome using SMRT Sequencing technology. They found far more variation than expected and suggest using this approach to establish a comprehensive database of structural variants that would aid future studies.
“Discovery and genotyping of structural variation from long-read haploid genome sequence data” comes from lead author John Huddleston, senior author Evan Eichler, and collaborators. The team fully sequenced two haploid human cell lines (CHM1 and CHM13) with SMRT Sequencing to greater than 60-fold coverage each. Then, using an assembly-based approach called SMRT-SV, they mined the data for structural variants ranging in size from as small as 2 bp to as large as 28 kb. “While our understanding of single-nucleotide variants (SNVs) is beginning to approach nearly complete sensitivity for the euchromatic portion of the genome, structural variants or SVs … have fared far worse because of their stronger association with repetitive DNA,” the authors report. “We sought to build a verifiable gold standard for human genetic variation by first eliminating the complexity of diploidy and then applying an alternate sequencing technology that improves sensitivity over repetitive regions of the human genome.”
The team conducted a thorough investigation of insertions, deletions, and other types of structural variation. Across the board, they discovered significantly more variation than anticipated, detecting five times as many indels (>7 bp) and structural variants (<1 kb) as other methods could find. “The theoretical amount of genetic variation in a single human diploid genome far exceeds expectations established by previous whole-genome studies,” they write. “Although this represents only a fraction of variant sites between two haplotypes, this missing variation accounts for most of the variant base pairs between two human genomes.”
Much of the variation was associated with DNA that was repetitive, GC-rich, or low complexity — all characteristics known to challenge short-read sequencers. “Long-read sequence technology can access these regions because alignments are sufficiently anchored within the flanks,” Huddleston et al. report.
The SMRT-SV protocol resolved more than 460,000 structural variants, nearly 90% of which have been missed even in highly regarded initiatives such as the 1000 Genomes Project. The approach was validated by targeting specific SVs for follow-up analysis as well as by studying 30 other human genomes to confirm the presence of these variants. Results indicate that “the majority of missed variants we discovered are common variants in the human population,” the authors report.
To learn more about the quest for structural variation going back to the Human Genome Project, check out this recent interview with Evan Eichler from Mendelspod.
A new Nature paper from scientists at the Wellcome Trust Sanger Institute and other institutions delves into two Plasmodium genomes and reveals novel information about how these parasites have evolved. SMRT Sequencing was used to generate a reference genome and high-quality draft assembly for the organisms, providing a clear picture of species that have previously been difficult to characterize.
From lead author Gavin Rutledge, senior author Thomas Otto, and collaborators, “Plasmodium malariae and P. ovale genomes provide insights into malaria parasite evolution” reports that prior studies have “been hampered by a lack of genetic information” for species responsible for many malaria infections.
The scientists deployed SMRT Sequencing to generate a reference-grade genome assembly for P. malariae; this new resource, they report, surpasses the quality of previous draft genomes. Measured in contiguity, the new assembly has just 63 scaffolds compared to more than 7,000 in previous drafts, with an N50 of 2.3 Mb, a dramatic improvement over the prior N50 of 6.4 kb. They also produced high-quality draft genomes for P. ovale and a parasite known as “P. malariae-like.”
These assemblies allowed for a better understanding of the phylogeny of these species, in some cases altering previous versions of the Plasmodium family tree. One such change indicates that the malaria parasite that infects rodents is more similar to P. ovale than to other strains that infect humans, suggesting that an ancestral host must have abandoned primates in favor of rodents. The scientists report that genes related to the invasion process evolve rapidly, contributing to host-specific adaptations. In addition, they write, “The relative dating of speciation events suggests that the move between non-human primates and humans occurred at approximately the same time in two well-separated lineages, suggesting that a common historical event may have promoted host switching and speciation in Plasmodium at the time.”
The team also found previously undiscovered gene content. “The most notable difference in the subtelomeres of P. malariae is the presence of two large gene families that were not apparent in earlier partial genome data,” they note. Proteins associated with these genes are thought to be important for binding to or entering red blood cells.
While this study represents an important advance in elucidating the parasite family with the new reference assembly, the authors call for continued work on this front. “Owing to the importance of rapidly evolving multigene families and genome structure, high quality genomes for all human infective species of Plasmodium are desperately needed,” they conclude.
We were delighted to have so many ASHG attendees join our workshop, titled “Discovering and Targeting Causative Variation Underlying Human Genetic Disease Using SMRT Sequencing.” If you missed it, check out the video recordings, or read our summary below.
The event featured three impressive customer presentations, beginning with Euan Ashley from Stanford University. In his presentation titled “Towards Precision Medicine,” He started off by acknowledging that “genomic medicine is here” and described how genomes and exomes are now routinely sequenced on a daily basis, with impressive genetic discovery results. For patients with rare and undiagnosed disease, Ashley reported that current sequencing efforts now solve approximately 30% of patient cases — a real improvement over years past. Despite these gains, he said there is still a need for new approaches to make sense of the remaining 70%. “If you can’t see it, you can’t call it,” “the genome is complex,” and “repeat tracts cause disease.” These are a few of the reasons Ashley mentioned that explain why current short-read NGS methods sometimes fail to achieve the same high accuracy levels establish by Sanger sequencing for calling known causal pathogenic variants in Precision Medicine studies.
Ashley then told attendees about the unique attributes of PacBio SMRT Sequencing that make it very well suited to address these challenges. He described how longer read lengths expand both the size and types of variants that can be studied. Toward that end, he said that accurately calling structural variants (SVs) is a major need, since short-read technologies work well for single nucleotide variants (SNVs) but not for longer variants. “Long read approaches reveal previously unseen structural variation,” he said, noting that this information is critical for research into repeat expansion disorders and other diseases tied to such variants.
In the first of its kind study, Ashley described the results of a new low-fold long-read WGS method using the PacBio Sequel System (recently published bioRxiv pre-print of this study). He reported sequencing a translational research sample from his clinic to an average depth of 8.6-fold coverage. Following mapping and genome-wide SV calling, Ashley said that SMRT Sequencing then allowed his team to identify six novel SVs occurring in OMIM genes in an individual with complex and varied symptoms. One gene was associated with Carney syndrome, which was a match for the person’s physiology and was later validated. He also called for the establishment of a massive SV resource, something like the ExAc repository, that would allow scientists to understand common and rare SVs and further facilitate discovery of causative pathogenic SVs in Precision Medicine studies. In separate work, his team used the Iso-Seq method with personalized haplotyping to determine how precision gene silencing could be used for people with hypertrophic cardiomyopathy.
Our next speaker, Melissa Laird Smith from the Icahn School of Medicine at Mount Sinai, spoke about “SMRT Sequencing as a Translational Research Tool to Investigate Germline, Somatic and Infectious Diseases.” In a fascinating and wide-ranging talk, she offered examples of how the Sequel System has been deployed at Mount Sinai for applications including pharmacogenomics, immune profiling, cancer profiling, and more. She cited Stuart Scott’s CYP2D6 work, which involves amplicon sequencing to understand an individual’s drug metabolism profile. Laird Smith said the team can now multiplex 384 samples on each SMRT Cell for 100-fold coverage on the Sequel System. She also presented work on Fabry’s disease spectrum, for which amplicon sequencing resolves phased mutations to make sense of the X-linked disease. In a personalized cancer therapy pipeline, low-coverage PacBio sequencing is used to validate somatic variants found in tumors. Finally, Laird Smith talked about immune profiling, where SMRT Sequencing of full-length single molecule VDJ sequences provides complete, accurate contigs of this highly variable and complex region.
In the final customer presentation, Michael Lutz from the Duke University Medical Center gave a talk entitled “Identification and Characterization of Informative Genetic Structural Variants for Neurodegenerative Diseases.” Focusing on Alzheimer’s disease, ALS, and Lewy body dementia, Lutz spoke about a recently published software tool that can be used in a pipeline with SMRT Sequencing data to find structural variant biomarkers. His team is particularly interested in short sequence repeats and short tandem repeats, which have already been implicated in neurodegenerative disease. In one example, they used SMRT Sequencing to characterize haplotypes of the low-complexity SNCA gene that could explain the differences between traditional Alzheimer’s and the Lewy body form of Alzheimer’s. In another project, Lutz used SMRT Sequencing to phase haplotypes across APOE alleles — something that wasn’t possible with short-read data — for insight into Alzheimer’s patterns of onset, severity, and more.
Lutz spoke in place of Allen Roses, who was originally scheduled to participate in the workshop but sadly passed away last month. Our CSO Jonas Korlach began the workshop with a tribute to the human genetics visionary, whose legacy in neurodegenerative diseases will be felt for decades to come.
Korlach also spoke about recent SMRT Sequencing updates, such as the Integrative Genomics Viewer update that is now optimized for PacBio data and the recently announced Sequel System chemistry (v 1.2.1) release. It offers a 50-fold reduction in DNA input requirements for 20 kb and 30 kb libraries, with SMRT Cell output ranging from 4 Gb to 8 Gb. He noted the recent data release of structural variation detected in the NA12878 genome, including many more insertions and deletions than short-read-based technologies were able to find. Korlach also congratulated the community of SMRT Sequencing users for their impressive publication rate. In 2016 alone there were more than 1,000 papers published citing the technology! Those include de novo human genomes, deep dives into structural variation and gene expression, plenty of novel findings in human genetic variation, and much more.
Thanks to everyone who participated in the event, and also to the great scientists everywhere who are applying SMRT Sequencing to any number of complex problems to reveal new information about our species and the world around us.
Looks like the sun will be shining on the annual Plant and Animal Genome (PAG) conference next week (despite the recent stormy weather in CA). We’re excited to be a part of the event which is always a great forum for cutting-edge scientific projects, new ways to apply technology, and networking with leaders in the plant and animal realm.
The 25th annual PAG conference will take place January 14-18 in San Diego and SMRT Sequencing will be featured in a variety of activities throughout the event. Visit us at booth #418 to learn more about SMRT Sequencing, the Sequel System, and our latest chemistry release. Attendees can also sign up for daily ‘expert hours’ featuring educational presentations on the Iso-Seq method, sample preparation, and data analysis.
As usual, we’ll be hosting a workshop at the meeting. “SMRT Sequencing for Complete Genomes” will run from 12:50 pm to 3:00 pm on Monday, January 16th in the conference hotel’s San Diego Ballroom. Speakers include Rebecca Johnson from the Australian Museum Research Institute, Erich Jarvis and Ben Matthews from Rockefeller University, the University of Arizona’s Rod Wing, Richard Kuo from the Roslin Institute and our own Jonas Korlach. Be sure to reserve your seat ahead of time or sign up to get the workshop recording if you can’t attend.
In addition, PAG attendees and any other scientists are welcome to join our SMRT Informatics Developers Conference on Wednesday, January 18th, from 12:00 pm to 4:30 pm (reserve your seat). It will also take place in the Town and Country Hotel, in the Sheffield/Hampton Ballroom. This collaborative event will aim to develop and improve data analysis tools for SMRT Sequencing data, with a focus on de novo genome assembly and Iso-Seq full-length transcript sequencing. Past events have been huge hits and have facilitated great advances in the analysis community. Speakers include:
- Sergey Koren, National Institutes of Health
- Jason Chin, PacBio
- Ben Rosen, United States Department of Agriculture
- Shaun Jackman, University of British Columbia
- Roberto Lleras, PacBio
- Kin Fai Au, University of Iowa
- Richard Kuo, Roslin Institute, University of Edinburgh
- Rachael Workman, Johns Hopkins University
- Richard Hall, PacBio
Lastly, we’re pleased to announce our fourth annual Plant and Animal SMRT Grant Program, which encourages scientists to tell us about their ‘most interesting genome in the world’ for a chance to win sequencing on the Sequel System. Entries are due by Jan. 31.
We hope to see you in San Diego!
We are pleased to announce the launch of a new version of our chemistry, SMRT Cells, and software for the Sequel System. The V4 software, V2 chemistry, and SMRT Cells tuned for the new sequencing chemistry kits will be available on January 23rd.
These new releases allow the system to achieve mean read lengths of 10-18 kb, with half of the data in reads >20 kb, and throughput of 5-8 Gb. This enhancement improves results for important applications such as structural variant detection, targeted sequencing, metagenomics, minor variant detection, and isoform sequencing. The software release includes updates to the base calling algorithm that increase accuracy, as well as new features designed for clinical research applications. In addition to the performance improvements, the Sequel System is now capable of loading 80 kb sequencing libraries.
This release improves users’ ability to perform low-fold structural variant detection and key targeted sequencing applications. For structural variant detection, they can now accomplish the same or better quality of results for structural variant analysis using, on average, half the number of SMRT Cells compared to the previously available chemistry. Long reads provided by the new chemistry also enable the detection of larger-scale structural variants; in particular, there is a 3-fold increase in sensitivity of insertions over 5 kb. For targeted sequencing, the new chemistry and software give users more flexibility. For example, for minor variant detection, customers can either gain detection sensitivity or reduce cost per sample with increased sample multiplexing.
In a statement for the release, Kevin Corcoran, our Senior Vice President of Market Development, said, “This release is part of our continued commitment to increasing the performance of the Sequel System, and we are very pleased with the data we are seeing both internally at our beta-test sites.”
We’re excited to be participating again in the Precision Medicine World Conference (PMWC), an independent conference series founded in 2009 and co-hosted with Stanford Health Care, UCSF, Intermountain Health, Duke University, and Duke Health. Considered to be the preeminent precision medicine conference, it attracts recognized leaders, top global researchers and medical professionals, and innovators across the healthcare and biotechnology sectors.
PMWC provides an exceptional forum for the exchange of information about the latest advances in technology (e.g. DNA sequencing technology), in clinical implementation (e.g. cancer and beyond), research, and all aspects related to the regulatory and reimbursement sectors.
From January 23rd to the 25th, some 1,300 attendees will descend on the Computer History Museum in Mountain View, Calif. The event will kick off the evening of January 22nd with a reception honoring Jennifer Doudna, who will receive the Luminary Award for her groundbreaking work on CRISPR/CAS-9 genome editing technology, and James Allison, who will receive the Pioneer Award for his work on cancer immunotherapy through the discovery of the immune checkpoint blockade.
PacBio founder Stephen Turner will be speaking at PMWC about the advancements made possible by SMRT Sequencing. If you’ll be attending the meeting, be sure to check out the session “Advancing the Clinic with Emerging NGS Technologies” on January 25th at 10:30 am.
Registration for the meeting is still open. Use the code PacBio by January 15th to receive a 5% discount. We hope to see you there!
A new article in Drug Discovery & Development from Stuart Scott and Yao Yang at the Icahn School of Medicine at Mount Sinai offers a compelling look at enhanced analysis of the CYP2D6 gene. The article, “Long-Read CYP2D6 Sequencing Enables Full Gene Characterization and Novel Allele Discovery,” describes the Mount Sinai team’s efforts to provide better resolution of this region with SMRT Sequencing.
The gene is important for drug development because the enzyme it encodes is involved in metabolizing nearly a quarter of all drugs frequently prescribed today. According to the article, “Variants within this gene can help predict how patients respond to medications ranging from painkillers to antipsychotics, which makes CYP2D6 an essential gene to consider when implementing pharmacogenomics into clinical care.”
The scientists turned to PacBio long-read sequencing to characterize this genomic region because the gene includes many deletions, duplications, and other structural variants which would be challenging to capture using other technologies. To date there are more than 100 known variations of the allele and new ones are constantly discovered. Most methods for querying the gene are limited by the number of alleles they can recognize or by their inability to capture complex structural variations and DNA duplications. Implementing SMRT Sequencing, however, allowed the scientists to detect all CYP2D6 alleles with great accuracy as well as to phase alleles.
While validating the new SMRT Sequencing pipeline for CYP2D6 analysis, the scientists discovered three novel alleles. “In fact, ~20% of the samples used to evaluate CYP2D6 SMRT sequencing were revised to either a non-genotyped or novel star (*) allele,” Scott and Yang write, “which highlights how long-read sequencing can reveal previously unrecognized variation in well-studied genes and specimens that were previously tested by other technologies.”
The authors conclude that establishing this approach for CYP2D6 analysis could address some of the consistency problems that have been seen in previous pharmacogenomics studies, which likely have been caused by different allele frequencies among patient populations. They recommend that long-read CYP2D6 analysis should be used for diverse populations in clinical trials to improve the community’s understanding of the natural variation in this gene across ethnicities. This information will be essential for developing better prescription and dosing guidelines for all patients.
A new publication from scientists at the University of California, Davis, and the USDA Agricultural Research Service presents important findings about a fungus that threatens global grape production. As part of the project, the team used SMRT Sequencing to generate a new assembly of the fungal genome, resulting in a more complete assembly than a previous short-read attempt.
“Condition-dependent co-regulation of genomic clusters of virulence factors in the grapevine trunk pathogen Neofusicoccum parvum,” published in Molecular Plant Pathology, comes from lead author Mélanie Massonnet, senior author Dario Cantu, and collaborators. The team was eager to determine why the wood-infecting Neofusicoccum parvum has such pathogenicity and virulence.
The scientists had previously produced a genome assembly for the fungus using short-read data, but it was highly fragmented across more than 1,800 contigs. By contrast, the 43.7 Mb PacBio assembly they generated is represented in only 28 contigs, including one that fully covers the mitochondrial genome. More than half of the contigs had telomeric repeats at both ends, “suggesting that these contigs encompass complete chromosomes, telomere-to-telomere,” the authors write. An analysis found the assembly’s accuracy rate to be 99.99976%.
To understand the differences between the new long-read assembly and the existing short-read one, the team used nucmer and Assemblytics. These analyses showed that repeat reconstruction had been a problem in the short-read assembly, where these regions were consistently reported as shorter than they were revealed to be by PacBio log-read sequencing. More than 180 sites — for a total of 113 kb — were completely missing from the short-read assembly, and structural variation was less likely to be detected.
With this high-quality genome resource as a foundation, the scientists were able to delve into a detailed transcriptome analysis. “Co-expressed gene clusters were significantly enriched not only in genes associated with secondary metabolism, but also with cell wall degradation, suggesting that dynamic co-regulation of transcriptional networks contribute to multiple aspects of N. parvum virulence,” the scientists report. In the majority of these clusters, genes had common motifs in their promoter regions, suggesting that co-regulation is controlled by common transcription factors.
While these findings are important on their own, the scientists underscore the need for additional studies. “Understanding how functions that lead to colonization of certain cell types/tissues, and the corresponding fungal genes activated during subsequent degradation of such host tissues, may help us understand mechanism(s) of cultivar resistance and interactions within the trunk-pathogen community,” they conclude.
To hear more great research from plant and animal scientists using SMRT Sequencing, sign up to attend or receive the recording of our PAG 2017 workshop.
We’re pleased to announce the winner of this year’s SMRT Grant, which launched during the American Society for Microbiology annual meeting this summer. The grant program, co-sponsored by PacBio and the Institute for Genome Sciences (IGS), was very competitive, with over 100 submitted proposals. From this broad range of entries, our judges faced quite a task choosing just one recipient for the grant.
Congratulations to Jessica Sieber from the University of Minnesota Duluth, who impressed reviewers with her proposal, “Metagenomic analysis of the gut microbiota of the 13-lined ground squirrel, a model fat storing hibernator.”
Ground squirrels have been models for human health conditions from diabetes and obesity to longevity and hypothermia. These particular squirrels are scientifically interesting because they almost triple their weight before going into a six-month hibernation, during which they consume nothing. Sieber notes that the hibernation process involves reducing the squirrel’s body temperature to 4 degrees Celsius. While that should be a challenging environment for the animal’s gut microbes, in fact they appear to thrive and may be responsible for folate production to protect the squirrel’s brain during hibernation. A deeper understanding of the role these microbiota play in this process may have downstream implications for human health.
Sieber’s project involves using SMRT Sequencing to produce a high-resolution picture of these gut microbial communities, including how they withstand the cold hibernation temperature. We look forward to learning about the new insights she discovers as a result of this grant!
Thank you to all of the submitters who participated in the grant competition. We look forward to a number of exciting new projects in the coming months!
We recently co-sponsored a webinar with Springer Nature, and if you missed it live, you can now register to watch the recording. Moderated by Nature Publishing Group’s Jayshan Carpen, the webinar is entitled “Reveal hidden genetic variation by combining long-read target capture with SMRT Sequencing” and features several terrific speakers. We’d like to thank Tetsuo Ashizawa from Houston Methodist Research Institute, Melissa Laird Smith from the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai, and our own Meredith Ashby for taking the time to present fascinating data and answer audience questions.
The webinar kicks off with a talk from Ashby, a scientist at PacBio, who discusses the use of targeted capture approaches with long-read SMRT Sequencing. Noting that these read lengths are necessary for spanning large genomic elements such as insertions and deletions, she shares two case studies to illustrate the process and findings. In one, scientists in Australia used long-read sequencing with Roche NimbleGen capture probes to study transcripts of relaxin genes in samples from both cell lines and prostate cancer patients. They discovered new isoforms, including two fusion genes that could easily be missed with other methods. In the second project, scientists at Baylor College of Medicine developed a large-insert targeted sequencing protocol (called PacBio-LITS) to study Potocki-Lupski syndrome, a rare disorder caused by a duplication event on chromosome 17. Their work uncovered possible rearrangement mechanisms that suggest opportunity for even more long-read-powered discoveries in this area.
Next, Laird Smith, the Icahn Institute’s assistant director of technology development, presents data from long-read studies of the IGH locus, a remarkably complex region that encodes the VDJ gene segments which are recombined during the adaptive immune response. Nearly half of the IGH region falls in a segmental duplication, making it quite difficult to sequence. Her team is using long-read sequencing to generate fully resolved haplotypes of the IGH locus from ethnically diverse individuals. They have also developed an oligo-based enrichment and long-read sequencing approach that should make it more straightforward to interrogate this challenging genomic region and generate results that include large structural variants.
Finally, Ashizawa, who is Director of the Neuroscience Research Program at Houston Methodist, speaks about ATTCT repeat expansions in SCA10, which cause a form of spinocerebellar ataxia. He describes a novel enrichment method that uses CRISPR-Cas9 to target the repeat expansion, without the need for amplification. Paired with long-read sequencing, this approach has allowed his team to span extremely long repeat regions while identifying interruption sequence motifs associated with distinct, epilepsy-linked clinical phenotypes. Based on detailed work with family samples, Ashizawa was also able to trace the likely origin of the initial SCA10 mutation, which seems to have occurred first in Asia.
The speakers also respond to audience questions about sequencing GC-rich regions, sample preparation details, read length statistics, and more. This Q&A session nicely expanded on the earlier examples to show how SMRT Sequencing can be combined with capture techniques for an economical means of querying specific regions in the human genome.