This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.
Search PacBio’s Blog
To understand the epigenetic regulation of brain function and behavior, scientists are turning to ants. To understand the ants, they are applying the accurate, long reads of SMRT Sequencing.
While the genetic code of many types of ant have been combed through thanks to several genomes assembled through whole-genome shotgun sequencing, there have only been brief glimpses and guesses regarding gene regulation. Existing assemblies are highly fragmented drafts, making epigenetic studies nearly impossible.
Eager to determine the epigenetic changes responsible for phenotypic and behavioral plasticity in Camponotus floridanus and Harpegnathos saltator ant species, a team of researchers from the Epigenetics Institute of the University of Pennsylvania’s Perelman School of Medicine used SMRT Sequencing to de novo assemble the two genomes, which had been previously sequenced using short reads.
Improved genome continuity led to comprehensive annotations of both protein-coding and non-coding RNAs, and answered some questions about the differential gene expression that allows some worker ants to become acting queens in their colonies.
In a paper published in Cell Reports, first author Emily J. Shields, lead author Roberto Bonasio and their collaborators described how they solved some mechanistic mysteries through PacBio long-read sequencing.
Harpegnathos worker ants are characterized by their unique reproductive and brain plasticity that, in the absence of a queen, allows some of them to transition to a queen-like phenotypic status called “gamergate,” which is accompanied by major changes in brain gene expression.
Previous work by the group in Harpegnathos and in the more conventional Florida carpenter ant Camponotus floridanus had suggested that epigenetic pathways, including those that control histone modifications and DNA methylation, might be responsible for differential deployment of caste-specific traits; pharmacological and molecular manipulation of histone acetylation has been shown to affect caste-specific behavior in Camponotus ants, suggesting a direct role for epigenetics in their social behavior.
“Although the molecular mechanisms by which environmental and developmental cues are converted into epigenetic information on chromatin remain subject of intense investigation, it has become clear that non-coding RNAs play an important role in mediating this flow of information,” the authors write.
In particular, they were interested in long non-coding RNAs (lncRNAs), which are transcripts longer than 200 base pairs that are not translated into proteins. Annotated extensively in human, mouse, bees, zebrafish, Drosophila melanogaster and Caenorhabditis elegans, no comprehensive annotation of lncRNAs in ants has been reported, limiting the reach of ant species as model organisms.
As many cis regulatory and epigenomic mechanisms take place at short-to-medium range (10–100 kb), the scientists wanted to span large repetitive regions and create longer gap-free regions of sequence (i.e., longer contigs) than those produced by previous short-read assemblies, so they turned to PacBio.
“Long PacBio reads allowed us to assemble across longer repeats than previously possible, greatly improving the contiguity of the Harpegnathos and Camponotus genomes,” the authors write.
They sequenced genomic DNA isolated from Harpegnathos and Camponotus workers using SMRT Sequencing, obtaining a sequence coverage of 70-fold for Harpegnathos and 53-fold for Camponotus, with longer contigs (on average more than 30-fold larger than a prior 2010 assembly) and scaffolds with fewer gaps. The assemblies have scaffold N50 sizes larger than 1 Mb, and gaps smaller than in all other insect genomes available on NCBI at the time of writing.
The UPenn team annotated protein-coding genes using a combination of methods, and they discovered more than 300 high-confidence lncRNAs, several of which displayed developmental-, brain-, or caste-specific expression patterns, suggesting important roles in development and brain function.
They were also able to identify some biologically relevant genes missing in the older versions of the genome assemblies. Most notably, a Gp-9-like gene previously unannotated in the Harpegnathos genome was found to be differentially expressed in worker brains compared to gamergates. Mass spectrometry analyses identified two peptides mapping exactly to the newly predicted sequence, confirming the accuracy of the updated gene model.
“This gene was not previously detected as differentially expressed, likely because its closest homolog in the old annotation contains many sequence disparities, reducing the RNA-seq coverage mapped to this gene in both castes,” the authors write.
The UPenn team will use the new assemblies to direct their future explorations of neuroepigenetics in ants. They also hope the work will have a wider impact on the field.
“Our greatly improved Harpegnathos and Camponotus assemblies deliver several critical benefits to further development of these ant species into molecular model organisms,” the authors conclude. “These improvements… will lead to greater understanding of the genetic and epigenetic factors that underlie the behavior of these social insects.”
When humans are infected with the Marburg virus, the result is often lethal, with hemorrhagic fever and other symptoms similar to Ebola. When bats are infected, the result is…. nothing. The tiny mammals remain asymptomatic.
In order to crack this antiviral mystery, a multi-institutional team of scientists sequenced, assembled and analyzed the genome of the bat species Rousettus aegyptiacus, a natural reservoir of Marburg virus and the only known reservoir for any filovirus.
Their findings contradicted previous hypotheses about bat antiviral immunity, which assumed that bats had enhanced antiviral defenses, controlling viral replication early in infection, and developing effective adaptive immune responses as a result. The new analysis suggests that an inhibitory immune state may exist instead.
Led by Boston University researchers Thomas B. Kepler and Stephanie S. Pavlovich, with Gustavo Palacios of the United States Army Research Institute of Infectious Diseases and others from Columbia University, the University of Nebraska, the NIH’s National Center for Biotechnology Information, and the Viral Special Pathogens Branch of the Centers for Disease Control and Prevention, the study in Cell describes several differences between immune responses in bats and humans.
Among them was an expanded and diversified KLRC/KLRD family of natural killer cell receptors, MHC class I genes, and type I interferons in the bats, which dramatically differ from their functional counterparts in other mammals.
“Such concerted evolution of key components of bat immunity is strongly suggestive of novel modes of antiviral defense,” the authors write.
The stark difference between bat and primate antiviral responses has long motivated scientists to characterize the genes involved in the immune system of bats, but previous efforts relied on genomes generated with low-coverage sequencing or with only short-read sequencing technologies. Such assemblies limit the ability to resolve repetitive regions of the genome where important immune gene loci reside, the authors note. So they turned to PacBio long-read sequencing, combined with paired-end short-reads, to generate a high-quality annotated genome for the Egyptian rousette bat. The result is a 1.91 Gb Raegyp2.0 genome, the most contiguous bat genome available.
They used the genome to study two large classes of immune genes: natural killer (NK) cell receptors and type I interferons (IFNs). Previous studies have reported the absence of canonical NK cell receptors in bat genomes, and others have suggested that significant differences exist in type I IFNs between bats and humans. Diving deeper, the new study found an unusual expansion of the KLRC (NKG2) and KLRD (CD94) gene families in R. aegyptiacus relative to other species, “showing genomic evidence of unique features and expression of these receptors that may result in a net inhibitory balance within bat NK cells.”
“The expansion of NK cell receptors is matched by an expansion of potential MHC class I ligands, which are distributed both within and, surprisingly, outside the canonical MHC loci,” the authors note.
They also observed that the type I IFN locus is considerably expanded and diversified in R. aegyptiacus, with members of the IFN-u subfamily being induced after viral infection and showing antiviral activity.
“All these features strengthen the notion of the unique biology of bats and suggest the existence of a distinct immunomodulatory mechanism used to control viral infection,” the authors conclude.
“Our findings are consistent with the hypothesis that certain key components of the immune system in bats have coevolved with viruses toward a state of respective tolerance and avirulence, although tolerance is likely not the only mechanism at play.”
The team notes that definitive tests of their hypotheses may be possible with the development of further experimental reagents for cytometry and biochemical intervention, and that such reagents are being developed now with information made available by the completed genome project.
And while the genome for R. aegyptiacus is providing useful information about how bats resist viral infections, it is just one species of interest to scientists who would like to better understand the genetics of bats in order to shed light on human and ecological biology. The Bat1K initiative is an effort by more than 140 scientists around the world to decode the genomes of all 1,300 species of bats using SMRT Sequencing and other technologies.
The first reference genome for maize variety B73, completed in 2009, was a major milestone, and an improved version released by Cold Spring Harbor Laboratory scientists in 2017 provided a deeper dive into the genetics of the complex crop. Yet even this new robust reference is not enough for Kelly Dawe, Doreen Ware and Matt Hufford, who have taken up another ambitious project: creating a 26-line pangenome reference collection in just two years.
“Maize is not only an important crop, but an important study species for answering basic questions about how plants grow and adapt to different environments,” says Ware, a computational biologist at USDA and Cold Spring Harbor Laboratory.
Interestingly, the genome differs significantly between individuals. A study comparing genome segments associated with kernel color from two inbred lines revealed that 12 percent of the gene content was not shared – that’s much more diversity within the species than between humans and chimpanzees, which exhibit more than 98 percent sequence similarity. The new project will create multiple reference genomes to reflect this diversity.
“By relying on a single type specimen as the sequence reference for most of the genetic information in maize, we may be missing much of the highly valuable natural variation in maize,” Ware says.
Beyond B73, the most extensively researched maize lines are the core set of 25 inbreds known as the NAM founder lines, which represent a broad cross section of modern maize diversity. SMRT Sequencing and BioNano optical mapping, which were essential in the creation of the groundbreaking 2017 B73 maize reference, will be used in the new $2.8 million National Science Foundation-funded project led by Dawe at the University of Georgia. They will create comprehensive, high-quality assemblies of these 25 inbreds, plus an additional line containing abnormal chromosome 10.
Plant genomes are notoriously difficult to sequence, and maize is particularly challenging because the vast majority of its 2.3 Gb diploid genome — a staggering 85 percent — is made up of highly repetitive transposable elements that other types of sequencing can’t address. Understanding these regulatory and structural elements is crucial to modern breeding efforts that aim to improve productivity across marginal environments and under changing climate.
“The sequenced lines will include varieties from both tropical and temperate regions, and their sequences should help us understand how corn has adapted to these different environments,” said Hufford, a co-principal investigator on the project and assistant professor at Iowa State University. “Understanding the ways corn adapts can facilitate development of lines for novel conditions.”
PacBio Sequencing will be essential as the team assesses the role of structural variation such as presence-absence and copy number variation in the determination of agronomic traits, Ware says.
The assemblies, along with information about the genes and their expression patterns, will be cataloged and made available to the public through her Gramene.org data resource.
“To go from a single reference to a broad perspective on the entire genetic repertoire of genes and gene expression patterns will be a major step forward in how we approach genome analysis in crops,” said Dawe, Distinguished Research Professor in UGA’s Franklin College of Arts and Sciences department of genetics and principal investigator on the grant. “It’s something that has not happened for any crop at this scale.”
Read about Doreen Ware’s original comprehensive maize genome project and about efforts at Corteva Agriscience™, Agriculture Division of DowDupont™ (formerly DuPont Pioneer) to create their own multiple maize reference library.
The PacBio team was honored to attend an excellent Keystone Symposium in Hannover, Germany recently. The event, “One Million Genomes: From Discovery to Health,” offered a rare look at large-scale human genome projects, with many top-notch speakers.
The meeting featured speakers from many national genomics efforts, including China, Estonia, Israel, the UK, and the US. Each of these individual national efforts is essential to overcome the representation bias seen in human genome databases today. Underrepresented groups are currently less likely to get actionable results from clinical genetic tests, a situation that threatens to confer the benefits of precision medicine disproportionately to people of European ancestry. Many of the new population projects have incorporated SMRT Sequencing, either to produce a reference-grade de novo assembly or to generate structural variation data about participants of diverse ancestry.
A highlight of the meeting for us was the closing talk from Jeong-Sun Seo of Seoul National University Bundang Hospital and Macrogen in South Korea. Professor Seo discussed the GenomeAsia 100K Project and Asian reference genomes. Seo reported on three de novo, reference-grade Asian genomes – Chinese HX1, Japanese JRGv1, and Korean AK1 – all generated with SMRT Sequencing. These genomes enable more accurate re-sequencing of 4.5 billion Asian people, which Seo explained is useful to detect medically relevant variants in this population. At the time of publication, the AK1 genome was the most contiguous personal human genome ever reported, with a contig N50 over 18 Mb.
Seo also presented initial results from a new project that is using PacBio sequencing to detect structural variants in 300 Mongolian individuals. He observed that more of the structural variants in AK1 were detected with PacBio sequencing of the first 30 Mongolian individuals than had been seen in 2,504 individuals from the 1000 Genomes Project, which relied on short-read sequencing. This likely reflects two factors: Asian-specific variation and the greatly increased sensitivity of PacBio sequencing for structural variants.
Also at the event, PacBio scientist Ralph Vogelsang presented a poster about population-scale discovery of structural variants that showed how SMRT Sequencing is uniquely suited to detecting large and often complex variants, which are known to cause disease but are frequently missed by short-read sequencing approaches. The poster also includes a helpful map of ongoing population-focused genome projects.
Congratulations to all of the scientists around the world contributing to these important efforts. We look forward to seeing the many new discoveries they enable!
When German diver Joachim Kreiselmaier reached the deepest parts of the Danube-Aach cave system, he couldn’t believe his eyes: a “strange fish,” with a pale body coloration and smaller eyes and larger nares and barbels than the loaches typically spotted nearby. He had discovered the first cavefish in Europe, and the northernmost in the world.
“This is spectacular, as it was believed that the Pleistocene glaciations prevented fish from colonizing subterranean habitats north of 41° latitude,” said ecologist Jasminca Behrmann-Godel of the Limnological Institute of the University of Konstanz, who examined the fish Kreiselmaier brought back to the surface. “Initial genetic studies, together with knowledge on the geological history of the region, indicate that the cave loach population is amazingly young — not older than 20,000 years.”
The mysteries of the new species, Cave barbatula, will now be investigated by Professor Dr. Arne Nolte of the University of Oldenburg, Germany, and Assistant Professor Dr. Fritz Sedlazeck from the Human Genome Sequencing Center at Baylor College of Medicine in Houston, Texas, as part of the 2018 Plant and Animal SMRT Grant.
The grant will provide Nolte and Sedlazeck with access to the PacBio Sequel System at GENEWIZ, as well as the materials needed and bioinformatics support to conduct comparative genomic sequencing on the newly discovered European cavefish.
“This grant enables us to establish the genome assembly of the European cavefish and identify genetic variants from its surface-water ancestors. We are fascinated by changes in the sensory system and pale pigmentation of the fish and we will compare its genomic makeup with the Mexican cavefish which is an important model organism in developmental biology,” Sedlazeck said. “The outcome of this study will enable us to understand the initial steps that lead to the evolution of cave animals and impact our understanding of how multiple phenotypes evolve among vertebrates.”
“The combination of PacBio’s powerful genomics platforms and GENEWIZ’s depth of experience in DNA and next-generation sequencing provides researchers like Drs. Nolte and Sedlazeck the technology and support they require to further their discoveries and understanding of the world around us,” added Dr. Ginger Zhou, Vice President of Global Next Generation Sequencing for GENEWIZ.
When it comes to bacteria, resistance is not always futile, or so we learned at the annual meeting of the American Society for Microbiology. One of our favorite events of the year, ASM Microbe was full of fun puns, giant pathogen dolls, and amazing science spanning basic molecular biology and physiology, antimicrobial agents and resistance, environment, ecology and evolution, and clinical and public health microbiology.
We invited attendees to get hands on at our booth — quite literally — and our giant interactive hands art piece was a big hit. Four of our Certified Service Providers — The University of Maryland Institute for Genome Sciences, GENEWIZ, Macrogen, and RTL Genomics — were also on hand to answer questions about our technology, which was further highlighted in a presentation by Principal Scientist Cheryl Heiner on “Single Chromosomal Genome Assemblies on the Sequel System with Circulomics High Molecular Weight DNA Extraction for Microbes,
Microbial multiplexing was a hot topic, and we were excited to share information about new tools we recently released to make it easier and less expensive to sequence microbial genomes on the Sequel System. The streamlined workflow — from library preparation to genome assembly — includes the release of two new 8-plex barcoded adapter kits specifically validated for multiplexing microbial genomes, a multiplexing calculator to ensure even coverage when pooling barcoded samples, streamlined de-multiplexing with SMRT Link v5.1.0, and optimized setting for microbial genome assembly with HGAP4.
Advances enabled by PacBio long-read sequencing technology were also showcased in more than 25 posters and presentations. Several posters focused on bioinformatics methods. Lee Katz’s cleverly titled “Kraken with Kalamari: Contamination Detection” poster attracted a lot of interest. Katz, a scientist with the CDC, described how leveraging a curated database of closed PacBio genomes with Kalamari enables better identification of bacteria when using the popular Kraken metagenomics tool for foodborne disease surveillance. Seok-Hwan Yoon of the Chun Lab presented another bioinformatics focused poster about the EzBioCloud project, a valuable taxonomy resource that has incorporated a staggering amount of PacBio data.
Other posters delved into understanding the evolution of human pathogens. CDC scientist Michael Weigand shared his findings about whooping cough resurgence in the United States in his talk, “Chromosome Rearrangement, Gene Amplification, and Insertion Sequence Elements in the Genome Evolution of Bordetella pertussis and the Genus Bordetella.” If you have an ASM Microbe login, check out a poster presentation of his work. Bordetella (of the birdie variety) was also the topic of an award-winning abstract, “Clonal Evolution and Genomic Diversification of Bordetella Hinzii in An Immunocompromised Host,” by Adrien Launay of the NIH during a rapid-fire presentation, “Microbe, Know Thy Host.” Three students from the University of New Hampshire working in the lab of Cheryl Whistler presented work on the biology and ecology of Vibrio. Sarah Eggert and Jillian Means investigated the the pathogenesis of vibrio parahaemolyticus, the leading cause of seafood-borne bacterial infections, and the spread of this pathogen into the Gulf of Maine. Jennifer Calawa won an Outstanding Abstract Award for her work on the comparative genomics of two closely related Vibrio fischeri strains with varying symbiotic capabilities. Attendees also learned more about the amazing new resource released by the UK’s National Collection of Type Cultures (NCTC), in partnership with the Wellcome Sanger Institute and PacBio: reference genome assemblies of 3,000 strains of important historic and modern bacteria, including some of the deadliest.
Of course, research interests at ASM microbe extend well beyond infectious disease. Anne Hatmaker of Tennessee’s Oak Ridge National Lab explored the potential of Megasphaera elsdenii — a bacterium found in the rumen of cattle — in the production of biofuels in her late-breaking abstract.
Finally, we used the meeting to launch what has also become an annual tradition: The Microbial Genomics SMRT Grant Program, made possible this year with the help of the University of Maryland’s Institute for Genome Sciences. You, too, can apply for the chance to win free SMRT Sequencing and bioinformatics analysis by submitting a 250-word proposal by July 20.
SMRT Sequencing is a go-to technology for generating reference-grade human genome assemblies, according to speakers in a recent webinar. In their presentations, Tina Graves-Lindsay from Washington University and Adam Ameur from Uppsala University spoke about diploid assemblies, discovering novel sequence, improving diversity of the current human reference genome, and much more. Finally, our own Paul Peluso gave a presentation that included the technology roadmap showing the next several upgrades for the Sequel System.
Graves-Lindsay began with efforts from the Genome Reference Consortium to “represent the full range of genetic diversity in humans,” a task requiring the generation of many population-specific references. She presented data from two haploid and 13 diploid genomes produced so far, and noted that two others are underway. For each reference, the scientists generate ~60-fold WGS coverage with PacBio, then assemble with FALCON. To assist with assembly QC and scaffolding, they merge the resulting sequence contigs with data from orthogonal long-range technologies such as Bionano Genomics or 10x Genomics. The approach has yielded impressive results: three of the 13 reference genomes achieved chromosome-level assembly; the highest contig N50 reached 26 Mb. To highlight the value of population-specific reference genomes, Graves-Lindsay offered some examples of regions that are not yet represented in the current human reference (GRCh38 build) – such as a 65 kb insertion found in a Yoruban assembly. To further resolve the diploid genome assemblies, her team is running FALCON-Unzip to generate haplotype-resolved contigs. These haplotigs better represent each of the maternal and paternal haplotypes for each genome, as opposed to a single collapsed contig sequence, and will serve as an allele-specific reference for the populations they represent.
Ameur’s talk focused on an effort that came out of SweGen, a population sequencing effort that covered 1,000 individuals in Sweden. His team chose two participants — one male and one female — and used SMRT Sequencing to produce reference-grade assemblies for each. They generated 75-fold WGS coverage for each individual, and combined PacBio assembled contigs with Bionano optical maps to produce highly contiguous genomes. By comparing results to the initial SweGen results, Ameur found that a large proportion of the 20,000 structural variants detected in each reference assembly were missed by short-read sequencing. The new assemblies also included a total of 24 Mb of novel genome sequence, not represented in GRCh38; the vast majority of that data came from repetitive regions 5 kb or longer. While about 30% of the novel sequence had no hits in NCBI, the nearly 70% remaining did match existing sequences, leading Ameur to suspect that at least some of those sequences had been mis-annotated because they were not found in the human reference. Now, his team is going back to the original SweGen short-read WGS data and aligning it against the new reference genomes, which is helping to improve variant detection in the Swedish population, resolve false-positive SNPs, and improve alignment in some coding regions.
The webinar’s final presentation came from Peluso, who offered a quick overview of the features of SMRT Sequencing and its growing use for high-quality assemblies. Of the 65 human assemblies most recently submitted to NCBI, 90% of those with a contig N50 greater than 1 Mb were generated with PacBio data. Ongoing population studies and reference genome projects aim to use SMRT Sequencing on more than 2,400 human genomes globally. Peluso also presented data from the recent effort to sequence a Puerto Rican genome, HG00733, which used the latest advances for the Sequel System (v2.1 chemistry and 5.1 software). The SMRTbell Express Template Prep Kit allowed for faster sample prep and better yield, leading to libraries that generated more than 50% of data in reads longer than 33 kb and a contig N50 of 31.4 Mb. Average output per SMRT Cell was 10 Gb. The new assembly compared favorably to the Sanger-assembled GRCh38.p12, with fewer contigs (982 vs. 1536) and only slightly smaller contig N50 (31.4 Mb vs. 56.4 Mb). Peluso described cost efficiencies using the latest Sequel System improvements for de novo assembly, noting that “the original human reference genome cost $3 billion, and today you can characterize a single human genome with PacBio for around $3,000 (1/1-millionth the cost), and build a reference-quality genome de novo for around $20,000.”
Peluso also announced the availability of FALCON-Phase, an improved phasing assembly tool that incorporates long-range Hi-C data and can be found on Github. Looking ahead, he said that simplified library prep is on the roadmap for midyear, with a chemistry update to improve accuracy and yield slated for release in late 2018. Next year, a new SMRT Cell 8M is expected to expand yield and reduce costs significantly.
The event concluded with an audience Q&A covering details about alignment stringency, shared structural variants across the Swedish population, decoy sequences, and more. If you missed the live webinar, watch the recording any time.
Many people who run a sequencing core lab would prefer to focus on science instead of business, but all core lab managers know that it’s imperative to keep a steady stream of clients and projects filling the pipeline. Here, we offer a handful of tips to help you expand your user base.
- Be fast, high-quality, and easy to understand
To you a queue for sequencing may look like you’re at the top of your game with high demand, but to customers it can be frustrating. Regularly updating processes to improve pipeline efficiency will ensure that your customers are getting the fastest service possible so they can complete their research. And if your lab is consistently backlogged, it may be time to consider expanding your capacity.
Related to efficiency, the quality of the product you put out is one of the surest ways to gain happy customers and repeat business and to prevent customers from expressing negative thoughts about your services. Remember the adage that a happy customer will tell two potential prospects about their experience, but an unhappy customer will tell ten. The PacBio technical support team and your local FAS are available via web or phone to help troubleshoot or train on a particular application.
In addition to having an efficient pipeline producing excellent customer data, it’s important to have a mechanism to report easy-to-digest results. Think about the high-level metrics your customers need to understand their results and provide that in a concise report when you deliver their data.
- Differentiate yourself
Your customers need to know why they should choose you from other service providers. Whether it be by application (de novo assembly, Iso-Seq analysis, targeted sequencing, etc.), by organism type (plant, animal, microbial, etc.), or by additional services (HMW DNA isolation, bioinformatics, etc.), own what you are good at and shout it from the rooftops.
- Focus on solutions, not workflows
You, as a service provider, are intimately aware of workflow details because they are essential to your day-to-day operations. However, your customers care about the solutions your workflows and results make possible. Tell prospects about the cool and meaningful science that your services have enabled. Case studies or publication feeds on your webpage are a great way to distribute this information.
- Don’t be afraid to learn new things
Maybe you’re a seasoned pro at generating large libraries for de novo assembly projects and you’ve been curious about providing a long-read RNA sequencing solution. Contact your local FAS and set up a training session! There’s no better way to show that you keep up with the latest and greatest advancements in sequencing technology than by regularly updating your services to reflect the most up-to-date applications of SMRT Sequencing.
- Marketing, marketing, marketing
It may seem obvious to some, but getting the word out about your services is a surefire way to get more interest — and ultimately more projects — into your pipeline. Contrary to popular belief, it doesn’t take a marketing consulting firm or an executive with a decade of experience to get started. From free things like using social media to highlight successful projects and promotional pricing, to low-cost events such as hosting webinars with core facility advocates as guest speakers, and all the way to paid ads and automated email campaigns, there are many ways to get the word out about your services at any budget level.
We hope this list was helpful! We will be posting on each of these tips in more depth throughout the rest of the year. Don’t miss out on any of them by subscribing to our blog.
Ever since researchers sequenced the chimpanzee genome in 2005, they have known that humans share the vast majority of our DNA sequence with chimps, making them our closest living relatives. So what, exactly, sets us apart?
While prior ape genome assemblies were helpful in finding single nucleotide changes, many researchers speculate that a variation type that is more difficult to resolve, structural differences in regulatory DNA or in the copy number of gene families, play important roles in species adaptation. Large-scale efforts to sequence and assemble more ape genomes over the last 13 years have expanded our knowledge, but many structural variations (SVs) that distinguish the great apes remain unresolved. Additionally, the currently available draft ape genome assemblies, which contain tens to hundreds of thousands of gaps, are often compared against the much higher-quality human genome reference, introducing bias that “humanizes” the ape assemblies.
Now, an effort led by scientists at the University of Washington has closed most of those gaps by producing ab initio chimpanzee and orangutan genome assemblies where most genes are complete and novel gene models are identified.
In a recently published Science paper, first author Zev N. Kronenberg of the UW Genome Sciences department and presently at Phase Genomics, lead author Evan E. Eichler, of UW and the Howard Hughes Medical Institute along with a multi-institutional team describe how they coupled PacBio long-read sequence assembly and Iso-Seq cDNA sequencing with a multi-platform scaffolding approach to characterize lineage-specific and shared great ape genetic variation ranging from single base-pair to megabase-sized variants.
The team sequenced four genomes—two human, one chimpanzee and one orangutan—to high depth (>65-fold coverage) using SMRT Sequencing data, and generated ~3 Gb assemblies for each species where the majority of the euchromatic DNA mapped to <1,000 large contigs. They then scaffolded the chimpanzee and orangutan genomes without guidance from the human reference genome. By using the same exact methods for assembly, these ape genomes along with the Eichler group’s long-read assembly of the gorilla genome could finally be compared to one another and the human genome on a more level playing field.
“Recent advances in sequencing and mapping technologies now make more detailed investigations possible, not only of individual species but also entire clades of species,” the authors write. “We generated new great ape genome assemblies displaying improved sequence contiguity by orders of magnitude, leading to a more comprehensive understanding of the evolution of structural variation.”
Comparing these new high quality genome assemblies to 86 recently sequenced great ape genomes and a diverse set of human genomes from the Simons Genome Diversity Panel, they identified 17,789 fixed human-specific structural variants, including 11,897 human-specific insertions and 5,892 human-specific deletions. These figures double the number of predicted genic and putative regulatory changes that emerged in humans since divergence from nonhuman apes. Among this set, they focused on SVs that potentially disrupt genes or regulatory sequence, identifying 1,145 human-specific SVs with potential functional effects.
“Unbiased genome scaffolding led to the discovery of novel and more complex subcytogenetic differences between human and other great ape chromosomes that were previously missed,” the authors write. “Projecting these onto the human genome shows potential hotspots of structural variation by size or number of events.”
Among the discoveries were fixed human-specific structural variants enriched near genes that are downregulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.
“Differential gene expression, especially in cortical radial glia, has been hypothesized to be a critical effector of brain size and a likely target of unique aspects of human brain evolution,” they write.
The authors identify several potential avenues for future investigation, such as structural variants that alter the human versions of the genes ZNHIT6, GLI3, and two key cell cycle regulators, CDC25C and WEE1. The publication also offers a significant resource to the great ape research community by annotating the ape genes and identifying full length mRNA isoforms with Iso-Seq data combined with short read RNA-seq.
The ape genomes still have some holes in comparison to human due to “upgrades” to the human reference genome using BAC-based long-read sequencing to resolve difficult, biologically relevant genomic regions such as segmental duplications. Eichler has long championed this approach and in a press release that accompanies the Science publication, he says “Our goal is to generate multiple ape genomes with as high quality as the human genome. Only then will we be able to truly understand the genetic differences that make us uniquely human.”
The genomes of 3,000 strains of bacteria, including some of the deadliest in the world, are now available to researchers as part of an ambitious project by the UK’s National Collection of Type Cultures (NCTC), in partnership with the Wellcome Sanger Institute and PacBio.
Plague, cholera, streptomyces, and 250 strains of E. coli, are among the reference genomes created, as well as all ‘type strains’ of the bacteria in the collection — the first strains that describe the species and are used to classify them. The genome sequences of these highly valuable strains are fundamental for developing ways to identify specific infections in people, including tests diagnosing bacterial infections in the field to rapidly identify the source of an outbreak and help contain infections.
The collection includes several of the most important known drug-resistant bacteria, such as tuberculosis (one of the top ten causes of death worldwide, infecting 10.4 million and killing 1.7 million people in 2016 alone) and gonorrhoea (the sexually transmitted disease that infects 78 million people a year and is now becoming extremely difficult to treat) — and some varieties of historical significance, such as a dysentery-causing Shigella flexneri isolated in 1915 from a soldier in the trenches of World War 1, and a sample from the nose of penicillin discoverer Alexander Fleming.
“Historical collections such at the NCTC are of enormous value in understanding current pathogens,” said Julian Parkhill from the Wellcome Sanger Institute. “Knowing very accurately what bacteria looked like before and during the introduction of antibiotics and vaccines, and comparing them to current strains from the same collection, shows us how they have responded to these treatments. This in turn helps us develop new antibiotics and vaccines.”
“PacBio’s comprehensive DNA sequencing enables deep genomic analyses, and we are happy to be partnering with them for this important project,” he added.
Our CSO Jonas Korlach, stated: “The high-quality genomic maps enabled by SMRT Sequencing allow an unprecedented understanding of these bacteria. We are delighted to be chosen by institutions like Wellcome Sanger to help create such essential resources for the scientific and public health communities.”
Going forward, all the bacterial species in the NCTC collection will be sequenced as they are collected. Researchers can order bacterial strains from the NCTC website. Full information about each strain, including the DNA sequences, are available at EMBL-EBI.
Scientists have made important inroads in understanding why patients with HIV develop neurological disorders despite treatments that otherwise hold the virus at bay. The project was made possible with SMRT Sequencing, which generates reads long enough to span the full HIV envelope.
“Ultradeep single-molecule real-time sequencing of HIV envelope reveals complete compartmentalization of highly macrophage-tropic R5 proviral variants in brain and CXCR4-using variants in immune and peripheral tissues” was recently published in the Journal of NeuroVirology by lead author Robin Brese, senior author Susanna Lamers, and collaborators at the University of Massachusetts Medical School and Bioinfoexperts. In the article, the team describes a novel approach for examining how HIV evolves in the brain, segregated from HIV replicating in peripheral tissues. This may explain why the virus continues to attack the brain even when viral load in the rest of the body is controlled with antiretroviral therapy.
Analyzing individual virus genomes has been the preferred method for studying this phenomenon, but doing so with short-read sequencers “is problematic with HIV because millions of short sequences are generated, which subsequently require assembly, a near impossible feat with HIV [envelope] due to its high sequence variability combined with the error rate of NGS,” the scientists report. For this study, the researchers used SMRT Sequencing to generate full-length sequences of the HIV envelope, eliminating the need for assembly. Tissue samples came from a deceased, 43-year-old male HIV patient who had been responding well to drug therapy but had been diagnosed with HIV-associated dementia. Samples were collected from brain, lymph node, lung, and colon.
Scientists generated full-length envelope sequences — spanning about 2.6 kb each — and aligned nearly 53,000 unique reads. They developed phylogenetic trees showing “that brain-derived viruses were compartmentalized from virus in tissues outside the brain with high branch support,” the team writes. They further add that “variants from all peripheral tissues were intermixed on the tree but independent of the brain clades.” Finally, they note that the depth of sequencing and variation found within the brain samples was compelling, and that “SMRT did not simply reamplify thousands of sequences that were derived from a single or very few proviruses, but likely reflects the true diversity in the tissue.” Interestingly, CXCR4-using variants were found only outside the brain, while viruses within the brain used the CCR5 co-receptor.
“The study is the first to use a SMRT sequencing approach to study HIV compartmentalization in tissues and supports other reports of limited trafficking between brain and non-brain sequences during [combined antiretroviral therapy],” the scientists conclude. “Due to the long sequence length, we could observe changes along the entire envelope gene, likely caused by differential selective pressure in the brain that may contribute to neurological disease.”
LINE-1 (long interspersed nuclear element) insertions cover almost 17% of the human genome, but they are notoriously difficult to resolve accurately with short-read sequencing technology, according to scientists in Portugal. That matters because intronic LINE-1 elements can cause disease. In a recent study, SMRT Sequencing made it possible to analyze the multi-kilobase region and find a mutation causing muscular dystrophy.
In “Exonization of an Intronic LINE-1 Element Causing Becker Muscular Dystrophy as a Novel Mutational Mechanism in Dystrophin Gene,” scientists from several institutes in Portugal report finding a LINE-1 insertion that disrupted an open reading frame in the dystrophin gene. Lead authors Ana Gonçalves and Jorge Oliveira, senior author Rosário Santos, and collaborators describe this work in the journal Genes.
The 50-year-old male patient suffered onset of Duchenne/Becker muscular dystrophy at age 13. Earlier attempts to identify the causative mutation — including multiplex-ligation probe amplification and genomic sequencing — had failed. The scientists used several technologies for this case, deploying SMRT Sequencing to genotype the LINE-1 element that was detected as an interruption in an open reading frame in the dystrophin gene. “An aberrant transcript was identified, containing a 103-nucleotide insertion between exons 51 and 52, with no similarity with the DMD gene,” the authors report. “This corresponded to the partial exonization of a long interspersed nuclear element.” SMRT Sequencing analysis confirmed that a full LINE-1 sequence was present, and perfectly matched an element located in chromosome 2 that might have been its source. Based on the discovery, the patient’s children were also analyzed and his daughter was found to be a carrier of the same mutation.
LINE-1 insertions within a gene are believed to be rare, with just 30 events reported in the literature, the scientists note. Most of those are found in exonic regions. Intronic LINE-1 insertions have been determined to cause disease in three cases, featuring chronic granulomatous disease, familial retinoblastoma, and Chanarin-Dorfman syndrome. It is possible that the small number of events reported is a result of technology limitations: “In the case of intronic LINE-1 insertions, detection may be hampered by the intron’s length and the fact that it mainly affects transcriptional events,” the scientists write.
“To our knowledge, this is the first report of a deep-intronic insertion of a LINE-1 element in the DMD gene shown to cause disease,” the scientists conclude. “Besides its scientific relevance … this finding also reinforces the need to develop comprehensive approaches to identify LINE-1 insertion profiles in the human genome.”
Many investigators rely on targeted sequencing approaches for deep dives into genomic regions of interest. By designing specific probes — often using short-read sequences directed towards the exome and supported by existing reference genomes or transcriptome assemblies — scientists can home in on exactly the area they want to explore.
But what about sequences in intergenic regions not covered by short reads, which could contain crucial regulatory elements varying between populations that might be of functional and evolutionary importance? Or, what about species lacking high-quality reference genomes to guide probe design?
A team of Norwegian researchers are tackling these challenges using PacBio long-read sequencing technology for their target capture experiments. In a pre-print posted on bioRxiv, corresponding author Sissel Jentoft, first author Siv Nam Khang Hoff, and colleagues at the University of Oslo, Roche NimbleGen, and Roche Diagnostics, describe how they used the technique to elucidate the evolution of the hemoglobin gene clusters in codfishes.
Hemoglobins (Hbs), key respiratory proteins in most vertebrates, are of great importance for ecological adaptation in fishes, as environmental factors such as temperature directly influence the solubility of O2 in surrounding waters and the ability of Hb to bind O2 at respiratory surfaces.
Previous studies have suggested remarkably high Hb gene copy number variation between codfish species. One study, for example, reported a negative correlation between the number of Hb genes and depth at which the species occur was observed, suggesting that the more variable environment in sunlit waters has facilitated a larger and more diverse Hb gene repertoire.
Interested in resolving the organization of Hb genes and their flanking genes in a selection of codfishes inhabiting different environmental conditions, the Oslo team turned to SMRT Sequencing to generate long, highly accurate, and continuous assemblies of these specific genomic regions of interest.
“Comparative genetic studies of gene organization or synteny requires longer, more continuous stretches of DNA containing more than one gene,” the authors explain.
Eight codfish species were selected on the basis of phylogenetic and habitat divergence. A highly continuous genome assembly of Atlantic cod (previously created using PacBio sequencing), as well as low-coverage draft genome assemblies of all eight species were used to design probes spanning both exons and introns of the genomic regions of interest. To enable targeted sequence capture for PacBio sequencing, the team used a modified protocol for sequence capture offered by Roche NimbleGen (the SeqCap EZ protocol) and generated custom barcodes.
“The generation of highly continuous assemblies enabled reconstruction of micro-synteny revealing lineage-specific gene duplications and identification of a relatively large and inter-species variable indel located in the promoter region between the Hbb1 and Hba1 genes,” the authors write.
The results shed light on the evolutionary history of Hb genes across species separated by up to 70 million years of evolution, and reveal genetic variations possibly linked to thermal adaptation, they conclude.
“Our study demonstrates that this approach… is a highly efficient and versatile method to investigate specific genomic regions of interest across distantly related species where genome sequences are lacking,” they add.
For pointers on how you can use SeqCap EZ for target sequence capture on PacBio Systems, check out this protocol.
In eukaryotic organisms, the majority of genes are alternatively spliced to produce multiple transcript isoforms. Gene regulation through alternative splicing can dramatically increase the protein-coding potential of a genome. Therefore, understanding the functional biology of a genome requires knowing the full complement of isoforms. Microarrays and high-throughput cDNA sequencing are useful tools for studying transcriptomes, yet these technologies provide only small snippets of transcripts. Accurately reconstructing complete transcripts to study gene isoforms has been challenging [1, 2].
The Iso-Seq method produces full-length transcripts using Single Molecule, Real-Time (SMRT) Sequencing . Long read lengths enable sequencing of full-length transcripts up to 10 kb or longer, eliminating the need for transcript assembly or inferencing. The Iso-Seq bioinformatics pipeline, which is freely available through SMRT Analysis, further processes the data into high-quality consensus transcript sequences that enable accurate isoform annotation and open reading frame prediction .
Since it does not require a reference genome or existing annotation, the Iso-Seq method has been widely adopted by the scientific community to analyze a variety of important agricultural crops and animals such as coffee, cotton, maize, rabbit, chicken, and many others. In all cases, the researchers discovered a much more diverse and complex transcriptome than previously understood. For example, Kuo et al. expanded the chicken annotation to ~64,000 transcripts, of which ~21,000 were novel lncRNAs not annotated in Ensembl. In another case, Wang et al. were able to expand and correct the maize B73 genome annotation, including the discovery of 867 novel lncRNA transcripts.
The ability to unambiguously determine the full exonic structure of complex genes, with no assembly required, also makes the Iso-Seq method attractive to the study of human diseases. Kohli et al. were able to characterize androgen receptor (AR) isoforms in castration-resistant prostate cancer to show that one novel isoform, AR-V9, was co-expressed with AR-V7 and predictive of drug resistance. Tseng et al. discovered novel splice patterns in the FMR1 gene in premutation carriers for Fragile X-associated Tremor/Ataxia syndrome that were undetected in the control group.
Perhaps somewhat surprisingly, after the Iso-Seq dataset for the MCF-7 breast cancer cell line was released to the public , it was revealed that this well-studied sample contained more cancer fusion genes, two new mitochondrial lncRNAs and novel sample-specific transcripts. In a recently published study, Anvar et al. used this same deep MCF-7 dataset to show that there is widespread coupling of transcript features, where more than 7,000 genes were found to have preferential coupling of 5’ start sites, exons, and polyadenylation sites. Such a study would not have been possible without the ability to precisely determine the starts and ends, as well as the splice junctions, of each transcript isoform.
But the Iso-Seq method is not just limited to eukaryotes. Recently, a new protocol called SMRT-Cappable-seq was developed to sequence the E. coli transcriptome. The result is a dramatic increase in the number of annotated operons and readthrough for the bacterium. Similarly, the Iso-Seq method was used to discover new coding and anti-sense transcripts in the previously poorly annotated human cytomegalovirus.
Since the launch of the Iso-Seq protocol in SMRT Analysis in 2014, the analysis pipeline has seen several improvements. The new Iso-Seq2 protocol, released in SMRT Analysis 5.1 last month, improves both speed and transcript recovery . More importantly, over the past 5 years the bioinformatics community has embraced the technology, sparking the development of additional tools. IsoCon, IDP, and IDP-denovo are error correction methods that work for targeted genes or hybrid data. Specialized long read aligners such as minimap2 now support alternative splicing. Cupcake and TAMA are two lightweight alignment processing tool suites. SQANTI categorizes Iso-Seq transcripts against an existing annotation and combines it with short read expression data. A growing list of community tools is maintained at the Iso-Seq wiki.
We encourage our users to continue finding new ways to utilize full-length transcript sequencing with PacBio and contribute to exciting biological discoveries!
- Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. GigaScience 1–13 (2017). doi:10.1093/gigascience/gix086
- Wang, M. et al. A global survey of alternative splicing in allopolyploid cotton: landscape, complexity and regulation. New Phytol 217, 163–178 (2017).
- Wang, B. et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat Comms 7, 11708 (2016).
- Chen, S.-Y., Deng, F., Jia, X., Li, C. & Lai, S.-J. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing. Sci. Rep. 7, 1–10 (2017).
- Kuo, R. I. et al. Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human. BMC Genomics 18, 1–19 (2017).
- Kohli, M. Androgen Receptor Variant AR-V9 Is Coexpressed with AR-V7 in Prostate Cancer Metastases and Predicts Abiraterone Resistance. Clin Cancer Res 23, 1–13 (2017).
- Tseng, E., Tang, H.-T., AlOlaby, R. R., Hickey, L. & Tassone, F. Altered expression of the FMR1 splicing variants landscape in premutation carriers. BBA – Gene Regulatory Mechanisms 1860, 1117–1126 (2017).
- Weirather, J. L. et al. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Research 43, e116–e116 (2015).
- Gao, S. et al. Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data. Mitochondrion 38, 41–47 (2018).
- Chakraborty, S. MCF-7 breast cancer cell line PacBio generated transcriptome has ~300 novel transcribed regions, un-annotated in both RefSeq and GENCODE, and absent in the liver, heart and brain transcriptomes. 1–8 (2017). doi:10.1101/100974
- Anvar, S. Y. et al. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol. 19, 1–18 (2018).
- Yan, B., Boitano, M., Clark, T. & Ettwiller, L. SMRT-Cappable-seq reveals complex operon variants in bacteria. bioRxiv 1–34 (2018). doi:10.1101/262964
- Balazs, Z. et al. Long-Read Sequencing of Human Cytomegalovirus Transcriptome Reveals RNA Isoforms Carrying Distinct Coding Potentials. Sci. Rep. 1–9 (2017). doi:10.1038/s41598-017-16262-z
References and Resources:
 Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat Meth 10, 1177–1184 (2013).
 Angelini, C., Canditiis, D. & Feis, I. Computational approaches for isoform detection and estimation: good and bad news. BMC Bioinformatics 15, 135–43 (2014).
 Gordon, S. P. et al. Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing. PLoS ONE 10, e0132628 (2015).
 PacBio MCF-7 blogpost: https://www.pacb.com/blog/data-release-human-mcf-7-transcriptome/
 PacBio Iso-Seq GitHub: https://github.com/PacificBiosciences/IsoSeq_SA3nUP/
Nature Methods just published “Accurate detection of complex structural variations using single-molecule sequencing,” a publication that presents the NGMLR aligner and Sniffles structural variant caller, both designed for use with long-read sequencing data. We chatted with developer and lead author Fritz Sedlazeck from the Human Genome Sequencing Center at Baylor to learn more.
Q: Why was a new alignment tool needed when many scientists already use BWA and other methods?
A: When I started my postdoc in Mike Schatz’s lab at Cold Spring Harbor, I had the opportunity to look at the complex SK-BR-3 cell lines. We soon discovered two challenges not addressed effectively by existing aligners: mapping split reads correctly, and handling the random short insertion and deletion errors that are characteristic of long reads.
Q: Why was Sniffles needed for structural variant detection?
A: Most of the methods for structural variant detection focus on paired-end reads. There were no appropriate structural variant calling tools at the time for long-read data, and very few callers that take into account split-read alignments. You have to have a method that parses through the full read.
Q: When you applied these tools to long-read data, what could you see that wasn’t visible before?
A: Before we started to think about how we could improve the alignments and structural variant calling, we spent a lot of time looking at IGV, focusing on single reads in complex regions like oncogenes. We knew there were some events that were hidden from us, and we saw a lot of noise coming out. That really motivated us to develop these new tools to find the signal in the noise. When we first applied them, very quickly we were detecting these structural variants. Some of the first results from Sniffles were identifications of amplification events and inversions that had not been found before.
Q: You’ve talked about plans to sequence 100 people with SMRT Sequencing from PacBio. What are the goals of that study?
A: This study is aiming at the concept of comprehensive genomes, or what Richard Gibbs calls “super-genomes.” We have SNP calls from Illumina, PacBio reads to call structural variants, and for a few samples we have 10x Genomics data for really long phasing. Our best example so far is a 67 Mb phasing block N50 for SNV and SVs. This pilot study covers many different ethnicities. The majority of samples are from African Americans, and there are many samples from Hispanic individuals as well. There are just a few Caucasians. We hope to get a good ethnicity-specific structural variant call set that we can use to inform other studies as well. We are confident that we’ll be able to identify many more structural variants that are invisible to short-read data.
Q: How much long-read coverage is needed for accurate structural variant discovery in a human genome?
A: We are aiming for about 10-fold coverage, which leaves us with 5-fold per haplotype. That’s enough for good coverage of each chromosome and lets us see the vast majority of structural variants.
For more technical detail about Sniffles and NGMLR, check out our blog post covering this paper as a preprint or attend the upcoming LabRoots webinar on May 9, in which Sedlazeck will give a talk entitled “Size Matters: Accurate Detection and Phasing of Structural Variations.”
The PacBio team is just back from Chicago, where we saw outstanding talks and posters at the American Association for Cancer Research (AACR) Annual Meeting and enjoyed that city’s well-deserved reputation for exciting weather. We hope everyone remembered to pack their hats and gloves and enjoyed the late-season snow!
This year multiple researchers presented work featuring the use of the Iso-Seq method for full-length transcript sequencing in cancer research. The first was a poster presented by Yeung Ho from University of Minnesota, entitled ‘The role of androgen receptor variant AR-V9 in prostate cancer’. The poster describes their discovery that the previously reported structure of AR-V9 was incorrect, and that past experiments characterizing AR-V9 expression patterns had not distinguished between it and the related isoform AR-V7. Read the full publication in Clinical Cancer Research to learn more.
Liqing Tian and Jinghui Zhang from St. Jude’s Children’s Research Hospital also presented work featuring SMRT Sequencing. In their poster on ‘Allelic specificity of immunoglobulin heavy chain (IGH) translocation in B-cell acute lymphoblastic leukemia (B-ALL) unveiled by long-read sequencing’ they shared follow up studies validating a tantalizing lead they discovered in the NALM-6 cell line transcriptome data collected in collaboration with us using the simplified Iso-Seq sample preparation protocol for the Sequel System introduced at last year’s AACR meeting. They showed that though IGH-DUX4 fusion occurs frequently and is thought to be a driver of B-ALL, DUX4 has translocated into the IGH enhancer that is repressed by immunoglobulin allelic exclusion. They theorized that translocation into the dormant enhancer may be favored either because DUX4 is toxic to the pre-B cell or because the translocation blocks VDJ recombination. Without functional B cell receptor, such clones would not survive.
Later in the day, our own Meredith Ashby presented a poster detailing recent improvements to the Iso-Seq analysis workflow, increasing both the speed and reliability of the pipeline that will benefit Sequel System users. The new pipeline can process 1 Sequel SMRT Cell in ~5 hours, and 3 SMRT Cells of data in ~13 hours. In addition, the poster described improved precision in isoform detection with the use of synthetic spike-in variants. Meredith also shared new whole transcriptome data from the breast cancer cell line HCC-1954 and normal breast tissue control, demonstrating that the Iso-Seq method reveals a surprising abundance of both novel isoforms and novel junctions.
On Tuesday afternoon, Neetu Singh from King George’s Medical University presented her poster ‘Characterization of oral squamous cell carcinoma transcriptome through long read sequencing technology’. Neetu reported numerous novel isoforms discovered in patient samples using the Iso-Seq method, including transcripts of genes previously implicated in tumor development. For example, she found that tumor samples but not controls expressed novel isoforms of SCAMP3 and type I keratin KRT-17. Similarly, she found many isoforms of KRT6, KRT14, and KRT16 gene fusions.
Interested in the Iso-Seq method for your project? We invite you to enter our upcoming Iso-Seq SMRT Grant Program: Connect the Dots with the Iso-Seq Method. As highlighted in the terrific science presented by PacBio users at AACR this year, full-length transcript sequencing with PacBio allows users to see the complete picture of splice variants, novel isoforms, and gene fusions. Tell us how the Iso-Seq method for RNA sequencing will drive new discoveries in your human-focused research for a chance to win sequencing on the Sequel System. Visit the grant submission website to learn more and get your entry ready for when the SMRT Grant opens May 1.
Structural variants account for most of the base pairs that differ between human genomes, and are known to cause more than 1,000 genetic disorders, including ALS, schizophrenia, and hereditary cancer. Yet they remain overlooked in human genetic research studies due to inherent challenges of short-read sequencing methods to resolve complex variants, which often involve repetitive DNA.
At a recent webinar co-hosted by Nature Research, Professor Alexander Hoischen joined Principal Scientist Aaron Wenger to discuss how advances in long-read sequencing and structural variant calling algorithms have made it possible to affordably detect the more than 20,000 such variants that are now known to exist in a human genome.
Wenger described methods for calling and visualizing structural variants from low-coverage, long-read sequencing of human genomes, and presented optimal study designs for both gene discovery and population genetics, while Hoischen shared case studies.
New Insights into Neurodevelopmental Disorders
Hoischen’s team at Radboud University Medical Center uses intellectual disability as a model for severe, sporadic neurodevelopmental disorders. Extensive research suggests that more than 60% of moderate to severe cases of intellectual disability are caused by de novo mutations, but even after the application of microarrays, exome sequencing, and short-read whole genome sequencing, 38% of patients remain undiagnosed because no causal variant can be found.
To address these unexplained cases, Hoischen and his colleagues adopted SMRT Sequencing to see if long-read technology could offer new insight. They already knew that information about structural variants associated with intellectual disability was lacking compared to single-nucleotide variants.
In a pilot project that is still underway, Hoischen selected five patient/parent trios and sequenced them to high coverage with the Sequel System. While all 15 people had previously been analyzed with microarrays, exome sequencing, and short-read WGS, SMRT Sequencing still uncovered 21 Mb of genomic sequence in each sample that was essentially new data. Of that, 7 Mb of sequence falls in genic space.
Using PBSV and a new joint calling tool the team beta tested, they found as many as 23,000 structural variants larger than 50 bp per genome. Nearly 70% of those variants were missed by short-read sequencing — 80% of insertions and 55% of deletions were novel, Hoischen said. By broadening their search to variants as small as 20 bp, the team expanded its variant calls to as many as 40,000 in each genome, with similar stats for novel findings. They have used PCR and other approaches to validate many of the calls, showing that the PacBio data is highly accurate.
For analysis, Hoischen and his team are focusing on de novo mutations. They use data from the parents to rapidly filter out inherited mutations, getting the patient’s universe of potentially causative variants down to very manageable numbers for follow-up study. In one example, he showed that just 40 structural variants were left to investigate in the patient after this filtering process. Hoischen said this approach is likely to be powerful for clinical applications.
Deeper Dive into Autism Spectrum Disorder
Another project was also announced during the presentation. Stephen Scherer, director of The Centre for Applied Genomics at The Hospital for Sick Children (SickKids) and Professor of Medicine at the University of Toronto, was named as recipient of the Structural Variant SMRT Grant Program, launched in partnership with GENEWIZ during the American Society of Human Genetics Annual Meeting in October 2017. He will receive sequencing on the Sequel System and bioinformatics support to pursue the project entitled “Using Low-Coverage PacBio SMRT Sequencing to Find Structural Variation Mutations in Autism Families with Multiple Affected Individuals.”
Scherer previously published results of a study in which he used short-read whole-genome sequencing to detect single nucleotide variations, indels, and copy number variations in more than 5,000 samples from families with ASD. Although he was able to successfully identify many variants and associated genes affecting autism risk, the study did not report on structural variation findings and for most families the genetic determinants are still to be resolved.
Congratulations to Dr. Scherer and we look forward to successfully applying low-coverage whole genome SMRT Sequencing to this important and ground-breaking research.
Pop quiz: Which animal accounts for around 20% of all living mammals, harbors (yet survives) some of the world’s deadliest diseases, lives proportionately longer than humans given its body size, and helps make tequila possible?
From the tiniest bumblebee bat (Craseonycteris thonglongyai) to the large (1kg) golden-capped fruitbat (Acerodon jubatus), the diversity and rare adaptations in bats have both fascinated and terrified people for centuries. Now, an international consortium of bat biologists, computational scientists, conservation organizations, and genome technologists has set out to decode the genomes of all 1,300 species of bats using SMRT Sequencing and other technologies.
The aim of the Bat1K initiative, as set forth by Emma Teeling of the University of Dublin, Sonja Vernes of the Max Planck Institute, and 146 others in this paper in the Annual Review of Animal Biosciences, is to “catalog the unique genetic diversity present in all living bats to better understand the molecular basis of their unique adaptations; uncover their evolutionary history; link genotype with phenotype; and ultimately better understand, promote, and conserve bats.”
The large sequencing project will be accomplished in three phases, starting with 21 representatives of each bat family, followed by 220 representatives for every genus of bat, and then the remaining 1,288 of the species. It will greatly expand upon the 14 bat genome assemblies currently available from the National Center for Biotechnology Information (NCBI) database, which are of varying quality and completeness.
“One primary goal of Bat1K is to standardize assembly strategies to provide assemblies of uniform optimal quality for the bat genomics community through combining multiple sequencing and scaffolding technologies,” the authors write. “We believe it is important not just to generate genome-level data, but to produce high-quality genome sequences that maximize the usefulness and accessibility of the data for all research fields.”
The bat clade exhibits a wide range of chromosomal variation. High-quality, chromosome-level genome assemblies across the group will allow researchers to investigate things like evolutionary trajectories of autosomal and sex chromosomes from nucleotide, syntenic, and phylogenomic perspectives.
The team is also hoping to resolve “some of the most passionate debates in science” centered around the evolutionary history of bats, which has been difficult to piece together due to an impoverished fossil record.
The information they uncover could benefit not only the research community, but the world at large. The authors argue that studying bats will enable us to address some of the most important challenges facing humanity into the next century including improving the well-being of a large and rapidly aging human population, preventing the spread of emergent infectious diseases, maintaining agricultural productivity, and restoring natural ecosystems worldwide.
Bats are suspected reservoirs for some of the deadliest viral diseases, including Ebola, SARS (severe acute respiratory syndrome), rabies, and MERS (Middle East respiratory syndrome coronavirus). But they appear to be asymptomatic and survive these infections. Figuring out why could increase our understanding of immune function and help prevent viral spillovers into humans.
Bats also exhibit extraordinary longevity—they can live up to 10 times longer than expected given their small body size and high metabolic rate. Only 19 mammal species are known to live proportionately longer than humans given their body size, and 18 of these are bats.
“Bats show few signs of senescence and low to negligible rates of cancer, suggesting they have also evolved unique mechanisms to extend their health spans, rendering them excellent models to study extended mammalian longevity and ageing,” the team writes.
By identifying bats’ cellular repair mechanisms, researchers could also gain insight into inflammatory disorders associated with autoimmune diseases, which are among the fastest growing causes of disease worldwide.
“The ability to modulate inappropriate inflammation in response to stressors without impairing immune function could improve the lives of millions,” the authors write.
Studying the genetics of echolocation, vocal learning, and sensory perception in bats could shed light into human blindness, deafness, and speech disorders, they add. And characterizing bat wing development could improve our understanding of how changes in limb developmental building blocks can lead to human limb malformations.
In regard to the ecosystem, bats perform key services. They pollinate crop species in the tropics (including agave, making possible the distillation of tequila) and disperse seeds across long distances, maintaining plant genetic diversity and aiding the regeneration of forests after clearing. They are able to breach ocean barriers, making them indispensable to isolated island ecosystems. They also feed on crop pests throughout their range; without bats, it is estimated that the United States would spend more than $3 billion a year on pesticides alone, the authors report.
“Bat1K will develop a genomic ark that can be used to benchmark the genomic health of different bat species to uncover populations in need of immediate conservation efforts,” the authors write. “Prioritization of bat genomes is not just desirable but indispensable to confront the many challenges to human well-being, ecosystem function, and biodiversity conservation we now face.”
Catch one of the Bat1K project leaders, Sonja Vernes, as a keynote speaker at the 2018 SMRT Leiden Conference, to be held in the Netherlands June 12-14. The meeting includes two back-to-back events: SMRT Scientific Symposium and the SMRT Informatics Developers Meeting. View the preliminary agenda and register
We’re told to avoid sugar and refined carbohydrates if we want our teeth to remain strong and cavity-free. But what is the role of microbiota in our oral health?
Cavities – or caries – actually occur as the result of bacterial infection that leads to sustained decalcification of tooth enamel and the layer beneath it, the dentin. Left unchecked, it can reach the tooth’s inner layer, with its soft pulp and sensitive nerve fibers, and, in some cases, can cause serious complications such as phylogenetic osteomyelitis and the life-threatening bacterial endocarditis.
In addition to diet and host factors, the occurrence and development of dental caries seems to be closely related to the imbalance of the oral microbiota. With this in mind, researchers at Zhejiang University in Hangzhou, China, wanted to create a profile of oral microbiota in early childhood caries, and they turned to PacBio SMRT Sequencing to do so.
As detailed in a paper published in the Frontiers in Microbiology, lead author Hui Chen, first author Yuan Wang, and colleagues derived 876 species from 13 known bacterial phyla and 110 genera from saliva samples collected from 41 Chinese preschoolers, aged 3–5 years old (21 with severe early childhood caries, and 20 who were caries-free).
A shift in the oral core microbiota was observed in the two groups, allowing the researchers to identify both protective and destructive bacteria.
“Our findings indicate that dental caries have a microbial component, which might have potential therapeutic implications,” the authors write.
At the species level, 38 species, including Streptococcus spp., Prevotella spp., and Lactobacillus spp., showed higher abundance in the caries group compared to the caries-free group. This suggests these bacteria may be risk factors for dental caries in children, the authors state.
The researchers also collected samples from the same children six months later. New cavities were developing in 5 children who were initially caries-free. Analyzing their microbiota, the researchers found that 6 species of bacteria that were abundant in the caries-free children, including Abiotrophia spp. and Neisseria spp., were much less abundant in these cases. Those bacteria were also less abundant in the initial caries group, leading the researchers to associate the strains with a healthy oral microbial ecosystem.
The authors say they chose single-molecule real-time sequencing because of its richness and resolution. Previous studies have explored the relationship between microorganisms and the development of caries; however, most of the cariogenic bacteria were only identified at the genus level, they noted.
“Species-level and even strain-level resolution is thought to be important for caries prognosis,” the authors state. “PacBio outperformed other sequencers… in terms of the length of reads, and it reconstructed the greatest portion of the 16S rRNA genome when sequencing the oral microbiota.”
At the HudsonAlpha Institute for Biotechnology, scientists are building on advances in agricultural research to power a clinical pediatric research program. For this work, they’re using the Sequel System to perform whole-genome sequencing on trios of children with developmental disabilities and their parents.
HudsonAlpha researchers have been using SMRT Sequencing to resolve challenging plant genomes, deploying a Sequel System and a PacBio RS II for these complex projects. The successfulness of that program led the institute to add a second Sequel System for use in sequencing human genomes.
The organization is part of the NIH-funded Clinical Sequencing Exploratory Research Program, with faculty investigator Greg Cooper leading an effort to apply whole genome sequencing to better understand the genetic basis of intellectual and developmental disabilities in children and to provide diagnostic information to affected families. More than 500 children and their parents have been enrolled in the study.
In a statement announcing this work, Cooper said, “By applying whole genome PacBio Sequencing in this study we hope to more sensitively identify all sizes of genetic variants, thereby increasing our solve rate for previously undiagnosed children. In many cases, an accurate clinical diagnosis can improve our ability to manage the child’s condition. We also anticipate that we will make novel discoveries through this work that may benefit many families beyond those directly tested here.”
The group’s efforts to diagnose children using short-read sequencing technology have achieved a success rate of about 30 percent, but it is widely known that these platforms are unable to detect certain types of variation that contribute to disease. Structural variants such as repeat expansions and copy number variations are larger and more complex than short-read sequencers can resolve, and likely represent some of the cases that have gone undiagnosed. With PacBio long-read sequencing technology, scientists may be able to produce answers for cases that have proven intractable with other technologies.
“We believe projects like HudsonAlpha’s CSER program to help solve undiagnosed genetic disease in children are among the most important and rewarding uses for our technology,” stated Kevin Corcoran, our Senior Vice President for Market Development. “We look forward to seeing how PacBio sequencing can both improve their clinical sequencing success rate as well as support new discoveries.”