This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.
Search PacBio’s Blog
February 21, 2018
Congratulations to Winston Timp’s team on the publication of their Iso-Seq analysis of hummingbird! The paper is now available at GigaScience.
April 4, 2017
A new preprint offers an enticing look at transcriptome results from analysis of a hummingbird using SMRT Sequencing. In this study, scientists found new clues to explain unique attributes of the bird’s metabolism. The work was made possible through full-length isoform sequencing, which allowed deep, assembly-free analysis even though no reference genome was available.
“Single molecule, full-length transcript sequencing provides insight into the extreme metabolism of ruby-throated hummingbird Archilochus colubris” is now available on BioRxiv. From Rachael Workman, Alexander Myrka, Elizabeth Tseng, William Wong, Kenneth Welch, and Winston Timp, the paper describes a project designed to better understand how hummingbirds switch metabolic gears to focus on sugars or lipids as needed. “This metabolic flexibility is remarkable both in that the birds can switch between exclusive use of each fuel type within minutes,” they write, “and in that de novo lipogenesis from dietary sugar precursors is the principle way in which fat stores are built, sometimes at exceptionally high rates, such as during the few days prior to a migratory flight.”
The team used the Iso-Seq method with long-read PacBio data to generate full-length isoform sequences, focusing on the liver of Archilochus colubris. According to the paper, this represents “the first high-coverage transcriptome of any single avian tissue.” They also aligned transcripts to Calypte anna, a recently completed hummingbird assembly that also made use of SMRT Sequencing.
Workman et al. report that the use of long-read PacBio data allowed for more accurate views of isoforms and alternative splicing, even without a reference genome. “Using full-length transcript data, we found alignment unnecessary to generate clear pictures of the gene isoforms,” they note. “The long reads negate the need for transcript assembly, a precarious analysis in the absence of a genome.” Nearly half of the reads in the final analysis covered full-length genes, including the 5’ and 3’ ends as well as the polyA tail.
The team used the COGENT pipeline to assign transcripts to gene families and focus on unique isoforms. “COGENT is specifically designed for transcriptome assembly in the absence of a reference genome, allowing for isoforms of the same gene to be distinctly identified from different gene families,” the scientists write. Their analysis generated a highly diverse set of isoforms, which the authors believe “represents a nearly complete transcriptome of the hummingbird liver.”
With that dataset, the scientists found genes unique to hummingbird. “These genes showed a specific enrichment for pathways involved in lipid metabolism — suggesting that the hummingbird has evolved variants of these genes to achieve its high levels of metabolic efficiency,” they report.
The scientists note that follow-up functional assays will be an important next step in understanding and verifying the function of many genes of interest.
We can’t resist a good reference genome, so the pre-AGBT workshop entitled “Updating Reference Assemblies: New Technologies, New Sources of Diversity” was right up our alley. Hosted by the McDonnell Genome Institute, a member of the Genome Reference Consortium, the event offered conference attendees useful updates on efforts to expand the diversity of human reference genome sequences by incorporating samples from multiple continents of origin (the Americas, Africa, and Asia in addition to Europe).
NCBI’s Valerie Schneider spoke about opportunities and challenges in mining assemblies other than the current GRCh38 build. There are more human genome assemblies than ever, she said, noting that this is providing new insight into where variants are most commonly found — and also helps focus efforts to represent additional diversity. She also covered recent improvements to the GRCh38 assembly, plus a list of remaining technical challenges, while reporting that 65 new human genomes have been submitted to GenBank since GRCh38 was published. Most of those are based on PacBio data, and Schneider spoke about how those assemblies are used to help understand alternate loci and genetic variants in GRCh38. Going forward, she indicated that assemblies from people of African descent are still needed, offering a major opportunity for improvement.
Tina Graves Lindsay from the McDonnell Genome Institute continued the diversity theme, showing how her team relies on a strategy of 60-fold coverage with PacBio long reads paired with scaffolding technologies to produce reference-grade assemblies. By sequencing genomes from underrepresented ethnicities, including Gambian and Yoruban assemblies she shared, her group has successfully resolved conflicts in GRCh38.
Ed Green from the University of California, Santa Cruz, spoke about updating reference genomes with proximity ligation techniques such as Hi-C and Chicago. The approaches are analogous to mate-pair data, he said, and talked about data from 12 diploid human genomes. In one example, proximity ligation showed alignment errors in NA19240, a reference just submitted to GenBank that had sections of chromosome 4 incorrectly placed on chromosome 1, among other problems.
We’d like to thank the workshop organizers for a great event!
The Department of Energy has its eyes on an unassuming solution to our bioenergy needs: Aspergillus. The fungal genus contains hundreds of variations, which include powerful pathogens, industrial cell factories, and prolific producers of bioactive secondary metabolites.
The DOE’s Joint Genome Institute (JGI) has embarked on an ambitious plan to sequence, annotate and analyze the genomes of 300 Aspergillus fungi, and the first results are in.
In a study published in the Proceedings of the National Academy of Sciences, “Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species,” a team led by researchers at the JGI in partnership with the DOE’s Joint BioEnergy Institute (JBEI) and the Technical University of Denmark (DTU), describe how they applied SMRT Sequencing to four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus, and A. steynii), producing very high-quality genome assemblies that can serve as reference strains for future comparative genomics analyses.
Two additional strains (A. taichungensis and A. candidus) were also sequenced and a comparative analysis involving these and other Aspergillus genomes was then conducted, allowing the team to identify biosynthetic gene clusters for secondary metabolites (SMs) of interest.
“One of the things we found to be interesting here was the diversity of the species we looked at; we picked four that were distantly related,” says study senior author Mikael R. Andersen, Professor at DTU. “With that diversity comes also chemical diversity, so we were able to find candidate genes for some very diverse types of compounds.”
Using a new analysis method developed by first author Inge Kjaerboelling, the team looked for genes found in all producer species and was able to “elegantly pinpoint the genes,” Andersen adds.
Among the traits they traced were allergens, virulence, and pathogenicity. Aspergillus fungi are also known to contain more than 250 carbohydrate active enzymes (CAzymes), which break down plant cell walls. Knowledge of how this works could help the DOE as it pursues sustainable alternative fuels using bioenergy feedstock crops.
The fungal species’ secondary metabolites are also of interest to DOE researchers, as these small molecules have the potential to act as biofuel and chemical intermediates. Determining the structures of purified secondary metabolites is often relatively straightforward, but connecting these molecules to their biosynthetic pathways can be quite challenging, says study co-author Scott Baker, a fungal researcher at the Environmental Molecular Sciences Laboratory, a DOE Office of Science User Facility located at the Pacific Northwest National Laboratory.
“We show that using comparative genomics can efficiently lead to reasonable predictions of gene clusters involved in biosynthetic pathways,” Baker says.
The authors hope that by characterizing the identity and roles of secondary metabolites, and the genes necessary for their generation, they will discover potential tools for improving the ability to process recalcitrant biomass into precursors for biofuels and bioproducts.
A new review in Nucleic Acids Research offers a sweeping look at clinical uses for SMRT Sequencing, concluding:
“The myth that SMRT sequencing is too error prone to be diagnostically useful is being expunged and replaced by evidence that it offers advantages over short-read sequencers.”
The authors continued, “Just as second-generation platforms stepped beyond Sanger sequencing and enabled a revolution in genomics medicine, third-generation single molecule sequencing platforms will likely be the next genetic diagnostic revolution.”
“Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics” written by Simon Ardui, Joris Vermeesch, and Matthew Hestand at KU Leuven and Adam Ameur at Uppsala University, offers a great overview of how SMRT Sequencing is being used in clinically relevant applications ranging from cancer to reproductive medicine and more. The paper notes that SMRT Sequencing offers tremendous benefits because it resolves many problems with short-read platforms — “limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles.” In addition, SMRT Sequencing has “higher consensus accuracies and can detect epigenetic modifications from native DNA,” Ardui et al write.
“SMRT sequencing is opening up new diagnostic avenues, such as the ability to determine tandem repeat lengths, interruptions, and even epigenetics in a single test at base pair resolution,” the scientists report. “Long read sequencing is already considered the gold standard for some applications, such as for HLA genotyping for tissue transplants.”
The review walks through many of those applications, offering prominent examples for each. Resolving tandem repeats, for example, is already important for Fragile X syndrome, spinocerebellar ataxia, and other repeat expansion disorders. “Replacing Southern Blots with faster and more direct SMRT sequencing will greatly enhance … repeat disorder diagnostics,” the scientists note. They also cover examples such as distinguishing pseudogenes, needed for CYP2D6 analysis for drug metabolism studies; identifying fusion genes to guide therapy selection for cancer patients; and infectious disease analysis; among many others.
Looking ahead, the review cites data indicating that whole transcriptome and whole genome sequencing on the PacBio system will soon see regular clinical utility. Regarding the Iso-Seq method, “as costs drop and throughput increases, unbiased PacBio expression and isoform detection will become routine in the near future,” the scientists write. They also note that “SMRT sequencing is greatly expanding the utility of WGS, permitting a factor greater in assembly completeness … even nearing reference genome contig sizes and including diploid aware assemblies.”
A project that sparked widespread interest and a successful science crowdsourcing campaign has inspired an international collaboration that produced two high-quality reference genomes, as well as a draft genome of a related beetle. And the results have shed light on the evolution of bioluminescence.
We’ve been following the progress of Team Firefly since the team of scientists from MIT, University of Rochester, Brigham Young University, Indiana University, Cornell University, and Tufts University narrowly lost our 2016 SMRT Grant competition. The project to sequence the genome of the Big Dipper Firefly, Photinus pyralis, was ultimately crowdfunded through the Experiment site and our Genome Galaxy Initiative.
In its latest update, the team announced it joined forces with collaborators around the globe to add two additional genomes to the fold: the Japanese “Heike” firefly, Aquatica lateralis, and the bioluminescent click-beetle, or “cucubano”, Ignelater luminosus.
The re-named Team Bioluminescent Beetles offered a sneak peek into its results by publishing a pre-print on the bioRxiv service, “Firefly genomes illuminate the origin and evolution of bioluminescence.” The scientists have also made the genome data downloadable at http://www.fireflybase.org.
“One of most intriguing findings of our results so far is that we think that fireflies and click-beetles actually evolved their very similar bioluminescent systems independently, making beetle bioluminescence a possible new example of parallel evolution,” writes team member Tim Fallon, of the Whitehead Institute for Biomedical Research at MIT.
The data also provide insight into the evolution of other traits, including chemical defenses and the viral and microbial holobiome associated with the unique lifestyle of bioluminescent beetles.
Until now, scientists have been in the dark about the genes behind the firefly luciferase gene, which are widely used in agricultural and biomedical research. The team hopes its findings could help in the improvement of these engineered bioluminescent systems, and also aid species conservation efforts.
Interested in pitching your own genome sequencing project? We are partnering with our certified service provider GENEWIZ to offer a ‘Sequence the Tree of Life’ SMRT Grant program. Submit your proposal for a chance to win sequencing on the Sequel System by March 25.
The Mexican salamander, or the axolotl, may have tiny feet, but the feat of decoding its genetic footprint was huge—32 billion base pairs huge, making it ten times bigger than the human genome and the largest ever sequenced.
The accomplishment by an international team of scientists is significant, not only because of its sheer size, but also because of the insights it could provide into tissue regeneration.
The easily recognizable critter has an astounding ability to regenerate body parts, growing lost limbs – bones, muscles, nerves and all – within weeks. It can also repair spinal cord and retinal tissue, and is easy to breed, making it a popular biological model since 1864. But the size of its genome, and the number of repetitive sequences within it, has made full genetic analysis impossible – until now.
In a recent Nature paper, The axolotl genome and the evolution of key tissue formation regulators, lead authors Elly Tanaka of the Research Institute of Molecular Pathology in Vienna, Michael Hiller and Gene Myers, of the Max Planck Institute of Molecular Cell Biology and Genetics, and Siegfried Schloissnig of the Heidelberg Institute for Theoretical Studies, describe how they sequenced, assembled, annotated, and analyzed the complete Ambystoma mexicanum genome using PacBio long-read sequencing, optical mapping and a new genome assembler called MARVEL.
“We sequenced 110 million long reads (32× coverage, N50 read length 14.2 kb) using Pacific Biosciences instruments to avoid the read sampling bias that is often found when using other technologies and to span repeat-rich genomic regions that cause breaks in short-read assemblies,” the authors write.
The long reads allowed the team to overcome several assembly challenges, such as 18.6 Gb of repetitive sequences (65.6% of the contig assembly) and distinct long terminal repeat (LTR) retroelement classes and endogenous retroviruses with elements more than 10 kb long.
The researchers observed that LTR expansion is a major contributor to giant genome size in axolotl, which is consistent across animals and plants. They also isolated genes unique to axolotl and identified one crucial developmental gene that was missing (PAX3); its functions were taken over by another gene (PAX7).
The next step, the authors note, will be to apply methods such as chromatin immunoprecipitation with sequencing (ChIP–seq) or assay for transposase-accessible chromatin using sequencing (ATAC-seq) to investigate the genomic basis of gene regulation during regeneration.
“Together with methods such as CRISPR-mediated gene editing, viral expression methods, transplantation and transgenesis, the axolotl is a powerful system for studying questions such as the evolutionary basis of its remarkable regeneration ability,” they conclude.
The SOLVE-RD research program, a collaboration of 21 participant organizations in 10 nations, announced it has received a €15 million grant from the European Union’s Horizon 2020 initiative. SOLVE-RD aims to improve the diagnosis and treatment of rare diseases, which in total affect millions of Europeans. The program is applying novel diagnostic tools to around 19,000 cases unsolved by prior short-read exome sequencing. Prominent among the planned “multi-omics” approach is long-read genome sequencing, which will reveal the large amount of potentially disease-causing genetic variation that is not accessible with short-read DNA sequencing. SOLVE-RD plans to apply long-read genome sequencing to 500 cases.
Recent studies with PacBio long-read genome sequencing have shown that each human genome has upwards of 20,000 structural variants (differences ≥50 bp), which affect more base pairs than single nucleotide variants and small insertions and deletions together . Short-read sequencing fails to detect most of these structural variants, which often lie in repetitive regions of the genome or are larger than short reads can span . In 2017, Merker et al. reported the first use of PacBio long-read genome sequencing to identify a disease-causing structural variant in a Mendelian disease case undiagnosed by short-read genome sequencing . The study applied low (8-fold) coverage sequencing on the Sequel System to discover structural variants. By applying long-read sequencing to a larger cohort of subjects with rare diseases, the SOLVE-RD program promises to provide valuable insights into the disease classes for which this technology is most useful. At ASHG 2017, Han Brunner, a coordinator of the SOLVE-RD consortium, described initial work on this effort using the Sequel System.
 Huddleston J, et al. (2017). Genome Research, 27(5):677-685.
 Merker JD, et al. (2018). Genetics in Medicine, 20(1):159-163.
Maize is amazingly diverse. A study comparing genome segments from two inbred lines, for instance, revealed that half of the sequence and one-third of the gene content was not shared – that’s more diversity within the species than between some other species, for example humans and chimpanzees, which exhibit more than 98 percent sequence similarity.
So how can researchers and commercial breeders rely upon a single reference genome to represent the genetic diversity in their germplasms? More and more scientists are deciding they cannot.
At DuPont Pioneer, where DNA sequencing is paramount for R&D to reveal the genetic basis for traits of interest in a variety of commercial crops, an ambitious project has begun: A pan-genome reference collection based on high-speed, high throughput SMRT Sequencing and assembly.
As described in this case study, the company has developed a way to assemble high-quality reference genomes in just one month, and it has started to create them for several of their own elite breeding lines, as well as select wild strains.
Having multiple genome assemblies of the same high standard for several genotypes will be increasingly important as researchers try to achieve a greater understanding of the impacts of structural variation on plant genomes, says Research Scientist Kevin Fengler, of the Data Science and Informatics group at DuPont Pioneer.
“We want to focus on true structural variation and have confidence in the new discoveries we find in these genomes,” he adds. “Until now, focusing on one reference genome has limited our view. We are just beginning to explore what we have been missing all along.”
From commercialization to crop improvement to answering basic questions about biology, generating and analyzing multiple reference genomes has myriad benefits in a variety of lab settings.
“It has become clear that one single genome is not enough to represent the huge amount of variation in rice genomes,” writes Zhi-Kang Li of the Chinese Academy of Agricultural Sciences in this recent Nature Scientific Data paper about the assembly of an early-mature japonica rice genome.
Rod Wing of the Arizona Genomics Institute agrees. He is aiming to build high-quality reference genomes for 23 additional species of rice using SMRT Sequencing. Beyond providing highly accurate, long-read sequence data, Wing said the PacBio platform is also useful for full-length RNA sequencing and its ability to characterize the methylome. As he notes in this blog post and case study, he can take rice tissues at several developmental stages and under many different environmental conditions, isolate RNA, and do Iso-Seq analysis on those samples to enable whole plant transcriptome analysis, which could help the community map gene networks.
Other researchers are also eager to expand their set of references:
- At CROPS 2017, a three-day event focused on genomic technologies and their use in crop improvement and breeding programs, Jeremy Schmutz of the HudsonAlpha Institute for Biotechnology and the Joint Genome Institute, discussed his use of PacBio SMRT Sequencing to create several cotton genomes as well as Brachypodium, peanut, sorghum, and more. As he notes in this blog post, SMRT Sequencing has been successful even for very challenging plant genomes with highly repetitive elements, GC-rich regions, areas of high and low complexity, and of varying degrees of ploidy.
- Grapes are getting a thorough cataloging, thanks to the efforts of the Cantu Lab at the University of California, Davis. His colleague, Steve Knapp, is also leading efforts to expand the selection of reference genome assemblies for strawberries.
- There is also a pressing need to add to the germplasm of the world’s most valuable plant: Coffee. As noted in this case study, there are about 100 species in the Coffea genus, but the particular strains cultivated to produce coffee — a market valued at $90 billion — have very little genetic diversity. In her quest to address disease and climate pressures that threaten the plant, Cornell University researcher Marcela Yepes has begun to create new references for several varieties, starting with Coffea Arabica and Coffea eugenioides.
We are looking forward to hearing about additional efforts to expand the reference genome library at the ongoing Plant and Animal Genome XXVI Conference. Look out for PacBio @ PAG, and swing by booth 418 to say hi. We will also be hosting a half-day informatics conference on Wednesday, Jan. 17.
Scientists from the University at Buffalo, Nanyang Technological University, and other institutions published results from an effort to elucidate the Utricularia gibba genome using SMRT Sequencing.
U. gibba, also known as the humped bladderwort, is an aquatic carnivorous flowering plant with a remarkably small genome, especially in light of two whole genome duplication events. Genome sequencing and annotation data are reported in the PNAS publication “Long-read sequencing uncovers the adaptive topography of a carnivorous plant genome” from lead author Tianying Lan, senior author Victor Albert, and collaborators. The scientists were interested in using the plant’s genome to learn more about the post-duplication deletion process as well as traits specific to carnivorous plants.
This particular plant’s genome was previously sequenced with short-read technology, producing an 82 Mb assembly “which revealed that its genome gained and deleted gene duplicates significantly faster than those of other genomes,” the scientists note. By applying PacBio long-read technology, the team was able to significantly improve on the original assembly. The de novo genome project resulted in an assembly with a contig N50 of nearly 3.5 Mb. The total size was about 100 Mb, adding more than 18 Mb missed by the short-read assembly. Twenty-four contigs included telomeres, with four of those representing complete chromosomes. “Remarkably, base pair correction using either the PacBio data or Illumina MiSeq reads from our previous assembly led to extremely minor improvements, only 0.071% and 0.01% of total bases, respectively,” Lan et al. write.
The authors present a more complete count of protein-coding genes thanks to the improved assembly. The tally came to 30,689, a nearly 8% increase from estimates based on the short-read assembly. In addition, “unlike the far shorter scaffolds from [the prior] assembly, our largely chromosome-sized contigs permitted us to conservatively distinguish the [whole genome duplication]-derived and tandem duplicate portions of U. gibba’s genome adaptive landscape,” they write. That unique information enabled the team to discover that tandem duplication events were “enriched in metabolic functions potentially important for a carnivorous plant” — including cysteine protease genes expressed only in the plant’s trap — while syntenic duplicates were “enriched for transcription factor functions,” the scientists report. “Such small-scale, tandem duplicates are therefore revealed as essential elements in the bladderwort’s carnivorous adaptation.”
Transposable elements were another area of investigation, with many more TE-derived events found in the PacBio assembly compared to the previous one. “Serving as a good illustration of the repeat discovery power of PacBio sequencing, ∼47% of the total TE assembly space comprised [large retrotransposon derivatives], whereas these elements amounted to only ∼14.6% of TEs in the previous short-read assembly,” the authors write.
For more information, check out this New York Times article covering the project and hear Tanya Renner, paper co-author, speak about carnivorous plants at the PacBio workshop held at the upcoming Plant & Animal Genome Conference on Monday, January 15th at 12:50 PM. Reserve your seat or register for a recording of the presentation.
One of our favorite January traditions is taking part in the Precision Medicine World Conference (PMWC), a three-day Silicon Valley event focused on exploring challenges and opportunities in personalized medicine. Taking place this year January 22-24 at the Computer History Museum, the slate of more than 350 speakers in 65-plus sessions will offer a cutting-edge look at the field. The conference is co-hosted by Stanford Health Care; the University of California, San Francisco; Johns Hopkins University; the University of Michigan; Duke University; and Duke Health.
On the docket this year: PMWC will cover topics including artificial intelligence and machine learning, CRISPR, immunotherapy, microbiome studies, and more. We’re particularly interested in presentations about national sequencing efforts such as the All of Us initiative, clinical sequencing, and infectious disease monitoring. The conference is also well-known for its award program. The prestigious PMWC Luminary Award is going this year to Emmanuelle Charpentier for CRISPR/Cas9 innovations and to Elizabeth Blackburn for discovering telomeres. Meanwhile, the PMWC Pioneer Award will be given to Ronald Levy for pioneering antibodies to treat cancer, Sir John Bell for his role in precision medicine efforts in the UK, and Alan Ashworth for discoveries in breast cancer and more.
Lori Aro, our senior director of clinical genomics, will be speaking at PMWC in the Genomic Profiling Showcase on January 24th at 11:15 am. Her presentation will update attendees on how long-read PacBio sequencing can be a good fit for clinical assay development since it provides the longest average reads, highest consensus accuracy, and most uniform coverage of any sequencing technology available today.
There’s still time to register for PMWC 2018 and they’re offering a 10% discount until January 11. We hope to see you at the meeting!
In an unprecedented crowd-sourced effort stoked by social media, 72 scientists collaborated via 25 conference calls and 3,323 emails to produce a new high-quality Aedes aegypti mosquito genome.
Assembled using PacBio long-read sequencing, the resource could provide the DNA map researchers need to combat the pest and the infectious diseases it spreads, including Zika, dengue, chikungunya, and yellow fever.
Eager to share the results with the scientific community, lead author Leslie B. Vosshall, first author Benjamin Matthews, both of Rockefeller University, and colleagues at several other institutions, published a pre-print of their paper, “Improved Aedes aegypti mosquito reference genome assembly enables biological discovery and vector control” online at bioRxiv.
In it, they describe how they improved upon previous efforts which failed to produce contiguous sequences of the large (~1.3 Gb) and highly repetitive Ae. Aegypti genome. The most recent previous assembly, AaegL4, for instance, produced chromosome-length scaffolds but suffered from short contigs and more than 31,000 gaps.
Using SMRT Sequencing data, the team produced an assembly that is highly contiguous, representing a 93% decrease in the number of contigs. The PacBio contigs were scaffolded end-to-end to the three Ae. aegypti chromosomes using Hi-C technology, resulting in the new AaegL5 reference. They were able to validate local structure, predict structural variants between haplotypes, and generate a dramatically improved gene set annotation.
As co-author Jeffrey Powell, a mosquito researcher at Yale University, told the New York Times at the start of the Aedes Genome Working Group project: “If we’re going to control the creature, we need to know it frontwards and backwards.”
“Having a complete genome sequence of the beast will give us a fundamental understanding of its biology that you can’t get any other way,” he added.
The researchers have already used the new assembly to investigate several scientific questions that could not be addressed with the previous genome, a few of which include:
- The structure of the elusive sex-determining “M” locus. Population suppressing strategies such as Sterile Insect Technique and Incompatible Insect Technique require that only males are released. A strategy that connects a gene for male determination to a gene drive construct has been proposed to effectively bias the population towards males over multiple generations, the authors note.
- More complete accounting of insecticide-detoxifying glutathione-S-transferase genes. Could catalyze the search for new resistance-breaking insecticides.
- The identity of multi-genes families that encode chemosensory receptors. A doubling in the known number of chemosensory receptors provides opportunities to link odorants on human skin to mosquito attraction, a key first step in the development of novel mosquito repellents.
- The evolution of insecticide resistance and vector differences. Mapping new candidates for dengue vector competence could help devise geographically-specific strategies.
“We predict that AaegL5 will catalyze new biological insights and intervention strategies to fight the deadly arboviral vector,” the authors conclude. “The high-quality genome assembly and annotation described here will enable major advances in mosquito biology and has already allowed us to carry out a number of experiments that were previously impossible.”
In a recent publication, scientists from the University of California, Davis, and PacBio reported results from an investigation of alternative splicing associated with a repeat expansion in the gene linked to fragile X syndrome. They used SMRT Sequencing to detect full-length isoforms (Iso-Seq analysis) associated with individuals at risk of FXTAS, an adult-onset neurodegenerative disorder.
“Altered expression of the FMR1 splicing variants landscape in premutation carriers” comes from lead author Elizabeth Tseng, senior author Flora Tassone, and collaborators. Previous studies from the Tassone lab had used SMRT Sequencing to detect full-length isoforms in samples from premutation carriers (individuals with more CGG repeats than a healthy person, but not enough to cause fragile X syndrome) and had identified a number of known isoforms. In this study, the scientists aimed for a more comprehensive analysis of alternative splicing of the FMR1 gene. “Although evidence suggests a strong role for regulation of the FMR1 gene expression in clinical outcomes,” the team reports, “there have been no detailed molecular characterizations on the role of alternative splicing in the development of FMR1 premutation associated disorders.”
To tackle this challenge, they deployed SMRT Sequencing to characterize transcript isoforms of the FMR1 gene in tissue samples from three premutation carriers and three matched controls, plus blood samples from 30 premutation carriers and 15 controls. The tissue samples were collected from cerebellum, testis, muscle, and heart. Iso-Seq analysis yielded up to 28,000 full-length transcript reads from the premutation carrier tissue samples. In total, the authors identified 49 unique FMR1 isoforms, including 16 previously characterized isoforms and a number of novel ones. This study has revealed new splicing patterns and a novel 140-bp exon that were shown to have elevated expression in premutation and FXTAS samples compared to normal controls. Of the 49 FMR1 isoforms, the scientists note, “30 of them were exclusively detected in premutation carriers based on sequencing results.”
This study underscores the power of Iso-Seq analysis in comprehensively characterizing full-length transcript isoforms for a gene of interest. By eliminating the need for sequence assembly, as is required using short-read sequencing methods, Iso-Seq analysis returns each isoform sequence in its entirety in a single read, thereby enabling the discovery of novel exons, intron retention, fusion transcripts and, ultimatel, previously undetected novel isoforms.
“Our findings suggest that an abnormal alternative splicing process is present in individuals with premutation alleles,” the team concludes. “The characterization of the expression levels of the different FMR1 isoforms is fundamental for understanding the regulation of the FMR1 gene as imbalance in their expression could lead to an altered functional diversity with neurotoxic consequences.”
Nematodes are both simple and complex, making them one of the most attractive animal taxa to study basic biological processes, including genome evolution. Studies in the nematode Caenorhabditis elegans, for instance, have provided invaluable insights into almost all aspects of biology, from developmental to neurobiology and human diseases.
However, the high degree of fragmentation of current genome assemblies for many organisms complicates almost all types of genomic analysis. As the authors of a recent Cell Reports paper, Single-Molecule Sequencing Reveals the Chromosome-Scale Genomic Architecture of the Nematode Model Organism Pristionchus pacificus, point out, “general questions of chromosome evolution cannot be addressed if genome assemblies consist of thousands of contigs.”
SMRT Sequencing was able to remedy this problem. By sequencing the genome of P. pacificus with the PacBio Sequel System, Christian Roedelsperger, Ralf J. Sommer, and other colleagues from the Max Planck Institute for Developmental Biology generated an assembly that reduced the number of contigs from 12,395 to 135 and simplified their search for clues into developmental systems drift, the genetics of phenotypic plasticity, and genome evolution.
pacificus has become an increasingly important model species, used in comparison to two other free-living nematode species, C. elegans and C. briggsae, to investigate how various biological pathways and their underlying regulatory programs are modified during evolution.
Populated primarily by self-fertilizing hermaphrodites with a low frequency of males, all three species undergo frequent recombination among different genetic lineages. Their genomes range in size from 100-160 Mb, but all have five autosomes and one sex chromosome. Many of their shared features are controlled by completely different molecular programs, a phenomenon referred to as ‘‘developmental systems drift,” making them particularly useful in comparative biology.
pacificus is also one of the most promising animal models in the investigation of “phenotypic plasticity,” the property of a single genotype to form distinct phenotypes in response to different environmental influences. In P. pacificus, for instance, young nematode larvae either develop directly into adults or into non-feeding, long-lived dauer larvae, which can disperse to find more suitable environments. They also exhibit two different mouth morphs that are specialized for either bacterial or predatory feeding.
For these reasons, the Max Planck team was particularly interested in unravelling some of their genetic mysteries. They sequenced the genome of the P. pacificus reference strain (PS312) on the Sequel System to 100-fold coverage. The resulting de novo assembly enabled ordering and orientation of contigs for all six P. pacificus chromosomes. “This allowed us to robustly characterize chromosomal patterns of gene density, repeat content, nucleotide diversity, linkage disequilibrium, and macrosynteny,” the authors write.
Among their findings was the discovery of a major translocation from autosomes to the sex chromosome during the evolution of the lineage leading to C. elegans.“These findings highlight the impact of large-scale chromosomal rearrangements in nematode genome evolution and emphasize the need for high-quality genome assemblies to robustly study these events,” add the authors. “The new P. pacificus assembly will allow more rigorous genomewide analysis in all fields of genomics, and will greatly enhance the capacity to map and identify causal genes for various phenotypes.”
Following on the heels of the first nearly complete assembly of the hexaploid bread wheat genome, scientists from the University of California, Davis, the USDA Agricultural Research Service, Johns Hopkins University, and many other institutions recently published a high-quality genome assembly for one of wheat’s diploid ancestors. Both efforts incorporated SMRT Sequencing to improve contiguity of the assemblies. The new publication reveals that the ancestral plant’s genome has evolved more quickly than usual, driven largely by repeats.
The paper, “Genome sequence of the progenitor of the wheat D genome Aegilops tauschii,” comes from senior author Jan Dvořák; lead authors Ming-Cheng Luo, Yong Gu, Daniela Puiu, Hao Wang, and Sven Twardziok; and collaborators. “Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat,” the scientists note. “The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence.”
To tackle this difficult genome, the team used a number of genome analysis technologies, including SMRT Sequencing, BAC sequencing, optical mapping, and more. Scientists from Johns Hopkins contributed 35-fold PacBio coverage of the Ae. tauschii genome, which is a 4.3 Gb in size and organized into seven chromosomes.
With a high-quality assembly in hand, the team turned to exploring unique features of the ancestral wheat genome. “Compared to other sequenced plant genomes … the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences,” the scientists report. Transposable elements, including the frequent long terminal repeat retrotransposons, accounted for nearly 85% of the sequence.
“Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes,” the team writes. “We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.”
Scientists championed their cases, school children sifted through species, and thousands of members of the public from around the globe took to social media to weigh in. Now the results are in, and high-quality genome assemblies for 25 organisms integral to United Kingdom ecosystems can begin.
As mentioned last month, we teamed up with the Wellcome Trust Sanger Institute on a project to celebrate their twenty-fifth anniversary. Sanger scientists will use the Sequel System and complementary technologies to produce reference-grade assemblies for squirrels, scallops, and sharks, as well as balsam, blackberries, bats, butterflies, bees, and many others.
The final five of the 25 candidates were chosen by a public vote via the I’m a Scientist, Get Me Out of Here campaign. After five weeks of online engagement, more than130 live chats between school children and scientists, and nearly 5,000 votes, the results were announced December 8. They include:
- Common Starfish
- Fen Raft Spider
- Lesser Spotted Catshark
- Asian Hornet
- Eurasian Otter
“The project could reveal why some brown trout migrate to the open ocean, whilst others don’t, or tell us more about the magneto receptors in robins’ eyes that allow them to ‘see’ the magnetic fields of the Earth,” the Institute states in their announcement. “It could also shed light on why Red Squirrels are vulnerable to the squirrel pox virus, yet Grey Squirrels can carry and spread the virus without becoming ill.”
Once completed, the assemblies will be made publicly available for future studies to understand the biodiversity of the UK and aid the conservation and understanding of these species.
Once again, we’d like to congratulate the Sanger team on a remarkable 25 years!
The ability to study the speciation of an animal in real-time is a dream come true for evolutionary and developmental biologists. A group of Japanese researchers has gotten that opportunity, thanks in part to SMRT Sequencing.
Scientists at the University of Tokyo were the first to create a reference genome for an inbred strain of the medaka fish (Oryzias latipes), genome size ~800 Mb, in 2007. The genome assembly was created using Sanger sequencing, but contained low-quality regions and 97,933 sequence gaps. So, the team started from scratch with long-read sequencing to generate genome assemblies with far less missing sequence.
In a paper published in Nature Communications, senior authors Hiroyuki Takeda and Shinich Morishita report new assemblies generated via PacBio long-read sequencing from three geographically isolated medaka strains. These high-quality assemblies allowed them to dive deeper than ever before into the genetics of the fish, and to discover new insights into how previously difficult-to-detect centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation.
“Highly accurate long contigs have been useful in enumeration of structural variants (SVs), filling gaps such as centromeres, extending contigs to telomeres, and phasing haplotypes,” the authors write.
The team focused its attention on centromeres, which are difficult to sequence and assemble with short-read and even Sanger platforms. “Once speciation is completed, representative centromeric monomers are highly diversified among 282 species; however, centromere evolution during speciation and its relevance with speciation are unknown,” the authors note.
With this in mind, the team sequenced the genomes of three medaka inbred strains derived from different local subpopulations: HNI from northern Japan, Hd-rR from southern Japan (the strain sequenced for the original reference genome), and HSOK from east Korea.
Originally considered a single species since they can mate and produce healthy offspring under laboratory conditions, the strains have accumulated genetic mutations and phenotypic diversity over a long period of geographical separation. They are now thought to be in the middle of speciation, making them the perfect platform for analyzing this type of evolution, the authors report.
Combining PacBio data with centromere-specific DNA probes and fluorescence in situ hybridization experiments, the team reports obtaining “an unprecedented resource of centromeric repeats of length 20–345 kbp in vertebrates.”
They found that the position of centromeres tended to be preserved unless chromosomal rearrangement took place on a large scale. This happened to the medaka, which remained the same for millions of years, until fissions, fusions, and translocations shaped its genome.
The scientists further discovered that this evolution happened at a different pace among the three strains, depending on the shape and sequence of the centromeres. Centromeric monomers in acrocentric chromosomes evolved more slowly than those in non-acrocentric chromosomes, the team reports. Using AgIn software, the authors estimated methylation states of CpG sites from kinetic SMRT Sequencing information and found divergent methylation patterns, suggesting that centromeres accumulate epigenetic diversity as well as sequence diversity during speciation.
They observed that each local strain has independently experienced thousands of mid-sized (1-50 kbp) insertion events—not enough to cause reproductive isolation, but possibly enough to participate in the regulation of genes and contribute to phenotypic variations.
“These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation,” conclude the authors. “Further analysis of the mid-sized insertions associated with novel transcripts and increased transcription will provide important clues to the genomic basis for vertebrate speciation.”
Unraveling the role of the microbiome in human health and environmental samples is an emerging priority in scientific study. But despite the best advances in sequencing technology, identifying the bacteria, fungi, and other organisms present in complex samples remains a huge challenge.
Metagenomic shotgun sequencing can read chromosomes, plasmids, and bacteriophages, and comparison to reference genome sequences can be used to place them into putative taxa and species bins, but these methods fail to sufficiently distinguish between genomes that are very similar.
A team of scientists from the Icahn School of Medicine at Mount Sinai, Sema4, and other institutions has come up with a novel solution: a computational method that uses PacBio long-read sequencing of metagenomic DNA to identify methylated motifs and create an epigenetic barcode that enables more precise microbiome analysis.
The process takes advantage of methyl groups which are added to nucleotides in bacteria and archaea in a highly sequence-specific manner, and these motifs often differ among species and strains.
The team took advantage of inter-pulse duration values that represent the time it takes a DNA polymerase to translocate from one nucleotide to the next during SMRT Sequencing. This measure can distinguish between methylated and non-methylated bases. They calculated methylation scores across motifs of several bacterial samples and murine fecal samples and created methylation profiles, which were used alongside sequence composition features to assemble contigs into species- and strain-level bins.
In a paper published in Nature Biotechnology, senior author Gang Fang describes how the method was also able to link mobile genetic elements, including antibiotic resistance-encoding plasmids, to their host species in a real microbiome sample.
Although their sequence coverages and composition profiles often differ, plasmid and chromosomal DNA of the bacterial host are methylated by the same set of methyltransferases, resulting in matching methylation profiles, the authors note.
“The biomedical community has long needed a microbiome analysis method capable of resolving individual species and strains with high resolution,” Fang said in statement.
The method could ultimately prove useful in both research and clinical settings, since it allows for linking mobile genetic elements to their bacterial hosts. This information makes it possible for scientists to more accurately predict virulence and antibiotic resistance of individual bacterial species and strains, among other important traits.
In a new publication, scientists from Anthony Nolan Research Institute and the UCL Cancer Institute present an in-depth analysis of the utility of SMRT Sequencing for Human Leukocyte Antigen (HLA) typing. They assessed more than 100 cell lines and found that PacBio long-read sequencing significantly improves the accuracy of HLA typing.
“Single molecule real-time (SMRT®) DNA sequencing of HLA genes at ultra-high resolution from 126 International HLA and Immunogenetics Workshop cell lines” comes from lead author Thomas Turner, senior author Steven Marsh, and collaborators. The scientists implemented SMRT Sequencing to perform high-resolution HLA typing for 126 B-lymphoblastoid cell lines, including a group of 107 cell lines established in 1987 that is now an essential resource for the community. The goal of the present study was to increase the resolution of the reference sequences in the IMGT/ HLA database and improve standardization of HLA typing calls for these cell lines.
HLA genes — used to evaluate donor-recipient tissue match before organ transplant, as well as other immune-related traits — are among the most polymorphic in the genome. Characterizing them has been a challenge with short-read sequencing platforms, but recent efforts to perform full-length sequencing and phasing of the genes with PacBio long-read sequencing have generated impressive results. Indeed, the authors write, “Anthony Nolan’s Histocompatibility Laboratory now routinely uses SMRT sequencing for HLA typing.”
For this project, scientists carried out amplicon sequencing for full-length gene analysis of HLA class I genes and partial analysis of class II genes. In total, they sequenced 931 HLA alleles, with 96% yielding results that matched previously established HLA types for those cell lines. Of the few dozen discrepancies, Turner et al. discovered that 10 harbored novel alleles and 13 were different because of zygosity results, while many others included allele types not previously reported for those cell lines. Confirmation studies showed that these SMRT Sequencing results accurately resolved ambiguities and corrected errors in earlier HLA typing efforts. “We identified numerous discrepancies and novel intronic polymorphisms, extended several alleles to full genomic sequences, and confirmed the existence of some alleles identified by other researchers,” the team reports.
“The work presented here has further demonstrated the efficacy of SMRT sequencing to provide the highest resolution, unambiguous HLA typing data when full genes are sequenced,” the scientists conclude. “This knowledge ensures the continued usefulness of the reference cell line panel as a resource to the immunogenetics community in the age of next generation DNA sequencing.”
What can one koala tell us about an endemic that threatens the survival of its species? A great deal, it turns out.
While doing a deep dive into the genome of a wild female koala, a team of Australian scientists led by Matthew Hobbs and Andrew King of the Australian Museum Research Institute were able to unravel some of the complexity of the species-specific gammaretrovirus KoRV.
The results, published recently in Nature, paint a picture of a rapidly evolving and diversifying virus, with implications for the long-term survival of the koala, as well as our understanding of retroviral-host species interactions.
The study allowed the researchers to see interspecies transmission, multimerization of sequences in the long terminal repeats, and recombination between different retroviruses, processes which have been reported for other retroviruses but occurred millions of years ago, rather than in very recent times, as for the koala.
KoRV is a retrovirus closely related to gibbon ape leukemia virus, and is thought to be the result of an interspecies transmission. Implicated in the pathogenesis of two major koala diseases, hematopoietic neoplasia and the endemic chlamydiosis, it is considered to be a significant threat to the survival of the species.
Several KoRV subtypes have been proposed. Presumed to be the original transmitted strain, KoRV-A is endogenous and widespread in northern Australian koalas, which are thought to be 100 percent infected. KoRV-B is a more recent, more virulent subtype believed to be the result of recombination. Additional variants — KoRV-C, D, E, F, G, H and I — have also recently been identified, but it has been difficult to examine the population of KoRV and KoRV-like insertions in any koala genome.
PacBio long-read sequencing technology finally made it possible. DNA from the koala’s spleen was sequenced to give an estimated overall coverage of 57.3-fold based on a genome size of 3.5 Gb. The authors used SMRT Sequencing due to its capacity to generate sequences of up to ~70 kb that carry full-length (8.4 kb) KoRV insertions and substantial flanking koala genome sequence. This provided a considerable advantage over short reads, which could not resolve the different KoRV insertion sites or types.
“Obtaining sequence data from elements (such as retroviruses) that are repeated throughout the genome cannot be done with short read sequencing technology, which is why we used long read PacBio sequencing in our study,” the authors add.
The team reported putative somatic integrations of five distinct forms of KoRV (KoRV-A, KoRV-B, KoRV-D, KoRV-E), as well as germline evidence of KoRV-A. They also found an endogenous recombinant element (recKoRV) in which most of the KoRV protein-coding region was replaced with an ancient, endogenous retroelement.
“This diverse pool of viral variants in the same animal highlights the range of strategies being used by this retrovirus as it invades, or comes to equilibrium with its new host,” the authors add.
“As KoRV-A, B and potentially other more pathogenic KoRV types sweep through koala populations, we might expect to see worsening effects of chlamydial disease. This highlights the importance of understanding the complex mix of KoRV types present in an individual animal.”
Read the full report and learn more about the project in this video presentation from a PAG 2017 Workshop by Rebecca Johnson, a co-author on the paper and the Director of the Australian Museum Research Institute. Johnson will also be presenting her work at the Advances in Genome Biology and Technology meeting in February 2018.
In a recent paper, scientists in Germany call for a genomic database of Klebsiella pneumoniae strains to accelerate strain identification as well as drug-resistance status. To that end, they used SMRT Sequencing to generate high-quality assemblies for 16 isolates collected in German hospitals.
“Monitoring microevolution of OXA-48-producing Klebsiella pneumoniae ST147 in a hospital setting by SMRT sequencing” comes from lead authors Andreas Zautner and Boyke Bunk, senior authors Jorg Overmann and Wolfgang Bohne, and collaborators at University Medical Center and other institutes in Germany.
The urgency to characterize K. pneumoniae strains comes from the rapid rise of carbapenem-resistant Klebsiella given that drug resistance, and increasingly multidrug resistance (MDR), is a major public health threat with these infections. “A continuous monitoring of [strain type] distribution and its association with resistance and virulence genes is essential for early detection of successful K. pneumoniae lineages,” the scientists report.
K. pneumoniae strains carry plasmids encoding different types of carbapenemase, which confers resistance to the carbapenem class of antibiotics. OXA-48 is currently the most common carbapenemase found in K. pneumoniae isolates in Germany, according to the authors; similar strains are commonly found in North Africa, the Middle East, and European countries along the Mediterranean. The team chose to focus on OXA-48 strains, selecting 16 isolates collected in 2013 and 2014 for whole genome SMRT Sequencing.
The technology choice was no accident. “A comprehensive K. pneumoniae database of closed genomes is necessary for a complete understanding of the genome plasticity of these organisms and can significantly improve the tracking of MDR isolates,” the scientists write. With SMRT Sequencing, they were able to generate closed genomes. In most cases they used a single SMRT Cell per strain, and “a consensus concordance of QV60 could be confirmed for all genomes,” they report.
Based on the 16 genome assemblies, the scientists determined that half of the isolates shared the same type, ST147, and differed by no more than 25 SNPs throughout the core genome. They identified several plasmids, including a novel linear plasmid prophage of Klebsiella oxytoca. “The comparative whole-genome analysis revealed several rearrangements of mobile genetic elements and losses of chromosomal and plasmidic regions in the ST147 isolates,” they write.
“Single molecule real-time sequencing allowed monitoring of the genetic and epigenetic microevolution of MDR OXA-48-producing K. pneumoniae,” the team concludes, noting that the approach was amenable to spotting individual SNPs, as well as complex rearrangements.