Unraveling Malaria Mysteries with Long-Read Sequencing
Thursday, May 30, 2019
Malaria is a complicated killer, and efforts to develop effective vaccines have been hindered by gaps in our understanding of both the parasite that causes the infection, Plasmodium falciparum, and its transmitter, the mosquito.
Like many virulent parasites, P. falciparum has evaded close genetic scrutiny due to its complex and changing composition. Its 23 Mb haploid genome is extremely AT rich (~80%) and contains stretches of highly repetitive sequences, especially in telomeric and subtelomeric regions. To make matters more complicated, it expands its genetic diversity during mitosis via homologous recombination, leading to the acquisition of new variants of virulence-associated surface adhesion molecules.
Attempts to decode the P. falciparum genome with short reads have resulted in extremely fragmented assemblies of more than 20,000 contigs each, with N50 contig sizes less than 2Kb in length — barely long enough to contain a single intact gene, let alone to allow resolution of very homologous gene families that are often chained end-to-end across large regions. Early shotgun sequencing missed polymorphisms such as insertions and deletions, copy number variants, chromosomal rearrangements and structural variants in P. falciparum’s hypervariable and highly repetitive regions.
In 2016, PacBio collaborated with scientists from Institut Pasteur and Cold Spring Harbor to create a complete telomere-to-telomere de novo assembly in which all 14 chromosomes were resolved into single contigs. Even extremely AT-rich regions were resolved with uniform coverage and subtelomeric regions of all chromosomes were successfully assembled in a single run for the first time.
As PacBio microbiology expert Meredith Ashby explained in a recent Labroots webinar, the assembly was “game changing.”
The most challenging parts of a genome are often the most important to decode, she explained. Regions of high homology facilitate recombination events, a key mechanism for rapid genome evolution, for example. Or, in other words, “where most of the exciting things are happening” in terms of immune invasion and drug resistance, for example.
Not only has the new reference genome facilitated better analysis of these areas, as well as structural variants and large-scale changes in the genome, it has also enabled better SNP calling, Ashby said. This is important because some traits, including drug resistance, may be SNP driven.
By mapping short reads to single-molecule sequenced reference genomes, you can more confidently tell the difference between genes and pseudogenes , or between genes and new duplications, she said. And many clinical isolates are sequenced using short-read technology.
Since the new reference genome was published, an additional 15 plasmodium isolates have been assembled to near completeness. There have already been several publications about new discoveries enabled by these new assemblies, from asexual replication to locally divergent selection.
Host with the most
To fully understand malaria, however, we must also understand its host. The genome of malaria vector Anopheles coluzzii was recently assembled from a single individual using our new low input protocol.
PacBio technology has also enabled the assembly of a much improved genome of Aedes aegypti, which transmits yellow fever, zika, dengue and chikungunya.
Like Plasmodium, the A. aegypti genome is highly repetitive, and early sequencing attempts resulted in assemblies with 37,000 contigs. SMRT Sequencing reduced this to around 2,500 contigs, and increased their N50 sizes from 84 K to 11.8 Mb – much more contiguous.
The AaegL5 assembly revealed an enormous number of new genes, including a far more comprehensive catalogue of odorant, gustatory and ionotropic receptors, which could provide important information for pest control strategies based on feeding and mating. The Rockefeller University researchers also identified hotspots that were under selective pressure for insecticide resistance.
Also of interest to infectious disease researchers: findings involving serine proteases, which mediate immune responses, and metalloproteases, which are linked to mosquito–Plasmodium interactions. Half of the 404 serine and metalloproteases gene models were improved in the AaegL5 assembly, and 49 novel proteases were discovered, Ashby said. Other vector competence hotspots were also identified, such as QTLs on chromosome 2 that were linked to systemic dengue virus dissemination in midgut-infected mosquitoes.
“Malaria, yellow fever, zika, dengue and chikungunya cause millions of deaths worldwide every year,” Ashby said. “Hopefully these new references will yield new insight into all kinds of things that are important to reduce the global burden of infectious diseases.”
To learn more about the application of SMRT Sequencing technology in infectious disease research, watch the full Labroots presentation, or visit the PacBio team at American Society for Microbiology (ASM) Microbe 2019 at booth 1160.