AGBT Day 1 & 2 Highlights: Hello GRCh38 & SMRT Sequencing for Pathogen Screening
Saturday, February 15, 2014
AGBT 2014 is off to a roaring start – the opening reception was hastily moved indoors when an impressive thunderstorm joined the party. Wednesday’s kickoff plenary session offered an insightful view of the recently released human genome reference, known as GRCh38, which is available with GenBank accession GCA_000001405.15. Valerie Schneider from the National Center for Biotechnology Information gave a presentation on the latest build, highlighting improvements that range from alternate loci to modeled centromeres to error correction of individual bases. The Genome Reference Consortium resolved more than 1,000 reported issues from build 37 with the release of this new build 38. They added 175 regions of alternate loci, which brought more than 3 Mb of novel sequence into the reference. 31 of the new alternate loci were from a single region, the KIR genomic region. The new reference genotypes were contributed by Dan Geraghty, based on fosmid sequencing and assembly done, in part, using PacBio® methods. The scaffold N50 for each chromosome increased by an average of 23 Mb, a measure that indicates the reference assembly is truly improving, Schneider said.
Annotation has also come back on the new reference, and early statistics show that GRCh38 has more genes and protein-coding transcripts and fewer genes with partial coding sequences than its predecessor had.
Adding centromere sequence was a major change, since these highly repetitive regions had previously been represented by 3 Mb gaps in the genome. The challenge has been accurately assembling these regions, Schneider noted. A collaboration with scientists in Jim Kent’s lab led to a new process for assembling these regions and allowed for their inclusion in build 38. Also, use of a haploid human sequence from a hydatidiform mole in collaboration with Evan Eichler has enabled scientists to resolve regions that were previously not represented properly, she added.
We were particularly eager for Schneider’s presentation because, like the good folks at the Genome Reference Consortium, we have spent a lot of time lately thinking about ways to improve human genome assembly. Our release this week of a 54x de novo human dataset represents our attempt to add to the existing knowledge and continue to build an important resource for the community to use.
On Thursday evening, two speakers presented SMRT® Sequencing data in the genomic technologies applications session. Ulf Gyllensten, a professor in the Department of Immunology, Genetics and Pathology at Uppsala University, presented data on microbial screening in a hospital setting. His team recently started using the PacBio RS II to examine patients with unknown bacterial infections that are resistant to multiple therapies and to study infections spreading through the hospital. With SMRT Sequencing, Gyllensten is able to deliver results to clinicians in just three days, faster than other NGS methods he has tried. He told AGBT attendees that good libraries will usually yield a very complete genome, showing as an example an organism with a 70% AT genome that assembled into a single contig representing the full genome. His team is also working to use SMRT Sequencing for HPV screening tests to help prevent cervical cancer. In a study of DNA samples taken from rural villages in Laos, Gyllensten and his colleagues used PacBio sequencing to identify more than twice as many HPV types, including novel strains, than could be identified with the highest-quality genotyping tools. See video here.
Later, Jason Ladner from the Center for Genomic Sciences at the U.S. Army Medical Research Institute of Infectious Diseases reported on studies of viral populations using SMRT Sequencing. The platform’s long reads and lack of amplification are useful in studying viruses, he told attendees as he presented data on reconstructing the frequency of viral haplotypes in mixed populations. SMRT Sequencing allows him to accurately distinguish between closely related haplotypes — even ones less than 1% divergent from each other, he added. Ladner also showed data from direct sequencing of viral populations, noting that one or two SMRT Cells provide enough coverage for either resequencing or de novo assembly in this process.
We’ll be back with highlights of days 3 and 4 from AGBT soon!