Dear editor, The single-molecule real-time (SMRT) sequencing platform presented by Pacific Biosciences (PacBio) is regarded as a third-generation sequencing technology (Eid et al., 2009, Roberts et al., 2013). PacBio delivers long reads from several to tens of kilobases (kbs), which are ideal for filling unsequenced gaps due to unusual sequence contexts, such as high-GC content or repeat-rich regions (Bashir et al., 2012, Berlin et al., 2015, Chaisson et al., 2015). PacBio long reads are also favorable for detecting large DNA fragments harboring structural variations (SVs), such as inversions, translocations, duplications, and large insertions/deletions (indels) (Ritz et al., 2010, English et al., 2014). However, one drawback of PacBio is the high error rate of base calling for single pass coverage of the genome (Au et al., 2012, Koren et al., 2012). This drawback can be mitigated by increasing sequencing coverage to achieve high consensus accuracy, but the requirements may be prohibitive for the de novo assembly of large- or medium-size genomes using only PacBio when considering both budgetary and computational costs. Alternatively, PacBio may be used for assembly improvement of near-finished reference genomes, especially for filling gaps in which unsequenced bases are represented by the letter N (English et al., 2012). Here, we combined PacBio (~15x) with Illumina reads (~40x) to improve the genome assemblies of African wild (Oryza barthii) and cultivated rice (O. glaberrima), and to infer large SVs between O. barthii and O. glaberrima.
Journal: Molecular plant