Assembling a plant genome is challenging due to the abundance of repetitive sequences, yet no standard is available to evaluate the assembly of repeat space. LTR retrotransposons (LTR-RTs) are the predominant interspersed repeat that is poorly assembled in draft genomes. Here, we propose a reference-free genome metric called LTR Assembly Index (LAI) that evaluates assembly continuity using LTR-RTs. After correcting for LTR-RT amplification dynamics, we show that LAI is independent of genome size, genomic LTR-RT content, and gene space evaluation metrics (i.e., BUSCO and CEGMA). By comparing genomic sequences produced by various sequencing techniques, we reveal the significant gain of assembly continuity by using long-read-based techniques over short-read-based methods. Moreover, LAI can facilitate iterative assembly improvement with assembler selection and identify low-quality genomic regions. To apply LAI, intact LTR-RTs and total LTR-RTs should contribute at least 0.1% and 5% to the genome size, respectively. The LAI program is freely available on GitHub: https://github.com/oushujun/LTR_retriever.
Journal: Nucleic acids research