April 21, 2020  |  

Long-read sequence and assembly of segmental duplications.

Authors: Vollger, Mitchell R and Dishuck, Philip C and Sorensen, Melanie and Welch, AnneMarie E and Dang, Vy and Dougherty, Max L and Graves-Lindsay, Tina A and Wilson, Richard K and Chaisson, Mark J P and Eichler, Evan E

We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.

Journal: Nature methods
DOI: 10.1038/s41592-018-0236-3
Year: 2019

Read publication

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.