Menu
July 7, 2019

Dense and accurate whole-chromosome haplotyping of individual genomes.

The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.


July 7, 2019

Variant review with the Integrative Genomics Viewer.

Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV’s variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org Cancer Res; 77(21); e31-34. ©2017 AACR.©2017 American Association for Cancer Research.


July 7, 2019

Tools for annotation and comparison of structural variation.

The impact of structural variants (SVs) on a variety of organisms and diseases like cancer has become increasingly evident. Methods for SV detection when studying genomic differences across cells, individuals or populations are being actively developed. Currently, just a few methods are available to compare different SVs callsets, and no specialized methods are available to annotate SVs that account for the unique characteristics of these variant types. Here, we introduce SURVIVOR_ant, a tool that compares types and breakpoints for candidate SVs from different callsets and enables fast comparison of SVs to genomic features such as genes and repetitive regions, as well as to previously established SV datasets such as from the 1000 Genomes Project. As proof of concept we compared 16 SV callsets generated by different SV calling methods on a single genome, the Genome in a Bottle sample HG002 (Ashkenazi son), and annotated the SVs with gene annotations, 1000 Genomes Project SV calls, and four different types of repetitive regions. Computation time to annotate 134,528 SVs with 33,954 of annotations was 22 seconds on a laptop.


July 7, 2019

Hunting structural variants: Population by population

Until recently, most population-scale genome sequencing studies have focused on identifying single nucleotide variants (SNVs) to explore genetic differences between individuals. Like so many SNV-based genome-wide association studies, however, these efforts have had difficulty identifying causative genetic mechanisms underlying most complex functions. More and more, the genomics community has realised that structural variation is likely responsible for many of the traits and phenotypes that scientists have not been able to attribute to SNVs. This class of variants, defined as genetic differences of 50 bp or larger, accounts for most of the DNA sequence differences between any two people. Structural variants (SVs) are also already known to cause many common and rare diseases including ALS, schizophrenia, leukemia, Carney complex, and Huntington’s disease. Despite the importance of SVs, these larger variants have been understudied and underreported compared to their single-nucleotide counterparts. One reason is that they remain difficult to detect. Their length often means they cannot be fully spanned using short sequencing reads. They also often occur in highly repetitive or GC-rich regions of the genome, making them challenging targets. As such, this class of human genetic variation has remained vastly under-explored in global populations and is now ripe for discovery.


July 7, 2019

Detection of complex structural variation from paired-end sequencing data

Detecting structural variants (SVs) from sequencing data is a key problem in genome analysis, but the full diversity of SVs is not captured by most methods. We introduce the Automated Reconstruction of Complex Structural Variants (ARC-SV) method, which detects a broad class of structural variants from paired-end whole genome sequencing (WGS) data. Analysis of samples from NA12878 and HuRef suggests that complex SVs are often misclassified by traditional methods. We validated our results both experimentally and by comparison to whole genome assembly and PacBio data; ARC-SV compares favorably to existing algorithms in general and gives state-of-the-art results on complex SV detection. By expanding the range of detectable SVs compared to commonly-used algorithms, ARC-SV allows additional information to be extracted from existing WGS data.


July 7, 2019

Copy number variation probes inform diverse applications

A major contributor to inter-individual genomic variability is copy number variation (CNV). CNVs change the diploid status of the DNA, involve one or multiple genes, and may disrupt coding regions, affect regulatory elements, or change gene dosage. While some of these changes may have no phenotypic consequences, others underlie disease, explain evolutionary processes, or impact the response to medication.


July 7, 2019

Comparative and population genomic landscape of Phellinus noxius: A hypervariable fungus causing root rot in trees.

The order Hymenochaetales of white rot fungi contain some of the most aggressive wood decayers causing tree deaths around the world. Despite their ecological importance and the impact of diseases they cause, little is known about the evolution and transmission patterns of these pathogens. Here, we sequenced and undertook comparative genomic analyses of Hymenochaetales genomes using brown root rot fungus Phellinus noxius, wood-decomposing fungus Phellinus lamaensis, laminated root rot fungus Phellinus sulphurascens and trunk pathogen Porodaedalea pini. Many gene families of lignin-degrading enzymes were identified from these fungi, reflecting their ability as white rot fungi. Comparing against distant fungi highlighted the expansion of 1,3-beta-glucan synthases in P. noxius, which may account for its fast-growing attribute. We identified 13 linkage groups conserved within Agaricomycetes, suggesting the evolution of stable karyotypes. We determined that P. noxius has a bipolar heterothallic mating system, with unusual highly expanded ~60 kb A locus as a result of accumulating gene transposition. We investigated the population genomics of 60 P. noxius isolates across multiple islands of the Asia Pacific region. Whole-genome sequencing showed this multinucleate species contains abundant poly-allelic single nucleotide polymorphisms with atypical allele frequencies. Different patterns of intra-isolate polymorphism reflect mono-/heterokaryotic states which are both prevalent in nature. We have shown two genetically separated lineages with one spanning across many islands despite the geographical barriers. Both populations possess extraordinary genetic diversity and show contrasting evolutionary scenarios. These results provide a framework to further investigate the genetic basis underlying the fitness and virulence of white rot fungi.© 2017 John Wiley & Sons Ltd.


July 7, 2019

Repetitive sequences in malaria parasite proteins.

Five species of parasite cause malaria in humans with the most severe disease caused by Plasmodium falciparum. Many of the proteins encoded in the P. falciparum genome are unusually enriched in repetitive low-complexity sequences containing a limited repertoire of amino acids. These repetitive sequences expand and contract dynamically and are among the most rapidly changing sequences in the genome. The simplest repetitive sequences consist of single amino acid repeats such as poly-asparagine tracts that are found in approximately 25% of P. falciparum proteins. More complex repeats of two or more amino acids are also common in diverse parasite protein families. There is no universal explanation for the occurrence of repetitive sequences and it is possible that many confer no function to the encoded protein and no selective advantage or disadvantage to the parasite. However, there are increasing numbers of examples where repetitive sequences are important for parasite protein function. We discuss the diverse roles of low-complexity repetitive sequences throughout the parasite life cycle, from mediating protein-protein interactions to enabling the parasite to evade the host immune system.© FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Genomic patterns of de novo mutation in simplex autism.

To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ~1.5 × 10(-8) SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10(-3), OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10(-3)), suggesting a path forward for genetically characterizing more complex cases of autism. Copyright © 2017 Elsevier Inc. All rights reserved.


July 7, 2019

Contributions of Zea mays subspecies mexicana haplotypes to modern maize.

Maize was domesticated from lowland teosinte (Zea mays ssp. parviglumis), but the contribution of highland teosinte (Zea mays ssp. mexicana, hereafter mexicana) to modern maize is not clear. Here, two genomes for Mo17 (a modern maize inbred) and mexicana are assembled using a meta-assembly strategy after sequencing of 10 lines derived from a maize-teosinte cross. Comparative analyses reveal a high level of diversity between Mo17, B73, and mexicana, including three Mb-size structural rearrangements. The maize spontaneous mutation rate is estimated to be 2.17?×?10-8 ~3.87?×?10-8 per site per generation with a nonrandom distribution across the genome. A higher deleterious mutation rate is observed in the pericentromeric regions, and might be caused by differences in recombination frequency. Over 10% of the maize genome shows evidence of introgression from the mexicana genome, suggesting that mexicana contributed to maize adaptation and improvement. Our data offer a rich resource for constructing the pan-genome of Zea mays and genetic improvement of modern maize varieties.


July 7, 2019

Hidden genetic variation shapes the structure of functional elements in Drosophila.

Mutations that add, subtract, rearrange, or otherwise refashion genome structure often affect phenotypes, although the fragmented nature of most contemporary assemblies obscures them. To discover such mutations, we assembled the first new reference-quality genome of Drosophila melanogaster since its initial sequencing. By comparing this new genome to the existing D. melanogaster assembly, we created a structural variant map of unprecedented resolution and identified extensive genetic variation that has remained hidden until now. Many of these variants constitute candidates underlying phenotypic variation, including tandem duplications and a transposable element insertion that amplifies the expression of detoxification-related genes associated with nicotine resistance. The abundance of important genetic variation that still evades discovery highlights how crucial high-quality reference genomes are to deciphering phenotypes.


July 7, 2019

Scaffolding of long read assemblies using long range contact information.

Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome.We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C.we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics.Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS.


July 7, 2019

SV2: Accurate structural variation genotyping and de novo mutation detection from whole genomes.

Structural Variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease.Here we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2).Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com


July 7, 2019

Copy number variation and expression analysis reveals a nonorthologous pinta gene family member involved in butterfly vision.

Vertebrate (cellular retinaldehyde-binding protein) and Drosophila (prolonged depolarization afterpotential is not apparent [PINTA]) proteins with a CRAL-TRIO domain transport retinal-based chromophores that bind to opsin proteins and are necessary for phototransduction. The CRAL-TRIO domain gene family is composed of genes that encode proteins with a common N-terminal structural domain. Although there is an expansion of this gene family in Lepidoptera, there is no lepidopteran ortholog of pinta. Further, the function of these genes in lepidopterans has not yet been established. Here, we explored the molecular evolution and expression of CRAL-TRIO domain genes in the butterfly Heliconius melpomene in order to identify a member of this gene family as a candidate chromophore transporter. We generated and searched a four tissue transcriptome and searched a reference genome for CRAL-TRIO domain genes. We expanded an insect CRAL-TRIO domain gene phylogeny to include H. melpomene and used 18 genomes from 4 subspecies to assess copy number variation. A transcriptome-wide differential expression analysis comparing four tissue types identified a CRAL-TRIO domain gene, Hme CTD31, upregulated in heads suggesting a potential role in vision for this CRAL-TRIO domain gene. RT-PCR and immunohistochemistry confirmed that Hme CTD31 and its protein product are expressed in the retina, specifically in primary and secondary pigment cells and in tracheal cells. Sequencing of eye protein extracts that fluoresce in the ultraviolet identified Hme CTD31 as a possible chromophore binding protein. Although we found several recent duplications and numerous copy number variants in CRAL-TRIO domain genes, we identified a single copy pinta paralog that likely binds the chromophore in butterflies.© The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

Ultraaccurate genome sequencing and haplotyping of single human cells.

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.