Menu
July 7, 2019  |  

Genomic patterns of de novo mutation in simplex autism.

To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ~1.5 × 10(-8) SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10(-3), OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10(-3)), suggesting a path forward for genetically characterizing more complex cases of autism. Copyright © 2017 Elsevier Inc. All rights reserved.


July 7, 2019  |  

Chromosome evolution in the free-living flatworms: first evidence of intrachromosomal rearrangements in karyotype evolution of Macrostomum lignano (Platyhelminthes, Macrostomida).

The free-living flatworm Macrostomum lignano is a hidden tetraploid. Its genome was formed by a recent whole genome duplication followed by chromosome fusions. Its karyotype (2n = 8) consists of a pair of large chromosomes (MLI1), which contain regions of all other chromosomes, and three pairs of small metacentric chromosomes. Comparison of MLI1 with metacentrics was performed by painting with microdissected DNA probes and fluorescent in situ hybridization of unique DNA fragments. Regions of MLI1 homologous to small metacentrics appeared to be contiguous. Besides the loss of DNA repeat clusters (pericentromeric and telomeric repeats and the 5S rDNA cluster) from MLI1, the difference between small metacentrics MLI2 and MLI4 and regions homologous to them in MLI1 were revealed. Abnormal karyotypes found in the inbred DV1/10 subline were analyzed, and structurally rearranged chromosomes were described with the painting technique, suggesting the mechanism of their origin. The revealed chromosomal rearrangements generate additional diversity, opening the way toward massive loss of duplicated genes from a duplicated genome. Our findings suggest that the karyotype of M. lignano is in the early stage of genome diploidization after whole genome duplication, and further studies on M. lignano and closely related species can address many questions about karyotype evolution in animals.


July 7, 2019  |  

SV2: Accurate structural variation genotyping and de novo mutation detection from whole genomes.

Structural Variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease.Here we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2).Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com


July 7, 2019  |  

Copy number variation and expression analysis reveals a nonorthologous pinta gene family member involved in butterfly vision.

Vertebrate (cellular retinaldehyde-binding protein) and Drosophila (prolonged depolarization afterpotential is not apparent [PINTA]) proteins with a CRAL-TRIO domain transport retinal-based chromophores that bind to opsin proteins and are necessary for phototransduction. The CRAL-TRIO domain gene family is composed of genes that encode proteins with a common N-terminal structural domain. Although there is an expansion of this gene family in Lepidoptera, there is no lepidopteran ortholog of pinta. Further, the function of these genes in lepidopterans has not yet been established. Here, we explored the molecular evolution and expression of CRAL-TRIO domain genes in the butterfly Heliconius melpomene in order to identify a member of this gene family as a candidate chromophore transporter. We generated and searched a four tissue transcriptome and searched a reference genome for CRAL-TRIO domain genes. We expanded an insect CRAL-TRIO domain gene phylogeny to include H. melpomene and used 18 genomes from 4 subspecies to assess copy number variation. A transcriptome-wide differential expression analysis comparing four tissue types identified a CRAL-TRIO domain gene, Hme CTD31, upregulated in heads suggesting a potential role in vision for this CRAL-TRIO domain gene. RT-PCR and immunohistochemistry confirmed that Hme CTD31 and its protein product are expressed in the retina, specifically in primary and secondary pigment cells and in tracheal cells. Sequencing of eye protein extracts that fluoresce in the ultraviolet identified Hme CTD31 as a possible chromophore binding protein. Although we found several recent duplications and numerous copy number variants in CRAL-TRIO domain genes, we identified a single copy pinta paralog that likely binds the chromophore in butterflies.© The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019  |  

A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan_tro_2.1.4) by several metrics, such as increased contiguity by >750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.© The Author 2017. Published by Oxford University Press.


July 7, 2019  |  

The state of whole-genome sequencing

Over the last decade, a technological paradigm shift has slashed the cost of DNA sequencing by over five orders of magnitude. Today, the cost of sequencing a human genome is a few thousand dollars, and it continues to fall. Here, we review the most cost-effective platforms for whole-genome sequencing (WGS) as well as emerging technologies that may displace or complement these. We also discuss the practical challenges of generating and analyzing WGS data, and how WGS has unlocked new strategies for discovering genes and variants underlying both rare and common human diseases.


July 7, 2019  |  

Timing, rates and spectra of human germline mutation.

Germline mutations are a driving force behind genome evolution and genetic disease. We investigated genome-wide mutation rates and spectra in multi-sibling families. The mutation rate increased with paternal age in all families, but the number of additional mutations per year differed by more than twofold between families. Meta-analysis of 6,570 mutations showed that germline methylation influences mutation rates. In contrast to somatic mutations, we found remarkable consistency in germline mutation spectra between the sexes and at different paternal ages. In parental germ line, 3.8% of mutations were mosaic, resulting in 1.3% of mutations being shared by siblings. The number of these shared mutations varied significantly between families. Our data suggest that the mutation rate per cell division is higher during both early embryogenesis and differentiation of primordial germ cells but is reduced substantially during post-pubertal spermatogenesis. These findings have important consequences for the recurrence risks of disorders caused by de novo mutations.


July 7, 2019  |  

Resolving complex structural genomic rearrangements using a randomized approach.

Complex chromosomal rearrangements are structural genomic alterations involving multiple instances of deletions, duplications, inversions, or translocations that co-occur either on the same chromosome or represent different overlapping events on homologous chromosomes. We present SVelter, an algorithm that identifies regions of the genome suspected to harbor a complex event and then resolves the structure by iteratively rearranging the local genome structure, in a randomized fashion, with each structure scored against characteristics of the observed sequencing data. SVelter is able to accurately reconstruct complex chromosomal rearrangements when compared to well-characterized genomes that have been deeply sequenced with both short and long reads.


July 7, 2019  |  

Comparative genomics of early-diverging mushroom-forming fungi provides insights into the origins of lignocellulose decay capabilities.

Evolution of lignocellulose decomposition was one of the most ecologically important innovations in fungi. White-rot fungi in the Agaricomycetes (mushrooms and relatives) are the most effective microorganisms in degrading both cellulose and lignin components of woody plant cell walls (PCW). However, the precise evolutionary origins of lignocellulose decomposition are poorly understood, largely because certain early-diverging clades of Agaricomycetes and its sister group, the Dacrymycetes, have yet to be sampled, or have been undersampled, in comparative genomic studies. Here, we present new genome sequences of ten saprotrophic fungi, including members of the Dacrymycetes and early-diverging clades of Agaricomycetes (Cantharellales, Sebacinales, Auriculariales, and Trechisporales), which we use to refine the origins and evolutionary history of the enzymatic toolkit of lignocellulose decomposition. We reconstructed the origin of ligninolytic enzymes, focusing on class II peroxidases (AA2), as well as enzymes that attack crystalline cellulose. Despite previous reports of white rot appearing as early as the Dacrymycetes, our results suggest that white-rot fungi evolved later in the Agaricomycetes, with the first class II peroxidases reconstructed in the ancestor of the Auriculariales and residual Agaricomycetes. The exemplars of the most ancient clades of Agaricomycetes that we sampled all lack class II peroxidases, and are thus concluded to use a combination of plesiomorphic and derived PCW degrading enzymes that predate the evolution of white rot.


July 7, 2019  |  

High quality maize centromere 10 sequence reveals evidence of frequent recombination events.

The ancestral centromeres of maize contain long stretches of the tandemly arranged CentC repeat. The abundance of tandem DNA repeats and centromeric retrotransposons (CR) has presented a significant challenge to completely assembling centromeres using traditional sequencing methods. Here, we report a nearly complete assembly of the 1.85 Mb maize centromere 10 from inbred B73 using PacBio technology and BACs from the reference genome project. The error rates estimated from overlapping BAC sequences are 7 × 10(-6) and 5 × 10(-5) for mismatches and indels, respectively. The number of gaps in the region covered by the reassembly was reduced from 140 in the reference genome to three. Three expressed genes are located between 92 and 477 kb from the inferred ancestral CentC cluster, which lies within the region of highest centromeric repeat density. The improved assembly increased the count of full-length CR from 5 to 55 and revealed a 22.7 kb segmental duplication that occurred approximately 121,000 years ago. Our analysis provides evidence of frequent recombination events in the form of partial retrotransposons, deletions within retrotransposons, chimeric retrotransposons, segmental duplications including higher order CentC repeats, a deleted CentC monomer, centromere-proximal inversions, and insertion of mitochondrial sequences. Double-strand DNA break (DSB) repair is the most plausible mechanism for these events and may be the major driver of centromere repeat evolution and diversity. In many cases examined here, DSB repair appears to be mediated by microhomology, suggesting that tandem repeats may have evolved to efficiently repair frequent DSBs in centromeres.


July 7, 2019  |  

Exploring structural variants in environmentally sensitive gene families.

Environmentally sensitive plant gene families like NBS-LRRs, receptor kinases, defensins and others, are known to be highly variable. However, most existing strategies for discovering and describing structural variation in complex gene families provide incomplete and imperfect results. The move to de novo genome assemblies for multiple accessions or individuals within a species is enabling more comprehensive and accurate insights about gene family variation. Earlier array-based genome hybridization and sequence-based read mapping methods were limited by their reliance on a reference genome and by misplacement of paralogous sequences. Variant discovery based on de novo genome assemblies overcome the problems arising from a reference genome and reduce sequence misplacement. As de novo genome sequencing moves to the use of longer reads, artifacts will be minimized, intact tandem gene clusters will be constructed accurately, and insights into rapid evolution will become feasible. Copyright © 2016 Elsevier Ltd. All rights reserved.


July 7, 2019  |  

Next-generation sequencing-based detection of germline L1-mediated transductions.

While active LINE-1 (L1) elements possess the ability to mobilize flanking sequences to different genomic loci through a process termed transduction influencing genomic content and structure, an approach for detecting polymorphic germline non-reference transductions in massively-parallel sequencing data has been lacking.Here we present the computational approach TIGER (Transduction Inference in GERmline genomes), enabling the discovery of non-reference L1-mediated transductions by combining L1 discovery with detection of unique insertion sequences and detailed characterization of insertion sites. We employed TIGER to characterize polymorphic transductions in fifteen genomes from non-human primate species (chimpanzee, orangutan and rhesus macaque), as well as in a human genome. We achieved high accuracy as confirmed by PCR and two single molecule DNA sequencing techniques, and uncovered differences in relative rates of transduction between primate species.By enabling detection of polymorphic transductions, TIGER makes this form of relevant structural variation amenable for population and personal genome analysis.


July 7, 2019  |  

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10?kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads.We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9?min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools.https://github.com/lh3/minimap and https://github.com/lh3/miniasmhengli@broadinstitute.orgSupplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019  |  

Transposons passively and actively contribute to evolution of the two-speed genome of a fungal pathogen.

Genomic plasticity enables adaptation to changing environments, which is especially relevant for pathogens that engage in “arms races” with their hosts. In many pathogens, genes mediating virulence cluster in highly variable, transposon-rich, physically distinct genomic compartments. However, understanding of the evolution of these compartments, and the role of transposons therein, remains limited. Here, we show that transposons are the major driving force for adaptive genome evolution in the fungal plant pathogen Verticillium dahliae We show that highly variable lineage-specific (LS) regions evolved by genomic rearrangements that are mediated by erroneous double-strand repair, often utilizing transposons. We furthermore show that recent genetic duplications are enhanced in LS regions, against an older episode of duplication events. Finally, LS regions are enriched in active transposons, which contribute to local genome plasticity. Thus, we provide evidence for genome shaping by transposons, both in an active and passive manner, which impacts the evolution of pathogen virulence. © 2016 Faino et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019  |  

Strategies for sequence assembly of plant genomes

The field of plant genome assembly has greatly benefited from the development and widespread adoption of next-generation DNA sequencing platforms. Very high sequencing throughputs and low costs per nucleotide have considerably reduced the technical and budgetary constraints associated with early assembly projects done primarily with a traditional Sanger-based approach. Those improvements led to a sharp increase in the number of plant genomes being sequenced, including large and complex genomes of economically important crops. Although next-generation DNA sequencing has considerably improved our understanding of the overall structure and dynamics of many plant genomes, severe limitations still remain because next-generation DNA sequencing reads typically are shorter than Sanger reads. In addition, the software tools used to de novo assemble sequences are not necessarily designed to optimize the use of short reads. These cause challenges, common to many plant species with large genome sizes, high repeat contents, polyploidy and genome-wide duplications. This chapter provides an overview of historical and current methods used to sequence and assemble plant genomes, along with new solutions offered by the emergence of technologies such as single molecule sequencing and optical mapping to address the limitations of current sequence assemblies.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.