Menu
July 7, 2019

Read-based phasing of related individuals.

Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potential to deliver results better than each individually.We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual.https://bitbucket.org/whatshap/whatshapt.marschall@mpi-inf.mpg.de.© The Author 2016. Published by Oxford University Press.


July 7, 2019

Resolving complex structural genomic rearrangements using a randomized approach.

Complex chromosomal rearrangements are structural genomic alterations involving multiple instances of deletions, duplications, inversions, or translocations that co-occur either on the same chromosome or represent different overlapping events on homologous chromosomes. We present SVelter, an algorithm that identifies regions of the genome suspected to harbor a complex event and then resolves the structure by iteratively rearranging the local genome structure, in a randomized fashion, with each structure scored against characteristics of the observed sequencing data. SVelter is able to accurately reconstruct complex chromosomal rearrangements when compared to well-characterized genomes that have been deeply sequenced with both short and long reads.


July 7, 2019

Coevolution between Nuclear-encoded DNA replication, recombination, and repair genes and plastid genome complexity.

Disruption of DNA replication, recombination, and repair (DNA-RRR) systems has been hypothesized to cause highly elevated nucleotide substitution rates and genome rearrangements in the plastids of angiosperms, but this theory remains untested. To investigate nuclear-plastid genome (plastome) coevolution in Geraniaceae, four different measures of plastome complexity (rearrangements, repeats, nucleotide insertions/deletions, and substitution rates) were evaluated along with substitution rates of 12 nuclear-encoded, plastid-targeted DNA-RRR genes from 27 Geraniales species. Significant correlations were detected for nonsynonymous (dN) but not synonymous (dS) substitution rates for three DNA-RRR genes (uvrB/C, why1, and gyrA) supporting a role for these genes in accelerated plastid genome evolution in Geraniaceae. Furthermore, correlation between dN of uvrB/C and plastome complexity suggests the presence of nucleotide excision repair system in plastids. Significant correlations were also detected between plastome complexity and 13 of the 90 nuclear-encoded organelle-targeted genes investigated. Comparisons revealed significant acceleration of dN in plastid-targeted genes of Geraniales relative to Brassicales suggesting this correlation may be an artifact of elevated rates in this gene set in Geraniaceae. Correlation between dN of plastid-targeted DNA-RRR genes and plastome complexity supports the hypothesis that the aberrant patterns in angiosperm plastome evolution could be caused by dysfunction in DNA-RRR systems.© The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

Large-scale mitogenomics enables insights into Schizophora (Diptera) radiation and population diversity.

True flies are insects of the order Diptera and encompass one of the most diverse groups of animals on Earth. Within dipterans, Schizophora represents a recent radiation of insects that was used as a model to develop a pipeline for generating complete mitogenomes using various sequencing platforms and strategies. 91 mitogenomes from 32 different species were sequenced and assembled with high fidelity, using amplicon, whole genome shotgun or single molecule sequencing approaches. Based on the novel mitogenomes, we estimate the origin of Schizophora within the Cretaceous-Paleogene (K-Pg) boundary, about 68.3?Ma. Detailed analyses of the blowfly family (Calliphoridae) place its origin at 22?Ma, concomitant with the radiation of grazing mammals. The emergence of ectoparasitism within calliphorids was dated 6.95?Ma for the screwworm fly and 2.3?Ma for the Australian sheep blowfly. Varying population histories were observed for the blowfly Chrysomya megacephala and the housefly Musca domestica samples in our dataset. Whereas blowflies (n?=?50) appear to have undergone selective sweeps and/or severe bottlenecks in the New World, houseflies (n?=?14) display variation among populations from different zoogeographical zones and low levels of gene flow. The reported high-throughput mitogenomics approach for insects enables new insights into schizophoran diversity and population history of flies.


July 7, 2019

Indica rice genome assembly, annotation and mining of blast disease resistance genes.

Rice is a major staple food crop in the world. Over 80 % of rice cultivation area is under indica rice. Currently, genomic resources are lacking for indica as compared to japonica rice. In this study, we generated deep-sequencing data (Illumina and Pacific Biosciences sequencing) for one of the indica rice cultivars, HR-12 from India.We assembled over 86 % (389 Mb) of rice genome and annotated 56,284 protein-coding genes from HR-12 genome using Illumina and PacBio sequencing. Comprehensive comparative analyses between indica and japonica subspecies genomes revealed a large number of indica specific variants including SSRs, SNPs and InDels. To mine disease resistance genes, we sequenced few indica rice cultivars that are reported to be highly resistant (Tetep and Tadukan) and susceptible (HR-12 and Co-39) against blast fungal isolates in many countries including India. Whole genome sequencing of rice genotypes revealed high rate of mutations in defense related genes (NB-ARC, LRR and PK domains) in resistant cultivars as compared to susceptible. This study has identified R-genes Pi-ta and Pi54 from durable indica resistant cultivars; Tetep and Tadukan, which can be used in marker assisted selection in rice breeding program.This is the first report of whole genome sequencing approach to characterize Indian rice germplasm. The genomic resources from our work will have a greater impact in understanding global rice diversity, genetics and molecular breeding.


July 7, 2019

BAC-pool sequencing and assembly of 19 Mb of the complex sugarcane genome.

Sequencing plant genomes are often challenging because of their complex architecture and high content of repetitive sequences. Sugarcane has one of the most complex genomes. It is highly polyploid, preserves intact homeologous chromosomes from its parental species and contains >55% repetitive sequences. Although bacterial artificial chromosome (BAC) libraries have emerged as an alternative for accessing the sugarcane genome, sequencing individual clones is laborious and expensive. Here, we present a strategy for sequencing and assembly reads produced from the DNA of pooled BAC clones. A set of 178 BAC clones, randomly sampled from the SP80-3280 sugarcane BAC library, was pooled and sequenced using the Illumina HiSeq2000 and PacBio platforms. A hybrid assembly strategy was used to generate 2,451 scaffolds comprising 19.2 MB of assembled genome sequence. Scaffolds of =20 Kb corresponded to 80% of the assembled sequences, and the full sequences of forty BACs were recovered in one or two contigs. Alignment of the BAC scaffolds with the chromosome sequences of sorghum showed a high degree of collinearity and gene order. The alignment of the BAC scaffolds to the 10 sorghum chromosomes suggests that the genome of the SP80-3280 sugarcane variety is ~19% contracted in relation to the sorghum genome. In conclusion, our data show that sequencing pools composed of high numbers of BAC clones may help to construct a reference scaffold map of the sugarcane genome.


July 7, 2019

No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini.

Tardigrades are meiofaunal ecdysozoans that are key to understanding the origins of Arthropoda. Many species of Tardigrada can survive extreme conditions through cryptobiosis. In a recent paper [Boothby TC, et al. (2015) Proc Natl Acad Sci USA 112(52):15976-15981], the authors concluded that the tardigrade Hypsibius dujardini had an unprecedented proportion (17%) of genes originating through functional horizontal gene transfer (fHGT) and speculated that fHGT was likely formative in the evolution of cryptobiosis. We independently sequenced the genome of H. dujardini As expected from whole-organism DNA sampling, our raw data contained reads from nontarget genomes. Filtering using metagenomics approaches generated a draft H. dujardini genome assembly of 135 Mb with superior assembly metrics to the previously published assembly. Additional microbial contamination likely remains. We found no support for extensive fHGT. Among 23,021 gene predictions we identified 0.2% strong candidates for fHGT from bacteria and 0.2% strong candidates for fHGT from nonmetazoan eukaryotes. Cross-comparison of assemblies showed that the overwhelming majority of HGT candidates in the Boothby et al. genome derived from contaminants. We conclude that fHGT into H. dujardini accounts for at most 1-2% of genes and that the proposal that one-sixth of tardigrade genes originate from functional HGT events is an artifact of undetected contamination.


July 7, 2019

A time- and cost-effective strategy to sequence mammalian Y Chromosomes: an application to the de novo assembly of gorilla Y.

The mammalian Y Chromosome sequence, critical for studying male fertility and dispersal, is enriched in repeats and palindromes, and thus, is the most difficult component of the genome to assemble. Previously, expensive and labor-intensive BAC-based techniques were used to sequence the Y for a handful of mammalian species. Here, we present a much faster and more affordable strategy for sequencing and assembling mammalian Y Chromosomes of sufficient quality for most comparative genomics analyses and for conservation genetics applications. The strategy combines flow sorting, short- and long-read genome and transcriptome sequencing, and droplet digital PCR with novel and existing computational methods. It can be used to reconstruct sex chromosomes in a heterogametic sex of any species. We applied our strategy to produce a draft of the gorilla Y sequence. The resulting assembly allowed us to refine gene content, evaluate copy number of ampliconic gene families, locate species-specific palindromes, examine the repetitive element content, and produce sequence alignments with human and chimpanzee Y Chromosomes. Our results inform the evolution of the hominine (human, chimpanzee, and gorilla) Y Chromosomes. Surprisingly, we found the gorilla Y Chromosome to be similar to the human Y Chromosome, but not to the chimpanzee Y Chromosome. Moreover, we have utilized the assembled gorilla Y Chromosome sequence to design genetic markers for studying the male-specific dispersal of this endangered species. © 2016 Tomaszkiewicz et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Third-generation sequencing and the future of genomics

Third-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address long-standing problems in de novo genome assembly, structural variation analysis and haplotype phasing.


July 7, 2019

Exploring structural variants in environmentally sensitive gene families.

Environmentally sensitive plant gene families like NBS-LRRs, receptor kinases, defensins and others, are known to be highly variable. However, most existing strategies for discovering and describing structural variation in complex gene families provide incomplete and imperfect results. The move to de novo genome assemblies for multiple accessions or individuals within a species is enabling more comprehensive and accurate insights about gene family variation. Earlier array-based genome hybridization and sequence-based read mapping methods were limited by their reliance on a reference genome and by misplacement of paralogous sequences. Variant discovery based on de novo genome assemblies overcome the problems arising from a reference genome and reduce sequence misplacement. As de novo genome sequencing moves to the use of longer reads, artifacts will be minimized, intact tandem gene clusters will be constructed accurately, and insights into rapid evolution will become feasible. Copyright © 2016 Elsevier Ltd. All rights reserved.


July 7, 2019

The Atlantic salmon genome provides insights into rediploidization.

The whole-genome duplication 80 million years ago of the common ancestor of salmonids (salmonid-specific fourth vertebrate whole-genome duplication, Ss4R) provides unique opportunities to learn about the evolutionary fate of a duplicated vertebrate genome in 70 extant lineages. Here we present a high-quality genome assembly for Atlantic salmon (Salmo salar), and show that large genomic reorganizations, coinciding with bursts of transposon-mediated repeat expansions, were crucial for the post-Ss4R rediploidization process. Comparisons of duplicate gene expression patterns across a wide range of tissues with orthologous genes from a pre-Ss4R outgroup unexpectedly demonstrate far more instances of neofunctionalization than subfunctionalization. Surprisingly, we find that genes that were retained as duplicates after the teleost-specific whole-genome duplication 320 million years ago were not more likely to be retained after the Ss4R, and that the duplicate retention was not influenced to a great extent by the nature of the predicted protein interactions of the gene products. Finally, we demonstrate that the Atlantic salmon assembly can serve as a reference sequence for the study of other salmonids for a range of purposes.


July 7, 2019

Single-molecule sequencing assists genome assembly improvement and structural variation inference.

Dear editor, The single-molecule real-time (SMRT) sequencing platform presented by Pacific Biosciences (PacBio) is regarded as a third-generation sequencing technology (Eid et al., 2009, Roberts et al., 2013). PacBio delivers long reads from several to tens of kilobases (kbs), which are ideal for filling unsequenced gaps due to unusual sequence contexts, such as high-GC content or repeat-rich regions (Bashir et al., 2012, Berlin et al., 2015, Chaisson et al., 2015). PacBio long reads are also favorable for detecting large DNA fragments harboring structural variations (SVs), such as inversions, translocations, duplications, and large insertions/deletions (indels) (Ritz et al., 2010, English et al., 2014). However, one drawback of PacBio is the high error rate of base calling for single pass coverage of the genome (Au et al., 2012, Koren et al., 2012). This drawback can be mitigated by increasing sequencing coverage to achieve high consensus accuracy, but the requirements may be prohibitive for the de novo assembly of large- or medium-size genomes using only PacBio when considering both budgetary and computational costs. Alternatively, PacBio may be used for assembly improvement of near-finished reference genomes, especially for filling gaps in which unsequenced bases are represented by the letter N (English et al., 2012). Here, we combined PacBio (~15x) with Illumina reads (~40x) to improve the genome assemblies of African wild (Oryza barthii) and cultivated rice (O. glaberrima), and to infer large SVs between O. barthii and O. glaberrima.


July 7, 2019

Evolutionary redesign of the Atlantic cod (Gadus morhua L.) Toll-like receptor repertoire by gene losses and expansions.

Genome sequencing of the teleost Atlantic cod demonstrated loss of the Major Histocompatibility Complex (MHC) class II, an extreme gene expansion of MHC class I and gene expansions and losses in the innate pattern recognition receptor (PRR) family of Toll-like receptors (TLR). In a comparative genomic setting, using an improved version of the genome, we characterize PRRs in Atlantic cod with emphasis on TLRs demonstrating the loss of TLR1/6, TLR2 and TLR5 and expansion of TLR7, TLR8, TLR9, TLR22 and TLR25. We find that Atlantic cod TLR expansions are strongly influenced by diversifying selection likely to increase the detectable ligand repertoire through neo- and subfunctionalization. Using RNAseq we find that Atlantic cod TLRs display likely tissue or developmental stage-specific expression patterns. In a broader perspective, a comprehensive vertebrate TLR phylogeny reveals that the Atlantic cod TLR repertoire is extreme with regards to losses and expansions compared to other teleosts. In addition we identify a substantial shift in TLR repertoires following the evolutionary transition from an aquatic vertebrate (fish) to a terrestrial (tetrapod) life style. Collectively, our findings provide new insight into the function and evolution of TLRs in Atlantic cod as well as the evolutionary history of vertebrate innate immunity.


July 7, 2019

Haemonchus contortus: genome structure, organization and comparative genomics

One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. Copyright © 2016 Elsevier Ltd. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.