Menu
September 21, 2019

in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies.

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in non-model organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS. Copyright © 2016 Author et al.


September 21, 2019

Recent advances in bioinformatics for fish genomics

In the past few years, we have contributed efforts to ~1/5 of the reported fish genomes. Based on our related experience, here we outline recent advances in bioinformatics for fish genomics, with an emphasis on development of software for genome assembly, genome annotation and evolutionary analysis. This review will be helpful for the new players of genome analysis on both animals and plants. In the past decade, whole genome sequences of approximately 50 fish species have been reported [1]. We have been involved in ~1/5 of these international works from 2014 to 2017, such as mudskippers (2014) [2], Chinese large yellow croaker [3], Chinese barbel fishes [4], Asian arowana [5,6], Channel catfish [7], seahorses [8], Japanese flounder [9], Chinese clearhead icefish [10] and Northern snakehead [11]. We are also in charge of the China Auqatic 10-100-1,000 Genomics Program [12], in which ~100 fish genomes are sequencing targets for the next 3~5 years. Based on our previous experience on fish genomic studies, here we outline recent advances in related bioinformatics for fish genomics to share with public readers. Since the basic informatics includes genome assembly, genome annotation and evolutionary analysis, we discuss them one by one in this order.


September 21, 2019

A distinct and genetically diverse lineage of the hybrid fungal pathogen Verticillium longisporum population causes stem striping in British oilseed rape.

Population genetic structures illustrate evolutionary trajectories of organisms adapting to differential environmental conditions. Verticillium stem striping disease on oilseed rape was mainly observed in continental Europe, but has recently emerged in the United Kingdom. The disease is caused by the hybrid fungal species Verticillium longisporum that originates from at least three separate hybridization events, yet hybrids between Verticillium progenitor species A1 and D1 are mainly responsible for Verticillium stem striping. We reveal a hitherto un-described dichotomy within V. longisporum lineage A1/D1 that correlates with the geographic distribution of the isolates with an ‘A1/D1 West’ and an ‘A1/D1 East’ cluster. Genome comparison between representatives of the A1/D1 West and East clusters excluded population distinctiveness through separate hybridization events. Remarkably, the A1/D1 West population that is genetically more diverse than the entire A1/D1 East cluster caused the sudden emergence of Verticillium stem striping in the UK, whereas in continental Europe Verticillium stem striping is predominantly caused by the more genetically uniform A1/D1 East population. The observed genetic diversity of the A1/D1 West population argues against a recent introduction of the pathogen into the UK, but rather suggests that the pathogen previously established in the UK and remained latent or unnoticed as oilseed rape pathogen until recently.© 2017 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.


September 21, 2019

PacBio assembly of a Plasmodium knowlesi genome sequence with Hi-C correction and manual annotation of the SICAvar gene family.

Plasmodium knowlesi has risen in importance as a zoonotic parasite that has been causing regular episodes of malaria throughout South East Asia. The P. knowlesi genome sequence generated in 2008 highlighted and confirmed many similarities and differences in Plasmodium species, including a global view of several multigene families, such as the large SICAvar multigene family encoding the variant antigens known as the schizont-infected cell agglutination proteins. However, repetitive DNA sequences are the bane of any genome project, and this and other Plasmodium genome projects have not been immune to the gaps, rearrangements and other pitfalls created by these genomic features. Today, long-read PacBio and chromatin conformation technologies are overcoming such obstacles. Here, based on the use of these technologies, we present a highly refined de novo P. knowlesi genome sequence of the Pk1(A+) clone. This sequence and annotation, referred to as the ‘MaHPIC Pk genome sequence’, includes manual annotation of the SICAvar gene family with 136 full-length members categorized as type I or II. This sequence provides a framework that will permit a better understanding of the SICAvar repertoire, selective pressures acting on this gene family and mechanisms of antigenic variation in this species and other pathogens.


September 21, 2019

Assessing genome assembly quality using the LTR Assembly Index (LAI).

Assembling a plant genome is challenging due to the abundance of repetitive sequences, yet no standard is available to evaluate the assembly of repeat space. LTR retrotransposons (LTR-RTs) are the predominant interspersed repeat that is poorly assembled in draft genomes. Here, we propose a reference-free genome metric called LTR Assembly Index (LAI) that evaluates assembly continuity using LTR-RTs. After correcting for LTR-RT amplification dynamics, we show that LAI is independent of genome size, genomic LTR-RT content, and gene space evaluation metrics (i.e., BUSCO and CEGMA). By comparing genomic sequences produced by various sequencing techniques, we reveal the significant gain of assembly continuity by using long-read-based techniques over short-read-based methods. Moreover, LAI can facilitate iterative assembly improvement with assembler selection and identify low-quality genomic regions. To apply LAI, intact LTR-RTs and total LTR-RTs should contribute at least 0.1% and 5% to the genome size, respectively. The LAI program is freely available on GitHub: https://github.com/oushujun/LTR_retriever.


September 21, 2019

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.


September 21, 2019

Population sequencing reveals clonal diversity and ancestral inbreeding in the grapevine cultivar Chardonnay.

Chardonnay is the basis of some of the world’s most iconic wines and its success is underpinned by a historic program of clonal selection. There are numerous clones of Chardonnay available that exhibit differences in key viticultural and oenological traits that have arisen from the accumulation of somatic mutations during centuries of asexual propagation. However, the genetic variation that underlies these differences remains largely unknown. To address this knowledge gap, a high-quality, diploid-phased Chardonnay genome assembly was produced from single-molecule real time sequencing, and combined with re-sequencing data from 15 different Chardonnay clones. There were 1620 markers identified that distinguish the 15 clones. These markers were reliably used for clonal identification of independently sourced genomic material, as well as in identifying a potential genetic basis for some clonal phenotypic differences. The predicted parentage of the Chardonnay haplomes was elucidated by mapping sequence data from the predicted parents of Chardonnay (Gouais blanc and Pinot noir) against the Chardonnay reference genome. This enabled the detection of instances of heterosis, with differentially-expanded gene families being inherited from the parents of Chardonnay. Most surprisingly however, the patterns of nucleotide variation present in the Chardonnay genome indicate that Pinot noir and Gouais blanc share an extremely high degree of kinship that has resulted in the Chardonnay genome displaying characteristics that are indicative of inbreeding.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.