Menu
July 7, 2019

Phylogeny of dermatophytes with genomic character evaluation of clinically distinct Trichophyton rubrum and T. áviolaceum

Trichophyton rubrum and T. violaceum are prevalent agents of human dermatophyte infections, the former being found on glabrous skin and nail, while the latter is confined to the scalp. The two species are phenotypically different but are highly similar phylogenetically. The taxonomy of dermatophytes is currently being reconsidered on the basis of molecular phylogeny. Molecular species definitions do not always coincide with existing concepts which are guided by ecological and clinical principles. In this article, we aim to bring phylogenetic and ecological data together in an attempt to develop new species concepts for anthropophilic dermatophytes. Focus is on the T. rubrum complex with analysis of rDNA ITS supplemented with LSU, TUB2, TEF3 and ribosomal protein L10 gene sequences. In order to explore genomic differences between T. rubrum and T. violaceum, one representative for both species was whole genome sequenced. Draft sequences were compared with currently available dermatophyte genomes. Potential virulence factors of adhesins and secreted proteases were predicted and compared phylogenetically. General phylogeny showed clear gaps between geophilic species of Arthroderma, but multilocus distances between species were often very small in the derived anthropophilic and zoophilic genus Trichophyton. Significant genome conservation between T. rubrum and T. violaceum was observed, with a high similarity at the nucleic acid level of 99.38 % identity. Trichophyton violaceum contains more paralogs than T. rubrum. About 30 adhesion genes were predicted among dermatophytes. Seventeen adhesins were common between T. rubrum and T. violaceum, while four were specific for the former and eight for the latter. Phylogenetic analysis of secreted proteases reveals considerable expansion and conservation among the analyzed species. Multilocus phylogeny and genome comparison of T. rubrum and T. violaceum underlined their close affinity. The possibility that they represent a single species exhibiting different phenotypes due to different localizations on the human body is discussed.


July 7, 2019

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.

The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time.Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50?=?4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50?=?14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~?10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n?=?13).ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.


July 7, 2019

Genomes and transcriptomes of duckweeds.

Duckweeds (Lemnaceae family) are the smallest flowering plants that adapt to the aquatic environment. They are regarded as the promising sustainable feedstock with the characteristics of high starch storage, fast propagation, and global distribution. The duckweed genome size varies 13-fold ranging from 150 Mb in Spirodela polyrhiza to 1,881 Mb in Wolffia arrhiza. With the development of sequencing technology and bioinformatics, five duckweed genomes from Spirodela and Lemna genera are sequenced and assembled. The genome annotations discover that they share similar protein orthologs, whereas the repeat contents could mainly explain the genome size difference. The gene families responsible for cell growth and expansion, lignin biosynthesis, and flowering are greatly contracted. However, the gene family of glutamate synthase has experienced expansion, indicating their significance in ammonia assimilation and nitrogen transport. The transcriptome is comprehensively sequenced for the genera of Spirodela, Landoltia, and Lemna, including various treatments such as abscisic acid, radiation, heavy metal, and starvation. The analysis of the underlying molecular mechanism and the regulatory network would accelerate their applications in the fields of bioenergy and phytoremediation. The comparative genomics has shown that duckweed genomes contain relatively low gene numbers and more contracted gene families, which may be in parallel with their highly reduced morphology with a simple leaf and primary roots. Still, we are waiting for the advancement of the long read sequencing technology to resolve the complex genomes and transcriptomes for unsequenced Wolffiella and Wolffia due to the large genome sizes and the similarity in their polyploidy.


July 7, 2019

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method.A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.


July 7, 2019

Genome size estimation of Chinese cultured artemisia annua L.

Almost all of antimalarial artemisinin is extracted from the traditional Chinese medicinal plant Artemisia annua L. However, under the condition of insufficient genomic in- formation and unresolved genetic backgrounds, regulatory mechanism of artemisinin biosynthetic pathway has not yet been clear. The genome size of genuine A. annua plants is an especially important and fundamental parameter, which helpful for further insight into genomic studies of ar- temisinin biosynthesis and improvement. In current study, all those genome sizes of A. annua samples collected with Barcoding identification were evaluated to be 1.38-1.49 Gb by Flow Cytometry (FCM) with Nipponbare as the bench- mark calibration standard and soybean and maize as two internal standards individually and simultaneously. The ge- nome estimation of seven A. annua strains came from five China provinces (Shandong, Hunan, Chongqing, Sichuan, and Hainan) with a low coefficient of variation (CV, = 2.96%) wasrelative accurate, 12.87% (220 Mb) less than previous reports about a foreign A. annuaspecies with a single con- trol. It facilitated the schedule of A. annua whole genome sequencing project, optimization of assembly methods and insight into its subsequent genetics and evolution.


July 7, 2019

Genomic insights into date palm origins.

With the development of next-generation sequencing technology, the amount of date palm (Phoenix dactylifera L.) genomic data has grown rapidly and yielded new insights into this species and its origins. Here, we review advances in understanding of the evolutionary history of the date palm, with a particular emphasis on what has been learned from the analysis of genomic data. We first record current genomic resources available for date palm including genome assemblies and resequencing data. We discuss new insights into its domestication and diversification history based on these improved genomic resources. We further report recent discoveries such as the existence of wild ancestral populations in remote locations of Oman and high differentiation between African and Middle Eastern populations. While genomic data are consistent with the view that domestication took place in the Gulf region, they suggest that the process was more complex involving multiple gene pools and possibly a secondary domestication. Many questions remain unanswered, especially regarding the genetic architecture of domestication and diversification. We provide a road map to future studies that will further clarify the domestication history of this iconic crop.


July 7, 2019

Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA.

The application of genomic data and bioinformatics for the identification of restricted or illegally-sourced natural products is urgently needed. The taxonomic identity and geographic provenance of raw and processed materials have implications in sustainable-use commercial practices, and relevance to the enforcement of laws that regulate or restrict illegally harvested materials, such as timber. Improvements in genomics make it possible to capture and sequence partial-to-complete genomes from challenging tissues, such as wood and wood products.In this paper, we report the success of an alignment-free genome comparison method, [Formula: see text] that differentiates different geographic sources of white oak (Quercus) species with a high level of accuracy with very small amount of genomic data. The method is robust to sequencing errors, different sequencing laboratories and sequencing platforms.This method offers an approach based on genome-scale data, rather than panels of pre-selected markers for specific taxa. The method provides a generalizable platform for the identification and sourcing of materials using a unified next generation sequencing and analysis framework.


January 23, 2017

Tutorial: Long Amplicon Analysis application

This tutorial provides an overview of the Long Amplicon Analysis (LAA) application. The LAA algorithm generates highly accurate, phased and full-length consensus sequences from long amplicons. Applications of LAA include…


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.