Menu
September 22, 2019

A graph-based approach to diploid genome assembly.

Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community.We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants.https://github.com/whatshap/whatshap.Supplementary data are available at Bioinformatics online.


September 22, 2019

Whole genome and transcriptome maps of the entirely black native Korean chicken breed Yeonsan Ogye.

Yeonsan Ogye (YO), an indigenous Korean chicken breed (Gallus gallus domesticus), has entirely black external features and internal organs. In this study, the draft genome of YO was assembled using a hybrid de novo assembly method that takes advantage of high-depth Illumina short reads (376.6X) and low-depth Pacific Biosciences (PacBio) long reads (9.7X).The contig and scaffold NG50s of the hybrid de novo assembly were 362.3 Kbp and 16.8 Mbp, respectively. The completeness (97.6%) of the draft genome (Ogye_1.1) was evaluated with single-copy orthologous genes using Benchmarking Universal Single-Copy Orthologs and found to be comparable to the current chicken reference genome (galGal5; 97.4%; contigs were assembled with high-depth PacBio long reads (50X) and scaffolded with short reads) and superior to other avian genomes (92%-93%; assembled with short read-only or hybrid methods). Compared to galGal4 and galGal5, the draft genome included 551 structural variations including the fibromelanosis (FM) locus duplication, related to hyperpigmentation. To comprehensively reconstruct transcriptome maps, RNA sequencing and reduced representation bisulfite sequencing data were analyzed from 20 tissues, including 4 black tissues (skin, shank, comb, and fascia). The maps included 15,766 protein-coding and 6,900 long noncoding RNA genes, many of which were tissue-specifically expressed and displayed tissue-specific DNA methylation patterns in the promoter regions.We expect that the resulting genome sequence and transcriptome maps will be valuable resources for studying domestic chicken breeds, including black-skinned chickens, as well as for understanding genomic differences between breeds and the evolution of hyperpigmented chickens and functional elements related to hyperpigmentation.


September 22, 2019

Evidence of non-tandemly repeated rDNAs and their intragenomic heterogeneity in Rhizophagus irregularis

Arbuscular mycorrhizal fungus (AMF) species are some of the most widespread symbionts of land plants. Our much improved reference genome assembly of a model AMF, Rhizophagus irregularis DAOM-181602 (total contigs?=?210), facilitated a discovery of repetitive elements with unusual characteristics. R. irregularis has only ten or 11 copies of complete 45S rDNAs, whereas the general eukaryotic genome has tens to thousands of rDNA copies. R. irregularis rDNAs are highly heterogeneous and lack a tandem repeat structure. These findings provide evidence for the hypothesis that rDNA heterogeneity depends on the lack of tandem repeat structures. RNA-Seq analysis confirmed that all rDNA variants are actively transcribed. Observed rDNA/rRNA polymorphisms may modulate translation by using different ribosomes depending on biotic and abiotic interactions. The non-tandem repeat structure and intragenomic heterogeneity of AMF rDNA/rRNA may facilitate successful adaptation to various environmental conditions, increasing host compatibility of these symbiotic fungi.


September 22, 2019

PBHoover and CigarRoller: a method for confident haploid variant calling on Pacific Biosciences data and its application to heterogeneous population analysis

Motivation: Single Molecule Real-Time (SMRT) sequencing has important and underutilized advantages that amplification-based platforms lack. Lack of systematic error (e.g. GC-bias), complete de novo assembly (including large repetitive regions) without scaffolding, can be mentioned. SMRT sequencing, however suffers from high random error rate and low sequencing depth (older chemistries). Here, we introduce PBHoover, software that uses a heuristic calling algorithm in order to make base calls with high certainty in low coverage regions. This software is also capable of mixed population detection with high sensitivity. PBHoovertextquoterights CigarRoller attachment improves sequencing depth in low-coverage regions through CIGAR-string correction. Results: We tested both modules on 348 M.tuberculosis clinical isolates sequenced on C1 or C2 chemistries. On average, CigarRoller improved percentage of usable read count from 68.9% to 99.98% in C1 runs and from 50% to 99% in C2 runs. Using the greater depth provided by CigarRoller, PBHoover was able to make base and variant calls 99.95% concordant with Sanger calls (QV33). PBHoover also detected antibiotic-resistant subpopulations that went undetected by Sanger. Using C1 chemistry, subpopulations as small as 9% of the total colony can be detected by PBHoover. This provides the most sensitive amplification-free molecular method for heterogeneity analysis and is in line with phenotypic methodstextquoteright sensitivity. This sensitivity significantly improves with the greater depth and lower error rate of the newer chemistries. Availability and Implementation: Executables are freely available under GNU GPL v3+ at http://www.gitlab.com/LPCDRP/pbhoover and http://www.gitlab.com/LPCDRP/CigarRoller. PBHoover is also available on bioconda: https://anaconda.org/bioconda/pbhoover.


September 22, 2019

Comparative genomics of Escherichia coli sequence type 219 clones from the same patient: Evolution of the IncI1 blaCMY-carrying plasmid in vivo.

This study investigates the evolution of an Escherichia coli sequence type 219 clone in a patient with recurrent urinary tract infection, comparing isolate EC974 obtained prior to antibiotic treatment and isolate EC1515 recovered after exposure to several ß-lactam antibiotics (ceftriaxone, cefixime, and imipenem). EC974 had a smooth colony morphology, while EC1515 had a rough colony morphology on sheep blood agar. RAPD-PCR analysis suggested that both isolates belonged to the same clone. Antimicrobial susceptibility tests showed that EC1515 was more resistant to piperacillin/tazobactam, cefepime, cefpirome, and ertapenem than EC974. Comparative genomic analysis was used to investigate the genetic changes of EC974 and EC1515 within the host, and showed three plasmids with replicons IncI1, P0111, and IncFII in both isolates. P0111-type plasmids pEC974-2 and pEC1515-2, contained the antibiotic resistance genes aadA2, tetA, and drfA12. IncFII-type plasmids pEC974-3 and pEC1515-3 contained the antibiotic resistance genes blaTEM-1, aadA1, aadA22, sul3, and inuF. Interestingly, blaCMY-111 and blaCMY-4 were found in very similar IncI1 plasmids that also contained aadA22 and aac(3)-IId, from isolates EC974 (pEC974-1) and EC1515 (pEC1515-1), respectively. The results showed in vivo amino acid substitutions converting blaCMY-111 to blaCMY-4 (R221W and A238V substitutions). Conjugation experiments showed a high frequency of IncI1 and IncFII plasmid co-transference. Transconjugants and DH5a cells harboring blaCMY-4 or blaCMY-111 showed higher levels of resistance to ampicillin, amoxicillin, cefazolin, cefuroxime, cefotaxime, cefixime, and ceftazidime, but not piperacillin/tazobactam, cefpime, or ertapenem. All known genes (outer membrane proteins and extended-spectrum AmpC ß-lactamases) involved in ETP resistance in E. coli were identical between EC974 and EC1515. This is the first study to identify the evolution of an IncI1 plasmid within the host, and to characterize blaCMY-111 in E. coli.


September 22, 2019

A chromosome scale assembly of the model desiccation tolerant grass Oropetium thomaeum

Oropetium thomaeum is an emerging model for desiccation tolerance and genome size evolution in grasses. A high-quality draft genome of Oropetium was recently sequenced, but the lack of a chromosome scale assembly has hindered comparative analyses and downstream functional genomics. Here, we reassembled Oropetium, and anchored the genome into ten chromosomes using Hi-C based chromatin interactions. A combination of high-resolution RNAseq data and homology-based gene prediction identified thousands of new, conserved gene models that were absent from the V1 assembly. This includes thousands of new genes with high expression across a desiccation timecourse. The sorghum and Oropetium genomes have a surprising degree of chromosome-level collinearity, and several chromosome pairs have near perfect synteny. Other chromosomes are collinear in the gene rich chromosome arms but have experienced pericentric translocations. Together, these resources will be useful for the grass comparative genomic community and further establish Oropetium as a model resurrection plant.


September 22, 2019

Integrating long-range connectivity information into de Bruijn graphs.

The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data.We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes.Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex.Supplementary data are available at Bioinformatics online.


September 22, 2019

A reference genome of the Chinese hamster based on a hybrid assembly strategy.

Accurate and complete genome sequences are essential in biotechnology to facilitate genome-based cell engineering efforts. The current genome assemblies for Cricetulus griseus, the Chinese hamster, are fragmented and replete with gap sequences and misassemblies, consistent with most short-read-based assemblies. Here, we completely resequenced C. griseus using single molecule real time sequencing and merged this with Illumina-based assemblies. This generated a more contiguous and complete genome assembly than either technology alone, reducing the number of scaffolds by >28-fold, with 90% of the sequence in the 122 longest scaffolds. Most genes are now found in single scaffolds, including up- and downstream regulatory elements, enabling improved study of noncoding regions. With >95% of the gap sequence filled, important Chinese hamster ovary cell mutations have been detected in draft assembly gaps. This new assembly will be an invaluable resource for continued basic and pharmaceutical research.© 2018 The Authors. Biotechnology and Bioengineering Published by Wiley Periodicals, Inc.


September 22, 2019

Analysis of the draft genome of the red seaweed Gracilariopsis chorda provides insights into genome size evolution in Rhodophyta.

Red algae (Rhodophyta) underwent two phases of large-scale genome reduction during their early evolution. The red seaweeds did not attain genome sizes or gene inventories typical of other multicellular eukaryotes. We generated a high-quality 92.1 Mb draft genome assembly from the red seaweed Gracilariopsis chorda, including methylation and small (s)RNA data. We analyzed these and other Archaeplastida genomes to address three questions: 1) What is the role of repeats and transposable elements (TEs) in explaining Rhodophyta genome size variation, 2) what is the history of genome duplication and gene family expansion/reduction in these taxa, and 3) is there evidence for TE suppression in red algae? We find that the number of predicted genes in red algae is relatively small (4,803-13,125 genes), particularly when compared with land plants, with no evidence of polyploidization. Genome size variation is primarily explained by TE expansion with the red seaweeds having the largest genomes. Long terminal repeat elements and DNA repeats are the major contributors to genome size growth. About 8.3% of the G. chorda genome undergoes cytosine methylation among gene bodies, promoters, and TEs, and 71.5% of TEs contain methylated-DNA with 57% of these regions associated with sRNAs. These latter results suggest a role for TE-associated sRNAs in RNA-dependent DNA methylation to facilitate silencing. We postulate that the evolution of genome size in red algae is the result of the combined action of TE spread and the concomitant emergence of its epigenetic suppression, together with other important factors such as changes in population size.


September 22, 2019

A synthetic-diploid benchmark for accurate variant-calling evaluation.

Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.


September 22, 2019

Whole-genome resequencing and pan-transcriptome reconstruction highlight the impact of genomic structural Variation on secondary metabolite gene clusters in the grapevine Esca pathogen Phaeoacremonium minimum.

The Ascomycete fungus Phaeoacremonium minimum is one of the primary causal agents of Esca, a widespread and damaging grapevine trunk disease. Variation in virulence among Pm. minimum isolates has been reported, but the underlying genetic basis of the phenotypic variability remains unknown. The goal of this study was to characterize intraspecific genetic diversity and explore its potential impact on virulence functions associated with secondary metabolism, cellular transport, and cell wall decomposition. We generated a chromosome-scale genome assembly, using single molecule real-time sequencing, and resequenced the genomes and transcriptomes of multiple isolates to identify sequence and structural polymorphisms. Numerous insertion and deletion events were found for a total of about 1 Mbp in each isolate. Structural variation in this extremely gene dense genome frequently caused presence/absence polymorphisms of multiple adjacent genes, mostly belonging to biosynthetic clusters associated with secondary metabolism. Because of the observed intraspecific diversity in gene content due to structural variation we concluded that a transcriptome reference developed from a single isolate is insufficient to represent the virulence factor repertoire of the species. We therefore compiled a pan-transcriptome reference of Pm. minimum comprising a non-redundant set of 15,245 protein-coding sequences. Using naturally infected field samples expressing Esca symptoms, we demonstrated that mapping of meta-transcriptomics data on a multi-species reference that included the Pm. minimum pan-transcriptome allows the profiling of an expanded set of virulence factors, including variable genes associated with secondary metabolism and cellular transport.


September 22, 2019

In vivo evolution of drug-resistant Mycobacterium tuberculosis in patients during long-term treatment.

In the current scenario, the drug-resistant tuberculosis is a significant challenge in the control of tuberculosis worldwide. In order to investigate the in vivo evolution of drug-resistant M. tuberculosis, the present study envisaged sequencing of the draft genomes of 18 serial isolates from four pre-extensively drug-resistant (pre-XDR) tuberculosis patients for continuous genetic alterations.All of the isolates harbored single nucleotide polymorphisms (SNPs) ranging from 1303 to 1309 with M. tuberculosis H37Rv as the reference. SNPs ranged from 0 to 12 within patients. The evolution rates were higher than the reported SNPs of 0.5 in the four patients. All the isolates exhibited mutations at sites of known drug targets, while some contained mutations in uncertain drug targets including folC, proZ, and pyrG. The compensatory substitutions for rescuing these deleterious mutations during evolution were only found in RpoC I491T in one patient. Many loci with microheterogeneity showed transient mutations in different isolates. Ninety three SNPs exhibited significant association with refractory pre-XDR TB isolates.Our results showed evolutionary changes in the serial genetic characteristics of the pre-XDR TB patients due to accumulation of the fixed drug-resistant related mutations, and the transient mutations under continuous antibiotics pressure over several years.


September 22, 2019

Temperature responses of mutation rate and mutational spectrum in an Escherichia coli strain and the correlation with metabolic rate.

Temperature is a major determinant of spontaneous mutation, but the precise mode, and the underlying mechanisms, of the temperature influences remain less clear. Here we used a mutation accumulation approach combined with whole-genome sequencing to investigate the temperature dependence of spontaneous mutation in an Escherichia coli strain. Experiments were performed under aerobic conditions at 25, 28 and 37 °C, three temperatures that were non-stressful for the bacterium but caused significantly different bacterial growth rates.Mutation rate did not differ between 25 and 28 °C, but was higher at 37 °C. Detailed analyses of the molecular spectrum of mutations were performed; and a particularly interesting finding is that higher temperature led to a bias of mutation to coding, relative to noncoding, DNA. Furthermore, the temperature response of mutation rate was extremely similar to that of metabolic rate, consistent with an idea that metabolic rate predicts mutation rate.Temperature affects mutation rate and the types of mutation supply, both being crucial for the opportunity of natural selection. Our results help understand how temperature drives evolutionary speed of organisms and thus the global patterns of biodiversity. This study also lend support to the metabolic theory of ecology for linking metabolic rate and molecular evolution rate.


September 22, 2019

Opposite polarity monospore genome de novo sequencing and comparative analysis reveal the possible heterothallic life cycle of Morchella importuna.

Morchella is a popular edible fungus worldwide due to its rich nutrition and unique flavor. Many research efforts were made on the domestication and cultivation of Morchella all over the world. In recent years, the cultivation of Morchella was successfully commercialized in China. However, the biology is not well understood, which restricts the further development of the morel fungus cultivation industry. In this paper, we performed de novo sequencing and assembly of the genomes of two monospores with a different mating type (M04M24 and M04M26) isolated from the commercially cultivated strain M04. Gene annotation and comparative genome analysis were performed to study differences in CAZyme (Carbohydrate-active enzyme) enzyme content, transcription factors, duplicated sequences, structure of mating type sites, and differences at the gene and functional levels between the two monospore strains of M. importuna. Results showed that the de novo assembled haploid M04M24 and M04M26 genomes were 48.98 and 51.07 Mb, respectively. A complete fine physical map of M. importuna was obtained from genome coverage and gene completeness evaluation. A total of 10,852 and 10,902 common genes and 667 and 868 endemic genes were identified from the two monospore strains, respectively. The Gene Ontology (GO) and KAAS (KEGG Automatic Annotation Serve) enrichment analyses showed that the endemic genes performed different functions. The two monospore strains had 99.22% collinearity with each other, accompanied with certain position and rearrangement events. Analysis of complete mating-type loci revealed that the two monospore M. importuna strains contained an independent mating-type structure and remained conserved in sequence and location. The phylogenetic and divergence time of M. importuna was analyzed at the whole-genome level for the first time. The bifurcation time of morel and tuber was estimated to be 201.14 million years ago (Mya); the two monospore strains with a different mating type represented the evolution of different nuclei, and the single copy homologous genes between them were also different due to a genetic differentiation distance about 0.65 Mya. Compared with truffles, M. importuna had an extension of 28 clusters of orthologous genes (COGs) and a contraction of two COGs. The two different polar nuclei with different degrees of contraction and expansion suggested that they might have undergone different evolutionary processes. The different mating-type structures, together with the functional clustering and enrichment analysis results of the endemic genes of the two different polar nuclei, imply that M. importuna might be a heterothallic fungus and the interaction between the endemic genes may be necessary for its complete life history. Studies on the genome of M. importuna facilitate a better understanding of morel biology and evolution.


September 22, 2019

Groundnut entered post-genome sequencing era: Opportunities and challenges in translating genomic information from genome to field

Cultivated groundnut or peanut (Arachis hypogaea) is an allopolyploid crop with a large complex genome and genetic barrier for exchanging genetic diversity from its wild relatives due to ploidy differences. Optimum genetic and genomic resources are key for accelerating the process for trait mapping and gene discovery and deploying diagnostic markers in genomics-assisted breeding. The better utilization of different aspects of peanut biology such as genetics, genomics, transcriptomics, proteomics, epigenomics, metabolomics, and interactomics can be of great help to groundnut genetic improvement program across the globe. The availability of high-quality reference genome is core to all the “omics” approaches, and hence optimum genomic resources are a must for fully exploiting the potential of modern science into conventional breeding. In this context, groundnut is passing through a very critical and transformational phase by making available the required genetic and genomic resources such as reference genomes of progenitors, resequencing of diverse lines, transcriptome resources, germplasm diversity panel, and multi-parent genetic populations for conducting high-resolution trait mapping, identification of associated markers, and development of diagnostic markers for selected traits. Lastly, the available resources have been deployed in translating genomic information from genome to field by developing improved groundnut lines with enhanced resistance to root-knot nematode, rust, and late leaf spot and high oleic acid. In addition, the International Peanut Genome Initiative (IPGI) have made available the high-quality reference genome for cultivated tetraploid groundnut which will facilitate better utilization of genetic resources in groundnut improvement. In parallel, the development of high-density genotyping platforms, such as Axiom_Arachis array with 58 K SNPs, and constitution of training population will initiate the deployment of the modern breeding approach, genomic selection, for achieving higher genetic gains in less time with more precision.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.