Menu
April 21, 2020

A multi-task convolutional deep neural network for variant calling in single molecule sequencing.

The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2?h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model.


April 21, 2020

Study of the whole genome, methylome and transcriptome of Cordyceps militaris.

The complete genome of Cordyceps militaris was sequenced using single-molecule real-time (SMRT) sequencing technology at a coverage over 300×. The genome size was 32.57?Mb, and 14 contigs ranging from 0.35 to 4.58?Mb with an N50 of 2.86?Mb were assembled, including 4 contigs with telomeric sequences on both ends and an additional 8 contigs with telomeric sequences on either the 5′ or 3′ end. A methylome database of the genome was constructed using SMRT and m4C and m6A methylated nucleotides, and many unknown modification types were identified. The major m6A methylation motif is GA and GGAG, and the major m4C methylation motif is GC or CG/GC. In the C. militaris genome DNA, there were four types of methylated nucleotides that we confirmed using high-resolution LCMS-IT-TOF. Using PacBio Iso-Seq, a total of 31,133 complete cDNA sequences were obtained in the fruiting body. The conserved domains of the nontranscribed regions of the genome include TATA boxes, which are the initial regions of genome replication. There were 406 structural variants between the HN and CM01 strains, and there were 1,114 structural variants between the HN and ATCC strains.


April 21, 2020

Biodegradation of naphthalene, BTEX, and aliphatic hydrocarbons by Paraburkholderia aromaticivorans BN5 isolated from petroleum-contaminated soil.

To isolate bacteria responsible for the biodegradation of naphthalene, BTEX (benzene, toluene, ethylbenzene, and o-, m-, and p-xylene), and aliphatic hydrocarbons in petroleum-contaminated soil, three enrichment cultures were established using soil extract as the medium supplemented with naphthalene, BTEX, or n-hexadecane. Community analyses showed that Paraburkholderia species were predominant in naphthalene and BTEX, but relatively minor in n-hexadecane. Paraburkholderia aromaticivorans BN5 was able to degrade naphthalene and all BTEX compounds, but not n-hexadecane. The genome of strain BN5 harbors genes encoding 29 monooxygenases including two alkane 1-monooxygenases and 54 dioxygenases, indicating that strain BN5 has versatile metabolic capabilities, for diverse organic compounds: the ability of strain BN5 to degrade short chain aliphatic hydrocarbons was verified experimentally. The biodegradation pathways of naphthalene and BTEX compounds were bioinformatically predicted and verified experimentally through the analysis of their metabolic intermediates. Some genomic features including the encoding of the biodegradation genes on a plasmid and the low sequence homologies of biodegradation-related genes suggest that biodegradation potentials of strain BN5 may have been acquired via horizontal gene transfers and/or gene duplication, resulting in enhanced ecological fitness by enabling strain BN5 to degrade all compounds including naphthalene, BTEX, and short aliphatic hydrocarbons in contaminated soil.


April 21, 2020

Reference genome and comparative genome analysis for the WHO reference strain for Mycobacterium bovis BCG Danish, the present tuberculosis vaccine.

Mycobacterium bovis bacillus Calmette-Guérin (M. bovis BCG) is the only vaccine available against tuberculosis (TB). In an effort to standardize the vaccine production, three substrains, i.e. BCG Danish 1331, Tokyo 172-1 and Russia BCG-1 were established as the WHO reference strains. Both for BCG Tokyo 172-1 as Russia BCG-1, reference genomes exist, not for BCG Danish. In this study, we set out to determine the completely assembled genome sequence for BCG Danish and to establish a workflow for genome characterization of engineering-derived vaccine candidate strains.By combining second (Illumina) and third (PacBio) generation sequencing in an integrated genome analysis workflow for BCG, we could construct the completely assembled genome sequence of BCG Danish 1331 (07/270) (and an engineered derivative that is studied as an improved vaccine candidate, a SapM KO), including the resolution of the analytically challenging long duplication regions. We report the presence of a DU1-like duplication in BCG Danish 1331, while this tandem duplication was previously thought to be exclusively restricted to BCG Pasteur. Furthermore, comparative genome analyses of publicly available data for BCG substrains showed the absence of a DU1 in certain BCG Pasteur substrains and the presence of a DU1-like duplication in some BCG China substrains. By integrating publicly available data, we provide an update to the genome features of the commonly used BCG strains.We demonstrate how this analysis workflow enables the resolution of genome duplications and of the genome of engineered derivatives of the BCG Danish vaccine strain. The BCG Danish WHO reference genome will serve as a reference for future engineered strains and the established workflow can be used to enhance BCG vaccine standardization.


April 21, 2020

The genome of broomcorn millet.

Broomcorn millet (Panicum miliaceum L.) is the most water-efficient cereal and one of the earliest domesticated plants. Here we report its high-quality, chromosome-scale genome assembly using a combination of short-read sequencing, single-molecule real-time sequencing, Hi-C, and a high-density genetic map. Phylogenetic analyses reveal two sets of homologous chromosomes that may have merged ~5.6 million years ago, both of which exhibit strong synteny with other grass species. Broomcorn millet contains 55,930 protein-coding genes and 339 microRNA genes. We find Paniceae-specific expansion in several subfamilies of the BTB (broad complex/tramtrack/bric-a-brac) subunit of ubiquitin E3 ligases, suggesting enhanced regulation of protein dynamics may have contributed to the evolution of broomcorn millet. In addition, we identify the coexistence of all three C4 subtypes of carbon fixation candidate genes. The genome sequence is a valuable resource for breeders and will provide the foundation for studying the exceptional stress tolerance as well as C4 biology.


April 21, 2020

Whole Genome Analysis of Lactobacillus plantarum Strains Isolated From Kimchi and Determination of Probiotic Properties to Treat Mucosal Infections by Candida albicans and Gardnerella vaginalis.

Three Lactobacillus plantarum strains ATG-K2, ATG-K6, and ATG-K8 were isolated from Kimchi, a Korean traditional fermented food, and their probiotic potentials were examined. All three strains were free of antibiotic resistance, hemolysis, and biogenic amine production and therefore assumed to be safe, as supported by whole genome analyses. These strains demonstrated several basic probiotic functions including a wide range of antibacterial activity, bile salt hydrolase activity, hydrogen peroxide production, and heat resistance at 70°C for 60 s. Further studies of antimicrobial activities against Candida albicans and Gardnerella vaginalis revealed growth inhibitory effects from culture supernatants, coaggregation effects, and killing effects of the three probiotic strains, with better efficacy toward C. albicans. In vitro treatment of bacterial lysates of the probiotic strains to the RAW264.7 murine macrophage cell line resulted in innate immunity enhancement via IL-6 and TNF-a production without lipopolysaccharide (LPS) treatment and anti-inflammatory effects via significantly increased production of IL-10 when co-treated with LPS. However, the degree of probiotic effect was different for each strain as the highest TNF-a and the lowest IL-10 production by the RAW264.7 cell were observed in the K8 lysate treated group compared to the K2 and K6 lysate treated groups, which may be related to genomic differences such as chromosome size (K2: 3,034,884 bp, K6: 3,205,672 bp, K8: 3,221,272 bp), plasmid numbers (K2: 3, K6 and K8: 1), or total gene numbers (K2: 3,114, K6: 3,178, K8: 3,186). Although more correlative inspections to connect genomic information and biological functions are needed, genomic analyses of the three strains revealed distinct genomic compositions of each strain. Also, this finding suggests genome level analysis may be required to accurately identify microorganisms. Nevertheless, L. plantarum ATG-K2, ATG-K6, and ATG-K8 demonstrated their potential as probiotics for mucosal health improvement in both microbial and immunological contexts.


April 21, 2020

Genomics-driven discovery of a biosynthetic gene cluster required for the synthesis of BII-Rafflesfungin from the fungus Phoma sp. F3723.

Phomafungin is a recently reported broad spectrum antifungal compound but its biosynthetic pathway is unknown. We combed publicly available Phoma genomes but failed to find any putative biosynthetic gene cluster that could account for its biosynthesis.Therefore, we sequenced the genome of one of our Phoma strains (F3723) previously identified as having antifungal activity in a high-throughput screen. We found a biosynthetic gene cluster that was predicted to synthesize a cyclic lipodepsipeptide that differs in the amino acid composition compared to Phomafungin. Antifungal activity guided isolation yielded a new compound, BII-Rafflesfungin, the structure of which was determined.We describe the NRPS-t1PKS cluster ‘BIIRfg’ compatible with the synthesis of the cyclic lipodepsipeptide BII-Rafflesfungin [HMHDA-L-Ala-L-Glu-L-Asn-L-Ser-L-Ser-D-Ser-D-allo-Thr-Gly]. We report new Stachelhaus codes for Ala, Glu, Asn, Ser, Thr, and Gly. We propose a mechanism for BII-Rafflesfungin biosynthesis, which involves the formation of the lipid part by BIIRfg_PKS followed by activation and transfer of the lipid chain by a predicted AMP-ligase on to the first PCP domain of the BIIRfg_NRPS gene.


April 21, 2020

Reconstruction of the full-length transcriptome atlas using PacBio Iso-Seq provides insight into the alternative splicing in Gossypium australe.

Gossypium australe F. Mueller (2n?=?2x?=?26, G2 genome) possesses valuable characteristics. For example, the delayed gland morphogenesis trait causes cottonseed protein and oil to be edible while retaining resistance to biotic stress. However, the lack of gene sequences and their alternative splicing (AS) in G. australe remain unclear, hindering to explore species-specific biological morphogenesis.Here, we report the first sequencing of the full-length transcriptome of the Australian wild cotton species, G. australe, using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) from the pooled cDNA of ten tissues to identify transcript loci and splice isoforms. We reconstructed the G. australe full-length transcriptome and identified 25,246 genes, 86 pre-miRNAs and 1468 lncRNAs. Most genes (12,832, 50.83%) exhibited two or more isoforms, suggesting a high degree of transcriptome complexity in G. australe. A total of 31,448 AS events in five major types were found among the 9944 gene loci. Among these five major types, intron retention was the most frequent, accounting for 68.85% of AS events. 29,718 polyadenylation sites were detected from 14,536 genes, 7900 of which have alternative polyadenylation sites (APA). In addition, based on our AS events annotations, RNA-Seq short reads from germinating seeds showed that differential expression of these events occurred during seed germination. Ten AS events that were randomly selected were further confirmed by RT-PCR amplification in leaf and germinating seeds.The reconstructed gene sequences and their AS in G. australe would provide information for exploring beneficial characteristics in G. australe.


April 21, 2020

Mitochondrial genome and transcriptome analysis of five alloplasmic male-sterile lines in Brassica juncea.

Alloplasmic lines, in which the nuclear genome is combined with wild cytoplasm, are often characterized by cytoplasmic male sterility (CMS), regardless of whether it was derived from sexual or somatic hybridization with wild relatives. In this study, we sequenced and analyzed the mitochondrial genomes of five such alloplasmic lines in Brassica juncea.The assembled and annotated mitochondrial genomes of the five alloplasmic lines were found to have virtually identical gene contents. They preserved most of the ancestral mitochondrial segments, and the same candidate male sterility gene (orf108) was found harbored in mitotype-specific sequences. We also detected promiscuous sequences of chloroplast origin that were conserved among plants of the Brassicaceae, and found the RNA editing profiles to vary across the five mitochondrial genomes.On the basis of our characterization of the genetic nature of five alloplasmic mitochondrial genomes, we speculated that the putative candidate male sterility gene orf108 may not be responsible for the CMS observed in Brassica oxyrrhina and Diplotaxis catholica. Furthermore, we propose the potential coincidence of CMS in alloplasmic lines. Our findings lay the foundation for further elucidation of male sterility gene.


April 21, 2020

Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data.

Our understanding of the pig transcriptome is limited. RNA transcript diversity among nine tissues was assessed using poly(A) selected single-molecule long-read isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) from a single White cross-bred pig. Across tissues, a total of 67,746 unique transcripts were observed, including 60.5% predicted protein-coding, 36.2% long non-coding RNA and 3.3% nonsense-mediated decay transcripts. On average, 90% of the splice junctions were supported by RNA-seq within tissue. A large proportion (80%) represented novel transcripts, mostly produced by known protein-coding genes (70%), while 17% corresponded to novel genes. On average, four transcripts per known gene (tpg) were identified; an increase over current EBI (1.9 tpg) and NCBI (2.9 tpg) annotations and closer to the number reported in human genome (4.2 tpg). Our new pig genome annotation extended more than 6000 known gene borders (5′ end extension, 3′ end extension, or both) compared to EBI or NCBI annotations. We validated a large proportion of these extensions by independent pig poly(A) selected 3′-RNA-seq data, or human FANTOM5 Cap Analysis of Gene Expression data. Further, we detected 10,465 novel genes (81% non-coding) not reported in current pig genome annotations. More than 80% of these novel genes had transcripts detected in >?1 tissue. In addition, more than 80% of novel intergenic genes with at least one transcript detected in liver tissue had H3K4me3 or H3K36me3 peaks mapping to their promoter and gene body, respectively, in independent liver chromatin immunoprecipitation data. These validated results show significant improvement over current pig genome annotations.


April 21, 2020

A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds.

The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor >?98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.


April 21, 2020

Long-read based assembly and synteny analysis of a reference Drosophila subobscura genome reveals signatures of structural evolution driven by inversions recombination-suppression effects.

Drosophila subobscura has long been a central model in evolutionary genetics. Presently, its use is hindered by the lack of a reference genome. To bridge this gap, here we used PacBio long-read technology, together with the available wealth of genetic marker information, to assemble and annotate a high-quality nuclear and complete mitochondrial genome for the species. With the obtained assembly, we performed the first synteny analysis of genome structure evolution in the subobscura subgroup.We generated a highly-contiguous ~?129?Mb-long nuclear genome, consisting of six pseudochromosomes corresponding to the six chromosomes of a female haploid set, and a complete 15,764?bp-long mitogenome, and provide an account of their numbers and distributions of codifying and repetitive content. All 12 identified paracentric inversion differences in the subobscura subgroup would have originated by chromosomal breakage and repair, with some associated duplications, but no evidence of direct gene disruptions by the breakpoints. Between lineages, inversion fixation rates were 10 times higher in continental D. subobscura than in the two small oceanic-island endemics D. guanche and D. madeirensis. Within D. subobscura, we found contrasting ratios of chromosomal divergence to polymorphism between the A sex chromosome and the autosomes.We present the first high-quality, long-read sequencing of a D. subobscura genome. Our findings generally support genome structure evolution in this species being driven indirectly, through the inversions’ recombination-suppression effects in maintaining sets of adaptive alleles together in the face of gene flow. The resources developed will serve to further establish the subobscura subgroup as model for comparative genomics and evolutionary indicator of global change.


April 21, 2020

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than second-generation sequencing technologies such as Illumina. The increased read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and characterize the intra-species variations. It also holds the promise to decipher the community structure in complex microbial communities because long reads help metagenomic assembly. One key step in genome assembly using long reads is to quickly identify reads forming overlaps. Because PacBio data has higher sequencing error rate and lower coverage than popular short read sequencing technologies (such as Illumina), efficient detection of true overlaps requires specially designed algorithms. In particular, there is still a need to improve the sensitivity of detecting small overlaps or overlaps with high error rates in both reads. Addressing this need will enable better assembly for metagenomic data produced by third-generation sequencing technologies.In this work, we designed and implemented an overlap detection program named GroupK, for third-generation sequencing reads based on grouped k-mer hits. While using k-mer hits for detecting reads’ overlaps has been adopted by several existing programs, our method uses a group of short k-mer hits satisfying statistically derived distance constraints to increase the sensitivity of small overlap detection. Grouped k-mer hit was originally designed for homology search. We are the first to apply group hit for long read overlap detection. The experimental results of applying our pipeline to both simulated and real third-generation sequencing data showed that GroupK enables more sensitive overlap detection, especially for datasets of low sequencing coverage.GroupK is best used for detecting small overlaps for third-generation sequencing data. It provides a useful supplementary tool to existing ones for more sensitive and accurate overlap detection. The source code is freely available at https://github.com/Strideradu/GroupK .


April 21, 2020

Valinomycin, produced by Streptomyces sp. S8, a key antifungal metabolite in large patch disease suppressiveness.

Large patch disease, caused by Rhizoctonia solani AG2-2, is the most devastating disease in Zoysiagrass (Zoysia japonica). Current large patch disease control strategies rely primarily upon the use of chemical pesticides. Streptomyces sp. S8 is known to possess exceptional antagonistic properties that could potentially suppress the large patch pathogen found at turfgrass plantations. This study aims to demonstrate the feasibility of using the strain as a biological control mechanism. Sequencing of the S8 strain genome revealed a valinomycin biosynthesis gene cluster. This cluster is composed of the vlm1 and vlm2 genes, which are known to produce antifungal compounds. In order to verify this finding for the large patch pathogen, a valinomycin biosynthesis knockout mutant was created via the CRISPR/Cas9 system. The mutant lost antifungal activity against the large patch pathogen. Consequently, it is anticipated that eco-friendly microbial preparations derived from the S8 strain can be utilized to biologically control large patch disease.


April 21, 2020

The wild sweetpotato (Ipomoea trifida) genome provides insights into storage root development.

Sweetpotato (Ipomoea batatas (L.) Lam.) is the seventh most important crop in the world and is mainly cultivated for its underground storage root (SR). The genetic studies of this species have been hindered by a lack of high-quality reference sequence due to its complex genome structure. Diploid Ipomoea trifida is the closest relative and putative progenitor of sweetpotato, which is considered a model species for sweetpotato, including genetic, cytological, and physiological analyses.Here, we generated the chromosome-scale genome sequence of SR-forming diploid I. trifida var. Y22 with high heterozygosity (2.20%). Although the chromosome-based synteny analysis revealed that the I. trifida shared conserved karyotype with Ipomoea nil after the separation, I. trifida had a much smaller genome than I. nil due to more efficient eliminations of LTR-retrotransposons and lack of species-specific amplification bursts of LTR-RTs. A comparison with four non-SR-forming species showed that the evolution of the beta-amylase gene family may be related to SR formation. We further investigated the relationship of the key gene BMY11 (with identity 47.12% to beta-amylase 1) with this important agronomic trait by both gene expression profiling and quantitative trait locus (QTL) mapping. And combining SR morphology and structure, gene expression profiling and qPCR results, we deduced that the products of the activity of BMY11 in splitting starch granules and be recycled to synthesize larger granules, contributing to starch accumulation and SR swelling. Moreover, we found the expression pattern of BMY11, sporamin proteins and the key genes involved in carbohydrate metabolism and stele lignification were similar to that of sweetpotato during the SR development.We constructed the high-quality genome reference of the highly heterozygous I. trifida through a combined approach and this genome enables a better resolution of the genomics feature and genome evolutions of this species. Sweetpotato SR development genes can be identified in I. trifida and these genes perform similar functions and patterns, showed that the diploid I. trifida var. Y22 with typical SR could be considered an ideal model for the studies of sweetpotato SR development.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.