Menu
April 21, 2020

Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies.

More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesized to be at the origin of many missing sequences in these genomes. We investigated features of these missing sequences to discover why some may not have been integrated into genomic libraries and/or sequenced.The sequences of five red jungle fowl cDNA models with high GC content were used as queries to search publicly available datasets of Illumina and Pacbio sequencing reads. These were used to reconstruct the leptin, TNFa, MRPL52, PCP2 and PET100 genes, all of which are absent from the red jungle fowl genome model. These gene sequences displayed elevated GC contents, had intron sizes that were sometimes larger than non-avian orthologues, and had non-coding regions that contained numerous tandem and inverted repeat sequences with motifs able to assemble into stable G-quadruplexes and intrastrand dyadic structures. Our results suggest that Illumina technology was unable to sequence the non-coding regions of these genes. On the other hand, PacBio technology was able to sequence these regions, but with dramatically lower efficiency than would typically be expected.High GC content was not the principal reason why numerous GC-rich regions of avian genomes are missing from genome assembly models. Instead, it is the presence of tandem repeats containing motifs capable of assembling into very stable secondary structures that is likely responsible.


April 21, 2020

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, with a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed and many new generally shorter transcripts were detected by normalization. For the same input cDNA and the same data yield, the normalized library recovered more total transcript isoforms, number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ~1.25 kb, more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ~52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ~80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.


April 21, 2020

Complete genome sequence analysis of the thermoacidophilic verrucomicrobial methanotroph “Candidatus Methylacidiphilum kamchatkense” strain Kam1 and comparison with its closest relatives.

The candidate genus “Methylacidiphilum” comprises thermoacidophilic aerobic methane oxidizers belonging to the Verrucomicrobia phylum. These are the first described non-proteobacterial aerobic methane oxidizers. The genes pmoCAB, encoding the particulate methane monooxygenase do not originate from horizontal gene transfer from proteobacteria. Instead, the “Ca. Methylacidiphilum” and the sister genus “Ca. Methylacidimicrobium” represent a novel and hitherto understudied evolutionary lineage of aerobic methane oxidizers. Obtaining and comparing the full genome sequences is an important step towards understanding the evolution and physiology of this novel group of organisms.Here we present the closed genome of “Ca. Methylacidiphilum kamchatkense” strain Kam1 and a comparison with the genomes of its two closest relatives “Ca. Methylacidiphilum fumariolicum” strain SolV and “Ca. Methylacidiphilum infernorum” strain V4. The genome consists of a single 2,2 Mbp chromosome with 2119 predicted protein coding sequences. Genome analysis showed that the majority of the genes connected with metabolic traits described for one member of “Ca. Methylacidiphilum” is conserved between all three genomes. All three strains encode class I CRISPR-cas systems. The average nucleotide identity between “Ca. M. kamchatkense” strain Kam1 and strains SolV and V4 is =95% showing that they should be regarded as separate species. Whole genome comparison revealed a high degree of synteny between the genomes of strains Kam1 and SolV. In contrast, comparison of the genomes of strains Kam1 and V4 revealed a number of rearrangements. There are large differences in the numbers of transposable elements found in the genomes of the three strains with 12, 37 and 80 transposable elements in the genomes of strains Kam1, V4 and SolV respectively. Genomic rearrangements and the activity of transposable elements explain much of the genomic differences between strains. For example, a type 1h uptake hydrogenase is conserved between strains Kam1 and SolV but seems to have been lost from strain V4 due to genomic rearrangements.Comparing three closed genomes of “Ca. Methylacidiphilum” spp. has given new insights into the evolution of these organisms and revealed large differences in numbers of transposable elements between strains, the activity of these explains much of the genomic differences between strains.


April 21, 2020

Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis).

Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n?=?58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun.We generated ˜300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1?Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle.Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.


April 21, 2020

Single-molecule sequencing detection of N6-methyladenine in microbial reference materials.

The DNA base modification N6-methyladenine (m6A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m6A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.


April 21, 2020

Methicillin-Resistant Staphylococcus aureus Blood Isolates Harboring a Novel Pseudo-staphylococcal Cassette Chromosome mec Element.

The aim of this work was to assess a novel pseudo-staphylococcal cassette chromosome mec (?SCCmec) element in methicillin-resistant Staphylococcus aureus (MRSA) blood isolates. Community-associated MRSA E16SA093 and healthcare-associated MRSA F17SA003 isolates were recovered from the blood specimens of patients with S. aureus bacteremia in 2016 and in 2017, respectively. Antimicrobial susceptibility was determined via the disk diffusion method, and SCCmec typing was conducted by multiplex polymerase chain reaction. Whole genome sequencing was carried out by single molecule real-time long-read sequencing. Both isolates belonged to sequence type 72 and agr-type I, and they were negative for Panton-Valentine leukocidin and toxic shock syndrome toxin. The spa-types of E16SA093 and F17SA003 were t324 and t2460, respectively. They had a SCCmec IV-like element devoid of the cassette chromosome recombinase (ccr) gene complex, designated as ?SCCmecE16SA093. The element was manufactured from SCCmec type IV and the deletion of the ccr gene complex and a 7.0- and 31.9-kb portion of each chromosome. The deficiency of the ccr gene complex in the SCCmec unit is likely resulting in mobility loss, which would be an adaptive evolutionary mechanism. The dissemination of this clone should be monitored closely.


April 21, 2020

A Pathovar of Xanthomonas oryzae Infecting Wild Grasses Provides Insight Into the Evolution of Pathogenicity in Rice Agroecosystems

Xanthomonas oryzae (Xo) are critical rice pathogens. Virulent lineages from Africa and Asia and less virulent strains from the US have been well characterized. X. campestris pv. leersiae (Xcl), first described in 1957, causes bacterial streak on the perennial grass, Leersia hexandra, and is a close relative of Xo. L. hexandra, a member of the Poaceae, is highly similar to rice phylogenetically, is globally ubiquitous around rice paddies, and is a reservoir of pathogenic Xo. We used long read, single molecule, real time (SMRT) genome sequences of five strains of Xcl from Burkina Faso, China, Mali and Uganda to determine the genetic relatedness of this organism with Xo. Novel Transcription Activator-Like Effectors (TALEs) were discovered in all five strains of Xcl. Predicted TALE target sequences were identified in the L. perrieri genome and compared to rice susceptibility gene homologs. Pathogenicity screening on L. hexandra and diverse rice cultivars confirmed that Xcl are able to colonize rice and produce weak but not progressive symptoms. Overall, based on average nucleotide identity, type III effector repertoires and disease phenotype, we propose to rename Xcl to X. oryzae pv. leersiae (Xol) and use this parallel system to improve understanding of the evolution of bacterial pathogenicity in rice agroecosystems.


April 21, 2020

Prediction of Host-Specific Genes by Pan-Genome Analyses of the Korean Ralstonia solanacearum Species Complex.

The soil-borne pathogenic Ralstonia solanacearum species complex (RSSC) is a group of plant pathogens that is economically destructive worldwide and has a broad host range, including various solanaceae plants, banana, ginger, sesame, and clove. Previously, Korean RSSC strains isolated from samples of potato bacterial wilt were grouped into four pathotypes based on virulence tests against potato, tomato, eggplant, and pepper. In this study, we sequenced the genomes of 25 Korean RSSC strains selected based on these pathotypes. The newly sequenced genomes were analyzed to determine the phylogenetic relationships between the strains with average nucleotide identity values, and structurally compared via multiple genome alignment using Mauve software. To identify candidate genes responsible for the host specificity of the pathotypes, functional genome comparisons were conducted by analyzing pan-genome orthologous group (POG) and type III secretion system effectors (T3es). POG analyses revealed that a total of 128 genes were shared only in tomato-non-pathogenic strains, 8 genes in tomato-pathogenic strains, 5 genes in eggplant-non-pathogenic strains, 7 genes in eggplant-pathogenic strains, 1 gene in pepper-non-pathogenic strains, and 34 genes in pepper-pathogenic strains. When we analyzed T3es, three host-specific effectors were predicted: RipS3 (SKWP3) and RipH3 (HLK3) were found only in tomato-pathogenic strains, and RipAC (PopC) were found only in eggplant-pathogenic strains. Overall, we identified host-specific genes and effectors that may be responsible for virulence functions in RSSC in silico. The expected characters of those genes suggest that the host range of RSSC is determined by the comprehensive actions of various virulence factors, including effectors, secretion systems, and metabolic enzymes.


April 21, 2020

Complete genome sequence of the Sulfodiicoccus acidiphilus strain HS-1T, the first crenarchaeon that lacks polB3, isolated from an acidic hot spring in Ohwaku-dani, Hakone, Japan.

Sulfodiicoccus acidiphilus HS-1T is the type species of the genus Sulfodiicoccus, a thermoacidophilic archaeon belonging to the order Sulfolobales (class Thermoprotei; phylum Crenarchaeota). While S. acidiphilus HS-1T shares many common physiological and phenotypic features with other Sulfolobales species, the similarities in their 16S rRNA gene sequences are less than 89%. In order to know the genomic features of S. acidiphilus HS-1T in the order Sulfolobales, we determined and characterized the genome of this strain.The circular genome of S. acidiphilus HS-1T is comprised of 2353,189 bp with a G+C content of 51.15 mol%. A total of 2459 genes were predicted, including 2411 protein coding and 48 RNA genes. The notable genomic features of S. acidiphilus HS-1T in Sulfolobales species are the absence of genes for polB3 and the autotrophic carbon fixation pathway, and the distribution pattern of essential genes and sequences related to genomic replication initiation. These insights contribute to an understanding of archaeal genomic diversity and evolution.


April 21, 2020

Whole Genome Analysis of Lactobacillus plantarum Strains Isolated From Kimchi and Determination of Probiotic Properties to Treat Mucosal Infections by Candida albicans and Gardnerella vaginalis.

Three Lactobacillus plantarum strains ATG-K2, ATG-K6, and ATG-K8 were isolated from Kimchi, a Korean traditional fermented food, and their probiotic potentials were examined. All three strains were free of antibiotic resistance, hemolysis, and biogenic amine production and therefore assumed to be safe, as supported by whole genome analyses. These strains demonstrated several basic probiotic functions including a wide range of antibacterial activity, bile salt hydrolase activity, hydrogen peroxide production, and heat resistance at 70°C for 60 s. Further studies of antimicrobial activities against Candida albicans and Gardnerella vaginalis revealed growth inhibitory effects from culture supernatants, coaggregation effects, and killing effects of the three probiotic strains, with better efficacy toward C. albicans. In vitro treatment of bacterial lysates of the probiotic strains to the RAW264.7 murine macrophage cell line resulted in innate immunity enhancement via IL-6 and TNF-a production without lipopolysaccharide (LPS) treatment and anti-inflammatory effects via significantly increased production of IL-10 when co-treated with LPS. However, the degree of probiotic effect was different for each strain as the highest TNF-a and the lowest IL-10 production by the RAW264.7 cell were observed in the K8 lysate treated group compared to the K2 and K6 lysate treated groups, which may be related to genomic differences such as chromosome size (K2: 3,034,884 bp, K6: 3,205,672 bp, K8: 3,221,272 bp), plasmid numbers (K2: 3, K6 and K8: 1), or total gene numbers (K2: 3,114, K6: 3,178, K8: 3,186). Although more correlative inspections to connect genomic information and biological functions are needed, genomic analyses of the three strains revealed distinct genomic compositions of each strain. Also, this finding suggests genome level analysis may be required to accurately identify microorganisms. Nevertheless, L. plantarum ATG-K2, ATG-K6, and ATG-K8 demonstrated their potential as probiotics for mucosal health improvement in both microbial and immunological contexts.


April 21, 2020

Magic-BLAST, an accurate RNA-seq aligner for long and short reads.

Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline.Magic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome.We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.


April 21, 2020

Genomics-driven discovery of a biosynthetic gene cluster required for the synthesis of BII-Rafflesfungin from the fungus Phoma sp. F3723.

Phomafungin is a recently reported broad spectrum antifungal compound but its biosynthetic pathway is unknown. We combed publicly available Phoma genomes but failed to find any putative biosynthetic gene cluster that could account for its biosynthesis.Therefore, we sequenced the genome of one of our Phoma strains (F3723) previously identified as having antifungal activity in a high-throughput screen. We found a biosynthetic gene cluster that was predicted to synthesize a cyclic lipodepsipeptide that differs in the amino acid composition compared to Phomafungin. Antifungal activity guided isolation yielded a new compound, BII-Rafflesfungin, the structure of which was determined.We describe the NRPS-t1PKS cluster ‘BIIRfg’ compatible with the synthesis of the cyclic lipodepsipeptide BII-Rafflesfungin [HMHDA-L-Ala-L-Glu-L-Asn-L-Ser-L-Ser-D-Ser-D-allo-Thr-Gly]. We report new Stachelhaus codes for Ala, Glu, Asn, Ser, Thr, and Gly. We propose a mechanism for BII-Rafflesfungin biosynthesis, which involves the formation of the lipid part by BIIRfg_PKS followed by activation and transfer of the lipid chain by a predicted AMP-ligase on to the first PCP domain of the BIIRfg_NRPS gene.


April 21, 2020

Reconstruction of the full-length transcriptome atlas using PacBio Iso-Seq provides insight into the alternative splicing in Gossypium australe.

Gossypium australe F. Mueller (2n?=?2x?=?26, G2 genome) possesses valuable characteristics. For example, the delayed gland morphogenesis trait causes cottonseed protein and oil to be edible while retaining resistance to biotic stress. However, the lack of gene sequences and their alternative splicing (AS) in G. australe remain unclear, hindering to explore species-specific biological morphogenesis.Here, we report the first sequencing of the full-length transcriptome of the Australian wild cotton species, G. australe, using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) from the pooled cDNA of ten tissues to identify transcript loci and splice isoforms. We reconstructed the G. australe full-length transcriptome and identified 25,246 genes, 86 pre-miRNAs and 1468 lncRNAs. Most genes (12,832, 50.83%) exhibited two or more isoforms, suggesting a high degree of transcriptome complexity in G. australe. A total of 31,448 AS events in five major types were found among the 9944 gene loci. Among these five major types, intron retention was the most frequent, accounting for 68.85% of AS events. 29,718 polyadenylation sites were detected from 14,536 genes, 7900 of which have alternative polyadenylation sites (APA). In addition, based on our AS events annotations, RNA-Seq short reads from germinating seeds showed that differential expression of these events occurred during seed germination. Ten AS events that were randomly selected were further confirmed by RT-PCR amplification in leaf and germinating seeds.The reconstructed gene sequences and their AS in G. australe would provide information for exploring beneficial characteristics in G. australe.


April 21, 2020

Divergent evolutionary trajectories following speciation in two ectoparasitic honey bee mites.

Multispecies host-parasite evolution is common, but how parasites evolve after speciating remains poorly understood. Shared evolutionary history and physiology may propel species along similar evolutionary trajectories whereas pursuing different strategies can reduce competition. We test these scenarios in the economically important association between honey bees and ectoparasitic mites by sequencing the genomes of the sister mite species Varroa destructor and Varroa jacobsoni. These genomes were closely related, with 99.7% sequence identity. Among the 9,628 orthologous genes, 4.8% showed signs of positive selection in at least one species. Divergent selective trajectories were discovered in conserved chemosensory gene families (IGR, SNMP), and Halloween genes (CYP) involved in moulting and reproduction. However, there was little overlap in these gene sets and associated GO terms, indicating different selective regimes operating on each of the parasites. Based on our findings, we suggest that species-specific strategies may be needed to combat evolving parasite communities. © The Author(s) 2019.


April 21, 2020

Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data.

Our understanding of the pig transcriptome is limited. RNA transcript diversity among nine tissues was assessed using poly(A) selected single-molecule long-read isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) from a single White cross-bred pig. Across tissues, a total of 67,746 unique transcripts were observed, including 60.5% predicted protein-coding, 36.2% long non-coding RNA and 3.3% nonsense-mediated decay transcripts. On average, 90% of the splice junctions were supported by RNA-seq within tissue. A large proportion (80%) represented novel transcripts, mostly produced by known protein-coding genes (70%), while 17% corresponded to novel genes. On average, four transcripts per known gene (tpg) were identified; an increase over current EBI (1.9 tpg) and NCBI (2.9 tpg) annotations and closer to the number reported in human genome (4.2 tpg). Our new pig genome annotation extended more than 6000 known gene borders (5′ end extension, 3′ end extension, or both) compared to EBI or NCBI annotations. We validated a large proportion of these extensions by independent pig poly(A) selected 3′-RNA-seq data, or human FANTOM5 Cap Analysis of Gene Expression data. Further, we detected 10,465 novel genes (81% non-coding) not reported in current pig genome annotations. More than 80% of these novel genes had transcripts detected in >?1 tissue. In addition, more than 80% of novel intergenic genes with at least one transcript detected in liver tissue had H3K4me3 or H3K36me3 peaks mapping to their promoter and gene body, respectively, in independent liver chromatin immunoprecipitation data. These validated results show significant improvement over current pig genome annotations.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.