Menu
September 22, 2019

PacBio sequencing and its applications.

Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.


September 22, 2019

A manganese superoxide dismutase (MnSOD) from red lip mullet, Liza haematocheila: Evaluation of molecular structure, immune response, and antioxidant function.

Manganese superoxide dismutase (MnSOD) is a nuclear-encoded antioxidant metalloenzyme. The main function of this enzyme is to dismutase the toxic superoxide anion (O2-) into less toxic hydrogen peroxide (H2O2) and oxygen (O2). Structural analysis of mullet MnSOD (MuMnSOD) was performed using different bioinformatics tools. Pairwise alignment revealed that the protein sequence matched to that derived from Larimichthys crocea with a 95.2% sequence identity. Phylogenetic tree analysis showed that the MuMnSOD was included in the category of teleosts. Multiple sequence alignment showed that a SOD Fe-N domain, SOD Fe-C domain, and Mn/Fe SOD signature were highly conserved among the other examined MnSOD orthologs. Quantitative real-time PCR showed that the highest MuMnSOD mRNA expression level was in blood cells. The highest expression level of MuMnSOD was observed in response to treatment with both Lactococcus garvieae and lipopolysaccharide (LPS) at 6?h post treatment in the head kidney and blood. Potential ROS-scavenging ability of the purified recombinant protein (rMuMnSOD) was examined by the xanthine oxidase assay (XOD assay). The optimum temperature and pH for XOD activity were found to be 25?°C and pH 7, respectively. Relative XOD activity was significantly increased with the dose of rMuMnSOD, revealing its dose dependency. Activity of rMuMnSOD was inhibited by potassium cyanide (KCN) and N-N’-diethyl-dithiocarbamate (DDC). Moreover, expression of MuMnSOD resulted in considerable growth retardation of both gram-positive and gram-negative bacteria. Results of the current study suggest that MuMnSOD acts as an antioxidant enzyme and participates in the immune response in mullet. Copyright © 2018 Elsevier Ltd. All rights reserved.


September 22, 2019

Transcriptome characterization of moso bamboo (Phyllostachys edulis) seedlings in response to exogenous gibberellin applications.

Moso bamboo (Phyllostachys edulis) is a well-known bamboo species of high economic value in the textile industry due to its rapid growth. Phytohormones, which are master regulators of growth and development, serve as important endogenous signals. However, the mechanisms through which phytohormones regulate growth in moso bamboo remain unknown to date.Here, we reported that exogenous gibberellins (GA) applications resulted in a significantly increased internode length and lignin condensation. Transcriptome sequencing revealed that photosynthesis-related genes were enriched in the GA-repressed gene class, which was consistent with the decrease in leaf chlorophyll concentrations and the lower rate of photosynthesis following GA treatment. Exogenous GA applications on seedlings are relatively easy to perform, thus we used 4-week-old whole seedlings of bamboo for GA- treatment followed by high throughput sequencing. In this study, we identified 932 cis-nature antisense transcripts (cis-NATs), and 22,196 alternative splicing (AS) events in total. Among them, 42 cis-nature antisense transcripts (cis-NATs) and 442 AS events were differentially expressed upon exposure to exogenous GA3, suggesting that post-transcriptional regulation might be also involved in the GA3 response. Targets of differential expression of cis-NATs included genes involved in hormone receptor, photosynthesis and cell wall biogenesis. For example, LAC4 and its corresponding cis-NATs were GA3-induced, and may be involved in the accumulation of lignin, thus affecting cell wall composition.This study provides novel insights illustrating how GA alters post-transcriptional regulation and will shed light on the underlying mechanism of growth modulated by GA in moso bamboo.


September 22, 2019

Long-read sequencing and de novo assembly of a Chinese genome.

Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93?Gb (contig N50: 8.3?Mb, scaffold N50: 22.0?Mb, including 39.3?Mb N-bases), together with 206?Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8?Mb of HX1-specific sequences, including 4.1?Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.


September 22, 2019

Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon

A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.


September 22, 2019

Long non-coding RNA identification: comparing machine learning based tools for long non-coding transcripts discrimination

Long noncoding RNA (lncRNA) is a kind of noncoding RNA with length more than 200 nucleotides, which aroused interest of people in recent years. Lots of studies have confirmed that human genome contains many thousands of lncRNAs which exert great influence over some critical regulators of cellular process. With the advent of high-throughput sequencing technologies, a great quantity of sequences is waiting for exploitation. Thus, many programs are developed to distinguish differences between coding and long noncoding transcripts. Different programs are generally designed to be utilised under different circumstances and it is sensible and practical to select an appropriate method according to a certain situation. In this review, several popular methods and their advantages, disadvantages, and application scopes are summarised to assist people in employing a suitable method and obtaining a more reliable result.


September 22, 2019

Capturing single cell genomes of active polysaccharide degraders: an unexpected contribution of Verrucomicrobia.

Microbial hydrolysis of polysaccharides is critical to ecosystem functioning and is of great interest in diverse biotechnological applications, such as biofuel production and bioremediation. Here we demonstrate the use of a new, efficient approach to recover genomes of active polysaccharide degraders from natural, complex microbial assemblages, using a combination of fluorescently labeled substrates, fluorescence-activated cell sorting, and single cell genomics. We employed this approach to analyze freshwater and coastal bacterioplankton for degraders of laminarin and xylan, two of the most abundant storage and structural polysaccharides in nature. Our results suggest that a few phylotypes of Verrucomicrobia make a considerable contribution to polysaccharide degradation, although they constituted only a minor fraction of the total microbial community. Genomic sequencing of five cells, representing the most predominant, polysaccharide-active Verrucomicrobia phylotype, revealed significant enrichment in genes encoding a wide spectrum of glycoside hydrolases, sulfatases, peptidases, carbohydrate lyases and esterases, confirming that these organisms were well equipped for the hydrolysis of diverse polysaccharides. Remarkably, this enrichment was on average higher than in the sequenced representatives of Bacteroidetes, which are frequently regarded as highly efficient biopolymer degraders. These findings shed light on the ecological roles of uncultured Verrucomicrobia and suggest specific taxa as promising bioprospecting targets. The employed method offers a powerful tool to rapidly identify and recover discrete genomes of active players in polysaccharide degradation, without the need for cultivation.


September 22, 2019

Emergence, retention and selection: A trilogy of origination for functional de novo proteins from ancestral lncRNAs in primates.

While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts.


September 22, 2019

Full-length transcriptome sequences and the identification of putative genes for flavonoid biosynthesis in safflower.

The flower of the safflower (Carthamus tinctorius L.) has been widely used in traditional Chinese medicine for the ability to improve cerebral blood flow. Flavonoids are the primary bioactive components in safflower, and their biosynthesis has attracted widespread interest. Previous studies mostly used second-generation sequencing platforms to survey the putative flavonoid biosynthesis genes. For a better understanding of transcription data and the putative genes involved in flavonoid biosynthesis in safflower, we carry our study.High-quality RNA was extracted from six types of safflower tissue. The RNAs of different tissues were mixed equally and used for multiple size-fractionated libraries (1-2, 2-3 and 3-6 k) library construction. Five cells were carried (2 cells for 1-2 and for 2-3 k libraries and 1 cell for 3-6 k libraries). 10.43Gb clean data and 38,302 de-redundant sequences were captured. 44 unique isoforms were annotated as encoding enzymes involved in flavonoid biosynthesis. The full length flavonoid genes were characterized and their evolutional relationship and expressional pattern were analyzed. They can be divided into eight families, with a large differences in the tissue expression. The temporal expressions under MeJA treatment were also measured, 9 genes are significantly up-regulated and 2 genes are significantly down-regulated. The genes involved in flavonoid synthesis in safflower were predicted in our study. Besides, the SSR and lncRNA are also analyzed in our study.Full-length transcriptome sequences were used in our study. The genes involved in flavonoid synthesis in safflower were predicted in our study. Combined the determination of flavonoids, CtC4H2, CtCHS3, CtCHI3, CtF3H3, CtF3H1 are mainly participated in MeJA promoting the synthesis of flavonoids. Our results also provide a valuable resource for further study on safflower.


September 22, 2019

Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform.

Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.


September 22, 2019

Nearly finished genomes produced using gel microdroplet culturing reveal substantial intraspecies genomic diversity within the human microbiome.

The majority of microbial genomic diversity remains unexplored. This is largely due to our inability to culture most microorganisms in isolation, which is a prerequisite for traditional genome sequencing. Single-cell sequencing has allowed researchers to circumvent this limitation. DNA is amplified directly from a single cell using the whole-genome amplification technique of multiple displacement amplification (MDA). However, MDA from a single chromosome copy suffers from amplification bias and a large loss of specificity from even very small amounts of DNA contamination, which makes assembling a genome difficult and completely finishing a genome impossible except in extraordinary circumstances. Gel microdrop cultivation allows culturing of a diverse microbial community and provides hundreds to thousands of genetically identical cells as input for an MDA reaction. We demonstrate the utility of this approach by comparing sequencing results of gel microdroplets and single cells following MDA. Bias is reduced in the MDA reaction and genome sequencing, and assembly is greatly improved when using gel microdroplets. We acquired multiple near-complete genomes for two bacterial species from human oral and stool microbiome samples. A significant amount of genome diversity, including single nucleotide polymorphisms and genome recombination, is discovered. Gel microdroplets offer a powerful and high-throughput technology for assembling whole genomes from complex samples and for probing the pan-genome of naturally occurring populations.


September 22, 2019

Ensembl 2018

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.


September 22, 2019

Transcriptome sequencing and comparative analysis of differentially-expressed isoforms in the roots of Halogeton glomeratus under salt stress.

Although Halogeton glomeratus (H. glomeratus) has been confirmed to have a unique mechanism to regulate Na+efflux from the cytoplasm and compartmentalize Na+into leaf vacuoles, little is known about the salt tolerance mechanisms of roots under salinity stress. In the present study, transcripts were sequenced using the BGISEQ-500 sequencing platform (BGI, Wuhan, China). After quality control, approximately 24.08 million clean reads were obtained and the average mapping ratio to the reference gene was 70.00%. When comparing salt-treated samples with the control, a total of 550, 590, 1411 and 2063 DEIs were identified at 2, 6, 24 and 72h, respectively. Numerous differentially-expressed isoforms that play important roles in response and adaptation to salt condition are related to metabolic processes, cellular processes, single-organism processes, localization, biological regulation, responses to stimulus, binding, catalytic activity and transporter activity. Fifty-eight salt-induced isoforms were common to different stages of salt stress; most of these DEIs were related to signal transduction and transporters, which maybe the core isoforms regulating Na+uptake and transport in the roots of H. glomeratus. The expression patterns of 18 DEIs that were detected by quantitative real-time polymerase chain reaction were consistent with their respective changes in transcript abundance as identified by RNA-Seq technology. The present study thoroughly explored potential isoforms involved in salt tolerance on H. glomeratus roots at five time points. Our results may serve as an important resource for the H. glomeratus research community, improving our understanding of salt tolerance in halophyte survival under high salinity stress. Copyright © 2018 Elsevier B.V. All rights reserved.


September 22, 2019

Minimap2: pairwise alignment for nucleotide sequences.

Recent advances in sequencing technologies promise ultra-long reads of ~100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of =100?bp in length, =1?kb genomic reads at error rate ~15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is =30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment.https://github.com/lh3/minimap2.Supplementary data are available at Bioinformatics online.


September 22, 2019

Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing.

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.