Menu
April 21, 2020

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, with a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed and many new generally shorter transcripts were detected by normalization. For the same input cDNA and the same data yield, the normalized library recovered more total transcript isoforms, number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ~1.25 kb, more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ~52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ~80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.


April 21, 2020

Prediction of Host-Specific Genes by Pan-Genome Analyses of the Korean Ralstonia solanacearum Species Complex.

The soil-borne pathogenic Ralstonia solanacearum species complex (RSSC) is a group of plant pathogens that is economically destructive worldwide and has a broad host range, including various solanaceae plants, banana, ginger, sesame, and clove. Previously, Korean RSSC strains isolated from samples of potato bacterial wilt were grouped into four pathotypes based on virulence tests against potato, tomato, eggplant, and pepper. In this study, we sequenced the genomes of 25 Korean RSSC strains selected based on these pathotypes. The newly sequenced genomes were analyzed to determine the phylogenetic relationships between the strains with average nucleotide identity values, and structurally compared via multiple genome alignment using Mauve software. To identify candidate genes responsible for the host specificity of the pathotypes, functional genome comparisons were conducted by analyzing pan-genome orthologous group (POG) and type III secretion system effectors (T3es). POG analyses revealed that a total of 128 genes were shared only in tomato-non-pathogenic strains, 8 genes in tomato-pathogenic strains, 5 genes in eggplant-non-pathogenic strains, 7 genes in eggplant-pathogenic strains, 1 gene in pepper-non-pathogenic strains, and 34 genes in pepper-pathogenic strains. When we analyzed T3es, three host-specific effectors were predicted: RipS3 (SKWP3) and RipH3 (HLK3) were found only in tomato-pathogenic strains, and RipAC (PopC) were found only in eggplant-pathogenic strains. Overall, we identified host-specific genes and effectors that may be responsible for virulence functions in RSSC in silico. The expected characters of those genes suggest that the host range of RSSC is determined by the comprehensive actions of various virulence factors, including effectors, secretion systems, and metabolic enzymes.


April 21, 2020

Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity.

Rapid innovation in sequencing technologies and improvement in assembly algorithms have enabled the creation of highly contiguous mammalian genomes. Here we report a chromosome-level assembly of the water buffalo (Bubalus bubalis) genome using single-molecule sequencing and chromatin conformation capture data. PacBio Sequel reads, with a mean length of 11.5?kb, helped to resolve repetitive elements and generate sequence contiguity. All five B. bubalis sub-metacentric chromosomes were correctly scaffolded with centromeres spanned. Although the index animal was partly inbred, 58% of the genome was haplotype-phased by FALCON-Unzip. This new reference genome improves the contig N50 of the previous short-read based buffalo assembly more than a thousand-fold and contains only 383 gaps. It surpasses the human and goat references in sequence contiguity and facilitates the annotation of hard to assemble gene clusters such as the major histocompatibility complex (MHC).


April 21, 2020

Complete genome sequence of 3-chlorobenzoate-degrading bacterium Cupriavidus necator NH9 and reclassification of the strains of the genera Cupriavidus and Ralstonia based on phylogenetic and whole-genome sequence analyses.

Cupriavidus necator NH9, a 3-chlorobenzoate (3-CB)-degrading bacterium, was isolated from soil in Japan. In this study, the complete genome sequence of NH9 was obtained via PacBio long-read sequencing to better understand the genetic components contributing to the strain’s ability to degrade aromatic compounds, including 3-CB. The genome of NH9 comprised two circular chromosomes (4.3 and 3.4 Mb) and two circular plasmids (427 and 77 kb) containing 7,290 coding sequences, 15 rRNA and 68 tRNA genes. Kyoto Encyclopedia of Genes and Genomes pathway analysis of the protein-coding sequences in NH9 revealed a capacity to completely degrade benzoate, 2-, 3-, or 4-hydroxybenzoate, 2,3-, 2,5-, or 3,4-dihydroxybenzoate, benzoylformate, and benzonitrile. To validate the identification of NH9, phylogenetic analyses (16S rRNA sequence-based tree and multilocus sequence analysis) and whole-genome sequence analyses (average nucleotide identity, percentage of conserved proteins, and tetra-nucleotide analyses) were performed, confirming that NH9 is a C. necator strain. Over the course of our investigation, we noticed inconsistencies in the classification of several strains that were supposed to belong to the two closely-related genera Cupriavidus and Ralstonia. As a result of whole-genome sequence analysis of 46 Cupriavidus strains and 104 Ralstonia strains, we propose that the taxonomic classification of 41 of the 150 strains should be changed. Our results provide a clear delineation of the two genera based on genome sequences, thus allowing taxonomic identification of strains belonging to these two genera.


April 21, 2020

Chromosome assembly of Collichthys lucidus, a fish of Sciaenidae with a multiple sex chromosome system.

Collichthys lucidus (C. lucidus) is a commercially important marine fish species distributed in coastal regions of East Asia with the X1X1X2X2/X1X2Y multiple sex chromosome system. The karyotype for female C. lucidus is 2n?=?48, while 2n?=?47 for male ones. Therefore, C. lucidus is also an excellent model to investigate teleost sex-determination and sex chromosome evolution. We reported the first chromosome genome assembly of C. lucidus using Illumina short-read, PacBio long-read sequencing and Hi-C technology. An 877?Mb genome was obtained with a contig and scaffold N50 of 1.1?Mb and 35.9?Mb, respectively. More than 97% BUSCOs genes were identified in the C. lucidus genome and 28,602 genes were annotated. We identified potential sex-determination genes along chromosomes and found that the chromosome 1 might be involved in the formation of Y specific metacentric chromosome. The first C. lucidus chromosome-level reference genome lays a solid foundation for the following population genetics study, functional gene mapping of important economic traits, sex-determination and sex chromosome evolution studies for Sciaenidae and teleosts.


April 21, 2020

Comparative Phylogenomics, a Stepping Stone for Bird Biodiversity Studies

Birds are a group with immense availability of genomic resources, and hundreds of forthcoming genomes at the doorstep. We review recent developments in whole genome sequencing, phylogenomics, and comparative genomics of birds. Short read based genome assemblies are common, largely due to efforts of the Bird 10K genome project (B10K). Chromosome-level assemblies are expected to increase due to improved long-read sequencing. The available genomic data has enabled the reconstruction of the bird tree of life with increasing confidence and resolution, but challenges remain in the early splits of Neoaves due to their explosive diversification after the Cretaceous-Paleogene (K-Pg) event. Continued genomic sampling of the bird tree of life will not just better reflect their evolutionary history but also shine new light onto the organization of phylogenetic signal and conflict across the genome. The comparatively simple architecture of avian genomes makes them a powerful system to study the molecular foundation of bird specific traits. Birds are on the verge of becoming an extremely resourceful system to study biodiversity from the nucleotide up.


April 21, 2020

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are =?5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer’s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer’s disease gene, found in disease cases but not in controls.While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer’s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.