We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque…
IGH@ proto-oncogene translocation is a common oncogenic event in lymphoid lineage cancers such as B-ALL, lymphoma and multiple myeloma. Here, to investigate the interplay between IGH@ proto-oncogene translocation and IGH allelic exclusion, we perform long-read whole-genome and transcriptome sequencing along with epigenetic and 3D genome profiling of Nalm6, an IGH-DUX4 positive B-ALL cell line. We detect significant allelic imbalance on the wild-type over the IGH-DUX4 haplotype in expression and epigenetic data, showing IGH-DUX4 translocation occurs on the silenced IGH allele. In vitro, this reduces the oncogenic stress of DUX4 high-level expression. Moreover, patient samples of IGH-DUX4 B-ALL have similar expression…
Flavonoids, theanine and caffeine are the main secondary metabolites of the tea plant (Camellia sinensis), which account for the tea’s unique flavor quality and health benefits. The biosynthesis pathways of these metabolites have been extensively studied at the transcriptional level, but the regulatory mechanisms are still unclear. In this study, to explore the transcriptome diversity and complexity of tea plant, PacBio Iso-Seq and RNA-seq analysis were combined to obtain full-length transcripts and to profile the changes in gene expression during the leaf development. A total of 1,388,066 reads of insert (ROI) were generated with an average length of 1,762?bp, and…
Efficient crop improvement depends on the application of accurate genetic information contained in diverse germplasm resources. Here we report a reference-grade genome of wild soybean accession W05, with a final assembled genome size of 1013.2?Mb and a contig N50 of 3.3?Mb. The analytical power of the W05 genome is demonstrated by several examples. First, we identify an inversion at the locus determining seed coat color during domestication. Second, a translocation event between chromosomes 11 and 13 of some genotypes is shown to interfere with the assignment of QTLs. Third, we find a region containing copy number variations of the Kunitz…
The brown plant hopper (BPH), Nilaparvata lugens, is one of the major pest of rice (Oryza sativa). Plant defenses against insect herbivores have been extensively studied, but our understanding of insect responses to host plants’ resistance mechanisms is still limited. The purpose of this study is to characterize transcripts of BPH and reveal the responses of BPH insects to resistant rice at transcription level by using the advanced molecular techniques, the next-generation sequencing (NGS) and the single-molecule, real-time (SMRT) sequencing.The current study obtained 24,891 collapsed isoforms of full-length transcripts, and 20,662 were mapped to known annotated genes, including 17,175 novel…
Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, with a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed and many new generally…
The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for identification of structural variants, sequencing repetitive regions, phasing alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the currently prevailing NGS approaches. LRS…
Gossypium australe F. Mueller (2n?=?2x?=?26, G2 genome) possesses valuable characteristics. For example, the delayed gland morphogenesis trait causes cottonseed protein and oil to be edible while retaining resistance to biotic stress. However, the lack of gene sequences and their alternative splicing (AS) in G. australe remain unclear, hindering to explore species-specific biological morphogenesis.Here, we report the first sequencing of the full-length transcriptome of the Australian wild cotton species, G. australe, using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) from the pooled cDNA of ten tissues to identify transcript loci and splice isoforms. We reconstructed the G. australe full-length transcriptome and…
Our understanding of the pig transcriptome is limited. RNA transcript diversity among nine tissues was assessed using poly(A) selected single-molecule long-read isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) from a single White cross-bred pig. Across tissues, a total of 67,746 unique transcripts were observed, including 60.5% predicted protein-coding, 36.2% long non-coding RNA and 3.3% nonsense-mediated decay transcripts. On average, 90% of the splice junctions were supported by RNA-seq within tissue. A large proportion (80%) represented novel transcripts, mostly produced by known protein-coding genes (70%), while 17% corresponded to novel genes. On average, four transcripts per known gene (tpg) were…
Darwin’s bark spider (Caerostris darwini) produces giant orb webs from dragline silk that can be twice as tough as other silks, making it the toughest biological material. This extreme toughness comes from increased extensibility relative to other draglines. We show C. darwini dragline-producing major ampullate (MA) glands highly express a novel silk gene transcript (MaSp4) encoding a protein that diverges markedly from closely related proteins and contains abundant proline, known to confer silk extensibility, in a unique GPGPQ amino acid motif. This suggests C. darwini evolved distinct proteins that may have increased its dragline’s toughness, enabling giant webs. Caerostris darwini’s…
Gene mutation is a common phenomenon in nature that often leads to phenotype differences, such as the variations in flower color that frequently occur in roses. With the aim of revealing the genomic information and inner mechanisms, the differences in the levels of both transcription and secondary metabolism between a pair of natural rose mutants were investigated by using hybrid RNA-sequencing and metabolite analysis. Metabolite analysis showed that glycosylated derivatives of pelargonidin, e.g., pelargonidin 3,5 diglucoside and pelargonidin 3-glucoside, which were not detected in white flowers (Rosa ‘Whilte Mrago Koster’), constituted the major pigments in pink flowers. Conversely, the flavonol…
Taxus cuspidata is well known worldwide for its ability to produce Taxol, one of the top-selling natural anticancer drugs. However, current Taxol production cannot match the increasing needs of the market, and novel strategies should be considered to increase the supply of Taxol. Since the biosynthetic mechanism of Taxol remains largely unknown, elucidating this pathway in detail will be very helpful in exploring alternative methods for Taxol production.Here, we sequenced Taxus cuspidata transcriptomes with next-generation sequencing (NGS) and third-generation sequencing (TGS) platforms. After correction with Illumina reads and removal of redundant reads, more than 180,000 nonredundant transcripts were generated from…
Alfalfa is the most extensively cultivated forage legume. Salinity is a major environmental factor that impacts on alfalfa’s productivity. However, little is known about the molecular mechanisms underlying alfalfa responses to salinity, especially the relative contribution of the two important components of osmotic and ionic stress.In this study, we constructed the first full-length transcriptome database for alfalfa root tips under continuous NaCl and mannitol treatments for 1, 3, 6, 12, and 24?h (three biological replicates for each time points, including the control group) via PacBio Iso-Seq. This resulted in the identification of 52,787 full-length transcripts, with an average length of…