The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of…
The diverse colours of mature pepper (Capsicum spp.) fruit result from the accumulation of different carotenoids. The carotenoid biosynthetic pathway has been well elucidated in Solanaceous plants, and analysis of candidate genes involved in this process has revealed variations in carotenoid biosynthetic genes in Capsicum spp. However, the allelic variations revealed by previous studies could not fully explain the variation in fruit colour in Capsicum spp. due to technical difficulties in detecting allelic variation in multiple candidate genes in numerous samples. In this study, we uncovered allelic variations in six carotenoid biosynthetic genes, including phytoene synthase (PSY1, PSY2), lycopene ß-cyclase,…
Ethanol-type fermentation is one of three main fermentation types in the acidogenesis of anaerobic treatment systems. Non-spore-forming Ethanoligenens is as a typical genus capable of ethanol-type fermentation in mixed culture (i.e. acetate-ethanol fermentation). This genus can produce ethanol, acetate, CO2, and H2 using carbohydrates, and has application potential in anaerobic bioprocesses. Here, the complete genome sequences and methylome of Ethanoligenens harbinense strains with different autoaggregative and coaggregative abilities were obtained using the PacBio single-molecule real-time sequencing platform. The genome size of E. harbinense strains was about 2.97-3.10?Mb with 55.5% G+C content. 3020-3153 genes were annotated, most of which were methylated…
The enormous sizes of adhesion G protein-coupled receptors (aGPCRs) go along with complex genomic exon-intron architectures giving rise to multiple mRNA variants. There is a need for a comprehensive catalog of aGPCR variants for proper evaluation of the complex functions of aGPCRs found in structural, in vitro and animal model studies. We used an established bioinformatics pipeline to extract, quantify and visualize mRNA variants of aGPCRs from deeply sequenced transcriptomes. Data analysis showed that aGPCRs have multiple transcription start sites even within introns and that tissue-specific splicing is frequent. On average, 19 significantly expressed transcript variants are derived from a…
Genome-wide analysis of DNA methylation patterns using single molecule real-time DNA sequencing has boosted the number of publicly available methylomes. However, there is a lack of tools coupling methylation patterns and the corresponding methyltransferase genes. Here we demonstrate a high-throughput method for coupling methyltransferases with their respective motifs, using automated cloning and analysing the methyltransferases in vectors carrying a strain-specific cassette containing all potential target sites. To validate the method, we analyse the genomes of the thermophile Moorella thermoacetica and the mesophile Acetobacterium woodii, two acetogenic bacteria having substantially modified genomes with 12 methylation motifs and a total of 23…
The complete genome of Cordyceps militaris was sequenced using single-molecule real-time (SMRT) sequencing technology at a coverage over 300×. The genome size was 32.57?Mb, and 14 contigs ranging from 0.35 to 4.58?Mb with an N50 of 2.86?Mb were assembled, including 4 contigs with telomeric sequences on both ends and an additional 8 contigs with telomeric sequences on either the 5′ or 3′ end. A methylome database of the genome was constructed using SMRT and m4C and m6A methylated nucleotides, and many unknown modification types were identified. The major m6A methylation motif is GA and GGAG, and the major m4C methylation…
Xanthomonas oryzae (Xo) are critical rice pathogens. Virulent lineages from Africa and Asia and less virulent strains from the US have been well characterized. X. campestris pv. leersiae (Xcl), first described in 1957, causes bacterial streak on the perennial grass, Leersia hexandra, and is a close relative of Xo. L. hexandra, a member of the Poaceae, is highly similar to rice phylogenetically, is globally ubiquitous around rice paddies, and is a reservoir of pathogenic Xo. We used long read, single molecule, real time (SMRT) genome sequences of five strains of Xcl from Burkina Faso, China, Mali and Uganda to determine…
The soil-borne pathogenic Ralstonia solanacearum species complex (RSSC) is a group of plant pathogens that is economically destructive worldwide and has a broad host range, including various solanaceae plants, banana, ginger, sesame, and clove. Previously, Korean RSSC strains isolated from samples of potato bacterial wilt were grouped into four pathotypes based on virulence tests against potato, tomato, eggplant, and pepper. In this study, we sequenced the genomes of 25 Korean RSSC strains selected based on these pathotypes. The newly sequenced genomes were analyzed to determine the phylogenetic relationships between the strains with average nucleotide identity values, and structurally compared via…
DNA methylation plays important roles in prokaryotes, and their genomic landscapes-prokaryotic epigenomes-have recently begun to be disclosed. However, our knowledge of prokaryotic methylation systems is focused on those of culturable microbes, which are rare in nature. Here, we used single-molecule real-time and circular consensus sequencing techniques to reveal the ‘metaepigenomes’ of a microbial community in the largest lake in Japan, Lake Biwa. We reconstructed 19 draft genomes from diverse bacterial and archaeal groups, most of which are yet to be cultured. The analysis of DNA chemical modifications in those genomes revealed 22 methylated motifs, nine of which were novel. We…
Species of Populus section Leuce are distributed throughout most parts of the Northern Hemisphere and have important economic and ecological significance. However, due to frequent hybridization within Leuce, the phylogenetic relationship between species has not been clarified. The chloroplast (cp) genome is characterized by maternal inheritance and relatively conservative mutation rates; thus, it is a powerful tool for building phylogenetic trees. In this study, we used the PacBio SEQUEL software to determine that the cp genome of Populus tomentosa has a length of 156,558 bp including a long single-copy region (84,717 bp), a small single-copy region (16,555 bp), and a…
In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100?bps to ~10,000?bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With…