June 1, 2021  |  

Isoform sequencing: Unveiling the complex landscape in eukaryotic transcriptome on the PacBio RS II.

Advances in RNA sequencing have accelerated our understanding of the transcriptome, however isoform discovery remains challenging due to short read lengths. The Iso-Seq Application provides a new alternative to sequence full-length cDNA libraries using long reads from the PacBio RS II. Identification of long and often rare isoforms is demonstrated with rat heart and lung RNA prepared using the Clontech® SMARTer® cDNA preparation kit, followed by agarose-gel size selection in fractions of 1-2 kb, 2-3 kb and 3-6 kb. For each tissue, 1.8 and 1.2 million reads were obtained from 32 and 26 SMRT Cells, respectively. Filtering for reads with both adapters and polyA tail signals yielded >50% putative full-length transcripts. To improve consensus accuracy, we developed an isoform-level clustering algorithm ICE (Iterative Clustering for Error Correction), and polished full-length consensus sequences from ICE using Quiver. This method generated full-length transcripts up to 4.5 kb with = 99% post-correction accuracy. Compared with known rat genes, the Iso-Seq method not only recovered the majority of currently annotated isoforms, but also several unannotated novel isoforms with identified homologs in the RefSeq database. Additionally, alternative stop sites, extended UTRs, and retained introns were detected.


June 1, 2021  |  

Genome assembly strategies of the recent polyploid, Coffea arabica.

Arabica coffee, revered for its taste and aroma, has a complex genome. It is an allotetraploid (2n=4x=44) with a genome size of approximately 1.3 Gb, derived from the recent (< 0.6 Mya) hybridization of two diploid progenitors (2n=2x=22), C. canephora (710 Mb) and C. eugenioides (670 Mb). Both parental species diverged recently (< 4.2Mya) and their genomes are highly homologous. To facilitate assembly, a dihaploid plant was chosen for sequencing. Initial genome assembly attempts with short read data produced an assembly covering 1,031 Mb of the C. arabica genome with a contig L50 of 9kb. By implementation of long read PacBio at greater than 50x coverage and cutting-edge PacBio software, a de novo PacBio-only genome assembly was constructed that covers 1,042 Mb of the genome with an L50 of 267 kb. The two assemblies were assessed and compared to determine gene content, chimeric regions, and the ability to separate the parental genomes. A genetic map that contains 600 SSRs is being used for anchoring the contigs and improve the sub-genome differentiation together with the search of sub-genome specific SNPs. PacBio transcriptome sequencing is currently being added to finalize gene annotation of the polished assembly. The finished genome assembly will be used to guide re-sequencing assemblies of parental genomes (C. canephora and C. eugenioides) as well as a template for GBS analysis and whole genome re-sequencing of a set of C. arabica accessions representative of the species diversity. The obtained data will provide powerful genomic tools to enable more efficient coffee breeding strategies for this crop, which is highly susceptible to climate change and is the main source of income for millions of small farmers in producing countries.


June 1, 2021  |  

Low-input long-read sequencing for complete microbial genomes and metagenomic community analysis.

Microbial genome sequencing can be done quickly, easily, and efficiently with the PacBio sequencing instruments, resulting in complete de novo assemblies. Alternative protocols have been developed to reduce the amount of purified DNA required for SMRT Sequencing, to broaden applicability to lower-abundance samples. If 50-100 ng of microbial DNA is available, a 10-20 kb SMRTbell library can be made. A 2 kb SMRTbell library only requires a few ng of gDNA when carrier DNA is added to the library. The resulting libraries can be loaded onto multiple SMRT Cells, yielding more than enough data for complete assembly of microbial genomes using the SMRT Portal assembly program HGAP, plus base-modification analysis. The entire process can be done in less than 3 days by standard laboratory personnel. This approach is particularly important for the analysis of metagenomic communities, in which genomic DNA is often limited. From these samples, full-length 16S amplicons can be generated, prepped with the standard SMRTbell library prep protocol, and sequenced. Alternatively, a 2 kb sheared library, made from a few ng of input DNA, can also be used to elucidate the microbial composition of a community, and may provide information about biochemical pathways present in the sample. In both these cases, 1-2 kb reads with >99% accuracy can be obtained from Circular Consensus Sequencing.


June 1, 2021  |  

Analysis of full-length metagenomic 16S genes by Single Molecule, Real-Time Sequencing

High-throughput sequencing of the complete 16S rRNA gene has become a valuable tool for characterizing microbial communities. However, the short reads produced by second-generation sequencing cannot provide taxonomic classification below the genus level. In this study, we demonstrate the capability of PacBio’s Single Molecule, Real-Time (SMRT) Sequencing to generate community profiles using mock microbial community samples from BEI Resources. We also evaluate multiplexing capabilities using PacBio barcodes on pooled samples comprising heterogeneous 16S amplicon populations representing soil, fecal, and mock communities.


June 1, 2021  |  

Full-length sequencing of HLA class I genes of more than 1000 samples provides deep insights into sequence variability

Aim: The vast majority of donor typing relies on sequencing exons 2 and 3 of HLA class I genes (HLA-A, -B, -C). With such an approach certain allele combinations do not result in the anticipated “high resolution” (G-code) typing, due to the lack of exon-phasing information. To resolve ambiguous typing results for a haplotype frequency project, we established a whole gene sequencing approach for HLA class I, facilitating also an estimation of the degree of sequence variability outside the commonly sequenced exons. Methods: Primers were developed flanking the UTR regions resulting in similar amplicon lengths of 4.2-4.4 kb. Using a 4-primer approach, secondary primers containing barcodes were combined with the gene specific primers to obtain barcoded full-gene amplicons in a single amplification step. Amplicons were pooled, purified, and ligated to SMRT bells (i.e. annealing points for sequencing primers) following standard protocols from Pacific Biosciences. Taking advantage of the SMRT chemistry, pools of 48-72 amplicons were sequenced full length and phased in single runs on a Pacific Biosciences RSII instrument. Demultiplexing was achieved using the SMRT portal. Sequence analysis was performed using NGSengine software (GenDx). Results: We successfully performed full-length gene sequencing of 1003 samples, harboring ambiguous typings of either HLA-A (n=46), HLA-B (n=304) or HLA-C (n=653). Despite the high per-read raw error rates typical for SMRT sequencing (~15%) the consensus sequence proved highly reliable. All consensus sequences for exons 2 and 3 were in full accordance with their MiSeq-derived sequences. Unambiguous allelic resolution was achieved for all samples. We observed novel intronic, exonic as well as UTR sequence variations for many of the alleles covered by our data set. This included sequences of 600 individuals with HLA-C*07:01/C*07:02 genotype revealing the extent of sequence variation outside the exons 2 and 3. Conclusion: Here we present a whole gene amplification and sequencing approach for HLA class I genes. The maturity of this approach was demonstrated by sequencing more than 1000 samples, achieving fully phased allelic sequences. Extensive sequencing of one common allele combination hints at the yet to discover diversity of the HLA system outside the commonly analyzed exons.


June 1, 2021  |  

Phased full-length SMRT Sequencing of HLA DPB1

Aim: In contrast to exon-based HLA-typing approaches, whole gene genotyping crucially depends on full-length sequences submitted to the IMGT/HLA Database. Currently, full-length sequences are provided for only 7 out of 520 HLA-DPB1 alleles. Therefore, we developed a fully phased whole-gene sequencing approach for DPB1, to facilitate further exploration of the allelic structure at this locus. Methods: Primers were developed flanking the UTR-regions of DPB1 resulting in a 12 kb amplicon. Using a 4-primer approach, secondary primers containing barcodes were combined with the gene-specific primers to obtain barcoded full-gene amplicons in a single amplification step. Amplicons were pooled, purified, and ligated to SMRT bells (i.e. annealing points for sequencing primers) following standard protocols from Pacific Biosciences. Taking advantage of the SMRT chemistry, pools of 48 amplicons were sequenced full length in single runs on a Pacific Biosciences RSII instrument. Demultiplexing was performed using the SMRT portal. Sequence analysis was performed using the NGSengine software (GenDx). Results: We analyzed a set of 48 randomly picked samples. With 3 exceptions due to PCR failure, all genotype assignments conformed to standard genotyping results based on exons 2 and 3. Allelic proportions for heterozygous positions were evenly distributed (range 0.4 – 0.6) for all samples, suggesting unbiased amplifications. Despite the high per-read raw error rates typical for SMRT sequencing (~15%) the consensus sequence proved highly reliable. All consensus sequences for exons 2 and 3 were in full accordance with their MiSeq-derived sequences. We describe novel intronic sequence variation of the 7 so far genomically defined alleles, as well as 7 whole-length DPB1 alleles with hitherto unknown intronic regions. One of these alleles (HLA-DPB1*131:01) is classified as rare. Conclusion: Here we present a whole gene amplification and sequencing workflow for DPB1 alleles utilizing single molecule real-time (SMRT) sequencing from Pacific Biosciences. Validation of consensus sequences against known exonic sequences highlights the reliability of this technology. This workflow will facilitate amending the IMGT/HLA Database for DPB1.


June 1, 2021  |  

Low-input long-read sequencing for complete microbial genomes and metagenomic community analysis

Microbial genome sequencing can be done quickly, easily, and efficiently with the PacBio sequencing instruments, resulting in complete de novo assemblies. Alternative protocols have been developed to reduce the amount of purified DNA required for SMRT Sequencing, to broaden applicability to lower-abundance samples. If 50-100 ng of microbial DNA is available, a 10-20 kb SMRTbell library can be made. The resulting library can be loaded onto multiple SMRT Cells, yielding more than enough data for complete assembly of microbial genomes using the SMRT Portal assembly program HGAP, plus base modification analysis. The entire process can be done in less than 3 days by standard laboratory personnel. This approach is particularly important for analysis of metagenomic communities, in which genomic DNA is often limited. From these samples, full-length 16S amplicons can be generated, prepped with the standard SMRTbell library prep protocol, and sequenced. Alternatively, a 2 kb sheared library, made from a few ng of input DNA, can also be used to elucidate the microbial composition of a community, and may provide information about biochemical pathways present in the sample. In both these cases, 1-2 kb reads with >99.9% accuracy can be obtained from Circular Consensus Sequencing.


April 21, 2020  |  

Detection of transferable oxazolidinone resistance determinants in Enterococcus faecalis and Enterococcus faecium of swine origin in Sichuan Province, China.

The aim of this study was to detect the transferable oxazolidinone resistance determinants (cfr, optrA and poxtA) in E. faecalis and E. faecium of swine origin in Sichuan Province, China.A total of 158 enterococci strains (93 E. faecalis and 65 E. faecium) isolated from 25 large-scale swine farms were screened for the presence of cfr, optrA and poxtA by PCR. The genetic environments of cfr, optrA and poxtA were characterized by whole genome sequencing. Transfer of oxazolidinone resistance determinants was determined by conjugation or electrotransformation experiments.The transferable oxazolidinone resistance determinants, cfr, optrA and poxtA, were detected in zero, six, and one enterococci strains, respectively. The poxtA in one E. faecalis strain was located on a 37,990 bp plasmid, which co-harbored fexB, cat, tet(L) and tet(M), and could be conjugated to E. faecalis JH2-2. One E. faecalis strain harbored two different OptrA variants, including one variant with a single substitution, Q219H, which has not been reported previously. Two optrA-carrying plasmids, pC25-1, with a size of 45,581 bp, and pC54, with a size of 64,500 bp, shared a 40,494 bp identical region that contained genetic context IS1216E-fexA-optrA-erm(A)-IS1216E, which could be electrotransformed into Staphylococcus aureus. Four different chromosomal optrA gene clusters were found in five strains, in which optrA was associated with Tn554 or Tn558 that were inserted into the radC gene.Our study highlights the fact that mobile genetic elements, such as plasmids, IS1216E, Tn554 and Tn558, may facilitate the horizontal transmission of optrA or poxtA.Copyright © 2019. Published by Elsevier Ltd.


April 21, 2020  |  

Whole-Genome Sequencing of a Brucella melitensis Strain (BMWS93) Isolated from a Bank Clerk and Exhibiting Complete Resistance to Rifampin.

Human brucellosis has become the most severe public health problem in the Ulanqab region of Inner Mongolia, China. Brucella melitensis BMWS93 was obtained from a blood sample taken from a bank clerk in the Ulanqab region of Inner Mongolia, China, and antimicrobial susceptibility testing in vitro showed no zone of inhibition, which confirmed resistance to rifampin. Therefore, whole-genome sequencing of this isolate was performed to better understand the mechanism of this resistance.Copyright © 2019 Liu et al.


April 21, 2020  |  

Complete Genome Sequence of Leptospira kmetyi LS 001/16, Isolated from a Soil Sample Associated with a Leptospirosis Patient in Kelantan, Malaysia.

The Gram-negative pathogenic spirochetal bacteria Leptospira spp. cause leptospirosis in humans and livestock animals. Leptospira kmetyi strain LS 001/16 was isolated from a soil sample associated with a leptospirosis patient in Kelantan, which is among the states in Malaysia with a high reported number of disease cases. Here, we report the complete genome sequence of Leptospira kmetyi strain LS 001/16. Copyright © 2019 Yusof et al.


April 21, 2020  |  

A Novel Bacteriophage Exclusion (BREX) System Encoded by the pglX Gene in Lactobacillus casei Zhang.

The bacteriophage exclusion (BREX) system is a novel prokaryotic defense system against bacteriophages. To our knowledge, no study has systematically characterized the function of the BREX system in lactic acid bacteria. Lactobacillus casei Zhang is a probiotic bacterium originating from koumiss. By using single-molecule real-time sequencing, we previously identified N6-methyladenine (m6A) signatures in the genome of L. casei Zhang and a putative methyltransferase (MTase), namely, pglX This work further analyzed the genomic locus near the pglX gene and identified it as a component of the BREX system. To decipher the biological role of pglX, an L. casei Zhang pglX mutant (?pglX) was constructed. Interestingly, m6A methylation of the 5′-ACRCAG-3′ motif was eliminated in the ?pglX mutant. The wild-type and mutant strains exhibited no significant difference in morphology or growth performance in de Man-Rogosa-Sharpe (MRS) medium. A significantly higher plasmid acquisition capacity was observed for the ?pglX mutant than for the wild type if the transformed plasmids contained pglX recognition sites (i.e., 5′-ACRCAG-3′). In contrast, no significant difference was observed in plasmid transformation efficiency between the two strains when plasmids lacking pglX recognition sites were tested. Moreover, the ?pglX mutant had a lower capacity to retain the plasmids than the wild type, suggesting a decrease in genetic stability. Since the Rebase database predicted that the L. casei PglX protein was bifunctional, as both an MTase and a restriction endonuclease, the PglX protein was heterologously expressed and purified but failed to show restriction endonuclease activity. Taken together, the results show that the L. casei Zhang pglX gene is a functional adenine MTase that belongs to the BREX system.IMPORTANCELactobacillus casei Zhang is a probiotic that confers beneficial effects on the host, and it is thus increasingly used in the dairy industry. The possession of an effective bacterial immune system that can defend against invasion of phages and exogenous DNA is a desirable feature for industrial bacterial strains. The bacteriophage exclusion (BREX) system is a recently described phage resistance system in prokaryotes. This work confirmed the function of the BREX system in L. casei and that the methyltransferase (pglX) is an indispensable part of the system. Overall, our study characterizes a BREX system component gene in lactic acid bacteria. Copyright © 2019 American Society for Microbiology.


April 21, 2020  |  

Sensitivity to the two peptide bacteriocin plantaricin EF is dependent on CorC, a membrane-bound, magnesium/cobalt efflux protein.

Lactic acid bacteria produce a variety of antimicrobial peptides known as bacteriocins. Most bacteriocins are understood to kill sensitive bacteria through receptor-mediated disruptions. Here, we report on the identification of the Lactobacillus plantarum plantaricin EF (PlnEF) receptor. Spontaneous PlnEF-resistant mutants of the PlnEF-indicator strain L. plantarum NCIMB 700965 (LP965) were isolated and confirmed to maintain cellular ATP levels in the presence of PlnEF. Genome comparisons resulted in the identification of a single mutated gene annotated as the membrane-bound, magnesium/cobalt efflux protein CorC. All isolates contained a valine (V) at position 334 instead of a glycine (G) in a cysteine-ß-synthase domain at the C-terminal region of CorC. In silico template-based modeling of this domain indicated that the mutation resides in a loop between two ß-strands. The relationship between PlnEF, CorC, and metal homeostasis was supported by the finding that PlnEF-resistance was lost when PlnEF was applied together with high concentrations of Mg2+ , Co2+ , Zn2+ , or Cu2+ . Lastly, PlnEF sensitivity was increased upon heterologous expression of LP965 corC but not the G334V CorC mutant in the PlnEF-resistant strain Lactobacillus casei BL23. These results show that PlnEF kills sensitive bacteria by targeting CorC. © 2019 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.


April 21, 2020  |  

High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution.

Targeted PCR amplification and high-throughput sequencing (amplicon sequencing) of 16S rRNA gene fragments is widely used to profile microbial communities. New long-read sequencing technologies can sequence the entire 16S rRNA gene, but higher error rates have limited their attractiveness when accuracy is important. Here we present a high-throughput amplicon sequencing methodology based on PacBio circular consensus sequencing and the DADA2 sample inference method that measures the full-length 16S rRNA gene with single-nucleotide resolution and a near-zero error rate. In two artificial communities of known composition, our method recovered the full complement of full-length 16S sequence variants from expected community members without residual errors. The measured abundances of intra-genomic sequence variants were in the integral ratios expected from the genuine allelic variants within a genome. The full-length 16S gene sequences recovered by our approach allowed Escherichia coli strains to be correctly classified to the O157:H7 and K12 sub-species clades. In human fecal samples, our method showed strong technical replication and was able to recover the full complement of 16S rRNA alleles in several E. coli strains. There are likely many applications beyond microbial profiling for which high-throughput amplicon sequencing of complete genes with single-nucleotide resolution will be of use. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

Complete Genome Sequence of Lactococcus lactis subsp. cremoris 3107, Host for the Model Lactococcal P335 Bacteriophage TP901-1.

The complete genome sequence of Lactococcus lactis subsp. cremoris 3107, a dairy starter strain and a host for the model lactococcal P335 bacteriophage TP901-1, is reported here. The circular chromosome of L. lactis subsp. cremoris 3107 is among the smallest genomes of currently sequenced lactococcal strains. L. lactis subsp. cremoris 3107 harbors a complement of six plasmids, which appears to be a reflection of its adaptation to the nutrient-rich dairy environment.


April 21, 2020  |  

Complete Genome Sequence of the Wolbachia wAlbB Endosymbiont of Aedes albopictus.

Wolbachia, an alpha-proteobacterium closely related to Rickettsia, is a maternally transmitted, intracellular symbiont of arthropods and nematodes. Aedes albopictus mosquitoes are naturally infected with Wolbachia strains wAlbA and wAlbB. Cell line Aa23 established from Ae. albopictus embryos retains only wAlbB and is a key model to study host-endosymbiont interactions. We have assembled the complete circular genome of wAlbB from the Aa23 cell line using long-read PacBio sequencing at 500× median coverage. The assembled circular chromosome is 1.48 megabases in size, an increase of more than 300 kb over the published draft wAlbB genome. The annotation of the genome identified 1,205 protein coding genes, 34 tRNA, 3 rRNA, 1 tmRNA, and 3 other ncRNA loci. The long reads enabled sequencing over complex repeat regions which are difficult to resolve with short-read sequencing. Thirteen percent of the genome comprised insertion sequence elements distributed throughout the genome, some of which cause pseudogenization. Prophage WO genes encoding some essential components of phage particle assembly are missing, while the remainder are found in five prophage regions/WO-like islands or scattered around the genome. Orthology analysis identified a core proteome of 535 orthogroups across all completed Wolbachia genomes. The majority of proteins could be annotated using Pfam and eggNOG analyses, including ankyrins and components of the Type IV secretion system. KEGG analysis revealed the absence of five genes in wAlbB which are present in other Wolbachia. The availability of a complete circular chromosome from wAlbB will enable further biochemical, molecular, and genetic analyses on this strain and related Wolbachia. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.