Canu Archives - PacBio

June 1, 2021 |

De novo assembly and preliminary annotation of the Schizocardium californicum genome

Animals in the phylum Hemichordata have provided key understanding of the origins and development of body patterning and nervous system organization. However, efforts to sequence and assemble the genomes of highly heterozygous non-model organisms have proven to be difficult with traditional short read approaches. Long repetitive DNA structures, extensive structural variation between haplotypes in polyploid species, and large genome sizes are limiting factors to achieving highly contiguous genome assemblies. Here we present the highly contiguous de novo assembly and preliminary annotation of an indirect developing hemichordate genome, Schizocardium californicum, using SMRT Sequening long reads.

June 1, 2021 |

Single molecule high-fidelity (HiFi) Sequencing with >10 kb libraries

Recent improvements in sequencing chemistry and instrument performance combine to create a new PacBio data type, Single Molecule High-Fidelity reads (HiFi reads). Increased read length and improvement in library construction enables average read lengths of 10-20 kb with average sequence identity greater than 99% from raw single molecule reads. The resulting reads have the accuracy comparable to short read NGS but with 50-100 times longer read length. Here we benchmark the performance of this data type by sequencing and genotyping the Genome in a Bottle (GIAB) HG0002 human reference sample from the National Institute of Standards and Technology (NIST). We further demonstrate the general utility of HiFi reads by analyzing multiple clones of Cabernet Sauvignon. Three different clones were sequenced and de novo assembled with the CANU assembly algorithm, generating draft assemblies of very high contiguity equal to or better than earlier assembly efforts using PacBio long reads. Using the Cabernet Sauvignon Clone 8 assembly as a reference, we mapped the HiFi reads generated from Clone 6 and Clone 47 to identify single nucleotide polymorphisms (SNPs) and structural variants (SVs) that are specific to each of the three samples.

June 1, 2021 |

Comparison of sequencing approaches applied to complex soil metagenomes to resolve proteins of interest

Background: Long-read sequencing presents several potential advantages for providing more complete gene profiling of metagenomic samples. Long reads can capture multiple genes in a single read, and longer reads typically result in assemblies with better contiguity, especially for higher abundance organisms. However, a major challenge with using long reads has been the higher cost per base, which may lead to insufficient coverage of low-abundance species. Additionally, lower single-pass accuracy can make gene discovery for low-abundance organisms difficult. Methods: To evaluate the pros and cons of long reads for metagenomics, we directly compared PacBio and Illumina sequencing on a soil-derived sample, which included spike-in controls of known concentrations of pure referenced samples. For PacBio sequencing, a 10 kb library was sequenced on the Sequel System with 3.0 chemistry. Highly accurate long reads (HiFi reads) with Q20 and higher were generated for downstream analyses using PacBio Circular Consensus Sequencing (CCS) mode. Results were assessed according to the following criteria: DNA extraction capacity, bioinformatics pipeline status, % of proteins with ambiguous AA’s, total unique error-free genes/$1000, total proteins observed in spike-ins/$1000, proteins of interest/$1000, median length of contigs with proteins, and assembly requirements. Results: Both methods had areas of superior performance. DNA extraction capacity was higher for Illumina, the bioinformatics pipeline is well-tested, and there was a lower proportion of proteins with ambiguous AA’s. On the other hand, with PacBio, twice as many unique error-free genes, twice as many total proteins from spike-ins, and ~6 times more proteins of interest were found per $1000 cost. PacBio data produced on average 5 times longer contigs capturing proteins of interest. Additionally, assembly was not required for gene or protein finding, as was the case with Illumina data. Conclusions: In this comparison of PacBio Sequel System with Illumina NextSeq on a complex microbiome, we conclude that the sequencing system of choice may vary, depending on the goals and resources for the project. PacBio sequencing requires a longer DNA extraction method, and the bioinformatics pipeline may require development. On the other hand, the Sequel System generates hundreds of thousands of long HiFi reads per SMRT Cell, producing more genes, more proteins, and longer contigs, thereby offering more information about the metagenomic samples for a lower cost.

July 19, 2019 |

Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster.

Highly repetitive satellite DNA (satDNA) repeats are found in most eukaryotic genomes. SatDNAs are rapidly evolving and have roles in genome stability and chromosome segregation. Their repetitive nature poses a challenge for genome assembly and makes progress on the detailed study of satDNA structure difficult. Here, we use single-molecule sequencing long reads from Pacific Biosciences (PacBio) to determine the detailed structure of all major autosomal complex satDNA loci in Drosophila melanogaster, with a particular focus on the 260-bp and Responder satellites. We determine the optimal de novo assembly methods and parameter combinations required to produce a high-quality assembly of these previously unassembled satDNA loci and validate this assembly using molecular and computational approaches. We determined that the computationally intensive PBcR-BLASR assembly pipeline yielded better assemblies than the faster and more efficient pipelines based on the MHAP hashing algorithm, and it is essential to validate assemblies of repetitive loci. The assemblies reveal that satDNA repeats are organized into large arrays interrupted by transposable elements. The repeats in the center of the array tend to be homogenized in sequence, suggesting that gene conversion and unequal crossovers lead to repeat homogenization through concerted evolution, although the degree of unequal crossing over may differ among complex satellite loci. We find evidence for higher-order structure within satDNA arrays that suggest recent structural rearrangements. These assemblies provide a platform for the evolutionary and functional genomics of satDNAs in pericentric heterochromatin. © 2017 Khost et al.; Published by Cold Spring Harbor Laboratory Press.

July 19, 2019 |

Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity.

Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology.Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ~7.9 million base pairs (Mb), representing a ~300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ~24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome.Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.© The Authors 2017. Published by Oxford University Press.

July 7, 2019 |

Genome sequence of Streptomyces sp. H-KF8, a marine actinobacterium isolated from a northern Chilean Patagonian fjord.

Streptomyces sp. H-KF8 is a fjord-derived marine actinobacterium capable of producing antimicrobial activity. Streptomyces sp. H-KF8 was isolated from sediments of the Comau fjord, located in the northern Chilean Patagonia. Here, we report the 7.7-Mb genome assembly, which represents the first genome of a Chilean marine actinobacterium. Copyright © 2017 Undabarrena et al.

July 7, 2019 |

Complete genome sequence of Edwardsiella hoshinae ATCC 35051.

Edwardsiella hoshinae is a Gram-negative facultative anaerobe that has primarily been isolated from avians and reptiles. We report here the complete and annotated genome sequence of an isolate from a monitor lizard (Varanus sp.), which contains a chromosome of 3,811,650 bp and no plasmids. Copyright © 2017 Reichley et al.

July 7, 2019 |

Complete genome sequences of two Staphylococcus aureus sequence type 5 isolates from California, USA.

Staphylococcus aureus causes a variety of human diseases ranging in severity. The pathogenicity of S. aureus can be partially attributed to the acquisition of mobile genetic elements. In this report, we provide two complete genome sequences from human clinical S. aureus isolates. Copyright © 2017 Hau et al.

July 7, 2019 |

Complete genome sequences of two geographically distinct Legionella micdadei clinical isolates.

Legionella is a highly diverse genus of intracellular bacterial pathogens that cause Legionnaire’s disease (LD), an often severe form of pneumonia. Two L. micdadei sp. clinical isolates, obtained from patients hospitalized with LD from geographically distinct areas, were sequenced using PacBio SMRT cell technology, identifying incomplete phage regions, which may impact virulence. Copyright © 2017 Osborne et al.

July 7, 2019 |

Complete genome sequence of the livestock-associated methicillin-resistant strain Staphylococcus aureus subsp. aureus 08S00974 (sequence type 398).

We report here the complete genome sequence of the livestock-associated methicillin-resistant Staphylococcus aureus strain 08S00974 from sequence type 398 (ST398 LA-MRSA) isolated from a fatting pig at a farm in Germany. Copyright © 2017 Makarova et al.

July 7, 2019 |

Zinc resistance within swine associated methicillin resistant staphylococcus aureus (MRSA) Isolates in the USA is associated with MLST lineage.

Zinc resistance in livestock-associated methicillin resistant Staphylococcus aureus (LA-MRSA) sequence type (ST) 398 is primarily mediated by the czrC gene co-located with the mecA gene, encoding methicillin resistance, within the type V SCCmec element. Because czrC and mecA are located within the same mobile genetic element, it has been suggested that the use of in feed zinc as an antidiarrheal agent has the potential to contribute to the emergence and spread of MRSA in swine through increased selection pressure to maintain the SCCmec element in isolates obtained from pigs. In this study we report the prevalence of the czrC gene and phenotypic zinc resistance in US swine associated LA-MRSA ST5 isolates, MRSA ST5 isolates from humans with no swine contact, and US swine associated LA-MRSA ST398 isolates. We demonstrate that the prevalence of zinc resistance in US swine associated LA-MRSA ST5 isolates was significantly lower than the prevalence of zinc resistance in MRSA ST5 isolates from humans with no swine contact, swine associated LA-MRSA ST398 isolates, and previous reports describing zinc resistance in other LA-MRSA ST398 isolates. Collectively our data suggest that selection pressure associated with zinc supplementation in feed is unlikely to have played a significant role in the emergence of LA-MRSA ST5 in the US swine population. Additionally, our data indicate that zinc resistance is associated with MLST lineage suggesting a potential link between genetic lineage and carriage of resistance determinants.Importance Our data suggest that coselection thought to be associated with the use of zinc in feed as an antimicrobial agent is not playing a role in the emergence of livestock-associated methicillin resistant Staphylococcus aureus (LA-MRSA) ST5 in the US swine population. Additionally, our data indicate that zinc resistance is more associated with multi locus sequence type (MLST) lineage suggesting a potential link between genetic lineage and carriage of resistance markers. This information is important to public health professionals, veterinarians, producers, and consumers. Copyright © 2017 American Society for Microbiology.

July 7, 2019 |

Complete genome sequence of the disinfectant susceptibility testing reference strain Staphylococcus aureus subsp. aureus ATCC 6538.

We report here the complete genome sequence of the methicillin-sensitive Staphylococcus aureus subsp. aureus strain ATCC 6538 (FDA 209, DSM 799, WDCM 00032, and NCTC 10788). Copyright © 2017 Makarova et al.

July 7, 2019 |

De novo whole-genome sequencing of the wood rot fungus Polyporus brumalis, which exhibits potential terpenoid metabolism.

Polyporus brumalis is able to synthesize several sesquiterpenes during fungal growth. Using a single-molecule real-time sequencing platform, we present the 53-Mb draft genome of P. brumalis, which contains 6,231 protein-coding genes. Gene annotation and isolation support genetic information, which can increase the understanding of sesquiterpene metabolism in P. brumalis. Copyright © 2017 Lee et al.

July 7, 2019 |

Whole-genome sequence of the Spodoptera frugiperda Sf9 insect cell line.

The draft whole-genome sequence of the Spodoptera frugiperda Sf9 insect cell line was obtained using long-read PacBio sequence technology and Canu assembly. The final assembled genome consisted of 451 Mbp in 4,577 contigs, with 12,716× mean coverage and a G+C content of 36.53%.

Asset Tag: Canu

De novo assembly and preliminary annotation of the Schizocardium californicum genome

Single molecule high-fidelity (HiFi) Sequencing with >10 kb libraries

Comparison of sequencing approaches applied to complex soil metagenomes to resolve proteins of interest

Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster.

Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity.

Genome sequence of Streptomyces sp. H-KF8, a marine actinobacterium isolated from a northern Chilean Patagonian fjord.

Complete genome sequence of Edwardsiella hoshinae ATCC 35051.

Complete genome sequences of two Staphylococcus aureus sequence type 5 isolates from California, USA.

Complete genome sequences of two geographically distinct Legionella micdadei clinical isolates.

Complete genome sequence of the livestock-associated methicillin-resistant strain Staphylococcus aureus subsp. aureus 08S00974 (sequence type 398).

Zinc resistance within swine associated methicillin resistant staphylococcus aureus (MRSA) Isolates in the USA is associated with MLST lineage.

Complete genome sequence of the disinfectant susceptibility testing reference strain Staphylococcus aureus subsp. aureus ATCC 6538.

De novo whole-genome sequencing of the wood rot fungus Polyporus brumalis, which exhibits potential terpenoid metabolism.

Whole-genome sequence of the Spodoptera frugiperda Sf9 insect cell line.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert