Menu
September 22, 2019

Insights into the evolution of multicellularity from the sea lettuce genome.

We report here the 98.5 Mbp haploid genome (12,924 protein coding genes) of Ulva mutabilis, a ubiquitous and iconic representative of the Ulvophyceae or green seaweeds. Ulva’s rapid and abundant growth makes it a key contributor to coastal biogeochemical cycles; its role in marine sulfur cycles is particularly important because it produces high levels of dimethylsulfoniopropionate (DMSP), the main precursor of volatile dimethyl sulfide (DMS). Rapid growth makes Ulva attractive biomass feedstock but also increasingly a driver of nuisance “green tides.” Ulvophytes are key to understanding the evolution of multicellularity in the green lineage, and Ulva morphogenesis is dependent on bacterial signals, making it an important species with which to study cross-kingdom communication. Our sequenced genome informs these aspects of ulvophyte cell biology, physiology, and ecology. Gene family expansions associated with multicellularity are distinct from those of freshwater algae. Candidate genes, including some that arose following horizontal gene transfer from chromalveolates, are present for the transport and metabolism of DMSP. The Ulva genome offers, therefore, new opportunities to understand coastal and marine ecosystems and the fundamental evolution of the green lineage. Copyright © 2018 Elsevier Ltd. All rights reserved.


September 22, 2019

Complete genome sequence of a blaKPC-2-positive Klebsiella pneumoniae strain isolated from the effluent of an urban sewage treatment plant in Japan.

Antimicrobial resistance genes (ARGs) and the bacteria that harbor them are widely distributed in the environment, especially in surface water, sewage treatment plant effluent, soil, and animal waste. In this study, we isolated a KPC-2-producing Klebsiella pneumoniae strain (GSU10-3) from a sampling site in Tokyo Bay, Japan, near a wastewater treatment plant (WWTP) and determined its complete genome sequence. Strain GSU10-3 is resistant to most ß-lactam antibiotics and other antimicrobial agents (quinolones and aminoglycosides). This strain is classified as sequence type 11 (ST11), and a core genome phylogenetic analysis indicated that strain GSU10-3 is closely related to KPC-2-positive Chinese clinical isolates from 2011 to 2017 and is clearly distinct from strains isolated from the European Union (EU), United States, and other Asian countries. Strain GSU10-3 harbors four plasmids, including a blaKPC-2-positive plasmid, pGSU10-3-3 (66.2?kb), which is smaller than other blaKPC-2-positive plasmids and notably carries dual replicons (IncFII [pHN7A8] and IncN). Such downsizing and the presence of dual replicons may promote its maintenance and stable replication, contributing to its broad host range with low fitness costs. A second plasmid, pGSU10-3-1 (159.0?kb), an IncA/C2 replicon, carries a class 1 integron (containing intI1, dfrA12, aadA2, qacE?1, and sul1) with a high degree of similarity to a broad-host-range plasmid present in the family Enterobacteriaceae The plasmid pGSU10-3-2 (134.8?kb), an IncFII(K) replicon, carries the IS26-mediated ARGs [aac(6′)Ib-cr,blaOXA-1, catB4 (truncated), and aac(3)-IId], tet(A), and a copper/arsenate resistance locus. GSU10-3 is the first nonclinical KPC-2-producing environmental Enterobacteriaceae isolate from Japan for which the whole genome has been sequenced.IMPORTANCE We isolated and determined the complete genome sequence of a KPC-2-producing K. pneumoniae strain from a sampling site in Tokyo Bay, Japan, near a wastewater treatment plant (WWTP). In Japan, the KPC type has been very rarely detected, while IMP is the most predominant type of carbapenemase in clinical carbapenemase-producing Enterobacteriaceae (CPE) isolates. Although laboratory testing thus far suggested that Japan may be virtually free of KPC-producing Enterobacteriaceae, we have detected it from effluent from a WWTP. Antimicrobial resistance (AMR) monitoring of WWTP effluent may contribute to the early detection of future AMR bacterial dissemination in clinical settings and communities; indeed, it will help illuminate the whole picture in which environmental contamination through WWTP effluent plays a part. Copyright © 2018 Sekizuka et al.


September 22, 2019

Comprehensive profiling of four base overhang ligation fidelity by T4 DNA Ligase and application to DNA assembly.

Synthetic biology relies on the manufacture of large and complex DNA constructs from libraries of genetic parts. Golden Gate and other Type IIS restriction enzyme-dependent DNA assembly methods enable rapid construction of genes and operons through one-pot, multifragment assembly, with the ordering of parts determined by the ligation of Watson-Crick base-paired overhangs. However, ligation of mismatched overhangs leads to erroneous assembly, and low-efficiency Watson Crick pairings can lead to truncated assemblies. Using sets of empirically vetted, high-accuracy junction pairs avoids this issue but limits the number of parts that can be joined in a single reaction. Here, we report the use of comprehensive end-joining ligation fidelity and bias data to predict high accuracy junction sets for Golden Gate assembly. The ligation profile accurately predicted junction fidelity in ten-fragment Golden Gate assembly reactions and enabled accurate and efficient assembly of a lac cassette from up to 24-fragments in a single reaction.


September 22, 2019

Complete genome sequence of Burkholderia sp. JP2-270, a rhizosphere isolate of rice with antifungal activity against Rhizoctonia solani.

Burkholderia sp. JP2-270, a bacterium with a strong ability to inhibit the growth of Rhizoctonia solani, was isolated from the rhizosphere of rice. The phylogenetic analysis based on 16S rRNA gene revealed that JP2-270 belonged to Burkholderia cepacia complex. Here, we present the complete genome sequence of Burkholderia sp. JP2-270, which consists of three circular chromosomes (Chr1 3,723,585 bp, Chr2 3,274,969 bp, Chr3 1,483,367 bp) and two plasmids (Plas1 15,126 bp, Plas2 428,263 bp). A total of 8193 protein coding genes were predicted in the genome, including 67 tRNA genes, 18 rRNA genes and 4 ncRNA genes. In addition, mutation analysis of Burkholderia sp. JP2-270 revealed that the gene bysR (DM992_17470), encoding a lysR-type transcriptional regulator, was essential for the antagonistic activity of Burkholderia sp. JP2-270 against R. solani GD118 in vitro and in vivo. Identification of regulatory gene associated with antagonistic activity will contribute to understand the antagonistic mechanism of Burkholderia sp. JP2-270. Copyright © 2018 Elsevier Ltd. All rights reserved.


September 22, 2019

Genomic Tandem Quadruplication is Associated with Ketoconazole Resistance in Malassezia pachydermatis.

Malassezia pachydermatis is a commensal yeast found on the skin of dogs. However, M. pachydermatis is also considered an opportunistic pathogen and is associated with various canine skin diseases including otitis externa and atopic dermatitis, which usually require treatment using an azole antifungal drug, such as ketoconazole. In this study, we isolated a ketoconazole-resistant strain of M. pachydermatis, designated “KCTC 27587,” from the external ear canal of a dog with otitis externa and analyzed its resistance mechanism. To understand the mechanism underlying ketoconazole resistance of the clinical isolate M. pachydermatis KCTC 27587, the whole genome of the yeast was sequenced using the PacBio platform and was compared with M. pachydermatis type strain CBS 1879. We found that a ~84-kb region in chromosome 4 of M. pachydermatis KCTC 27587 was tandemly quadruplicated. The quadruplicated region contains 52 protein coding genes, including the homologs of ERG4 and ERG11, whose overexpression is known to be associated with azole resistance. Our data suggest that the quadruplication of the ~84-kb region may be the cause of the ketoconazole resistance in M. pachydermatis KCTC 27587.


September 22, 2019

Complete and de novo assembly of the Leishmania braziliensis (M2904) genome.

Leishmania braziliensis is the etiological agent of American mucosal leishmaniasis, one of the most severe clinical forms of leishmaniasis. Here, we report the assembly of the L. braziliensis (M2904) genome into 35 continuous chromosomes. Also, the annotation of 8395 genes is provided. The public availability of this information will contribute to a better knowledge of this pathogen and help in the search for vaccines and novel drug targets aimed to control the disease caused by this Leishmania species.


September 21, 2019

Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements.

CRISPR-Cas9 is poised to become the gene editing tool of choice in clinical contexts. Thus far, exploration of Cas9-induced genetic alterations has been limited to the immediate vicinity of the target site and distal off-target sequences, leading to the conclusion that CRISPR-Cas9 was reasonably specific. Here we report significant on-target mutagenesis, such as large deletions and more complex genomic rearrangements at the targeted sites in mouse embryonic stem cells, mouse hematopoietic progenitors and a human differentiated cell line. Using long-read sequencing and long-range PCR genotyping, we show that DNA breaks introduced by single-guide RNA/Cas9 frequently resolved into deletions extending over many kilobases. Furthermore, lesions distal to the cut site and crossover events were identified. The observed genomic damage in mitotically active cells caused by CRISPR-Cas9 editing may have pathogenic consequences.


September 21, 2019

Population sequencing reveals clonal diversity and ancestral inbreeding in the grapevine cultivar Chardonnay.

Chardonnay is the basis of some of the world’s most iconic wines and its success is underpinned by a historic program of clonal selection. There are numerous clones of Chardonnay available that exhibit differences in key viticultural and oenological traits that have arisen from the accumulation of somatic mutations during centuries of asexual propagation. However, the genetic variation that underlies these differences remains largely unknown. To address this knowledge gap, a high-quality, diploid-phased Chardonnay genome assembly was produced from single-molecule real time sequencing, and combined with re-sequencing data from 15 different Chardonnay clones. There were 1620 markers identified that distinguish the 15 clones. These markers were reliably used for clonal identification of independently sourced genomic material, as well as in identifying a potential genetic basis for some clonal phenotypic differences. The predicted parentage of the Chardonnay haplomes was elucidated by mapping sequence data from the predicted parents of Chardonnay (Gouais blanc and Pinot noir) against the Chardonnay reference genome. This enabled the detection of instances of heterosis, with differentially-expanded gene families being inherited from the parents of Chardonnay. Most surprisingly however, the patterns of nucleotide variation present in the Chardonnay genome indicate that Pinot noir and Gouais blanc share an extremely high degree of kinship that has resulted in the Chardonnay genome displaying characteristics that are indicative of inbreeding.


July 19, 2019

Examining sources of error in PCR by single-molecule sequencing.

Next-generation sequencing technology has enabled the detection of rare genetic or somatic mutations and contributed to our understanding of disease progression and evolution. However, many next-generation sequencing technologies first rely on DNA amplification, via the Polymerase Chain Reaction (PCR), as part of sample preparation workflows. Mistakes made during PCR appear in sequencing data and contribute to false mutations that can ultimately confound genetic analysis. In this report, a single-molecule sequencing assay was used to comprehensively catalog the different types of errors introduced during PCR, including polymerase misincorporation, structure-induced template-switching, PCR-mediated recombination and DNA damage. In addition to well-characterized polymerase base substitution errors, other sources of error were found to be equally prevalent. PCR-mediated recombination by Taq polymerase was observed at the single-molecule level, and surprisingly found to occur as frequently as polymerase base substitution errors, suggesting it may be an underappreciated source of error for multiplex amplification reactions. Inverted repeat structural elements in lacZ caused polymerase template-switching between the top and bottom strands during replication and the frequency of these events were measured for different polymerases. For very accurate polymerases, DNA damage introduced during temperature cycling, and not polymerase base substitution errors, appeared to be the major contributor toward mutations occurring in amplification products. In total, we analyzed PCR products at the single-molecule level and present here a more complete picture of the types of mistakes that occur during DNA amplification.


July 19, 2019

Pacific Biosciences sequencing and IMGT/HighV-QUEST analysis of full-length single chain fragment variable from an in vivo selected phage-display combinatorial Library.

Phage-display selection of immunoglobulin (IG) or antibody single chain Fragment variable (scFv) from combinatorial libraries is widely used for identifying new antibodies for novel targets. Next-generation sequencing (NGS) has recently emerged as a new method for the high throughput characterization of IG and T cell receptor (TR) immune repertoires bothin vivoandin vitro. However, challenges remain for the NGS sequencing of scFv from combinatorial libraries owing to the scFv length (>800?bp) and the presence of two variable domains [variable heavy (VH) and variable light (VL) for IG] associated by a peptide linker in a single chain. Here, we show that single-molecule real-time (SMRT) sequencing with the Pacific Biosciences RS II platform allows for the generation of full-length scFv reads obtained from anin vivoselection of scFv-phages in an animal model of atherosclerosis. We first amplified the DNA of the phagemid inserts from scFv-phages eluted from an aortic section at the third round of thein vivoselection. From this amplified DNA, 450,558 reads were obtained from 15 SMRT cells. Highly accurate circular consensus sequences from these reads were generated, filtered by quality and then analyzed by IMGT/HighV-QUEST with the functionality for scFv. Full-length scFv were identified and characterized in 348,659 reads. Full-length scFv sequencing is an absolute requirement for analyzing the associated VH and VL domains enriched during thein vivopanning rounds. In order to further validate the ability of SMRT sequencing to provide high quality, full-length scFv sequences, we tracked the reads of an scFv-phage clone P3 previously identified by biological assays and Sanger sequencing. Sixty P3 reads showed 100% identity with the full-length scFv of 767?bp, 53 of them covering the whole insert of 977?bp, which encompassed the primer sequences. The remaining seven reads were identical over a shortened length of 939?bp that excludes the vicinity of primers at both ends. Interestingly these reads were obtained from each of the 15 SMRT cells. Thus, the SMRT sequencing method and the IMGT/HighV-QUEST functionality for scFv provides a straightforward protocol for characterization of full-length scFv from combinatorial phage libraries.


July 19, 2019

Dissecting the causal mechanism of X-linked Dystonia-Parkinsonism by integrating genome and transcriptome assembly.

X-linked Dystonia-Parkinsonism (XDP) is a Mendelian neurodegenerative disease that is endemic to the Philippines and is associated with a founder haplotype. We integrated multiple genome and transcriptome assembly technologies to narrow the causal mutation to the TAF1 locus, which included a SINE-VNTR-Alu (SVA) retrotransposition into intron 32 of the gene. Transcriptome analyses identified decreased expression of the canonical cTAF1 transcript among XDP probands, and de novo assembly across multiple pluripotent stem-cell-derived neuronal lineages discovered aberrant TAF1 transcription that involved alternative splicing and intron retention (IR) in proximity to the SVA that was anti-correlated with overall TAF1 expression. CRISPR/Cas9 excision of the SVA rescued this XDP-specific transcriptional signature and normalized TAF1 expression in probands. These data suggest an SVA-mediated aberrant transcriptional mechanism associated with XDP and may provide a roadmap for layered technologies and integrated assembly-based analyses for other unsolved Mendelian disorders. Copyright © 2018 Elsevier Inc. All rights reserved.


July 19, 2019

Long-read sequencing and de novo genome assembly of Ammopiptanthus nanus, a desert shrub.

Ammopiptanthus nanus is a rare broad-leaved shrub that is found in the desert and arid regions of Central Asia. This plant species exhibits extremely high tolerance to drought and freezing and has been used in abiotic tolerance research in plants. As a relic of the tertiary period, A. nanus is of great significance to plant biogeographic research in the ancient Mediterranean region. Here, we report a draft genome assembly using the Pacific Biosciences (PacBio) platform and gene annotation for A. nanus.A total of 64.72 Gb of raw PacBio sequel reads were generated from four 20-kb libraries. After filtering, 64.53 Gb of clean reads were obtained, giving 72.59× coverage depth. Assembly using Canu gave an assembly length of 823.74 Mb, with a contig N50 of 2.76 Mb. The final size of the assembled A. nanus genome was close to the 889 Mb estimated by k-mer analysis. The gene annotation completeness was evaluated using Benchmarking Universal Single-Copy Orthologs; 1,327 of the 1,440 conserved genes (92.15%) could be found in the A. nanus assembly. Genome annotation revealed that 74.08% of the A. nanus genome is composed of repetitive elements and 53.44% is composed of long terminal repeat elements. We predicted ?37,188 protein-coding genes, of which 96.53% were functionally annotated.The genomic sequences of A. nanus could be a valuable source for comparative genomic analysis in the legume family and will be useful for understanding the phylogenetic relationships of the Thermopsideae and the evolutionary response of plant species to the Qinghai Tibetan Plateau uplift.


July 19, 2019

How well can we create phased, diploid, human genomes?: An assessment of FALCON-Unzip phasing using a human trio

Long read sequencing technology has allowed researchers to create de novo assemblies with impressive continuity[1,2]. This advancement has dramatically increased the number of reference genomes available and hints at the possibility of a future where personal genomes are assembled rather than resequenced. In 2016 Pacific Biosciences released the FALCON-Unzip framework, which can provide long, phased haplotype contigs from de novo assemblies. This phased genome algorithm enhances the accuracy of highly heterozygous organisms and allows researchers to explore questions that require haplotype information such as allele-specific expression and regulation. However, validation of this technique has been limited to small genomes or inbred individuals[3]. As a roadmap to personal genome assembly and phasing, we assess the phasing accuracy of FALCON-Unzip in humans using publicly available data for the Ashkenazi trio from the Genome in a Bottle Consortium[4]. To assess the accuracy of the Unzip algorithm, we assembled the genome of the son using FALCON and FALCON Unzip, genotyped publicly available short read data for the mother and the father, and observed the inheritance pattern of the parental SNPs along the phased genome of the son. We found that 72.8% of haplotype contigs share SNPs with only one parent suggesting that these contigs are correctly phased. Most mis-phased SNPs are random but present in high frequency toward the end of haplotype contigs. Approximately 20.7% of mis-phased haplotype contigs contain clusters of mis-phased SNPs, suggesting that haplotypes were mis-joined by FALCON-Unzip. Mis-joined boundaries in those contigs are located in areas of low SNP density. This research demonstrates that the FALCON-Unzip algorithm can be used to create long and accurate haplotypes for humans and identifies problematic regions that could benefit in future improvement.


July 19, 2019

Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease.

Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 ‘GGGGCC’ (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9.Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinION was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8× coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained >?800× coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual’s repeat region was >?99% G4C2 content, though we cannot rule out small interruptions.Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.


July 19, 2019

From short reads to chromosome-scale genome assemblies.

A high-quality, annotated genome assembly is the foundation for many downstream studies. However, obtaining such an assembly is a complex, reiterative process that requires the assimilation of high-quality data and combines different approaches and data types. While some software packages incorporating multiple steps of genome assembly are commercially available, they may not be flexible enough to be routinely applied to all organisms, particularly to nonmodel species such as pathogenic oomycetes and fungi. If researchers understand and apply the most appropriate, currently available tools for each step, it is possible to customize parameters and optimize results for their organism of study. Based on our experience of de novo assembly and annotation of several oomycete species, this chapter provides a modular workflow from processing of raw reads, to initial assembly generation, through optimization, chromosome-scale scaffolding and annotation, outlining input and output data as well as examples and alternative software used for each step. The accompanying Notes provide background information for each step as well as alternative options. The final result of this workflow could be an annotated, high-quality, validated, chromosome-scale assembly or a draft assembly of sufficient quality to meet specific needs of a project.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.