Menu
July 7, 2019

Dissecting a hidden gene duplication: the Arabidopsis thaliana SEC10 locus.

Repetitive sequences present a challenge for genome sequence assembly, and highly similar segmental duplications may disappear from assembled genome sequences. Having found a surprising lack of observable phenotypic deviations and non-Mendelian segregation in Arabidopsis thaliana mutants in SEC10, a gene encoding a core subunit of the exocyst tethering complex, we examined whether this could be explained by a hidden gene duplication. Re-sequencing and manual assembly of the Arabidopsis thaliana SEC10 (At5g12370) locus revealed that this locus, comprising a single gene in the reference genome assembly, indeed contains two paralogous genes in tandem, SEC10a and SEC10b, and that a sequence segment of 7 kb in length is missing from the reference genome sequence. Differences between the two paralogs are concentrated in non-coding regions, while the predicted protein sequences exhibit 99% identity, differing only by substitution of five amino acid residues and an indel of four residues. Both SEC10 genes are expressed, although varying transcript levels suggest differential regulation. Homozygous T-DNA insertion mutants in either paralog exhibit a wild-type phenotype, consistent with proposed extensive functional redundancy of the two genes. By these observations we demonstrate that recently duplicated genes may remain hidden even in well-characterized genomes, such as that of A. thaliana. Moreover, we show that the use of the existing A. thaliana reference genome sequence as a guide for sequence assembly of new Arabidopsis accessions or related species has at least in some cases led to error propagation.


July 7, 2019

Genome sequencing of two Neorhizobium galegae strains reveals a noeT gene responsible for the unusual acetylation of the nodulation factors.

The species Neorhizobium galegae comprises two symbiovars that induce nodules on Galega plants. Strains of both symbiovars, orientalis and officinalis, induce nodules on the same plant species, but fix nitrogen only in their own host species. The mechanism behind this strict host specificity is not yet known. In this study, genome sequences of representatives of the two symbiovars were produced, providing new material for studying properties of N. galegae, with a special interest in genomic differences that may play a role in host specificity.The genome sequences confirmed that the two representative strains are much alike at a whole-genome level. Analysis of orthologous genes showed that N. galegae has a higher number of orthologs shared with Rhizobium than with Agrobacterium. The symbiosis plasmid of strain HAMBI 1141 was shown to transfer by conjugation under optimal conditions. In addition, both sequenced strains have an acetyltransferase gene which was shown to modify the Nod factor on the residue adjacent to the non-reducing-terminal residue. The working hypothesis that this gene is of major importance in directing host specificity of N. galegae could not, however, be confirmed.Strains of N. galegae have many genes differentiating them from strains of Agrobacterium, Rhizobium and Sinorhizobium. However, the mechanism behind their ecological difference is not evident. Although the final determinant for the strict host specificity of N. galegae remains to be identified, the gene responsible for the species-specific acetylation of the Nod factors was identified in this study. We propose the name noeT for this gene to reflect its role in symbiosis.


July 7, 2019

Whole-genome sequence of Serratia symbiotica strain CWBI-2.3T, a free-living symbiont of the black bean aphid Aphis fabae.

The gammaproteobacterium Serratia symbiotica is one of the major secondary symbionts found in aphids. Here, we report the draft genome sequence of S. symbiotica strain CWBI-2.3(T), previously isolated from the black bean aphid Aphis fabae. The 3.58-Mb genome sequence might provide new insights to understand the evolution of insect-microbe symbiosis. Copyright © 2014 Foray et al.


July 7, 2019

Safety of the surrogate microorganism Enterococcus faecium NRRL B-2354 for use in thermal process validation.

Enterococcus faecium NRRL B-2354 is a surrogate microorganism used in place of pathogens for validation of thermal processing technologies and systems. We evaluated the safety of strain NRRL B-2354 based on its genomic and functional characteristics. The genome of E. faecium NRRL B-2354 was sequenced and found to comprise a 2,635,572-bp chromosome and a 214,319-bp megaplasmid. A total of 2,639 coding sequences were identified, including 45 genes unique to this strain. Hierarchical clustering of the NRRL B-2354 genome with 126 other E. faecium genomes as well as pbp5 locus comparisons and multilocus sequence typing (MLST) showed that the genotype of this strain is most similar to commensal, or community-associated, strains of this species. E. faecium NRRL B-2354 lacks antibiotic resistance genes, and both NRRL B-2354 and its clonal relative ATCC 8459 are sensitive to clinically relevant antibiotics. This organism also lacks, or contains nonfunctional copies of, enterococcal virulence genes including acm, cyl, the ebp operon, esp, gelE, hyl, IS16, and associated phenotypes. It does contain scm, sagA, efaA, and pilA, although either these genes were not expressed or their roles in enterococcal virulence are not well understood. Compared with the clinical strains TX0082 and 1,231,502, E. faecium NRRL B-2354 was more resistant to acidic conditions (pH 2.4) and high temperatures (60°C) and was able to grow in 8% ethanol. These findings support the continued use of E. faecium NRRL B-2354 in thermal process validation of food products.


July 7, 2019

LoRDEC: accurate and efficient long read error correction.

PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space.We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy. Availability and implementaion: LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec. © The Author 2014. Published by Oxford University Press.


July 7, 2019

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information.

The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data.Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes.The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.


July 7, 2019

Type I restriction enzymes and their relatives.

Type I restriction enzymes (REases) are large pentameric proteins with separate restriction (R), methylation (M) and DNA sequence-recognition (S) subunits. They were the first REases to be discovered and purified, but unlike the enormously useful Type II REases, they have yet to find a place in the enzymatic toolbox of molecular biologists. Type I enzymes have been difficult to characterize, but this is changing as genome analysis reveals their genes, and methylome analysis reveals their recognition sequences. Several Type I REases have been studied in detail and what has been learned about them invites greater attention. In this article, we discuss aspects of the biochemistry, biology and regulation of Type I REases, and of the mechanisms that bacteriophages and plasmids have evolved to evade them. Type I REases have a remarkable ability to change sequence specificity by domain shuffling and rearrangements. We summarize the classic experiments and observations that led to this discovery, and we discuss how this ability depends on the modular organizations of the enzymes and of their S subunits. Finally, we describe examples of Type II restriction-modification systems that have features in common with Type I enzymes, with emphasis on the varied Type IIG enzymes.


July 7, 2019

Complete genome sequence of the sugar cane endophyte Pseudomonas aurantiaca PB-St2, a disease-suppressive bacterium with antifungal activity toward the plant pathogen Colletotrichum falcatum.

The endophytic bacterium Pseudomonas aurantiaca PB-St2 exhibits antifungal activity and represents a biocontrol agent to suppress red rot disease of sugar cane. Here, we report the completely sequenced 6.6-Mb genome of P. aurantiaca PB-St2. The sequence contains a repertoire of biosynthetic genes for secondary metabolites that putatively contribute to its antagonistic activity and its plant-microbe interactions.


July 7, 2019

First complete genome sequence of Salmonella enterica subsp. enterica serovar Typhimurium strain ATCC 13311 (NCTC 74), a reference strain of multidrug resistance, as achieved by use of PacBio Single-Molecule Real-Time technology.

We report the first complete genomic sequence of Salmonella enterica subsp. enterica serovar Typhimurium strain ATCC 13311, the leading food-borne pathogen and a reference strain used in drug resistance studies. De novo assembly with PacBio sequencing completed its chromosome and one plasmid. They will accelerate the investigation into multidrug resistance in Salmonella Typhimurium. Copyright © 2014 Terabayashi et al.


July 7, 2019

Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a pandoravirus morphology.

The largest known DNA viruses infect Acanthamoeba and belong to two markedly different families. The Megaviridae exhibit pseudo-icosahedral virions up to 0.7 µm in diameter and adenine-thymine (AT)-rich genomes of up to 1.25 Mb encoding a thousand proteins. Like their Mimivirus prototype discovered 10 y ago, they entirely replicate within cytoplasmic virion factories. In contrast, the recently discovered Pandoraviruses exhibit larger amphora-shaped virions 1 µm in length and guanine-cytosine-rich genomes up to 2.8 Mb long encoding up to 2,500 proteins. Their replication involves the host nucleus. Whereas the Megaviridae share some general features with the previously described icosahedral large DNA viruses, the Pandoraviruses appear unrelated to them. Here we report the discovery of a third type of giant virus combining an even larger pandoravirus-like particle 1.5 µm in length with a surprisingly smaller 600 kb AT-rich genome, a gene content more similar to Iridoviruses and Marseillevirus, and a fully cytoplasmic replication reminiscent of the Megaviridae. This suggests that pandoravirus-like particles may be associated with a variety of virus families more diverse than previously envisioned. This giant virus, named Pithovirus sibericum, was isolated from a >30,000-y-old radiocarbon-dated sample when we initiated a survey of the virome of Siberian permafrost. The revival of such an ancestral amoeba-infecting virus used as a safe indicator of the possible presence of pathogenic DNA viruses, suggests that the thawing of permafrost either from global warming or industrial exploitation of circumpolar regions might not be exempt from future threats to human or animal health.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.