Menu
July 7, 2019

Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage.

Genome assemblies that are accurate, complete and contiguous are essential for identifying important structural and functional elements of genomes and for identifying genetic variation. Nevertheless, most recent genome assemblies remain incomplete and fragmented. While long molecule sequencing promises to deliver more complete genome assemblies with fewer gaps, concerns about error rates, low yields, stringent DNA requirements and uncertainty about best practices may discourage many investigators from adopting this technology. Here, in conjunction with the platinum standard Drosophila melanogaster reference genome, we analyze recently published long molecule sequencing data to identify what governs completeness and contiguity of genome assemblies. We also present a hybrid meta-assembly approach that achieves remarkable assembly contiguity for both Drosophila and human assemblies with only modest long molecule sequencing coverage. Our results motivate a set of preliminary best practices for obtaining accurate and contiguous assemblies, a ‘missing manual’ that guides key decisions in building high quality de novo genome assemblies, from DNA isolation to polishing the assembly.© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 7, 2019

Genomic studies of nitrogen-fixing rhizobial strains from Phaseolus vulgaris seeds and nodules.

Rhizobia are soil bacteria that establish symbiotic relationships with legumes and fix nitrogen in root nodules. We recently reported that several nitrogen-fixing rhizobial strains, belonging to Rhizobium phaseoli, R. trifolii, R. grahamii and Sinorhizobium americanum, were able to colonize Phaseolus vulgaris (common bean) seeds. To gain further insight into the traits that support this ability, we analyzed the genomic sequences and proteomes of R. phaseoli (CCGM1) and S. americanum (CCGM7) strains from seeds and compared them with those of the closely related strains CIAT652 and CFNEI73, respectively, isolated only from nodules.In a fine structural study of the S. americanum genomes, the chromosomes, megaplasmids and symbiotic plasmids were highly conserved and syntenic, with the exception of the smaller plasmid, which appeared unrelated. The symbiotic tract of CCGM7 appeared more disperse, possibly due to the action of transposases. The chromosomes of seed strains had less transposases and strain-specific genes. The seed strains CCGM1 and CCGM7 shared about half of their genomes with their closest strains (3353 and 3472 orthologs respectively), but a large fraction of the rest also had homology with other rhizobia. They contained 315 and 204 strain-specific genes, respectively, particularly abundant in the functions of transcription, motility, energy generation and cofactor biosynthesis. The proteomes of seed and nodule strains were obtained and showed a particular profile for each of the strains. About 82 % of the proteins in the comparisons appeared similar. Forty of the most abundant proteins in each strain were identified; these proteins in seed strains were involved in stress responses and coenzyme and cofactor biosynthesis and in the nodule strains mainly in central processes. Only 3 % of the abundant proteins had hypothetical functions.Functions that were enriched in the genomes and proteomes of seed strains possibly participate in the successful occupancy of the new niche. The genome of the strains had features possibly related to their presence in the seeds. This study helps to understand traits of rhizobia involved in seed adaptation.


July 7, 2019

Towards integration of population and comparative genomics in forest trees.

The past decade saw the initiation of an ongoing revolution in sequencing technologies that is transforming all fields of biology. This has been driven by the advent and widespread availability of high-throughput, massively parallel short-read sequencing (MPS) platforms. These technologies have enabled previously unimaginable studies, including draft assemblies of the massive genomes of coniferous species and population-scale resequencing. Transcriptomics studies have likewise been transformed, with RNA-sequencing enabling studies in nonmodel organisms, the discovery of previously unannotated genes (novel transcripts), entirely new classes of RNAs and previously unknown regulatory mechanisms. Here we touch upon current developments in the areas of genome assembly, comparative regulomics and population genetics as they relate to studies of forest tree species.© 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.


July 7, 2019

Probabilistic viral quasispecies assembly

Viruses are pathogens that cause infectious diseases. The swarm of virions is subject to the host’s immune pressure and possibly antiviral therapy. It may escape this selective pressure and gain selective advantage by acquiring one or more of the genomic alterations: single-nucleotide variants (SNVs), loss or gain of one or more amino acids, large deletions, for example, due to alternative splicing, or recombination of different strains. Genotypic antiretroviral drug resistance testing is performed via sequencing. Next-generation sequencing (NGS) technologies revolutionized assessing viral genetic diversity experimentally. In viral quasispecies analysis, there are two main goals: the identification of low-frequency variants and haplotype assembly on a whole-genome scale. PacBio performs single-molecule sequencing. This chapter elaborates human haplotyping and its relationship to probabilistic viral haplotype reconstruction methods. Viral quasispecies assembly has the potential to replace the current de facto diversity estimation by SNV calling. With advances in library preparation, increasing sensitivity of sequencing platforms, and more sophisticated models, it might be possible to detect all or most viral strains in a single individual.


July 7, 2019

The mechanisms whereby the green alga Chlorella ohadii, isolated from desert soil crust, exhibits unparalleled photodamage resistance.

Excess illumination damages the photosynthetic apparatus with severe implications with regard to plant productivity. Unlike model organisms, the growth of Chlorella ohadii, isolated from desert soil crust, remains unchanged and photosynthetic O2 evolution increases, even when exposed to irradiation twice that of maximal sunlight. Spectroscopic, biochemical and molecular approaches were applied to uncover the mechanisms involved. D1 protein in photosystem II (PSII) is barely degraded, even when exposed to antibiotics that prevent its replenishment. Measurements of various PSII parameters indicate that this complex functions differently from that in model organisms and suggest that C. ohadii activates a nonradiative electron recombination route which minimizes singlet oxygen formation and the resulting photoinhibition. The light-harvesting antenna is very small and carotene composition is hardly affected by excess illumination. Instead of succumbing to photodamage, C. ohadii activates additional means to dissipate excess light energy. It undergoes major structural, compositional and physiological changes, leading to a large rise in photosynthetic rate, lipids and carbohydrate content and inorganic carbon cycling. The ability of C. ohadii to avoid photodamage relies on a modified function of PSII and the dissipation of excess reductants downstream of the photosynthetic reaction centers. The biotechnological potential as a gene source for crop plant improvement is self-evident.© 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.


July 7, 2019

The complete chloroplast genome sequences for four Amaranthus species (Amaranthaceae).

The amaranth genus contains many important grain and weedy species. We further our understanding of the genus through the development of a complete reference chloroplast genome.A high-quality Amaranthus hypochondriacus (Amaranthaceae) chloroplast genome assembly was developed using long-read technology. This reference genome was used to reconstruct the chloroplast genomes for two closely related grain species (A. cruentus and A. caudatus) and their putative progenitor (A. hybridus). The reference genome was 150,518 bp and possesses a circular structure of two inverted repeats (24,352 bp) separated by small (17,941 bp) and large (83,873 bp) single-copy regions; it encodes 111 genes, 72 for proteins. Relative to the reference chloroplast genome, an average of 210 single-nucleotide polymorphisms (SNPs) and 122 insertion/deletion polymorphisms (indels) were identified across the analyzed genomes.This reference chloroplast genome, along with the reported simple sequence repeats, SNPs, and indels, is an invaluable genetic resource for studying the phylogeny and genetic diversity within the amaranth genus.


July 7, 2019

The botrydial biosynthetic gene cluster of Botrytis cinerea displays a bipartite genomic structure and is positively regulated by the putative Zn(II)2Cys6 transcription factor BcBot6.

Botrydial (BOT) is a non-host specific phytotoxin produced by the polyphagous phytopathogenic fungus Botrytis cinerea. The genomic region of the BOT biosynthetic gene cluster was investigated and revealed two additional genes named Bcbot6 and Bcbot7. Analysis revealed that the G+C/A+T-equilibrated regions that contain the Bcbot genes alternate with A+T-rich regions made of relics of transposable elements that have undergone repeat-induced point mutations (RIP). Furthermore, BcBot6, a Zn(II)2Cys6 putative transcription factor was identified as a nuclear protein and the major positive regulator of BOT biosynthesis. In addition, the phenotype of the ?Bcbot6 mutant indicated that BcBot6 and therefore BOT are dispensable for the development, pathogenicity and response to abiotic stresses in the B. cinerea strain B05.10. Finally, our data revealed that B. pseudocinerea, that is also polyphagous and lives in sympatry with B. cinerea, lacks the ability to produce BOT. Identification of BcBot6 as the major regulator of BOT synthesis is the first step towards a comprehensive understanding of the complete regulation network of BOT synthesis and of its ecological role in the B. cinerea life cycle. Copyright © 2016 Elsevier Inc. All rights reserved.


July 7, 2019

Complete genome sequence of Bacillus amyloliquefaciens subsp. plantarum S499, a rhizobacterium that triggers plant defences and inhibits fungal phytopathogens.

Bacillus amyloliquefaciens subsp. plantarum S499 is a plant beneficial rhizobacterium with a good antagonistic potential against phytopathogens through the release of active secondary metabolites. Moreover, it can induce systemic resistance in plants by producing considerable amounts of surfactins. The complete genome sequence of B. amyloliquefaciens subsp. plantarum S499 includes a circular chromosome of 3,927,922bp and a plasmid of 8,008bp. A remarkable abundance in genomic regions of putative horizontal origin emerged from the analysis. Furthermore, we highlighted the presence of genes involved in the establishment of interactions with the host plants at the root level and in the competition with other soil-borne microorganisms. More specifically, genes related to the synthesis of amylolysin, amylocyclicin, and butirosin were identified. These antimicrobials were not known before to be part of the antibiotic arsenal of the strain. The information embedded in the genome will support the upcoming studies regarding the application of B. amyloliquefaciens isolates as plant-growth promoters and biocontrol agents. Copyright © 2016 Elsevier B.V. All rights reserved.


July 7, 2019

Chimeras link to tandem repeats and transposable elements in tetraploid hybrid fish

Abstract The formation of the allotetraploid hybrid lineage (4nAT) encompasses both distant hybridization and polyploidization processes. The allotetraploid offspring have two sets of sub-genomes inherited from both parental species and therefore it is important to explore its genetic structure. Herein, we construct a bacterial artificial chromosome library of allotetraploids, and then sequence and analyze the full-length sequences of 19 bacterial artificial chromosomes. Sixty-eight DNA chimeras are identified, which are divided into four models according to the distribution of the genomic DNA derived from the parents. Among the 68 genetic chimeras, 44 (64.71%) are linked to tandem repeats (TRs) and 23 (33.82%) are linked to transposable elements (TEs). The chimeras linked to TRs are related to slipped-strand mispairing and double-strand break repair while the chimeras linked to TEs are benefit from the intervention of recombinases. In addition, TRs and TEs are linked not only with the recombinations, but also with the insertions/deletions of DNA segments. We conclude that DNA chimeras accompanied by TRs and TEs coordinate a balance between the sub-genomes derived from the parents which reduces the genomic shock effects and favors the evolutionary and adaptive capacity of the allotetraploidization. It is the first report on the relationship between formation of the DNA chimeras and TRs and TEs in the polyploid animals.


July 7, 2019

A complete toolset for the study of Ustilago bromivora and Brachypodium sp. as a fungal-temperate grass pathosystem.

Due to their economic relevance, the study of plant pathogen interactions is of importance. However, elucidating these interactions and their underlying molecular mechanisms remains challenging since both host and pathogen need to be fully genetically accessible organisms. Here we present milestones in the establishment of a new biotrophic model pathosystem: Ustilago bromivora and Brachypodium sp. We provide a complete toolset, including an annotated fungal genome and methods for genetic manipulation of the fungus and its host plant. This toolset will enable researchers to easily study biotrophic interactions at the molecular level on both the pathogen and the host side. Moreover, our research on the fungal life cycle revealed a mating type bias phenomenon. U. bromivora harbors a haplo-lethal allele that is linked to one mating type region. As a result, the identified mating type bias strongly promotes inbreeding, which we consider to be a potential speciation driver.


July 7, 2019

Genomic analysis of phylotype I strain EP1 reveals substantial divergence from other strains in the Ralstonia solanacearum species complex.

Ralstonia solanacearum species complex is a devastating group of phytopathogens with an unusually wide host range and broad geographical distribution. R. solanacearum isolates may differ considerably in various properties including host range and pathogenicity, but the underlying genetic bases remain vague. Here, we conducted the genome sequencing of strain EP1 isolated from Guangdong Province of China, which belongs to phylotype I and is highly virulent to a range of solanaceous crops. Its complete genome contains a 3.95-Mb chromosome and a 2.05-Mb mega-plasmid, which is considerably bigger than reported genomes of other R. solanacearum strains. Both the chromosome and the mega-plasmid have essential house-keeping genes and many virulence genes. Comparative analysis of strain EP1 with other 3 phylotype I and 3 phylotype II, III, IV strains unveiled substantial genome rearrangements, insertions and deletions. Genome sequences are relatively conserved among the 4 phylotype I strains, but more divergent among strains of different phylotypes. Moreover, the strains exhibited considerable variations in their key virulence genes, including those encoding secretion systems and type III effectors. Our results provide valuable information for further elucidation of the genetic basis of diversified virulences and host range of R. solanacearum species.


July 7, 2019

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (~15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ~95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ~5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.© 2016 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Spontaneous chloroplast mutants mostly occur by replication slippage and show a biased pattern in the plastome of Oenothera.

Spontaneous plastome mutants have been used as a research tool since the beginning of genetics. However, technical restrictions have severely limited their contributions to research in physiology and molecular biology. Here, we used full plastome sequencing to systematically characterize a collection of 51 spontaneous chloroplast mutants in Oenothera (evening primrose). Most mutants carry only a single mutation. Unexpectedly, the vast majority of mutations do not represent single nucleotide polymorphisms but are insertions/deletions originating from DNA replication slippage events. Only very few mutations appear to be caused by imprecise double-strand break repair, nucleotide misincorporation during replication, or incorrect nucleotide excision repair following oxidative damage. U-turn inversions were not detected. Replication slippage is induced at repetitive sequences that can be very small and tend to have high A/T content. Interestingly, the mutations are not distributed randomly in the genome. The underrepresentation of mutations caused by faulty double-strand break repair might explain the high structural conservation of seed plant plastomes throughout evolution. In addition to providing a fully characterized mutant collection for future research on plastid genetics, gene expression, and photosynthesis, our work identified the spectrum of spontaneous mutations in plastids and reveals that this spectrum is very different from that in the nucleus.© 2016 American Society of Plant Biologists. All rights reserved.


July 7, 2019

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits.© The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 7, 2019

Exploiting next-generation sequencing to solve the haplotyping puzzle in polyploids: a simulation study.

Haplotypes are the units of inheritance in an organism, and many genetic analyses depend on their precise determination. Methods for haplotyping single individuals use the phasing information available in next-generation sequencing reads, by matching overlapping single-nucleotide polymorphisms while penalizing post hoc nucleotide corrections made. Haplotyping diploids is relatively easy, but the complexity of the problem increases drastically for polyploid genomes, which are found in both model organisms and in economically relevant plant and animal species. Although a number of tools are available for haplotyping polyploids, the effects of the genomic makeup and the sequencing strategy followed on the accuracy of these methods have hitherto not been thoroughly evaluated.We developed the simulation pipeline haplosim to evaluate the performance of three haplotype estimation algorithms for polyploids: HapCompass, HapTree and SDhaP, in settings varying in sequencing approach, ploidy levels and genomic diversity, using tetraploid potato as the model. Our results show that sequencing depth is the major determinant of haplotype estimation quality, that 1?kb PacBio circular consensus sequencing reads and Illumina reads with large insert-sizes are competitive and that all methods fail to produce good haplotypes when ploidy levels increase. Comparing the three methods, HapTree produces the most accurate estimates, but also consumes the most resources. There is clearly room for improvement in polyploid haplotyping algorithms.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.