De novo assembly Archives - Page 312 of 324

July 7, 2019

TriPoly: haplotype estimation for polyploids using sequencing data of related individuals.

Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci.We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring.TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly.Supplementary data are available at Bioinformatics online.

July 7, 2019

Loss of RXFP2 and INSL3 genes in Afrotheria shows that testicular descent is the ancestral condition in placental mammals.

Descent of testes from a position near the kidneys into the lower abdomen or into the scrotum is an important developmental process that occurs in all placental mammals, with the exception of five afrotherian lineages. Since soft-tissue structures like testes are not preserved in the fossil record and since key parts of the placental mammal phylogeny remain controversial, it has been debated whether testicular descent is the ancestral or derived condition in placental mammals. To resolve this debate, we used genomic data of 71 mammalian species and analyzed the evolution of two key genes (relaxin/insulin-like family peptide receptor 2 [RXFP2] and insulin-like 3 [INSL3]) that induce the development of the gubernaculum, the ligament that is crucial for testicular descent. We show that both RXFP2 and INSL3 are lost or nonfunctional exclusively in four afrotherians (tenrec, cape elephant shrew, cape golden mole, and manatee) that completely lack testicular descent. The presence of remnants of once functional orthologs of both genes in these afrotherian species shows that these gene losses happened after the split from the placental mammal ancestor. These “molecular vestiges” provide strong evidence that testicular descent is the ancestral condition, irrespective of persisting phylogenetic discrepancies. Furthermore, the absence of shared gene-inactivating mutations and our estimates that the loss of RXFP2 happened at different time points strongly suggest that testicular descent was lost independently in Afrotheria. Our results provide a molecular mechanism that explains the loss of testicular descent in afrotherians and, more generally, highlight how molecular vestiges can provide insights into the evolution of soft-tissue characters.

July 7, 2019

Meeting report: mobile genetic elements and genome plasticity 2018

The Mobile Genetic Elements and Genome Plasticity conference was hosted by Keystone Symposia in Santa Fe, NM USA, February 11–15, 2018. The organizers were Marlene Belfort, Evan Eichler, Henry Levin and Lynn Maquat. The goal of this conference was to bring together scientists from around the world to discuss the function of transposable elements and their impact on host species. Central themes of the meeting included recent innovations in genome analysis and the role of mobile DNA in disease and evolution. The conference included 200 scientists who participated in poster presentations, short talks selected from abstracts, and invited talks. A total of 58 talks were organized into eight sessions and two workshops. The topics varied from mechanisms of mobilization, to the structure of genomes and their defense strategies to protect against transposable elements.

July 7, 2019

Fast-SG: an alignment-free algorithm for hybrid assembly.

Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes.Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.

July 7, 2019

High-quality genome sequence of the root-knot nematode Meloidogyne arenaria genotype A2-O.

Root-knot nematodes (Meloidogyne spp.) cause serious damage to many crops globally. We report the high-quality genome sequence of Meloidogyne arenaria genotype A2-O. The genome assembly of M. arenaria A2-O is composed of 2,224 contigs with an N50 contig length of 204,551?bp and a total assembly length of 284.05?Mb. Copyright © 2018 Sato et al.

July 7, 2019

Closed genome sequence of Clostridium botulinum strain CFSAN064329 (62A).

Clostridium botulinum is a strictly anaerobic, Gram-positive, spore-forming bacterium that produces botulinum neurotoxin, a potent and deadly proteinaceous exotoxin. Clostridium botulinum strain CFSAN064329 (62A) produces an A1 serotype/subtype botulinum neurotoxin and is frequently utilized in food challenge and detection studies. We report here the closed genome sequence of Clostridium botulinum strain CFSAN064329 (62A).

July 7, 2019

Complete genome sequence of Acinetobacter schindleri SGAir0122 isolated from Singapore Air.

Acinetobacter schindleri strain SGAir0122 was isolated from tropical air samples collected in Singapore. The prevalence of nosocomial infection caused by this Gram-negative bacterium indicates its clinical significance as an opportunistic human pathogen. Its complete genome consists of one chromosome of 3.105?Mb and a plasmid of 181?kb. Copyright © 2018 Kee et al.

July 7, 2019

Complete genome sequence of an avian native NDM-1-producing Salmonella enterica subsp. enterica serovar Corvallis strain.

Carbapenems are an important class of ß-lactams and one of the last options for treating severe human infections. We present here the complete genome sequence of avian native carbapenemase-producing Salmonella enterica subsp. enterica serovar Corvallis strain 12-01738, harboring a blaNDM-1-carrying IncA/C2 plasmid, isolated in 2012 from a wild bird (Milvus migrans) in Germany. Copyright © 2018 Hadziabdic et al.

July 7, 2019

Complete genome sequence of Streptomyces sp. strain BSE7F, a Bali mangrove sediment actinobacterium with antimicrobial activities.

The strain Streptomyces sp. BSE7F, a novel Streptomyces strain isolated from Indonesian mangrove sediment, displays antimicrobial activities against Gram-positive bacteria, Gram-negative bacteria, and yeast. Bioinformatic analysis of the genome sequence revealed the occurrence of 22 biosynthetic gene clusters disclosing the secondary metabolite capacity of strain BSE7F. Copyright © 2018 Handayani et al.

July 7, 2019

Genome sequence of the soybean cyst nematode (Heteroderaglycines)endosymbiont “Candidatus Cardinium hertigii” strain cHgTN10.

In this study, we present the genome sequence of the “Candidatus Cardinium hertigii” strain cHgTN10, an endosymbiotic bacterium of the plant-parasitic nematode Heterodera glycines This is the first genome assembly reported for an endosymbiont directly sequenced from a tylenchid nematode. Copyright © 2018 Showmaker et al.

July 7, 2019

Complete genome sequences of four Salmonella enterica subsp. enterica serovar Senftenberg and Montevideo isolates associated with a 2016 multistate outbreak in the United States.

A multistate outbreak of 11 Salmonella infections linked to pistachio nuts occurred in 2016. In this announcement, we report the complete genome sequences of four Salmonella enterica subsp. enterica serovar Senftenberg and S. enterica subsp. enterica serovar Montevideo isolates from pistachios collected during the 2016 outbreak investigation.

July 7, 2019

Genome sequence of Geobacillus thermoleovorans SGAir0734, isolated from Singapore air.

The thermophilic bacterium Geobacillus thermoleovorans was isolated from a tropical air sample collected in Singapore. The genome was sequenced on the PacBio RS II platform and consists of one chromosome with 3.6?Mb and one plasmid with 75?kb. The genome comprises 3,509 protein-coding genes, 88 tRNAs, and 27 rRNAs. Copyright © 2018 Gaultier et al.

July 7, 2019

Genome sequence of Bacillus velezensis SGAir0473, isolated from tropical air collected in Singapore.

Bacillus velezensis strain SGAir0473 (Firmicutes) was isolated from tropical air collected in Singapore. Its genome was assembled using short reads and single-molecule real-time sequencing and comprises one chromosome with 4.18?Mb. The genome consists of 3,937 protein-coding genes, 86 tRNAs, and 27 rRNAs. Copyright © 2018 Lim et al.

July 7, 2019

Genome sequence of Pantoea ananatis SGAir0210, isolated from outdoor air in Singapore.

Pantoea ananatis SGAir0210 was isolated from outdoor air collected in Singapore. The genome was assembled from long reads generated by single-molecule real-time sequencing complemented with short reads. The genome size was approximately 4.81 Mb, with 4,303 protein-coding genes, 80 tRNAs, and 22 rRNAs identified. Copyright © 2018 Luhung et al.

July 7, 2019

Interpreting whole-genome sequence analyses of foodborne bacteria for regulatory applications and outbreak investigations.

Whole-genome sequence (WGS) analysis has revolutionized the food safety industry by enabling high-resolution typing of foodborne bacteria. Higher resolving power allows investigators to identify origins of contamination during illness outbreaks and regulatory activities quickly and accurately. Government agencies and industry stakeholders worldwide are now analyzing WGS data routinely. Although researchers have published many studies that assess the efficacy of WGS data analysis for source attribution, guidance for interpreting WGS analyses is lacking. Here, we provide the framework for interpreting WGS analyses used by the Food and Drug Administration’s Center for Food Safety and Applied Nutrition (CFSAN). We based this framework on the experiences of CFSAN investigators, collaborations and interactions with government and industry partners, and evaluation of the published literature. A fundamental question for investigators is whether two or more bacteria arose from the same source of contamination. Analysts often count the numbers of nucleotide differences [single-nucleotide polymorphisms (SNPs)] between two or more genome sequences to measure genetic distances. However, using SNP thresholds alone to assess whether bacteria originated from the same source can be misleading. Bacteria that are isolated from food, environmental, or clinical samples are representatives of bacterial populations. These populations are subject to evolutionary forces that can change genome sequences. Therefore, interpreting WGS analyses of foodborne bacteria requires a more sophisticated approach. Here, we present a framework for interpreting WGS analyses that combines SNP counts with phylogenetic tree topologies and bootstrap support. We also clarify the roles of WGS, epidemiological, traceback, and other evidence in forming the conclusions of investigations. Finally, we present examples that illustrate the application of this framework to real-world situations.

Auto Tag: De novo assembly

TriPoly: haplotype estimation for polyploids using sequencing data of related individuals.

Loss of RXFP2 and INSL3 genes in Afrotheria shows that testicular descent is the ancestral condition in placental mammals.

Meeting report: mobile genetic elements and genome plasticity 2018

Fast-SG: an alignment-free algorithm for hybrid assembly.

High-quality genome sequence of the root-knot nematode Meloidogyne arenaria genotype A2-O.

Closed genome sequence of Clostridium botulinum strain CFSAN064329 (62A).

Complete genome sequence of Acinetobacter schindleri SGAir0122 isolated from Singapore Air.

Complete genome sequence of an avian native NDM-1-producing Salmonella enterica subsp. enterica serovar Corvallis strain.

Complete genome sequence of Streptomyces sp. strain BSE7F, a Bali mangrove sediment actinobacterium with antimicrobial activities.

Genome sequence of the soybean cyst nematode (Heteroderaglycines)endosymbiont “Candidatus Cardinium hertigii” strain cHgTN10.

Complete genome sequences of four Salmonella enterica subsp. enterica serovar Senftenberg and Montevideo isolates associated with a 2016 multistate outbreak in the United States.

Genome sequence of Geobacillus thermoleovorans SGAir0734, isolated from Singapore air.

Genome sequence of Bacillus velezensis SGAir0473, isolated from tropical air collected in Singapore.

Genome sequence of Pantoea ananatis SGAir0210, isolated from outdoor air in Singapore.

Interpreting whole-genome sequence analyses of foodborne bacteria for regulatory applications and outbreak investigations.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert