Hybrid assembly Archives - Page 47 of 49

July 7, 2019 |

Systems biology-guided biodesign of consolidated lignin conversion

Lignin is the second most abundant biopolymer on the earth, yet its utilization for fungible products is complicated by its recalcitrant nature and remains a major challenge for sustainable lignocellulosic biorefineries. In this study, we used a systems biology approach to reveal the carbon utilization pattern and lignin degradation mechanisms in a unique lignin-utilizing Pseudomonas putida strain (A514). The mechanistic study further guided the design of three functional modules to enable a consolidated lignin bioconversion route. First, P. putida A514 mobilized a dye peroxidase-based enzymatic system for lignin depolymerization. This system could be enhanced by overexpressing a secreted multifunctional dye peroxidase to promote a two-fold enhancement of cell growth on insoluble kraft lignin. Second, A514 employed a variety of peripheral and central catabolism pathways to metabolize aromatic compounds, which can be optimized by overexpressing key enzymes. Third, the ß-oxidation of fatty acid was up-regulated, whereas fatty acid synthesis was down-regulated when A514 was grown on lignin and vanillic acid. Therefore, the functional module for polyhydroxyalkanoate (PHA) production was designed to rechannel ß-oxidation products. As a result, PHA content reached 73% per cell dry weight (CDW). Further integrating the three functional modules enhanced the production of PHA from kraft lignin and biorefinery waste. Thus, this study elucidated lignin conversion mechanisms in bacteria with potential industrial implications and laid out the concept for engineering a consolidated lignin conversion route.

July 7, 2019 |

svclassify: a method to establish benchmark structural variant calls.

The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz.We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.

July 7, 2019 |

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits.© The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019 |

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Contigs assembled from the second generation sequencing short reads may contain misassemblies, and thus complicate downstream analysis or even lead to incorrect analysis results. Fortunately, with more and more sequenced species available, it becomes possible to use the reference genome of a closely related species to detect misassemblies. In addition, long reads of the third generation sequencing technology have been more and more widely used, and can also help detect misassemblies.Here, we introduce ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies. In our performance test on short read assemblies of human chromosome 14 data, ReMILO can detect 41.8-77.9% extensive misassemblies and 33.6-54.5% local misassemblies. On hybrid short and long read assemblies of S.pastorianus data, ReMILO can also detect 60.6-70.9% extensive misassemblies and 28.6-54.0% local misassemblies.The ReMILO software can be downloaded for free under Artistic License 2.0 from this site: https://github.com/songc001/remilo.baoe@bjtu.edu.cn.Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

July 7, 2019 |

Genomic insights into Photobacterium damselae subsp. damselae strain KC-Na-1, isolated from the finless porpoise (Neophocaena asiaeorientalis)

Photobacterium damselae subsp. damselae (PDD) is a marine bacterium that can infect a variety of marine animals and humans. Although this bacterium has been isolated from several stranded dolphins and whales, its pathogenic role in cetaceans is still unclear. In this study, we report the complete genome of PDD strain KC-Na-1 isolated from a finless porpoise (Neophocaena asiaeorientalis) rescued from the South Sea (Republic of Korea). The sequenced genome comprised two chromosomes and four plasmids. Among the recently identified major virulence factors in PDD, only phospholipase (plpV) was found in strain KC-Na-1. Interestingly, two genes homologous to Vibrio thermostable direct hemolysin (tdh) and its transcriptional regulator toxR, which are known virulence factors associated with Vibrio parahaemolyticus, were encoded on the plasmid pPDD-Na-1-3. Based on these results, strain KC-Na-1 may have potential pathogenicity in humans and other marine animals and also could act as a potential virulent strain. To the best of our knowledge, this is the first report of the complete genome sequence of P. damselae.

July 7, 2019 |

Construction of two whole genome radiation hybrid panels for dromedary (Camelus dromedarius): 5000RAD and 15000RAD.

The availability of genomic resources including linkage information for camelids has been very limited. Here, we describe the construction of a set of two radiation hybrid (RH) panels (5000RADand 15000RAD) for the dromedary (Camelus dromedarius) as a permanent genetic resource for camel genome researchers worldwide. For the 5000RADpanel, a total of 245 female camel-hamster radiation hybrid clones were collected, of which 186 were screened with 44 custom designed marker loci distributed throughout camel genome. The overall mean retention frequency (RF) of the final set of 93 hybrids was 47.7%. For the 15000RADpanel, 238 male dromedary-hamster radiation hybrid clones were collected, of which 93 were tested using 44 PCR markers. The final set of 90 clones had a mean RF of 39.9%. This 15000RADpanel is an important high-resolution complement to the main 5000RADpanel and an indispensable tool for resolving complex genomic regions. This valuable genetic resource of dromedary RH panels is expected to be instrumental for constructing a high resolution camel genome map. Construction of the set of RH panels is essential step toward chromosome level reference quality genome assembly that is critical for advancing camelid genomics and the development of custom genomic tools.

July 7, 2019 |

Complete genome sequence of Methanobrevibacter smithii strain KB11, isolated from a Korean fecal sample.

The archaeon Methanobrevibacter smithii is a major colonizer of the human gut. Methanobrevibacter smithii strain KB11 was newly isolated from a Korean fecal sample. Here, we present the complete genome sequence of strain KB11 and a brief comparison with that of M. smithii type strain ATCC 35061T.

July 7, 2019 |

FMLRC: Hybrid long read error correction using an FM-index.

Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy.We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods.Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

July 7, 2019 |

Tigmint: correcting assembly errors using linked reads from large molecules.

Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap.To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing.Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone.

July 7, 2019 |

Genome sequences of five Mycobacterium bovis strains isolated from farmed animals and wildlife in Canada.

Mycobacterium bovis is the causative agent of bovine tuberculosis, an infectious disease that affects both animals and humans and thus presents a risk to public health and the livestock industry. Here, we report the genome sequences of five Mycobacterium bovis strains that represent major genotype clusters observed in farmed animals and wildlife in Canada.© Crown copyright 2018.

July 7, 2019 |

Complete genome sequence of the live attenuated vaccine strain Brucella melitensis Rev.1.

Live attenuated vaccines are essential elements in control programs for the prevention of brucellosis. Here, we report the whole-genome sequence of the original Elberg Brucella melitensis Rev.1 vaccine strain, passage 101 (1970). Commercial lines of the original strain have been successfully used in small ruminants worldwide. Copyright © 2018 Salmon-Divon et al.

July 7, 2019 |

Complete genome sequence of the symbiotic strain Bradyrhizobium icense LMTR 13T, isolated from lima bean (Phaseolus lunatus) in Peru.

The complete genome sequence of Bradyrhizobium icense LMTR 13T, a root nodule bacterium isolated from the legume Phaseolus lunatus, is reported here. The genome consists of a circular 8,322,773-bp chromosome which codes for a large and novel symbiotic island as well as genes putatively involved in soil and root colonization. Copyright © 2018 Ormeño-Orrillo et al.

July 7, 2019 |

First detection of a blaCTX-M-15-carrying plasmid in Vibrio alginolyticus.

Vibrio alginolyticus is a gram-negative halophilic bacterium, widely distributed in sea-water and seafood all over the world and is the main pathogenic bacteria of marine animals such as fish, shrimp and shellfish. Besides, it is also an important human pathogen causing eye, ear and wound infections, as well as gastroenteritis, septicemia, and necrotizing fasciitis [1]. Resistance to extended-spectrum cephalosporins is rarely ob- served in V. alginolyticus. Here, we report for the first time the identification of a foodborne V. alginolyticus strain Vb0506 carrying plasmid encoding blaCTX-M-15.

July 7, 2019 |

First complete genome sequence of Yersinia massiliensis.

Using a combination of Illumina paired-end sequencing, Pacific Biosciences RS II sequencing, and OpGen Argus whole-genome optical mapping, we report here the first complete genome sequence of Yersinia massiliensis The completed genome consists of a 4.99-Mb chromosome, a 121-kb megaplasmid, and a 57-kb plasmid.© Crown copyright 2018.

July 7, 2019 |

Improved draft genome sequence of a monoteliosporic culture of the karnal bunt (Tilletia indica) pathogen of wheat.

Karnal bunt of wheat is an internationally quarantined fungal pathogen disease caused by Tilletia indica and affects the international commercial seed trade of wheat. We announce here the first improved draft genome assembly of a monoteliosporic culture of the Tilletia indica fungus, consisting of 787 scaffolds with an approximate total genome size of 31.83 Mbp, which is more accurate and near to complete than the previous version. Copyright © 2018 Kumar et al.

Auto Tag: Hybrid assembly

Systems biology-guided biodesign of consolidated lignin conversion

svclassify: a method to establish benchmark structural variant calls.

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Genomic insights into Photobacterium damselae subsp. damselae strain KC-Na-1, isolated from the finless porpoise (Neophocaena asiaeorientalis)

Construction of two whole genome radiation hybrid panels for dromedary (Camelus dromedarius): 5000RAD and 15000RAD.

Complete genome sequence of Methanobrevibacter smithii strain KB11, isolated from a Korean fecal sample.

FMLRC: Hybrid long read error correction using an FM-index.

Tigmint: correcting assembly errors using linked reads from large molecules.

Genome sequences of five Mycobacterium bovis strains isolated from farmed animals and wildlife in Canada.

Complete genome sequence of the live attenuated vaccine strain Brucella melitensis Rev.1.

Complete genome sequence of the symbiotic strain Bradyrhizobium icense LMTR 13T, isolated from lima bean (Phaseolus lunatus) in Peru.

First detection of a blaCTX-M-15-carrying plasmid in Vibrio alginolyticus.

First complete genome sequence of Yersinia massiliensis.

Improved draft genome sequence of a monoteliosporic culture of the karnal bunt (Tilletia indica) pathogen of wheat.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert