NGS Archives - Page 7 of 425

April 21, 2020

Complete genome sequence and evolution analysis of Psychrobacter sp. YP14 from Gammaridea Gastrointestinal Microbiota of Yap Trench

Psychrobacter sp. YP14, a moderately psychrophilic bacterium belonging to the class Gammaproteobacteria, was isolated from Gammaridea Gastrointestinal Microbiota of Yap Trench. The strain has one circular chromosome of 2,895,311 bp with a 44.66% GC content, consisting of 2333 protein-coding genes, 53 tRNA genes and 9 rRNA genes. Four plasmids were completely assembled and their sizes were 13,712 bp, 19711 bp, 36270 bp, 8194 bp, respectively. In particular, a putative open reading frame (ORF) for dienelactone hydrolase (DLH) related to degradation of chlorinated aromatic hydrocarbons. To get an better understanding of the evolution of Psychrobacter sp. YP14 in this genus, six Psychrobacter strains (G, PRwf-1, DAB_AL43B, AntiMn-1,P11G5, P2G3), with publicly available complete genome, were selected and comparative genomics analysis were performed among them. The closest phylogenetic relationship was identified between strains G and K5 based on 16s gene and ANI (average nucleotide identity) values. Analysis of the pan-genome structure found that YP14 has fewer COG clusters associated with transposons and prophage which indicates fewer sequence rearrangements compared with PRwf-1. Besides, stress response-related genes of strain YP14 demonstrates that it has less strategies to cope with extreme environment, which is consistent with its intestinal habitat. The difference of metabolism and strategies coped with stress response of YP14 are more conducive to the study of microbial survival and metabolic mechanisms in deep sea environment.

April 21, 2020

Hi-C guided assemblies reveal conserved regulatory topologies on X and autosomes despite extensive genome shuffling

Genome rearrangements that occur during evolution impose major challenges on regulatory mechanisms that rely on three-dimensional genome architecture. Here, we developed a scaffolding algorithm and generated chromosome-length assemblies from Hi-C data for studying genome topology in three distantly related Drosophila species. We observe extensive genome shuffling between these species with one synteny breakpoint after approximately every six genes. A/B compartments, a set of large gene-dense topologically associating domains (TADs) and spatial contacts between high-affinity sites (HAS) located on the X chromosome are maintained over 40 million years, indicating architectural conservation at various hierarchies. Evolutionary conserved genes cluster in the vicinity of HAS, while HAS locations appear evolutionarily flexible, thus uncoupling functional requirement of dosage compensation from individual positions on the linear X chromosome. Therefore, 3D architecture is preserved even in scenarios of thousands of rearrangements highlighting its relevance for essential processes such as dosage compensation of the X chromosome.

April 21, 2020

Quantifying the Benefit Offered by Transcript Assembly on Single-Molecule Long Reads

Third-generation sequencing technologies benefit transcriptome analysis by generating longer sequencing reads. However, not all single-molecule long reads represent full transcripts due to incomplete cDNA synthesis and the sequencing length limit of the platform. This drives a need for long read transcript assembly. We quantify the benefit that can be achieved by using a transcript assembler on long reads. Adding long-read-specific algorithms, we evolved Scallop to make Scallop-LR, a long-read transcript assembler, to handle the computational challenges arising from long read lengths and high error rates. Analyzing 26 SRA PacBio datasets using Scallop-LR, Iso-Seq Analysis, and StringTie, we quantified the amount by which assembly improved Iso-Seq results. Through combined evaluation methods, we found that Scallop-LR identifies 2100–4000 more (for 18 human datasets) or 1100–2200 more (for eight mouse datasets) known transcripts than Iso-Seq Analysis, which does not do assembly. Further, Scallop-LR finds 2.4–4.4 times more potentially novel isoforms than Iso-Seq Analysis for the human and mouse datasets. StringTie also identifies more transcripts than Iso-Seq Analysis. Adding long-read-specific optimizations in Scallop-LR increases the numbers of predicted known transcripts and potentially novel isoforms for the human transcriptome compared to several recent short-read assemblers (e.g. StringTie). Our findings indicate that transcript assembly by Scallop-LR can reveal a more complete human transcriptome.

April 21, 2020

Multiple Long-read Sequencing Survey of Herpes Simplex Virus Lytic Transcriptome

Long-read sequencing (LRS) has become increasingly important in RNA research due to its strength in resolving complex transcriptomic architectures. In this regard, currently two LRS platforms have demonstrated adequate performance: the Single Molecule Real-Time Sequencing by Pacific Biosciences (PacBio) and the nanopore sequencing by Oxford Nanopore Technologies (ONT). Even though these techniques produce lower coverage and are more error prone than short-read sequencing, they continue to be more successful in identifying transcript isoforms including polycistronic and multi-spliced RNA molecules, as well as transcript overlaps. Recent reports have successfully applied LRS for the investigation of the transcriptome of viruses belonging to various families. These studies have substantially increased the number of previously known viral RNA molecules. In this work, we used the Sequel and MinION technique from PacBio and ONT, respectively, to characterize the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). In most samples, we analyzed the poly(A) fraction of the transcriptome, but we also performed random oligonucleotide-based sequencing. Besides cDNA sequencing, we also carried out native RNA sequencing. Our investigations identified more than 160 previously undetected transcripts, including coding and non-coding RNAs, multi-splice transcripts, as well as polycistronic and complex transcripts. Furthermore, we determined previously unsubstantiated transcriptional start sites, polyadenylation sites, and splice sites. A large number of novel transcriptional overlaps were also detected. Random-primed sequencing revealed that each convergent gene pair produces non-polyadenylated read-through RNAs overlapping the partner genes. Furthermore, we identified novel replication-associated transcripts overlapping the HSV-1 replication origins, and novel LAT variants with very long 5’ regions, which are co-terminal with the LAT-0.7kb transcript. Overall, our results demonstrated that the HSV-1 transcripts form an extremely complex pattern of overlaps, and that entire viral genome is transcriptionally active. In most viral genes, if not in all, both DNA strands are expressed.

April 21, 2020

The Chinese chestnut genome: a reference for species restoration

Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.

April 21, 2020

Characterization of LINE-1 transposons in a human genome at allelic resolution

The activity of the retrotransposon LINE-1 has created a substantial portion of the human genome. Most of this sequence comprises fractured and debilitated LINE-1s. An accurate approximation of the number, location, and sequence of the LINE-1 elements present in any single genome has proven elusive due to the difficulty of assembling and phasing the repetitive and polymorphic regions of the human genome. Through an in-depth analysis of publicly-available, deep, long-read assemblies of nearly homozygous human genomes, we defined the location and sequence of all intact LINE-1s in these assemblies. We found 148 and 142 intact LINE-1s in two nearly homozygous assemblies. A combination of these assemblies suggests a diploid human genome contains at least 50% more intact LINE-1s than previous estimates textendash in this case, 290 intact LINE-1s at 194 loci. We think this is the best approximation, to date, of the number of intact LINE-1s in a single diploid human genome. In addition to counting intact LINE-1 elements, we resolved the sequence of each element, including some LINE-1 elements in unassembled, presumably centromeric regions of the genome. A comparison of the intact LINE-1s in each assembly shows the specific pattern of variation between these genomes, including LINE-1s that remain intact in only one genome, allelic variation in shared intact LINE-1s, and LINE-1s that are unique (presumably young) insertions in only one genome. We found that many old elements (> 6 million years old) remain intact, and comparison of the young and intact LINE-1s across assemblies reinforces the notion that only a small portion of all LINE-1 sequences that may be intact in the genomes of the human population has been uncovered. This dataset provides the first nearly comprehensive estimate of LINE-1 diversity within an individual, an important dataset in the quest to understand the functional consequences of sequence variation in LINE-1 and the complete set of LINE-1s in the human population.

April 21, 2020

Evidence of extensive intraspecific noncoding reshuffling in a 169-kb mitochondrial genome of a basidiomycetous fungus

Comparative genomics of fungal mitochondrial genomes (mitogenomes) have revealed a remarkable pattern of rearrangement between and within major phyla owing to horizontal gene transfer (HGT) and recombination. The role of recombination was exemplified at a finer evolutionary time scale in basidiomycetes group of fungi as they display a diversity of mitochondrial DNA (mtDNA) inheritance patterns. Here, we assembled mitogenomes of six species from the Hymenochaetales order of basidiomycetes and examined 59 mitogenomes from two genetic lineages of Pyrrhoderma noxium. Gene order is largely colinear while intergene regions are major determinants of mitogenome size variation. Substantial sequence divergence was found in shared introns consistent with high HGT frequency observed in yeasts, but we also identified a rare case where an intron was retained in five species since speciation. In contrast to the hyperdiversity observed in nuclear genomes of P. noxium, mitogenomes’ intraspecific polymorphisms at protein coding sequences are extremely low. Phylogeny based on introns revealed turnover as well as exchange of introns between two lineages. Strikingly, some strains harbor a mosaic origin of introns from both lineages. Analysis of intergenic sequence indicated substantial differences between and within lineages, and an expansion may be ongoing as a result of exchange between distal intergenes. These findings suggest that the evolution in mtDNAs is usually lineage specific but chimeric mitotypes are frequently observed, thus capturing the possible evolutionary processes shaping mitogenomes in a basidiomycete. The large mitogenome sizes reported in various basidiomycetes appear to be a result of interspecific reshuffling of intergenes.

April 21, 2020

Exceptional subgenome stability and functional divergence in allotetraploid teff, the primary cereal crop in Ethiopia

Teff (Eragrostis tef) is a cornerstone of food security in the Horn of Africa, where it is prized for stress resilience, grain nutrition, and market value. Despite its overall importance to small-scale farmers and communities in Africa, teff suffers from low production compared to other cereals because of limited intensive selection and molecular breeding. Here we report a chromosome-scale genome assembly of allotetraploid teff (variety textquoteleftDabbitextquoteright) and patterns of subgenome dynamics. The teff genome contains two complete sets of homoeologous chromosomes, with most genes maintained as syntenic gene pairs. Through analyzing the history of transposable element activity, we estimate the teff polyploidy event occurred ~1.1 million years ago (mya) and the two subgenomes diverged ~5.0 mya. Despite this divergence, we detected no large-scale structural rearrangements, homoeologous exchanges, or bias gene loss, contrasting most other allopolyploid plant systems. The exceptional subgenome stability observed in teff may enable the ubiquitous and recurrent polyploidy within Chloridoideae, possibly contributing to the increased resilience and diversification of these grasses. The two teff subgenomes have partitioned their ancestral functions based on divergent expression patterns among homoeologous gene pairs across a diverse expression atlas. The most striking differences in homoeolog expression bias are observed during seed development and under abiotic stress, and thus may be related to agronomic traits. Together these genomic resources will be useful for accelerating breeding efforts of this underutilized grain crop and for acquiring fundamental insights into polyploid genome evolution.

April 21, 2020

Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serovar 5a M90T and genome-wide transcription start site determination

Background Shigella is a Gram-negative facultative intracellular bacterium that causes bacillary dysentery in humans. Shigella invades cells of the colonic mucosa owing to its virulence plasmid-encoded Type 3 Secretion System (T3SS), and multiplies in the target cell cytosol. Although the laboratory reference strain S. flexneri serotype 5a M90T has been extensively used to understand the molecular mechanisms of pathogenesis, its complete genome sequence is not available, thereby greatly limiting studies employing high-throughput sequencing and systems biology approaches. Results We have sequenced, assembled, annotated and manually curated the full genome of S. flexneri 5a M90T. This yielded two complete circular contigs, the chromosome and the virulence plasmid (pWR100). To obtain the genome sequence, we have employed long-read PacBio DNA sequencing followed by polishing with Illumina RNA-seq data. This provides a new pipeline to prepare gapless, highly accurate genome sequences. Furthermore, we have performed genome-wide analysis of transcriptional start sites and determined the length of 5’ untranslated regions (5’-UTRs) at typical culture conditions for the inoculum of in vitro infection experiments. We identified 6,723 primary TSS (pTSS) and 7,328 secondary TSS (sTSS). The S. flexneri 5a M90T annotated genome sequence and the transcriptional start sites are integrated into RegulonDB (http://regulondb.ccg.unam.mx) and RSAT (http://embnet.ccg.unam.mx/rsat/) to use its analysis tools in S. flexneri 5a M90T genome. Conclusions We provide the first complete genome for S. flexneri serotype 5a, specifically the laboratory reference strain M90T. Our work opens the possibility of employing S. flexneri M90T in high-quality systems biology studies such as transcriptomic and differential expression analyses or in genome evolution studies. Moreover, the catalogue of TSS that we report here can be used in molecular pathogenesis studies as a resource to know which genes are transcribed before infection of host cells. The genome sequence, together with the analysis of transcriptional start sites, is also a valuable tool for precise genetic manipulation of S. flexneri 5a M90T. The hybrid pipeline that we report here combining genome sequencing with long-reads technology and polishing with RNAseq data defines a powerful strategy for genome assembly, polishing and annotation in any type of organism.

April 21, 2020

Loss-of-function tolerance of enhancers in the human genome

Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that each individual human genome possesses at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers are more tissue-specific and regulate fewer and more dispensable genes. They are enriched in immune-related cells while LoF-intolerant enhancers are enriched in kidney and brain/neuronal stem cells. We developed a supervised learning approach to predict the LoF- tolerance of enhancers, which achieved an AUROC of 96%. We predict 5,677 more enhancers would be likely tolerant to LoF and 75 enhancers that would be highly LoF-intolerant. Our predictions are supported by known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies.

April 21, 2020

Functional genomics reveals extensive diversity in Staphylococcus epidermidis restriction modification systems compared to Staphylococcus aureus

Staphylococcus epidermidis is a significant opportunistic pathogen of humans. Molecular studies in this species have been hampered by the presence of restriction-modification (RM) systems that limit introduction of foreign DNA. Here we establish the complete genomes and methylomes for seven clinically significant, genetically diverse S. epidermidis isolates and perform the first systematic genomic analyses of the type I RM systems within both S. epidermidis and Staphylococcus aureus. Our analyses revealed marked differences in the gene arrangement, chromosomal location and movement of type I RM systems between the two species. Unlike S. aureus, S. epidermidis type I RM systems demonstrate extensive diversity even within a single genetic lineage. This is contrary to current assumptions and has important implications for approaching the genetic manipulation of S. epidermidis. Using Escherichia coli plasmid artificial modification (PAM) to express S. epidermidis hsdMS, we readily overcame restriction barriers in S. epidermidis, and achieved transformation efficiencies equivalent to those of modification deficient mutants. With these functional experiments we demonstrate how genomic data can be used to predict both the functionality of type I RM systems and the potential for a strain to be transformation proficient. We outline an efficient approach for the genetic manipulation of S. epidermidis from diverse genetic backgrounds, including those that have hitherto been intractable. Additionally, we identified S. epidermidis BPH0736, a naturally restriction defective, clinically significant, multidrug-resistant ST2 isolate as an ideal candidate for molecular studies.

April 21, 2020

Short translational ramp determines efficiency of protein synthesis

It is generally assumed that translation efficiency is governed by translation initiation. However, the efficiency of protein synthesis is regulated by multiple factors including tRNA abundance, codon composition, mRNA motifs and amino-acid sequence1textendash4. These factors influence the rate of protein synthesis beyond the initiation phase of translation, typically by modulating the rate of peptide-bond formation and to a lesser extent that of translocation. The slowdown in translation during the early elongation phase, known as the 5textquoteright translational ramp, likely contributes to the efficiency of protein synthesis 5textendash9. Multiple mechanisms, which could explain the molecular basis for this translational ramp, have been proposed that include tRNA abundance bias6,9, the rate of translation initiation10textendash15, mRNA and ribosome structure 11,12,14,16textendash18, or retention of initiation factors during early elongation events 19. Here, we show that the amount of synthesized protein (translation efficiency) depends on a short translational ramp that comprises the first 5 codons in mRNA. Using a library of more than 250,000 reporter sequences combined with in vitro and in vivo protein expression assays, we show that differences in the short ramp can lead to 3 to 4 orders of magnitude changes in protein abundance. The observed difference is not dependent on tRNA abundance, efficiency of translation initiation, or overall mRNA structure. Instead, we show that translation is regulated by amino-acid-sequence composition and local mRNA sequence. Single-molecule measurements of translation kinetics indicate substantial pausing of ribosome and abortion of protein synthesis on the 4th or 5th codon for distinct amino acid or nucleotide compositions. Introduction of preferred sequence motifs, only at the exact positions within the mRNA, improves protein synthesis for recombinant proteins, indicating an evolutionarily conserved mechanism for controlling translational efficiency.

April 21, 2020

Disruption of the kringle 1 domain of prothrombin leads to late onset mortality in zebrafish

The ability to prevent blood loss in response to injury is a critical, evolutionarily conserved function of all vertebrates. Prothrombin (F2) contributes to both primary and secondary hemostasis through the activation of platelets and the conversion of soluble fibrinogen to insoluble fibrin, respectively. Complete prothrombin deficiency has never been observed in humans and is incompatible with life in mice, limiting the ability to understand the entirety of prothrombin’s in vivo functions. We have previously demonstrated the ability of zebrafish to tolerate loss of both pro- and anticoagulant factors that are embryonic lethal in mammals, making them an ideal model for the study of prothrombin deficiency. Using genome editing with TALENs, we have generated a null allele in zebrafish f2. Homozygous mutant embryos develop normally into early adulthood, but demonstrate eventual complete mortality with the majority of fish succumbing to internal hemorrhage by 2 months of age. We show that despite the extended survival, the mutants are unable to form occlusive thrombi in both the venous and arterial systems as early as 3-5 days of life, and we were able to phenocopy this early hemostatic defect using direct oral anticoagulants. When the equivalent mutation was engineered into the homologous residues of human prothrombin, there were severe reductions in secretion and activation, suggesting a possible role for kringle 1 in thrombin maturation, and the possibility that the F1.2 fragment has a functional role in exerting the procoagulant effects of thrombin. Together, our data demonstrate the conserved function of thrombin in zebrafish, as well as the requirement for kringle 1 for biosynthesis and activation by prothrombinase. Understanding how zebrafish are able to develop normally and survive into early adulthood without prothrombin will provide important insight into its pleiotropic functions as well as the management of patients with bleeding disorders.

April 21, 2020

metaFlye: scalable long-read metagenome assembly using repeat graphs

Long-read sequencing technologies substantially improved assemblies of many isolate bacterial genomes as compared to fragmented assemblies produced with short-read technologies. However, assembling complex metagenomic datasets remains a challenge even for the state-of-the-art long-read assemblers. To address this gap, we present the metaFlye assembler and demonstrate that it generates highly contiguous and accurate metagenome assemblies. In contrast to short-read metagenomics assemblers that typically fail to reconstruct full-length 16S RNA genes, metaFlye captures many 16S RNA genes within long contigs, thus providing new opportunities for analyzing the microbial “dark matter of life”. We also demonstrate that long-read metagenome assemblers significantly improve full-length plasmid and virus reconstruction as compared to short-read assemblers and reveal many novel plasmids and viruses.

April 21, 2020

CENP-C stabilizes the conformation of CENP-A nucleosomes within the inner kinetochore at human centromere

The centromere is a vital locus on each chromosome which seeds the kinetochore, allowing for a physical connection between the chromosome and the mitotic spindle. At the heart of the centromere is the centromere-specific histone H3 variant CENP-A/CENH3. Throughout the cell cycle the constitutive centromere associated network is bound to CENP-A chromatin, but how this protein network modifies CENP-A nucleosome dynamics in vivo is unknown. Here, using a combination of biophysical and biochemical analyses we provide evidence for the existence of two populations of structurally distinct CENP-A nucleosomes that co-exist at human centromeres. These two populations display unique sedimentation patterns, which permits purification of inner kinetochore bound CENP-A chromatin away from bulk CENP-A nucleosomes. The bulk population of CENP-A nucleosomes have diminished heights and weakened DNA interactions, whereas CENP-A nucleosomes robustly associated with the inner kinetochore are stabilized in an octameric conformation, with restricted access to nucleosomal DNA. Immuno-labeling coupled to atomic force microscopy of these complexes confirms their identity at the nanoscale resolution. These data provide a systematic and detailed description of inner-kinetochore bound CENP-A chromatin from human centromeres, with implications for the state of CENP-A chromatin that is actively engaged during mitosis.

Asset Tag: NGS

Complete genome sequence and evolution analysis of Psychrobacter sp. YP14 from Gammaridea Gastrointestinal Microbiota of Yap Trench

Hi-C guided assemblies reveal conserved regulatory topologies on X and autosomes despite extensive genome shuffling

Quantifying the Benefit Offered by Transcript Assembly on Single-Molecule Long Reads

Multiple Long-read Sequencing Survey of Herpes Simplex Virus Lytic Transcriptome

The Chinese chestnut genome: a reference for species restoration

Characterization of LINE-1 transposons in a human genome at allelic resolution

Evidence of extensive intraspecific noncoding reshuffling in a 169-kb mitochondrial genome of a basidiomycetous fungus

Exceptional subgenome stability and functional divergence in allotetraploid teff, the primary cereal crop in Ethiopia

Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serovar 5a M90T and genome-wide transcription start site determination

Loss-of-function tolerance of enhancers in the human genome

Functional genomics reveals extensive diversity in Staphylococcus epidermidis restriction modification systems compared to Staphylococcus aureus

Short translational ramp determines efficiency of protein synthesis

Disruption of the kringle 1 domain of prothrombin leads to late onset mortality in zebrafish

metaFlye: scalable long-read metagenome assembly using repeat graphs

CENP-C stabilizes the conformation of CENP-A nucleosomes within the inner kinetochore at human centromere

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert