Clone sequencing Archives

September 22, 2019 |

Capturing natural product biosynthetic pathways from uncultivated symbiotic bacteria of marine sponges through metagenome mining: a mini-review

Symbiotic bacteria associated with marine sponges have frequently been proposed as the true producer of many bioactive natural products with potent anticancer activities. However, the majority of these complex symbiotic bacteria cannot be cultivated under laboratory conditions, hampering efforts to access and develop their potent compounds for therapeutic applications. Metagenome mining is a powerful cultivation-independent tool that can be used to search for new natural product biosynthetic pathways from highly complex bacterial consortia. Some notable examples of natural products, in which their biosynthetic pathways have been cloned by metagenome mining are onnamide A, psymberin, polytheonamides, calyculin, and misakinolide A. Subsequent expression of the pathways in easily culturable bacteria, such as Escherichia coli, could lead to the sustainable production of rare promising natural products. This review discusses principles of metagenome mining developed to gain access to natural product biosynthetic pathways from uncultured symbiotic bacteria of marine sponges. This includes detecting biosynthetic genes in sponge metagenome, creating large metagenomic library, rapid screening of metagenomic library, and clone sequencing. For many natural products made by modular polyketide synthases (PKSs) and hybrids with non-ribosomal peptide synthetases (NRPSs), their biosynthetic pathways as well as structures of final products can be predicted with high accuracy through bioinformatic analysis and sometimes combined with functional proof. Further metagenome sequencing integrated with single-cell analysis and chemical studies could provide insights into the remarkable biosynthetic capacity of uncultivated bacterial symbionts, thereby facilitating the discovery and sustainable production of a wide diversity of sponge-derived complex compounds.

September 22, 2019 |

Recurrent structural variation, clustered sites of selection, and disease risk for the complement factor H (CFH) gene family.

Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ~360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ~25-35 Mya and CFHR1 and CFHR3 ~7-13 Mya). Remarkably, all evolutionary breakpoints share a common ~4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has expanded independently throughout primate evolution. This segment is recurrently reused and juxtaposed with a donor duplication containing exons 8 and 9 from ancestral CFH, creating four CFHR fusion genes that include lineage-specific members of the gene family. Combined analysis of >5,000 AMD cases and controls identifies a significant burden of a rare missense mutation that clusters at the N terminus of CFH [P = 5.81 × 10-8, odds ratio (OR) = 9.8 (3.67-Infinity)]. A bipolar clustering pattern of rare nonsynonymous mutations in patients with AMD (P < 10-3) and AHUS (P = 0.0079) maps to functional domains that show evidence of positive selection during primate evolution. Our structural variation analysis in >2,400 individuals reveals five recurrent rearrangement breakpoints that show variable frequency among AMD cases and controls. These data suggest a dynamic and recurrent pattern of mutation critical to the emergence of new CFHR genes but also in the predisposition to complex human genetic disease phenotypes.

September 22, 2019 |

Proteomic detection of immunoglobulin light chain variable region peptides from amyloidosis patient biopsies.

Immunoglobulin light chain (LC) amyloidosis (AL) is caused by deposition of clonal LCs produced by an underlying plasma cell neoplasm. The clonotypic LC sequences are unique to each patient, and they cannot be reliably detected by either immunoassays or standard proteomic workflows that target the constant regions of LCs. We addressed this issue by developing a novel sequence template-based workflow to detect LC variable (LCV) region peptides directly from AL amyloid deposits. The workflow was implemented in a CAP/CLIA compliant clinical laboratory dedicated to proteomic subtyping of amyloid deposits extracted from either formalin-fixed paraffin-embedded tissues or subcutaneous fat aspirates. We evaluated the performance of the workflow on a validation cohort of 30 AL patients, whose amyloidogenic clone was identified using a novel proteogenomics method, and 30 controls. The recall and negative predictive values of the workflow, when identifying the gene family of the AL clone, were 93 and 98%, respectively. Application of the workflow on a clinical cohort of 500 AL amyloidosis samples highlighted a bias in the LCV gene families used by the AL clones. We also detected similarity between AL clones deposited in multiple organs of systemic AL patients. In summary, AL proteomic data sets are rich in LCV region peptides of potential clinical significance that are recoverable with advanced bioinformatics.

September 22, 2019 |

High-resolution comparative analysis of great ape genomes.

Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

September 22, 2019 |

Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research.

The large and complex hexaploid genome has greatly hindered genomics studies of common wheat (Triticum aestivum, AABBDD). Here, we investigated transcripts in common wheat developing caryopses using the emerging single-molecule real-time (SMRT) sequencing technology PacBio RSII, and assessed the resultant data for improving common wheat genome annotation and grain transcriptome research.We obtained 197,709 full-length non-chimeric (FLNC) reads, 74.6 % of which were estimated to carry complete open reading frame. A total of 91,881 high-quality FLNC reads were identified and mapped to 16,188 chromosomal loci, corresponding to 13,162 known genes and 3026 new genes not annotated previously. Although some FLNC reads could not be unambiguously mapped to the current draft genome sequence, many of them are likely useful for studying highly similar homoeologous or paralogous loci or for improving chromosomal contig assembly in further research. The 91,881 high-quality FLNC reads represented 22,768 unique transcripts, 9591 of which were newly discovered. We found 180 transcripts each spanning two or three previously annotated adjacent loci, suggesting that they should be merged to form correct gene models. Finally, our data facilitated the identification of 6030 genes differentially regulated during caryopsis development, and full-length transcripts for 72 transcribed gluten gene members that are important for the end-use quality control of common wheat.Our work demonstrated the value of PacBio transcript sequencing for improving common wheat genome annotation through uncovering the loci and full-length transcripts not discovered previously. The resource obtained may aid further structural genomics and grain transcriptome studies of common wheat.

September 22, 2019 |

Accurate characterization of the IFITM locus using MiSeq and PacBio sequencing shows genetic variation in Galliformes.

Interferon inducible transmembrane (IFITM) proteins are effectors of the immune system widely characterized for their role in restricting infection by diverse enveloped and non-enveloped viruses. The chicken IFITM (chIFITM) genes are clustered on chromosome 5 and to date four genes have been annotated, namely chIFITM1, chIFITM3, chIFITM5 and chIFITM10. However, due to poor assembly of this locus in the Gallus Gallus v4 genome, accurate characterization has so far proven problematic. Recently, a new chicken reference genome assembly Gallus Gallus v5 was generated using Sanger, 454, Illumina and PacBio sequencing technologies identifying considerable differences in the chIFITM locus over the previous genome releases.We re-sequenced the locus using both Illumina MiSeq and PacBio RS II sequencing technologies and we mapped RNA-seq data from the European Nucleotide Archive (ENA) to this finalized chIFITM locus. Using SureSelect probes capture probes designed to the finalized chIFITM locus, we sequenced the locus of a different chicken breed, namely a White Leghorn, and a turkey.We confirmed the Gallus Gallus v5 consensus except for two insertions of 5 and 1 base pair within the chIFITM3 and B4GALNT4 genes, respectively, and a single base pair deletion within the B4GALNT4 gene. The pull down revealed a single amino acid substitution of A63V in the CIL domain of IFITM2 compared to Red Jungle fowl and 13, 13 and 11 differences between IFITM1, 2 and 3 of chickens and turkeys, respectively. RNA-seq shows chIFITM2 and chIFITM3 expression in numerous tissue types of different chicken breeds and avian cell lines, while the expression of the putative chIFITM1 is limited to the testis, caecum and ileum tissues.Locus resequencing using these capture probes and RNA-seq based expression analysis will allow the further characterization of genetic diversity within Galliformes.

September 22, 2019 |

Improved high-quality genome assembly and annotation of Tibetan hulless barley

Background The Tibetan hulless barley (Hordeum vulgare L. var. nudum), also called textquotedblleftQingketextquotedblright in Chinese and textquotedblleftNetextquotedblright in Tibetan, is the staple food for Tibetans and an important livestock feed in the Tibetan Plateau. The Tibetan hulless barley in China has about 3500 years of cultivation history, mainly produced in Tibet, Qinghai, Sichuan, Yunnan and other areas. In addition, Tibetan hulless barley has rich nutritional value and outstanding health effects, including the beta glucan, dietary fiber, amylopectin, the contents of trace elements, which are higher than any other cereal crops.Findings Here, we reported an improved high-quality assembly of Tibetan hulless barley genome with 4.0 Gb in size. We employed the falcon assembly package, scaffolding and error correction tools to finish improvement using PacBio long reads sequencing technology, with contig and scaffold N50 lengths of 1.563Mb and 4.006Mb, respectively, representing more continuous than the original Tibetan hulless barley genome nearly two orders of magnitude. We also re-annotated the new assembly, and reported 61,303 stringent confident putative protein-coding genes, of which 40,457 is HC genes. We have developed a new Tibetan hulless barley genome database (THBGD) to download and use friendly, as well as to better manage the information of the Tibetan hulless barley genetic resources.Conclusions The availability of new Tibetan hulless barley genome and annotations will take the genetics of Tibetan hulless barley to a new level and will greatly simplify the breeders effort. It will also enrich the granary of the Tibetan people.AbbreviationsBLASTBasic Local Alignment Search ToolBUSCOBenchmarking Universal Single-Copy OrthologsQVquality valuePacBioPacifc BiosciencesRNA-seqRNA sequencingNGSNext generation sequencingTGSThird generation sequencingTHBGDTibetan hulless barley Genome Database

September 22, 2019 |

Functional characterization of the mucus barrier on the Xenopus tropicalis skin surface.

Mucosal surfaces represent critical routes for entry and exit of pathogens. As such, animals have evolved strategies to combat infection at these sites, in particular the production of mucus to prevent attachment and to promote subsequent movement of the mucus/microbe away from the underlying epithelial surface. Using biochemical, biophysical, and infection studies, we have investigated the host protective properties of the skin mucus barrier of the Xenopus tropicalis tadpole. Specifically, we have characterized the major structural component of the barrier and shown that it is a mucin glycoprotein (Otogelin-like or Otogl) with similar sequence, domain organization, and structural properties to human gel-forming mucins. This mucin forms the structural basis of a surface barrier (~6 µm thick), which is depleted through knockdown of Otogl. Crucially, Otogl knockdown leads to susceptibility to infection by the opportunistic pathogen Aeromonas hydrophila To more accurately reflect its structure, tissue localization, and function, we have renamed Otogl as Xenopus Skin Mucin, or MucXS. Our findings characterize an accessible and tractable model system to define mucus barrier function and host-microbe interactions. Copyright © 2018 the Author(s). Published by PNAS.

September 22, 2019 |

Identification of candidate genes at the Dp-fl locus conferring resistance against the rosy apple aphid Dysaphis plantaginea

The cultivated apple is susceptible to several pests including the rosy apple aphid (RAA; Dysaphis plantaginea Passerini), control of which is mainly based on chemical treatments. A few cases of resistance to aphids have been described in apple germplasm resources, laying the basis for the development of new resistant cultivars by breeding. The cultivar ‘Florina’ is resistant to RAA, and recently, the Dp-fl locus responsible for its resistance was mapped on linkage group 8 of the apple genome. In this paper, a chromosome walking approach was performed by using a ‘Florina’ bacterial artificial chromosome (BAC) library. The walking started from the available tightly linked molecular markers flanking the resistance region. Various walking steps were performed in order to identify the minimum tiling path of BAC clones covering the Dp-fl region from both the “resistant” and “susceptible” chromosomes of ‘Florina’. A genomic region of about 279 Kb encompassing the Dp-fl resistance locus was fully sequenced by the PacBio technology. Through the development of new polymorphic markers, the mapping interval around the resistance locus was narrowed down to a physical region of 95 Kb. The annotation of this sequence resulted in the identification of four candidate genes putatively involved in the RAA resistance response.

September 22, 2019 |

A combinatorial approach to synthetic transcription factor-promoter combinations for yeast strain engineering.

Despite the need for inducible promoters in strain development efforts, the majority of engineering in Saccharomyces cerevisiae continues to rely on a few constitutively active or inducible promoters. Building on advances that use the modular nature of both transcription factors and promoter regions, we have built a library of hybrid promoters that are regulated by a synthetic transcription factor. The hybrid promoters consist of native S. cerevisiae promoters, in which the operator regions have been replaced with sequences that are recognized by the bacterial LexA DNA binding protein. Correspondingly, the synthetic transcription factor (TF) consists of the DNA binding domain of the LexA protein, fused with the human estrogen binding domain and the viral activator domain, VP16. The resulting system with a bacterial DNA binding domain avoids the transcription of native S. cerevisiae genes, and the hybrid promoters can be induced using estradiol, a compound with no detectable impact on S. cerevisiae physiology. Using combinations of one, two or three operator sequence repeats and a set of native S. cerevisiae promoters, we obtained a series of hybrid promoters that can be induced to different levels, using the same synthetic TF and a given estradiol. This set of promoters, in combination with our synthetic TF, has the potential to regulate numerous genes or pathways simultaneously, to multiple desired levels, in a single strain.© 2017 The Authors. Yeast published by John Wiley & Sons, Ltd.

September 22, 2019 |

De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture.

While short-read sequencing technology has resulted in a sharp increase in the number of species with genome assemblies, these assemblies are typically highly fragmented. Repeats pose the largest challenge for reference genome assembly, and pericentromeric regions and the repeat-rich Y chromosome are typically ignored from sequencing projects. Here, we assemble the genome of Drosophila miranda using long reads for contig formation, chromatin interaction maps for scaffolding and short reads, and optical mapping and bacterial artificial chromosome (BAC) clone sequencing for consensus validation. Our assembly recovers entire chromosomes and contains large fractions of repetitive DNA, including about 41.5 Mb of pericentromeric and telomeric regions, and >100 Mb of the recently formed highly repetitive neo-Y chromosome. While Y chromosome evolution is typically characterized by global sequence loss and shrinkage, the neo-Y increased in size by almost 3-fold because of the accumulation of repetitive sequences. Our high-quality assembly allows us to reconstruct the chromosomal events that have led to the unusual sex chromosome karyotype in D. miranda, including the independent de novo formation of a pair of sex chromosomes at two distinct time points, or the reversion of a former Y chromosome to an autosome.

September 22, 2019 |

Ma orthologous genes in Prunus spp. shed light on a noteworthy NBS-LRR cluster conferring differential resistance to root-knot nematodes.

Root-knot nematodes (RKNs) are considerable polyphagous pests that severely challenge plants worldwide and especially perennials. The specific genetic resistance of plants mainly relies on the NBS-LRR genes that are pivotal factors for pathogens control. In Prunus spp., the Ma plum and RMja almond genes possess different spectra for resistance to RKNs. While previous works based on the Ma gene allowed to clone it and to decipher its peculiar TIR-NBS-LRR (TNL) structure, we only knew that the RMja gene mapped on the same chromosome as Ma. We carried out a high-resolution mapping using an almond segregating F2 progeny of 1448 seedlings from resistant (R) and susceptible (S) parental accessions, to locate precisely RMja on the peach genome, the reference sequence for Prunus species. We showed that the RMja gene maps in the Ma resistance cluster and that the Ma ortholog is the best candidate for RMja. This co-localization is a crucial step that opens the way to unravel the molecular determinants involved in the resistance to RKNs. Then we sequenced both almond parental NGS genomes and aligned them onto the RKN susceptible reference peach genome. We produced a BAC library of the R parental accession and, from two overlapping BAC clones, we obtained a 336-kb sequence encompassing the RMja candidate region. Thus, we could benefit from three Ma orthologous regions to investigate their sequence polymorphism, respectively, within plum (complete R spectrum), almond (incomplete R spectrum) and peach (null R spectrum). We showed that the Ma TNL cluster has evolved orthologs with a unique conserved structure comprised of five repeated post-LRR (PL) domains, which contain most polymorphism. In addition to support the Ma and RMja orthologous relationship, our results suggest that the polymorphism contained in the PL sequences might underlie differential resistance interactions with RKNs and an original immune mechanism in woody perennials. Besides, our study illustrates how PL exon duplications and losses shape TNL structure and give rise to atypical PL domain repeats of yet unknown role.

September 22, 2019 |

Evolutionary conservation of Y Chromosome ampliconic gene families despite extensive structural variation.

Despite claims that the mammalian Y Chromosome is on a path to extinction, comparative sequence analysis of primate Y Chromosomes has shown the decay of the ancestral single-copy genes has all but ceased in this eutherian lineage. The suite of single-copy Y-linked genes is highly conserved among the majority of eutherian Y Chromosomes due to strong purifying selection to retain dosage-sensitive genes. In contrast, the ampliconic regions of the Y Chromosome, which contain testis-specific genes that encode the majority of the transcripts on eutherian Y Chromosomes, are rapidly evolving and are thought to undergo species-specific turnover. However, ampliconic genes are known from only a handful of species, limiting insights into their long-term evolutionary dynamics. We used a clone-based sequencing approach employing both long- and short-read sequencing technologies to assemble ~2.4 Mb of representative ampliconic sequence dispersed across the domestic cat Y Chromosome, and identified the major ampliconic gene families and repeat units. We analyzed fluorescence in situ hybridization, qPCR, and whole-genome sequence data from 20 cat species and revealed that ampliconic gene families are conserved across the cat family Felidae but show high transcript diversity, copy number variation, and structural rearrangement. Our analysis of ampliconic gene evolution unveils a complex pattern of long-term gene content stability despite extensive structural variation on a nonrecombining background.© 2018 Brashear et al.; Published by Cold Spring Harbor Laboratory Press.

Auto Tag: Clone sequencing

Capturing natural product biosynthetic pathways from uncultivated symbiotic bacteria of marine sponges through metagenome mining: a mini-review

Recurrent structural variation, clustered sites of selection, and disease risk for the complement factor H (CFH) gene family.

Proteomic detection of immunoglobulin light chain variable region peptides from amyloidosis patient biopsies.

High-resolution comparative analysis of great ape genomes.

Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research.

Accurate characterization of the IFITM locus using MiSeq and PacBio sequencing shows genetic variation in Galliformes.

Improved high-quality genome assembly and annotation of Tibetan hulless barley

Functional characterization of the mucus barrier on the Xenopus tropicalis skin surface.

Identification of candidate genes at the Dp-fl locus conferring resistance against the rosy apple aphid Dysaphis plantaginea

A combinatorial approach to synthetic transcription factor-promoter combinations for yeast strain engineering.

De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture.

Ma orthologous genes in Prunus spp. shed light on a noteworthy NBS-LRR cluster conferring differential resistance to root-knot nematodes.

Evolutionary conservation of Y Chromosome ampliconic gene families despite extensive structural variation.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert