Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time
Transcriptional initiation of a small RNA, not R-loop stability, dictates the frequency of pilin antigenic variation in Neisseria gonorrhoeae.
Neisseria gonorrhoeae, the sole causative agent of gonorrhea, constitutively undergoes diversification of the Type IV pilus. Gene conversion occurs between one of the several donor silent copies located in distinct loci and the recipient pilE gene, encoding the major pilin subunit of the pilus. A guanine quadruplex (G4) DNA structure and a cis-acting sRNA (G4-sRNA) are located upstream of the pilE gene and both are required for pilin antigenic variation (Av). We show that the reduced sRNA transcription lowers pilin Av frequencies. Extended transcriptional elongation is not required for Av, since limiting the transcript to 32 nt allows for normal Av frequencies. Using chromatin immunoprecipitation (ChIP) assays, we show that cellular G4s are less abundant when sRNA transcription is lower. In addition, using ChIP, we demonstrate that the G4-sRNA forms a stable RNA:DNA hybrid (R-loop) with its template strand. However, modulating R-loop levels by controlling RNase HI expression does not alter G4 abundance quantified through ChIP. Since pilin Av frequencies were not altered when modulating R-loop levels by controlling RNase HI expression, we conclude that transcription of the sRNA is necessary, but stable R-loops are not required to promote pilin Av. © 2019 John Wiley & Sons Ltd.
CRISPR-Cas9 and BEs system are poised to become the gene editing tool of choice in clinical contexts, however large fragment deletion was found in Cas9-mediated mutation cells without animal level validation. By analyzing 16 gene-edited rabbit lines (including 112 rabbits) generated using SpCas9, BEs, xCas9 and xCas9-BEs with long-range PCR genotyping and long-read sequencing by PacBio platform, we show that extending thousands of bases fragment deletions in single-guide RNA/Cas9 and xCas9 system mutation rabbit, but few large deletions were found in BEs-induced mutation rabbits. We firstly validated that no large fragment deletion induced by BEs system at animal level, suggesting that BE systems can be beneficial tools for the further development of highly accurate and secure gene therapy for the clinical treatment of human genetic disorders
Brassica napus (AACC, 2n = 38) is an important oilseed crop grown worldwide. However, little is known about the population evolution of this species, the genomic difference between its major genetic groups, such as European and Asian rapeseed, and the impacts of historical large-scale introgression events on this young tetraploid. In this study, we reported the de novo assembly of the genome sequences of an Asian rapeseed (B. napus), Ningyou 7, and its four progenitors and compared these genomes with other available genomic data from diverse European and Asian cultivars. Our results showed that Asian rapeseed originally derived from European rapeseed but subsequently significantly diverged, with rapid genome differentiation after hybridization and intensive local selective breeding. The first historical introgression of B. rapa dramatically broadened the allelic pool but decreased the deleterious variations of Asian rapeseed. The second historical introgression of the double-low traits of European rapeseed (canola) has reshaped Asian rapeseed into two groups (double-low and double-high), accompanied by an increase in genetic load in the double-low group. This study demonstrates distinctive genomic footprints and deleterious SNP (single nucleotide polymorphism) variants for local adaptation by recent intra- and interspecies introgression events and provides novel insights for understanding the rapid genome evolution of a young allopolyploid crop. © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Optimized Cas9 expression systems for highly efficient Arabidopsis genome editing facilitate isolation of complex alleles in a single generation.
Genetic resources for the model plant Arabidopsis comprise mutant lines defective in almost any single gene in reference accession Columbia. However, gene redundancy and/or close linkage often render it extremely laborious or even impossible to isolate a desired line lacking a specific function or set of genes from segregating populations. Therefore, we here evaluated strategies and efficiencies for the inactivation of multiple genes by Cas9-based nucleases and multiplexing. In first attempts, we succeeded in isolating a mutant line carrying a 70 kb deletion, which occurred at a frequency of ~?1.6% in the T2 generation, through PCR-based screening of numerous individuals. However, we failed to isolate a line lacking Lhcb1 genes, which are present in five copies organized at two loci in the Arabidopsis genome. To improve efficiency of our Cas9-based nuclease system, regulatory sequences controlling Cas9 expression levels and timing were systematically compared. Indeed, use of DD45 and RPS5a promoters improved efficiency of our genome editing system by approximately 25-30-fold in comparison to the previous ubiquitin promoter. Using an optimized genome editing system with RPS5a promoter-driven Cas9, putatively quintuple mutant lines lacking detectable amounts of Lhcb1 protein represented approximately 30% of T1 transformants. These results show how improved genome editing systems facilitate the isolation of complex mutant alleles, previously considered impossible to generate, at high frequency even in a single (T1) generation.
A Novel Bacteriophage Exclusion (BREX) System Encoded by the pglX Gene in Lactobacillus casei Zhang.
The bacteriophage exclusion (BREX) system is a novel prokaryotic defense system against bacteriophages. To our knowledge, no study has systematically characterized the function of the BREX system in lactic acid bacteria. Lactobacillus casei Zhang is a probiotic bacterium originating from koumiss. By using single-molecule real-time sequencing, we previously identified N6-methyladenine (m6A) signatures in the genome of L. casei Zhang and a putative methyltransferase (MTase), namely, pglX This work further analyzed the genomic locus near the pglX gene and identified it as a component of the BREX system. To decipher the biological role of pglX, an L. casei Zhang pglX mutant (?pglX) was constructed. Interestingly, m6A methylation of the 5′-ACRCAG-3′ motif was eliminated in the ?pglX mutant. The wild-type and mutant strains exhibited no significant difference in morphology or growth performance in de Man-Rogosa-Sharpe (MRS) medium. A significantly higher plasmid acquisition capacity was observed for the ?pglX mutant than for the wild type if the transformed plasmids contained pglX recognition sites (i.e., 5′-ACRCAG-3′). In contrast, no significant difference was observed in plasmid transformation efficiency between the two strains when plasmids lacking pglX recognition sites were tested. Moreover, the ?pglX mutant had a lower capacity to retain the plasmids than the wild type, suggesting a decrease in genetic stability. Since the Rebase database predicted that the L. casei PglX protein was bifunctional, as both an MTase and a restriction endonuclease, the PglX protein was heterologously expressed and purified but failed to show restriction endonuclease activity. Taken together, the results show that the L. casei Zhang pglX gene is a functional adenine MTase that belongs to the BREX system.IMPORTANCELactobacillus casei Zhang is a probiotic that confers beneficial effects on the host, and it is thus increasingly used in the dairy industry. The possession of an effective bacterial immune system that can defend against invasion of phages and exogenous DNA is a desirable feature for industrial bacterial strains. The bacteriophage exclusion (BREX) system is a recently described phage resistance system in prokaryotes. This work confirmed the function of the BREX system in L. casei and that the methyltransferase (pglX) is an indispensable part of the system. Overall, our study characterizes a BREX system component gene in lactic acid bacteria. Copyright © 2019 American Society for Microbiology.
Gammaherpesvirus Readthrough Transcription Generates a Long Non-Coding RNA That Is Regulated by Antisense miRNAs and Correlates with Enhanced Lytic Replication In Vivo.
Gammaherpesviruses, including the human pathogens Epstein?Barr virus (EBV) and Kaposi’s sarcoma-associated herpesvirus (KSHV) are oncogenic viruses that establish lifelong infections in hosts and are associated with the development of lymphoproliferative diseases and lymphomas. Recent studies have shown that the majority of the mammalian genome is transcribed and gives rise to numerous long non-coding RNAs (lncRNAs). Likewise, the large double-stranded DNA virus genomes of herpesviruses undergo pervasive transcription, including the expression of many as yet uncharacterized lncRNAs. Murine gammaperherpesvirus 68 (MHV68, MuHV-4, ?HV68) is a natural pathogen of rodents, and is genetically and pathogenically related to EBV and KSHV, providing a highly tractable model for studies of gammaherpesvirus biology and pathogenesis. Through the integrated use of parallel data sets from multiple sequencing platforms, we previously resolved transcripts throughout the MHV68 genome, including at least 144 novel transcript isoforms. Here, we sought to molecularly validate novel transcripts identified within the M3/M2 locus, which harbors genes that code for the chemokine binding protein M3, the latency B cell signaling protein M2, and 10 microRNAs (miRNAs). Using strand-specific northern blots, we validated the presence of M3-04, a 3.91 kb polyadenylated transcript that initiates at the M3 transcription start site and reads through the M3 open reading frame (ORF), the M3 poly(a) signal sequence, and the M2 ORF. This unexpected transcript was solely localized to the nucleus, strongly suggesting that it is not translated and instead may function as a lncRNA. Use of an MHV68 mutant lacking two M3-04-antisense pre-miRNA stem loops resulted in highly increased expression of M3-04 and increased virus replication in the lungs of infected mice, demonstrating a key role for these RNAs in regulation of lytic infection. Together these findings suggest the possibility of a tripartite regulatory relationship between the lncRNA M3-04, antisense miRNAs, and the latency gene M2.
Newly emerged wheat blast disease is a serious threat to global wheat production. Wheat blast is caused by a distinct, exceptionally diverse lineage of the fungus causing rice blast disease. Through sequencing a recent field isolate, we report a reference genome that includes seven core chromosomes and mini-chromosome sequences that harbor effector genes normally found on ends of core chromosomes in other strains. No mini-chromosomes were observed in an early field strain, and at least two from another isolate each contain different effector genes and core chromosome end sequences. The mini-chromosome is enriched in transposons occurring most frequently at core chromosome ends. Additionally, transposons in mini-chromosomes lack the characteristic signature for inactivation by repeat-induced point (RIP) mutation genome defenses. Our results, collectively, indicate that dispensable mini-chromosomes and core chromosomes undergo divergent evolutionary trajectories, and mini-chromosomes and core chromosome ends are coupled as a mobile, fast-evolving effector compartment in the wheat pathogen genome.
Phylogenetic barriers to horizontal transfer of antimicrobial peptide resistance genes in the human gut microbiota.
The human gut microbiota has adapted to the presence of antimicrobial peptides (AMPs), which are ancient components of immune defence. Despite its medical importance, it has remained unclear whether AMP resistance genes in the gut microbiome are available for genetic exchange between bacterial species. Here, we show that AMP resistance and antibiotic resistance genes differ in their mobilization patterns and functional compatibilities with new bacterial hosts. First, whereas AMP resistance genes are widespread in the gut microbiome, their rate of horizontal transfer is lower than that of antibiotic resistance genes. Second, gut microbiota culturing and functional metagenomics have revealed that AMP resistance genes originating from phylogenetically distant bacteria have only a limited potential to confer resistance in Escherichia coli, an intrinsically susceptible species. Taken together, functional compatibility with the new bacterial host emerges as a key factor limiting the genetic exchange of AMP resistance genes. Finally, our results suggest that AMPs induce highly specific changes in the composition of the human microbiota, with implications for disease risks.
RADAR-seq: A RAre DAmage and Repair sequencing method for detecting DNA damage on a genome-wide scale.
RAre DAmage and Repair sequencing (RADAR-seq) is a highly adaptable sequencing method that enables the identification and detection of rare DNA damage events for a wide variety of DNA lesions at single-molecule resolution on a genome-wide scale. In RADAR-seq, DNA lesions are replaced with a patch of modified bases that can be directly detected by Pacific Biosciences Single Molecule Real-Time (SMRT) sequencing. RADAR-seq enables dynamic detection over a wide range of DNA damage frequencies, including low physiological levels. Furthermore, without the need for DNA amplification and enrichment steps, RADAR-seq provides sequencing coverage of damaged and undamaged DNA across an entire genome. Here, we use RADAR-seq to measure the frequency and map the location of ribonucleotides in wild-type and RNaseH2-deficient E. coli and Thermococcus kodakarensis strains. Additionally, by tracking ribonucleotides incorporated during in vivo lagging strand DNA synthesis, we determined the replication initiation point in E. coli, and its relation to the origin of replication (oriC). RADAR-seq was also used to map cyclobutane pyrimidine dimers (CPDs) in Escherichia coli (E. coli) genomic DNA exposed to UV-radiation. On a broader scale, RADAR-seq can be applied to understand formation and repair of DNA damage, the correlation between DNA damage and disease initiation and progression, and complex biological pathways, including DNA replication.Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.
In lichen symbiosis, polyol transfer from green algae is important for acquiring the fungal carbon source. However, the existence of polyol transporter genes and their correlation with lichenization remain unclear. Here, we report candidate polyol transporter genes selected from the genome of the lichen-forming fungus (LFF) Ramalina conduplicans. A phylogenetic analysis using characterized polyol and monosaccharide transporter proteins and hypothetical polyol transporter proteins of R. conduplicans and various ascomycetous fungi suggested that the characterized yeast’ polyol transporters form multiple clades with the polyol transporter-like proteins selected from the diverse ascomycetous taxa. Thus, polyol transporter genes are widely conserved among Ascomycota, regardless of lichen-forming status. In addition, the phylogenetic clusters suggested that LFFs belonging to Lecanoromycetes have duplicated proteins in each cluster. Consequently, the number of sequences similar to characterized yeast’ polyol transporters were evaluated using the genomes of 472 species or strains of Ascomycota. Among these, LFFs belonging to Lecanoromycetes had greater numbers of deduced polyol transporter proteins. Thus, various polyol transporters are conserved in Ascomycota and polyol transporter genes appear to have expanded during the evolution of Lecanoromycetes. Copyright © 2019 British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Penicillium purpurogenum Produces a Set of Endoxylanases: Identification, Heterologous Expression, and Characterization of a Fourth Xylanase, XynD, a Novel Enzyme Belonging to Glycoside Hydrolase Family 10.
The fungus Penicillium purpurogenum grows on a variety of natural carbon sources and secretes a large number of enzymes which degrade the polysaccharides present in lignocellulose. In this work, the gene coding for a novel endoxylanase has been identified in the genome of the fungus. This gene (xynd) possesses four introns. The cDNA has been expressed in Pichia pastoris and characterized. The enzyme, XynD, belongs to family 10 of the glycoside hydrolases. Mature XynD has a calculated molecular weight of 40,997. It consists of 387 amino acid residues with an N-terminal catalytic module, a linker rich in ser and thr residues, and a C-terminal family 1 carbohydrate-binding module. XynD shows the highest identity (97%) to a putative endoxylanase from Penicillium subrubescens but its highest identity to a biochemically characterized xylanase (XYND from Penicillium funiculosum) is only 68%. The enzyme has a temperature optimum of 60 °C, and it is highly stable in its pH optimum range of 6.5-8.5. XynD is the fourth biochemically characterized endoxylanase from P. purpurogenum, confirming the rich potential of this fungus for lignocellulose biodegradation. XynD, due to its wide pH optimum and stability, may be a useful enzyme in biotechnological procedures related to this biodegradation process.
A physical and genetic map of Cannabis sativa identifies extensive rearrangements at the THC/CBD acid synthase loci.
Cannabis sativa is widely cultivated for medicinal, food, industrial, and recreational use, but much remains unknown regarding its genetics, including the molecular determinants of cannabinoid content. Here, we describe a combined physical and genetic map derived from a cross between the drug-type strain Purple Kush and the hemp variety “Finola.” The map reveals that cannabinoid biosynthesis genes are generally unlinked but that aromatic prenyltransferase (AP), which produces the substrate for THCA and CBDA synthases (THCAS and CBDAS), is tightly linked to a known marker for total cannabinoid content. We further identify the gene encoding CBCA synthase (CBCAS) and characterize its catalytic activity, providing insight into how cannabinoid diversity arises in cannabis. THCAS and CBDAS (which determine the drug vs. hemp chemotype) are contained within large (>250 kb) retrotransposon-rich regions that are highly nonhomologous between drug- and hemp-type alleles and are furthermore embedded within ~40 Mb of minimally recombining repetitive DNA. The chromosome structures are similar to those in grains such as wheat, with recombination focused in gene-rich, repeat-depleted regions near chromosome ends. The physical and genetic map should facilitate further dissection of genetic and molecular mechanisms in this commercially and medically important plant. © 2019 Laverty et al.; Published by Cold Spring Harbor Laboratory Press.
Wild almond species accumulate the bitter and toxic cyanogenic diglucoside amygdalin. Almond domestication was enabled by the selection of genotypes harboring sweet kernels. We report the completion of the almond reference genome. Map-based cloning using an F1 population segregating for kernel taste led to the identification of a 46-kilobase gene cluster encoding five basic helix-loop-helix transcription factors, bHLH1 to bHLH5. Functional characterization demonstrated that bHLH2 controls transcription of the P450 monooxygenase-encoding genes PdCYP79D16 and PdCYP71AN24, which are involved in the amygdalin biosynthetic pathway. A nonsynonymous point mutation (Leu to Phe) in the dimerization domain of bHLH2 prevents transcription of the two cytochrome P450 genes, resulting in the sweet kernel trait. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Expression of mutant Ataxin-1 with an abnormally expanded polyglutamine domain is necessary for the onset and progression of spinocerebellar ataxia type 1 (SCA1). Understanding how Ataxin-1 expression is regulated in the human brain could inspire novel molecular therapies for this fatal, dominantly inherited neurodegenerative disease. Previous studies have shown that the ATXN1 3’UTR plays a key role in regulating the Ataxin-1 cellular pool via diverse post-transcriptional mechanisms. Here we show that elements within the ATXN1 5’UTR also participate in the regulation of Ataxin-1 expression. PCR and PacBio sequencing analysis of cDNA obtained from control and SCA1 human brain samples revealed the presence of three major, alternatively spliced ATXN1 5’UTR variants. In cell-based assays, fusion of these variants upstream of an EGFP reporter construct revealed significant and differential impacts on total EGFP protein output, uncovering a type of genetic rheostat-like function of the ATXN1 5’UTR. We identified ribosomal scanning of upstream AUG codons and increased transcript instability as potential mechanisms of regulation. Importantly, transcript-based analyses revealed significant differences in the expression pattern of ATXN1 5’UTR variants between control and SCA1 cerebellum. Together, the data presented here shed light into a previously unknown role for the ATXN1 5’UTR in the regulation of Ataxin-1 and provide new opportunities for the development of SCA1 therapeutics. Copyright © 2019. Published by Elsevier Inc.