Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time
CRISPR-Cas9 and BEs system are poised to become the gene editing tool of choice in clinical contexts, however large fragment deletion was found in Cas9-mediated mutation cells without animal level validation. By analyzing 16 gene-edited rabbit lines (including 112 rabbits) generated using SpCas9, BEs, xCas9 and xCas9-BEs with long-range PCR genotyping and long-read sequencing by PacBio platform, we show that extending thousands of bases fragment deletions in single-guide RNA/Cas9 and xCas9 system mutation rabbit, but few large deletions were found in BEs-induced mutation rabbits. We firstly validated that no large fragment deletion induced by BEs system at animal level, suggesting that BE systems can be beneficial tools for the further development of highly accurate and secure gene therapy for the clinical treatment of human genetic disorders
Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline
Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for each class of elements with unknown relative performance metrics. We benchmarked existing programs based on a curated library of rice TEs. Using the most robust programs, we created a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a condensed TE library for annotations of structurally intact and fragmented elements. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.List of abbreviationsTETransposable ElementsLTRLong Terminal RepeatLINELong Interspersed Nuclear ElementSINEShort Interspersed Nuclear ElementMITEMiniature Inverted Transposable ElementTIRTerminal Inverted RepeatTSDTarget Site DuplicationTPTrue PositivesFPFalse PositivesTNTrue NegativeFNFalse NegativesGRFGeneric Repeat FinderEDTAExtensive de-novo TE Annotator
Haplotype phasing of genetic variants is important for interpretation of the maize genome, population genetic analysis, and functional genomic analysis of allelic activity. Accordingly, accurate methods for phasing full-length isoforms are essential for functional genomics study. In this study, we performed an isoform-level phasing study in maize, using two inbred lines and their reciprocal crosses, based on single-molecule full-length cDNA sequencing. To phase and analyze full-length transcripts between hybrids and parents, we developed a tool called IsoPhase. Using this tool, we validated the majority of SNPs called against matching short read data and identified cases of allele-specific, gene-level, and isoform-level expression. Our results revealed that maize parental and hybrid lines exhibit different splicing activities. After phasing 6,847 genes in two reciprocal hybrids using embryo, endosperm and root tissues, we annotated the SNPs and identified large-effect genes. In addition, based on single-molecule sequencing, we identified parent-of-origin isoforms in maize hybrids, different novel isoforms between maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase power and accuracy in studies of allelic expression.
Defining transgene insertion sites and off-target effects of homology-based gene silencing informs the use of functional genomics tools in Phytophthora infestans.
DNA transformation and homology-based transcriptional silencing are frequently used to assess gene function in Phytophthora. Since unplanned side-effects of these tools are not well-characterized, we used P. infestans to study plasmid integration sites and whether knockdowns caused by homology-dependent silencing spreads to other genes. Insertions occurred both in gene-dense and gene-sparse regions but disproportionately near the 5′ ends of genes, which disrupted native coding sequences. Microhomology at the recombination site between plasmid and chromosome was common. Studies of transformants silenced for twelve different gene targets indicated that neighbors within 500-nt were often co-silenced, regardless of whether hairpin or sense constructs were employed and the direction of transcription of the target. However, cis-spreading of silencing did not occur in all transformants obtained with the same plasmid. Genome-wide studies indicated that unlinked genes with partial complementarity with the silencing-inducing transgene were not usually down-regulated. We learned that hairpin or sense transgenes were not co-silenced with the target in all transformants, which informs how screens for silencing should be performed. We conclude that transformation and gene silencing can be reliable tools for functional genomics in Phytophthora but must be used carefully, especially by testing for the spread of silencing to genes flanking the target.
Morphological and genomic characterisation of the hybrid schistosome infecting humans in Europe reveals a complex admixture between Schistosoma haematobium and Schistosoma bovis parasites
Schistosomes cause schistosomiasis, the worldtextquoterights second most important parasitic disease after malaria. A peculiar feature of schistosomes is their ability to produce viable and fertile hybrids. Originally only present in the tropics, schistosomiasis is now also endemic in Europe. Based on two genetic markers the European species had been identified as a hybrid between the ruminant-infective Schistosoma bovis and the human-infective Schistosoma haematobium.Here we describe for the first time the genomic composition of the European schistosome hybrid (77% of S. haematobium and 23% of S. bovis origins), its morphometric parameters and its compatibility with the European vector snail and intermediate host Compatibility is a key parameter for the parasites life cycle progression. We also show that egg morphology (a classical diagnostic parameter) does not allow for differential diagnosis while genetic tests do so. Additionally, we performed genome assembly improvement and annotation of S. bovis, the parental species for which no satisfactory genome assembly was available.For the first time since the discovery of hybrid schistosomes, these results reveal at the whole genomic level a complex admixture of parental genomes highlighting (i) the high permeability of schistosomes to other speciestextquoteright alleles, and (ii) the importance of hybrid formation for pushing species boundaries not only conceptionally but also geographically.
Suppressed recombination allows divergence between homologous sex chromosomes and the functionality of their genes. Here, we reveal patterns of the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua on the basis of cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis. The genome assembly contained 34,105 expressed genes, of which 10,076 were assigned to linkage groups. Genetic mapping and exome resequencing of individuals across the species range both identified the largest linkage group, LG1, as the sex chromosome. Although the sex chromosomes of M. annua are karyotypically homomorphic, we estimate that about a third of the Y chromosome has ceased recombining, containing 568 transcripts and spanning 22.3 cM in the corresponding female map. Nevertheless, we found limited evidence for Y-chromosome degeneration in terms of gene loss and pseudogenization, and most X- and Y-linked genes appear to have diverged in the period subsequent to speciation between M. annua and its sister species M. huetii which shares the same sex-determining region. Taken together, our results suggest that the M. annua Y chromosome has at least two evolutionary strata: a small old stratum shared with M. huetii, and a more recent larger stratum that is probably unique to M. annua and that stopped recombining about one million years ago. Patterns of gene expression within the non-recombining region are consistent with the idea that sexually antagonistic selection may have played a role in favoring suppressed recombination.Copyright © 2019, Genetics.
Salmonella Genomic Island 3 Is an Integrative and Conjugative Element and Contributes to Copper and Arsenic Tolerance of Salmonella enterica.
Salmonella genomic island 3 (SGI3) was first described as a chromosomal island in Salmonella 4,,12:i:-, a monophasic variant of Salmonella enterica subsp. enterica serovar Typhimurium. The SGI3 DNA sequence detected from Salmonella 4,,12:i:- isolated in Japan was identical to that of a previously reported one across entire length of 81?kb. SGI3 consists of 86 open reading frames, including a copper homeostasis and silver resistance island (CHASRI) and an arsenic tolerance operon, in addition to genes related to conjugative transfer and DNA replication or partitioning, suggesting that the island is a mobile genetic element. We successfully selected transconjugants that acquired SGI3 after filter-mating experiments using the S. enterica serovars Typhimurium, Heidelberg, Hadar, Newport, Cerro, and Thompson as recipients. Southern blot analysis using I-CeuI-digested genomic DNA demonstrated that SGI3 was integrated into a chromosomal fragment of the transconjugants. PCR and sequencing analysis demonstrated that SGI3 was inserted into the 3′ end of the tRNA genes pheV or pheR The length of the target site was 52 or 55?bp, and a 55-bp attI sequence indicating generation of the circular form of SGI3 was also detected. The transconjugants had a higher MIC against CuSO4 compared to the recipient strains under anaerobic conditions. Tolerance was defined by the cus gene cluster in the CHASRI. The transconjugants also had distinctly higher MICs against Na2HAsO4 compared to recipient strains under aerobic conditions. These findings clearly demonstrate that SGI3 is an integrative and conjugative element and contributes to the copper and arsenic tolerance of S. enterica.Copyright © 2019 American Society for Microbiology.
African cichlid fishes are well known for their rapid radiations and are a model system for studying evolutionary processes. Here we compare multiple, high-quality, chromosome-scale genome assemblies to elucidate the genetic mechanisms underlying cichlid diversification and study how genome structure evolves in rapidly radiating lineages.We re-anchored our recent assembly of the Nile tilapia (Oreochromis niloticus) genome using a new high-density genetic map. We also developed a new de novo genome assembly of the Lake Malawi cichlid, Metriaclima zebra, using high-coverage Pacific Biosciences sequencing, and anchored contigs to linkage groups (LGs) using 4 different genetic maps. These new anchored assemblies allow the first chromosome-scale comparisons of African cichlid genomes. Large intra-chromosomal structural differences (~2-28 megabase pairs) among species are common, while inter-chromosomal differences are rare (<10 megabase pairs total). Placement of the centromeres within the chromosome-scale assemblies identifies large structural differences that explain many of the karyotype differences among species. Structural differences are also associated with unique patterns of recombination on sex chromosomes. Structural differences on LG9, LG11, and LG20 are associated with reduced recombination, indicative of inversions between the rock- and sand-dwelling clades of Lake Malawi cichlids. M. zebra has a larger number of recent transposable element insertions compared with O. niloticus, suggesting that several transposable element families have a higher rate of insertion in the haplochromine cichlid lineage.This study identifies novel structural variation among East African cichlid genomes and provides a new set of genomic resources to support research on the mechanisms driving cichlid adaptation and speciation. © The Author(s) 2019. Published by Oxford University Press.
Comparative genomic analysis of eight novel haloalkaliphilic bacteriophages from Lake Elmenteita, Kenya.
We report complete genome sequences of eight bacteriophages isolated from Haloalkaline Lake Elmenteita found on the floor of Kenyan Rift Valley. The bacteriophages were sequenced, annotated and a comparative genomic analysis using various Bioinformatics tools carried out to determine relatedness of the bacteriophages to each other, and to those in public databases. Basic genome properties like genome size, percentage coding density, number of open reading frames, percentage GC content and gene organizations revealed the bacteriophages had no relationship to each other. Comparison to other nucleotide sequences in GenBank database showed no significant similarities hence novel. At the amino acid level, phages of our study revealed mosaicism to genes with conserved domains to already described phages. Phylogenetic analyses of large terminase gene responsible for DNA packaging and DNA polymerase gene for replication further showed diversity among the bacteriophages. Our results give insight into diversity of bacteriophages in Lake Elmenteita and provide information on their evolution. By providing primary sequence information, this study not only provides novel sequences for biotechnological exploitation, but also sets stage for future studies aimed at better understanding of virus diversity and genomes from haloalkaline lakes in the Rift Valley.
Cultivars of purple tea (Camellia sinensis) that accumulate anthocyanins in place of catechins are currently attracting global interest in their use as functional health beverages. RNA-seq of normal (LJ43) and purple Zijuan (ZJ) cultivars identified the transcription factor CsMYB75 and phi (F) class glutathione transferase CsGSTF1 as being associated with anthocyanin hyperaccumulation. Both genes mapped as a quantitative trait locus (QTL) to the purple bud leaf color (BLC) trait in F1 populations, with CsMYB75 promoting the expression of CsGSTF1 in transgenic tobacco (Nicotiana tabacum). Although CsMYB75 elevates the biosynthesis of both catechins and anthocyanins, only anthocyanins accumulate in purple tea, indicating selective downstream regulation. As glutathione transferases in other plants are known to act as transporters (ligandins) of flavonoids, directing them for vacuolar deposition, the role of CsGSTF1 in selective anthocyanin accumulation was investigated. In tea, anthocyanins accumulate in multiple vesicles, with the expression of CsGSTF1 correlated with BLC, but not with catechin content, in diverse germplasm. Complementation of the Arabidopsis tt19-8 mutant, which is unable to express the orthologous ligandin AtGSTF12, restored anthocyanin accumulation, but did not rescue the transparent testa phenotype, confirming that CsGSTF1 did not function in catechin accumulation. Consistent with a ligandin function, transient expression of CsGSTF1 in Nicotiana occurred in the nucleus, cytoplasm and membrane. Furthermore, RNA-Seq of the complemented mutants exposed to 2% sucrose as a stress treatment showed unexpected roles for anthocyanin accumulation in affecting the expression of genes involved in redox responses, phosphate homeostasis and the biogenesis of photosynthetic components, as compared with non-complemented plants. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.
Wolbachia, an alpha-proteobacterium closely related to Rickettsia, is a maternally transmitted, intracellular symbiont of arthropods and nematodes. Aedes albopictus mosquitoes are naturally infected with Wolbachia strains wAlbA and wAlbB. Cell line Aa23 established from Ae. albopictus embryos retains only wAlbB and is a key model to study host-endosymbiont interactions. We have assembled the complete circular genome of wAlbB from the Aa23 cell line using long-read PacBio sequencing at 500× median coverage. The assembled circular chromosome is 1.48 megabases in size, an increase of more than 300 kb over the published draft wAlbB genome. The annotation of the genome identified 1,205 protein coding genes, 34 tRNA, 3 rRNA, 1 tmRNA, and 3 other ncRNA loci. The long reads enabled sequencing over complex repeat regions which are difficult to resolve with short-read sequencing. Thirteen percent of the genome comprised insertion sequence elements distributed throughout the genome, some of which cause pseudogenization. Prophage WO genes encoding some essential components of phage particle assembly are missing, while the remainder are found in five prophage regions/WO-like islands or scattered around the genome. Orthology analysis identified a core proteome of 535 orthogroups across all completed Wolbachia genomes. The majority of proteins could be annotated using Pfam and eggNOG analyses, including ankyrins and components of the Type IV secretion system. KEGG analysis revealed the absence of five genes in wAlbB which are present in other Wolbachia. The availability of a complete circular chromosome from wAlbB will enable further biochemical, molecular, and genetic analyses on this strain and related Wolbachia. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Finding Nemo’s Genes: A chromosome-scale reference assembly of the genome of the orange clownfish Amphiprion percula.
The iconic orange clownfish, Amphiprion percula, is a model organism for studying the ecology and evolution of reef fishes, including patterns of population connectivity, sex change, social organization, habitat selection and adaptation to climate change. Notably, the orange clownfish is the only reef fish for which a complete larval dispersal kernel has been established and was the first fish species for which it was demonstrated that antipredator responses of reef fishes could be impaired by ocean acidification. Despite its importance, molecular resources for this species remain scarce and until now it lacked a reference genome assembly. Here, we present a de novo chromosome-scale assembly of the genome of the orange clownfish Amphiprion percula. We utilized single-molecule real-time sequencing technology from Pacific Biosciences to produce an initial polished assembly comprised of 1,414 contigs, with a contig N50 length of 1.86 Mb. Using Hi-C-based chromatin contact maps, 98% of the genome assembly were placed into 24 chromosomes, resulting in a final assembly of 908.8 Mb in length with contig and scaffold N50s of 3.12 and 38.4 Mb, respectively. This makes it one of the most contiguous and complete fish genome assemblies currently available. The genome was annotated with 26,597 protein-coding genes and contains 96% of the core set of conserved actinopterygian orthologs. The availability of this reference genome assembly as a community resource will further strengthen the role of the orange clownfish as a model species for research on the ecology and evolution of reef fishes. © 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
Hybrid crops, an important part of modern agriculture, rely on the development of male and female heterotic gene pools. In sunflowers, heterotic gene pools were developed through the use of crop-wild relatives to produce cytoplasmic male sterile female and branching, fertility restoring male lines. Here, we use genomic data from a diversity panel of male, female, and open-pollinated lines to explore the genetic changes brought during modern improvement. We find the male lines have diverged most from their open-pollinated progenitors and that genetic differentiation is concentrated in chromosomes, 8, 10 and 13, due to introgressions from wild relatives. Ancestral variation from open-pollinated varieties almost universally evolved in parallel for both male and female lines suggesting little or no selection for heterotic overdominance. Furthermore, we show that gene content differs between the male and female lines and that differentiation in gene content is concentrated in high FST regions. This means that the introgressions that brought branching and fertility restoration to the male lines, brought with them different gene content from the ancestral haplotypes, including the removal of some genes. Although we find no evidence that gene complementation genomewide is responsible for heterosis between male and female lines, several of the genes that are largely absent in either the male or female lines are associated with pathogen defense, suggesting complementation may be functionally relevant for crop breeders.
Icefishes (suborder Notothenioidei; family Channichthyidae) are the only vertebrates that lack functional haemoglobin genes and red blood cells. Here, we report a high-quality genome assembly and linkage map for the Antarctic blackfin icefish Chaenocephalus aceratus, highlighting evolved genomic features for its unique physiology. Phylogenomic analysis revealed that Antarctic fish of the teleost suborder Notothenioidei, including icefishes, diverged from the stickleback lineage about 77 million years ago and subsequently evolved cold-adapted phenotypes as the Southern Ocean cooled to sub-zero temperatures. Our results show that genes involved in protection from ice damage, including genes encoding antifreeze glycoprotein and zona pellucida proteins, are highly expanded in the icefish genome. Furthermore, genes that encode enzymes that help to control cellular redox state, including members of the sod3 and nqo1 gene families, are expanded, probably as evolutionary adaptations to the relatively high concentration of oxygen dissolved in cold Antarctic waters. In contrast, some crucial regulators of circadian homeostasis (cry and per genes) are absent from the icefish genome, suggesting compromised control of biological rhythms in the polar light environment. The availability of the icefish genome sequence will accelerate our understanding of adaptation to extreme Antarctic environments.