Wheat (Triticum aestivum L.), a globally important crop, is challenged by increasing temperatures (heat stress, HS). However its polyploid nature, the incompleteness of its genome sequences and annotation, the lack of comprehensive HS-responsive transcriptomes and the unexplored heat sensing and signaling of wheat hinder our full understanding of its adaptations to HS. The recently released genome sequences of wheat, as well as emerging single-molecular sequencing technologies, provide an opportunity to thoroughly investigate the molecular mechanisms of the wheat response to HS. We generated a high-resolution spatio-temporal transcriptome map of wheat flag leaves and filling grain under HS at 0 min, 5 min, 10 min, 30 min, 1 h and 4 h by combining full-length single-molecular sequencing and Illumina short reads sequencing. This hybrid sequencing newly discovered 4947 loci and 70 285 transcripts, generating the comprehensive and dynamic list of HS-responsive full-length transcripts and complementing the recently released wheat reference genome. Large-scale analysis revealed a global landscape of heat adaptations, uncovering unexpected rapid heat sensing and signaling, significant changes of more than half of HS-responsive genes within 30 min, heat shock factor-dependent and -independent heat signaling, and metabolic alterations in early HS-responses. Integrated analysis also demonstrated the differential responses and partitioned functions between organs and subgenomes, and suggested a differential pattern of transcriptional and alternative splicing regulation in the HS response. This study provided comprehensive data for dissecting molecular mechanisms of early HS responses in wheat and highlighted the genomic plasticity and evolutionary divergence of polyploidy wheat. © 2019 The Authors. The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.
The development of clustered regularly interspaced short-palindromic repeat (CRISPR)-Cas systems for genome editing has transformed the way life science research is conducted and holds enormous potential for the treatment of disease as well as for many aspects of biotech- nology. Here, I provide a personal perspective on the development of CRISPR-Cas9 for genome editing within the broader context of the field and discuss our work to discover novel Cas effectors and develop them into additional molecular tools. The initial demonstra- tion of Cas9-mediated genome editing launched the development of many other technologies, enabled new lines of biological inquiry, and motivated a deeper examination of natural CRISPR-Cas systems, including the discovery of new types of CRISPR-Cas systems. These new discoveries in turn spurred further technological developments. I review these exciting discoveries and technologies as well as provide an overview of the broad array of applications of these technologies in basic research and in the improvement of human health. It is clear that we are only just beginning to unravel the potential within microbial diversity, and it is quite likely that we will continue to discover other exciting phenomena, some of which it may be possible to repurpose as molecular technologies. The transformation of mysterious natural phenomena to powerful tools, however, takes a collective effort to discover, characterize, and engineer them, and it has been a privilege to join the numerous researchers who have contributed to this transformation of CRISPR-Cas systems.
Improvement of the Pacific bluefin tuna (Thunnus orientalis) reference genome and development of male-specific DNA markers.
The Pacific bluefin tuna, Thunnus orientalis, is a highly migratory species that is widely distributed in the North Pacific Ocean. Like other marine species, T. orientalis has no external sexual dimorphism; thus, identifying sex-specific variants from whole genome sequence data is a useful approach to develop an effective sex identification method. Here, we report an improved draft genome of T. orientalis and male-specific DNA markers. Combining PacBio long reads and Illumina short reads sufficiently improved genome assembly, with a 38-fold increase in scaffold contiguity (to 444 scaffolds) compared to the first published draft genome. Through analysing re-sequence data of 15 males and 16 females, 250 male-specific SNPs were identified from more than 30 million polymorphisms. All male-specific variants were male-heterozygous, suggesting that T. orientalis has a male heterogametic sex-determination system. The largest linkage disequilibrium block (3,174?bp on scaffold_064) contained 51 male-specific variants. PCR primers and a PCR-based sex identification assay were developed using these male-specific variants. The sex of 115 individuals (56 males and 59 females; sex was diagnosed by visual examination of the gonads) was identified with high accuracy using the assay. This easy, accurate, and practical technique facilitates the control of sex ratios in tuna farms. Furthermore, this method could be used to estimate the sex ratio and/or the sex-specific growth rate of natural populations.
The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor >?98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.
Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation.
We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.
The availability of plant reference genomes has ushered in a new era of crop genomics. More than 100 plant genomes have been sequenced since 2000, 63% of which are crop species. These genome sequences provide insight into architecture, evolution and novel aspects of crop genomes such as the retention of key agronomic traits after whole genome duplication events. Some crops have very large, polyploid, repeat-rich genomes, which require innovative strategies for sequencing, assembly and analysis. Even low quality reference genomes have the potential to improve crop germplasm through genome-wide molecular markers, which decrease expensive phenotyping and breeding cycles. The next stage of plant genomics will require draft genome refinement, building resources for crop wild relatives, resequencing broad diversity panels, and plant ENCODE projects to better understand the complexities of these highly diverse genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16?kilobases) reads with random errors, we assembled 99% (244?megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4?megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution.
Background: Numerous completed or on-going whole genome sequencing projects have highlighted the fact that obtaining a high quality genome sequence is necessary to address comparative genomics questions such as structural variations among genotypes and gain or loss of specific function. Despite the spectacular progress that has been made in sequencing technologies, obtaining accurate and reliable data is still a challenge, both at the whole genome scale and when targeting specific genomic regions. These problems are even more noticeable for complex plant genomes. Most plant genomes are known to be particularly challenging due to their size, high density of repetitive elements and various levels of ploidy. To overcome these problems, we have developed a strategy to reduce genome complexity by using the large insert BAC libraries combined with next generation sequencing technologies. Results: We compared two different technologies (Roche-454 and Pacific Biosciences PacBio RS II) to sequence pools of BAC clones in order to obtain the best quality sequence. We targeted nine BAC clones from different species (maize, wheat, strawberry, barley, sugarcane and sunflower) known to be complex in terms of sequence assembly. We sequenced the pools of the nine BAC clones with both technologies. We compared assembly results and highlighted differences due to the sequencing technologies used. Conclusions: We demonstrated that the long reads obtained with the PacBio RS II technology serve to obtain a better and more reliable assembly, notably by preventing errors due to duplicated or repetitive sequences in the same region.
Since the introduction of next generation sequencing, plant genome assembly projects do not need to rely on dedicated research facilities or community-wide consortia anymore, even individual research groups can sequence and assemble the genomes they are interested in. However, such assemblies are typically not based on the entire breadth of genomic technologies including genetic and physical maps and their contiguities tend to be low compared to the full-length gold standard reference sequences. Recently emerging third generation genomic technologies like long-read sequencing or optical mapping promise to bridge this quality gap and enable simple and cost-effective solutions for chromosomal-level assemblies.
Extract It may be hard to believe, but the idea of sequence assembly is around 40 years old. Consider this pair of quotes from Rodger Staden (Staden 1979): “With modern fast sequencing techniques and suitable computer programs it is now possible to sequence whole genomes without the need of restriction maps.” “If the 5′ end of the sequence from one gel reading is the same as the 3′ end of the sequence from another the data is said to overlap. If the overlap is of sufficient length to distinguish it from being a repeat in the sequence the two sequences must be contiguous. The data from the two gel readings can then be joined to form one longer continuous sequence.” Replace “gel reading” with “read” and these sentences would go unnoticed in the introduction of any paper today. Here you can also see the birth of jargon that now pervades the field: overlaps between reads form contigs (contiguous sequences). Just a few months later, Gingeras et al. (1979) described “Computer programs for the assembly of DNA sequences.” It all sounds so modern, until the discussion mentions FORTRAN code stored on magnetic tapes. How, then, can we fill an entire special issue of Genome Research with “new advances” so many years later? To me, this reflects the beauty of the problem—simple enough to be stated in a single paragraph, yet complex enough to sustain a field of research for decades. This dichotomy is common to many famous computational problems; indeed, mathematical formulations of sequence assembly fall into a class of problems known as “NP-hard” that do not admit an easy solution (Medvedev et al. 2007). There is another reason for continued advances in sequence assembly—advances in sequencing technology. As evident from the Staden quotes above, the first assembly methods were …
Transcription activator-like (TAL) effectors from Xanthomonas citri subsp. malvacearum (Xcm) are essential for bacterial blight of cotton (BBC). Here, by combining transcriptome profiling with TAL effector-binding element (EBE) prediction, we show that GhSWEET10, encoding a functional sucrose transporter, is induced by Avrb6, a TAL effector determining Xcm pathogenicity. Activation of GhSWEET10 by designer TAL effectors (dTALEs) restores virulence of Xcm avrb6 deletion strains, whereas silencing of GhSWEET10 compromises cotton susceptibility to infections. A BBC-resistant line carrying an unknown recessive b6 gene bears the same EBE as the susceptible line, but Avrb6-mediated induction of GhSWEET10 is reduced, suggesting a unique mechanism underlying b6-mediated resistance. We show via an extensive survey of GhSWEET transcriptional responsiveness to different Xcm field isolates that additional GhSWEETs may also be involved in BBC. These findings advance our understanding of the disease and resistance in cotton and may facilitate the development cotton with improved resistance to BBC.
Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.
Common bread wheat, Triticum aestivum, has one of the most complex genomes known to science, with 6 copies of each chromosome, enormous numbers of near-identical sequences scattered throughout, and an overall haploid size of more than 15 billion bases. Multiple past attempts to assemble the genome have produced assemblies that were well short of the estimated genome size. Here we report the first near-complete assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads. The final assembly contains 15 344 693 583 bases and has a weighted average (N50) contig size of 232 659 bases. This represents by far the most complete and contiguous assembly of the wheat genome to date, providing a strong foundation for future genetic studies of this important food crop. We also report how we used the recently published genome of Aegilops tauschii, the diploid ancestor of the wheat D genome, to identify 4 179 762 575 bp of T. aestivum that correspond to its D genome components.© The Author 2017. Published by Oxford University Press.
Wheat is an important global crop with an extremely large and complex genome that contains more transposable elements (TEs) than any other known crop species. Here, we generated a chromosome-scale, high-quality reference genome of Aegilops tauschii, the donor of the wheat D genome, in which 92.5% sequences have been anchored to chromosomes. Using this assembly, we accurately characterized genic loci, gene expression, pseudogenes, methylation, recombination ratios, microRNAs and especially TEs on chromosomes. In addition to the discovery of a wave of very recent gene duplications, we detected that TEs occurred in about half of the genes, and found that such genes are expressed at lower levels than those without TEs, presumably because of their elevated methylation levels. We mapped all wheat molecular markers and constructed a high-resolution integrated genetic map corresponding to genome sequences, thereby placing previously detected agronomically important genes/quantitative trait loci (QTLs) on the Ae. tauschii genome for the first time.