July 19, 2019  |  

Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution.

Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data.Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution.While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.

July 19, 2019  |  

Aluminum tolerance in maize is associated with higher MATE1 gene copy number.

Genome structure variation, including copy number variation and presence/absence variation, comprises a large extent of maize genetic diversity; however, its effect on phenotypes remains largely unexplored. Here, we describe how copy number variation underlies a rare allele that contributes to maize aluminum (Al) tolerance. Al toxicity is the primary limitation for crop production on acid soils, which make up 50% of the world’s potentially arable lands. In a recombinant inbred line mapping population, copy number variation of the Al tolerance gene multidrug and toxic compound extrusion 1 (MATE1) is the basis for the quantitative trait locus of largest effect on phenotypic variation. This expansion in MATE1 copy number is associated with higher MATE1 expression, which in turn results in superior Al tolerance. The three MATE1 copies are identical and are part of a tandem triplication. Only three maize inbred lines carrying the three-copy allele were identified from maize and teosinte diversity panels, indicating that copy number variation for MATE1 is a rare, and quite likely recent, event. These maize lines with higher MATE1 copy number are also Al-tolerant, have high MATE1 expression, and originate from regions of highly acidic soils. Our findings show a role for copy number variation in the adaptation of maize to acidic soils in the tropics and suggest that genome structural changes may be a rapid evolutionary response to new environments.

July 19, 2019  |  

CGGBP1 mitigates cytosine methylation at repetitive DNA sequences.

CGGBP1 is a repetitive DNA-binding transcription regulator with target sites at CpG-rich sequences such as CGG repeats and Alu-SINEs and L1-LINEs. The role of CGGBP1 as a possible mediator of CpG methylation however remains unknown. At CpG-rich sequences cytosine methylation is a major mechanism of transcriptional repression. Concordantly, gene-rich regions typically carry lower levels of CpG methylation than the repetitive elements. It is well known that at interspersed repeats Alu-SINEs and L1-LINEs high levels of CpG methylation constitute a transcriptional silencing and retrotransposon inactivating mechanism.Here, we have studied genome-wide CpG methylation with or without CGGBP1-depletion. By high throughput sequencing of bisulfite-treated genomic DNA we have identified CGGBP1 to be a negative regulator of CpG methylation at repetitive DNA sequences. In addition, we have studied CpG methylation alterations on Alu and L1 retrotransposons in CGGBP1-depleted cells using a novel bisulfite-treatment and high throughput sequencing approach.The results clearly show that CGGBP1 is a possible bidirectional regulator of CpG methylation at Alus, and acts as a repressor of methylation at L1 retrotransposons.

July 19, 2019  |  

Assembly and diploid architecture of an individual human genome via single-molecule technologies.

We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.

July 19, 2019  |  

Single-Molecule Real-Time Sequencing combined with optical mapping yields completely finished fungal genome.

Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contiguous and complete eukaryotic genome assembly, focusing on the filamentous fungus Verticillium dahliae. Compared with Illumina-based assemblies of the V. dahliae genome, hybrid assemblies that also include PacBio-generated long reads establish superior contiguity. Intriguingly, provided that sufficient sequence depth is reached, assemblies solely based on PacBio reads outperform hybrid assemblies and even result in fully assembled chromosomes. Furthermore, the addition of optical map data allowed us to produce a gapless and complete V. dahliae genome assembly of the expected eight chromosomes from telomere to telomere. Consequently, we can now study genomic regions that were previously not assembled or poorly assembled, including regions that are populated by repetitive sequences, such as transposons, allowing us to fully appreciate an organism’s biological complexity. Our data show that a combination of PacBio-generated long reads and optical mapping can be used to generate complete and gapless assemblies of fungal genomes.Studying whole-genome sequences has become an important aspect of biological research. The advent of next-generation sequencing (NGS) technologies has nowadays brought genomic science within reach of most research laboratories, including those that study nonmodel organisms. However, most genome sequencing initiatives typically yield (highly) fragmented genome assemblies. Nevertheless, considerable relevant information related to genome structure and evolution is likely hidden in those nonassembled regions. Here, we investigated a diverse set of strategies to obtain gapless genome assemblies, using the genome of a typical ascomycete fungus as the template. Eventually, we were able to show that a combination of PacBio-generated long reads and optical mapping yields a gapless telomere-to-telomere genome assembly, allowing in-depth genome analyses to facilitate functional studies into an organism’s biology. Copyright © 2015 Faino et al.

July 19, 2019  |  

The pineapple genome and the evolution of CAM photosynthesis.

Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ? duplication event. The pineapple lineage has transitioned from C3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues. CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.

July 19, 2019  |  

Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16?kilobases) reads with random errors, we assembled 99% (244?megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4?megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.

July 19, 2019  |  

A supergene determines highly divergent male reproductive morphs in the ruff.

Three strikingly different alternative male mating morphs (aggressive ‘independents’, semicooperative ‘satellites’ and female-mimic ‘faeders’) coexist as a balanced polymorphism in the ruff, Philomachus pugnax, a lek-breeding wading bird. Major differences in body size, ornamentation, and aggressive and mating behaviors are inherited as an autosomal polymorphism. We show that development into satellites and faeders is determined by a supergene consisting of divergent alternative, dominant and non-recombining haplotypes of an inversion on chromosome 11, which contains 125 predicted genes. Independents are homozygous for the ancestral sequence. One breakpoint of the inversion disrupts the essential CENP-N gene (encoding centromere protein N), and pedigree analysis confirms the lethality of homozygosity for the inversion. We describe new differences in behavior, testis size and steroid metabolism among morphs and identify polymorphic genes within the inversion that are likely to contribute to the differences among morphs in reproductive traits.

July 19, 2019  |  

Long-read sequence assembly of the gorilla genome.

Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. Copyright © 2016, American Association for the Advancement of Science.

July 19, 2019  |  

Radical remodeling of the Y chromosome in a recent radiation of malaria mosquitoes.

Y chromosomes control essential male functions in many species, including sex determination and fertility. However, because of obstacles posed by repeat-rich heterochromatin, knowledge of Y chromosome sequences is limited to a handful of model organisms, constraining our understanding of Y biology across the tree of life. Here, we leverage long single-molecule sequencing to determine the content and structure of the nonrecombining Y chromosome of the primary African malaria mosquito, Anopheles gambiae. We find that the An. gambiae Y consists almost entirely of a few massively amplified, tandemly arrayed repeats, some of which can recombine with similar repeats on the X chromosome. Sex-specific genome resequencing in a recent species radiation, the An. gambiae complex, revealed rapid sequence turnover within An. gambiae and among species. Exploiting 52 sex-specific An. gambiae RNA-Seq datasets representing all developmental stages, we identified a small repertoire of Y-linked genes that lack X gametologs and are not Y-linked in any other species except An. gambiae, with the notable exception of YG2, a candidate male-determining gene. YG2 is the only gene conserved and exclusive to the Y in all species examined, yet sequence similarity to YG2 is not detectable in the genome of a more distant mosquito relative, suggesting rapid evolution of Y chromosome genes in this highly dynamic genus of malaria vectors. The extensive characterization of the An. gambiae Y provides a long-awaited foundation for studying male mosquito biology, and will inform novel mosquito control strategies based on the manipulation of Y chromosomes.

July 19, 2019  |  

Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads.

Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups.

July 19, 2019  |  

Comparative genomics of two sequential Candida glabrata clinical isolates.

Candida glabrata is an important fungal pathogen which develops rapid antifungal resistance in treated patients. It is known that azole treatments lead to antifungal resistance in this fungal species and that multidrug efflux transporters are involved in this process. Specific mutations in the transcriptional regulator PDR1 result in upregulation of the transporters. In addition, we showed that the PDR1 mutations can contribute to enhance virulence in animal models. In this study, we were interested to compare genomes of two specific C. glabrata-related isolates, one of which was azole susceptible (DSY562) while the other was azole resistant (DSY565). DSY565 contained a PDR1 mutation (L280F) and was isolated after a time-lapse of 50 d of azole therapy. We expected that genome comparisons between both isolates could reveal additional mutations reflecting host adaptation or even additional resistance mechanisms. The PacBio technology used here yielded 14 major contigs (sizes 0.18-1.6 Mb) and mitochondrial genomes from both DSY562 and DSY565 isolates that were highly similar to each other. Comparisons of the clinical genomes with the published CBS138 genome indicated important genome rearrangements, but not between the clinical strains. Among the unique features, several retrotransposons were identified in the genomes of the investigated clinical isolates. DSY562 and DSY565 each contained a large set of adhesin-like genes (101 and 107, respectively), which exceed by far the number of reported adhesins (63) in the CBS138 genome. Comparison between DSY562 and DSY565 yielded 17 nonsynonymous SNPs (among which the was the expected PDR1 mutation) as well as small size indels in coding regions (11) but mainly in adhesin-like genes. The genomes contained a DNA mismatch repair allele of MSH2 known to be involved in the so-called hyper-mutator phenotype of this yeast species and the number of accumulated mutations between both clinical isolates is consistent with the presence of a MSH2 defect. In conclusion, this study is the first to compare genomes of C. glabrata sequential clinical isolates using the PacBio technology as an approach. The genomes of these isolates taken in the same patient at two different time points exhibited limited variations, even if submitted to the host pressure. Copyright © 2017 Vale-Silva et al.

July 19, 2019  |  

PacBio sequencing reveals transposable element as a key contributor to genomic plasticity and virulence variation in Magnaporthe oryzae.

The sustainable cultivation of rice, which serves as staple food crop for more than half of the world’s population, is under serious threat due to the huge yield losses inflicted by rice blast disease caused by the globally destructive fungus Magnaporthe oryzae (Pyricularia oryzae) (Dean et al., 2012, Nalley et al., 2016, Deng et al., 2017). This filamentous ascomycete fungus is also capable of causing blast infection on other economically important cereal crops, including wheat, millet, and barley, making it the world’s most important plant pathogenic fungus (Zhong et al., 2016). The advent of whole-genome sequencing technology and the subsequent deployment of next-generation sequencing (NGS) strategies have successfully generated genome assemblies for over 50 isolates of M. oryzae, which have played an instrumental role in enhancing our understanding of how rice blast fungus undertakes host adaptation, host specificity, and host range expansion to overcome host resistance (Dean et al., 2005, Xue et al., 2012, Wu et al., 2015, Zhang et al., 2016). However, research findings obtained from comparative genomic studies conducted using the NGS-assembled genome do not present an in-depth account of the genomic features that contribute to the prevailing genomic variations among M. oryzae species, because NGS assemblies are highly fragmented and lack most of the lineage-specific (LS) regions, which are more plastic than the core genome and enriched with repeats and effector proteins (Raffaele and Kamoun, 2012, Faino et al., 2016).

July 19, 2019  |  

Structure and distribution of centromeric retrotransposons at diploid and allotetraploid Coffea centromeric and pericentromeric regions.

Centromeric regions of plants are generally composed of large array of satellites from a specific lineage ofGypsyLTR-retrotransposons, called Centromeric Retrotransposons. Repeated sequences interact with a specific H3 histone, playing a crucial function on kinetochore formation. To study the structure and composition of centromeric regions in the genusCoffea, we annotated and classified Centromeric Retrotransposons sequences from the allotetraploidC. arabicagenome and its two diploid ancestors:Coffea canephoraandC. eugenioides. Ten distinct CRC (Centromeric Retrotransposons inCoffea) families were found. The sequence mapping and FISH experiments of CRC Reverse Transcriptase domains inC. canephora, C. eugenioides, andC. arabicaclearly indicate a strong and specific targeting mainly onto proximal chromosome regions, which can be associated also with heterochromatin. PacBio genome sequence analyses of putative centromeric regions onC. arabicaandC. canephorachromosomes showed an exceptional density of one family of CRC elements, and the complete absence of satellite arrays, contrasting with usual structure of plant centromeres. Altogether, our data suggest a specific centromere organization inCoffea, contrasting with other plant genomes.

July 7, 2019  |  

Dissecting the fungal biology of Bipolaris papendorfii: from phylogenetic to comparative genomic analysis.

Bipolaris papendorfii has been reported as a fungal plant pathogen that rarely causes opportunistic infection in humans. Secondary metabolites isolated from this fungus possess medicinal and anticancer properties. However, its genetic fundamental and basic biology are largely unknown. In this study, we report the first draft genome sequence of B. papendorfii UM 226 isolated from the skin scraping of a patient. The assembled 33.4 Mb genome encodes 11,015 putative coding DNA sequences, of which, 2.49% are predicted transposable elements. Multilocus phylogenetic and phylogenomic analyses showed B. papendorfii UM 226 clustering with Curvularia species, apart from other plant pathogenic Bipolaris species. Its genomic features suggest that it is a heterothallic fungus with a putative unique gene encoding the LysM-containing protein which might be involved in fungal virulence on host plants, as well as a wide array of enzymes involved in carbohydrate metabolism, degradation of polysaccharides and lignin in the plant cell wall, secondary metabolite biosynthesis (including dimethylallyl tryptophan synthase, non-ribosomal peptide synthetase, polyketide synthase), the terpenoid pathway and the caffeine metabolism. This first genomic characterization of B. papendorfii provides the basis for further studies on its biology, pathogenicity and medicinal potential. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.