Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data.Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that…
Genome structure variation, including copy number variation and presence/absence variation, comprises a large extent of maize genetic diversity; however, its effect on phenotypes remains largely unexplored. Here, we describe how copy number variation underlies a rare allele that contributes to maize aluminum (Al) tolerance. Al toxicity is the primary limitation for crop production on acid soils, which make up 50% of the world’s potentially arable lands. In a recombinant inbred line mapping population, copy number variation of the Al tolerance gene multidrug and toxic compound extrusion 1 (MATE1) is the basis for the quantitative trait locus of largest effect on…
CGGBP1 is a repetitive DNA-binding transcription regulator with target sites at CpG-rich sequences such as CGG repeats and Alu-SINEs and L1-LINEs. The role of CGGBP1 as a possible mediator of CpG methylation however remains unknown. At CpG-rich sequences cytosine methylation is a major mechanism of transcriptional repression. Concordantly, gene-rich regions typically carry lower levels of CpG methylation than the repetitive elements. It is well known that at interspersed repeats Alu-SINEs and L1-LINEs high levels of CpG methylation constitute a transcriptional silencing and retrotransposon inactivating mechanism.Here, we have studied genome-wide CpG methylation with or without CGGBP1-depletion. By high throughput sequencing of…
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that…
Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contiguous and complete eukaryotic genome assembly, focusing on the filamentous fungus Verticillium dahliae. Compared with Illumina-based assemblies of the V. dahliae genome, hybrid assemblies that also include PacBio-generated long reads establish superior contiguity. Intriguingly, provided that sufficient sequence depth is reached, assemblies solely based on PacBio reads outperform hybrid assemblies and even result in fully assembled…
Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ? duplication event. The pineapple lineage has transitioned from C3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues. CAM…
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16?kilobases) reads with random errors, we assembled 99% (244?megabases) of the Oropetium genome into…
Three strikingly different alternative male mating morphs (aggressive ‘independents’, semicooperative ‘satellites’ and female-mimic ‘faeders’) coexist as a balanced polymorphism in the ruff, Philomachus pugnax, a lek-breeding wading bird. Major differences in body size, ornamentation, and aggressive and mating behaviors are inherited as an autosomal polymorphism. We show that development into satellites and faeders is determined by a supergene consisting of divergent alternative, dominant and non-recombining haplotypes of an inversion on chromosome 11, which contains 125 predicted genes. Independents are homozygous for the ancestral sequence. One breakpoint of the inversion disrupts the essential CENP-N gene (encoding centromere protein N), and pedigree…
Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The…
Y chromosomes control essential male functions in many species, including sex determination and fertility. However, because of obstacles posed by repeat-rich heterochromatin, knowledge of Y chromosome sequences is limited to a handful of model organisms, constraining our understanding of Y biology across the tree of life. Here, we leverage long single-molecule sequencing to determine the content and structure of the nonrecombining Y chromosome of the primary African malaria mosquito, Anopheles gambiae. We find that the An. gambiae Y consists almost entirely of a few massively amplified, tandemly arrayed repeats, some of which can recombine with similar repeats on the X…
Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of…
Candida glabrata is an important fungal pathogen which develops rapid antifungal resistance in treated patients. It is known that azole treatments lead to antifungal resistance in this fungal species and that multidrug efflux transporters are involved in this process. Specific mutations in the transcriptional regulator PDR1 result in upregulation of the transporters. In addition, we showed that the PDR1 mutations can contribute to enhance virulence in animal models. In this study, we were interested to compare genomes of two specific C. glabrata-related isolates, one of which was azole susceptible (DSY562) while the other was azole resistant (DSY565). DSY565 contained a…
The sustainable cultivation of rice, which serves as staple food crop for more than half of the world’s population, is under serious threat due to the huge yield losses inflicted by rice blast disease caused by the globally destructive fungus Magnaporthe oryzae (Pyricularia oryzae) (Dean et al., 2012, Nalley et al., 2016, Deng et al., 2017). This filamentous ascomycete fungus is also capable of causing blast infection on other economically important cereal crops, including wheat, millet, and barley, making it the world’s most important plant pathogenic fungus (Zhong et al., 2016). The advent of whole-genome sequencing technology and the subsequent…
Centromeric regions of plants are generally composed of large array of satellites from a specific lineage ofGypsyLTR-retrotransposons, called Centromeric Retrotransposons. Repeated sequences interact with a specific H3 histone, playing a crucial function on kinetochore formation. To study the structure and composition of centromeric regions in the genusCoffea, we annotated and classified Centromeric Retrotransposons sequences from the allotetraploidC. arabicagenome and its two diploid ancestors:Coffea canephoraandC. eugenioides. Ten distinct CRC (Centromeric Retrotransposons inCoffea) families were found. The sequence mapping and FISH experiments of CRC Reverse Transcriptase domains inC. canephora, C. eugenioides, andC. arabicaclearly indicate a strong and specific targeting mainly onto proximal…
Bipolaris papendorfii has been reported as a fungal plant pathogen that rarely causes opportunistic infection in humans. Secondary metabolites isolated from this fungus possess medicinal and anticancer properties. However, its genetic fundamental and basic biology are largely unknown. In this study, we report the first draft genome sequence of B. papendorfii UM 226 isolated from the skin scraping of a patient. The assembled 33.4 Mb genome encodes 11,015 putative coding DNA sequences, of which, 2.49% are predicted transposable elements. Multilocus phylogenetic and phylogenomic analyses showed B. papendorfii UM 226 clustering with Curvularia species, apart from other plant pathogenic Bipolaris species.…