Menu
July 19, 2019

Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma.

To understand how genomic heterogeneity of glioblastoma (GBM) contributes to poor therapy response, we performed DNA and RNA sequencing on GBM samples and the neurospheres and orthotopic xenograft models derived from them. We used the resulting dataset to show that somatic driver alterations including single-nucleotide variants, focal DNA alterations and oncogene amplification on extrachromosomal DNA (ecDNA) elements were in majority propagated from tumor to model systems. In several instances, ecDNAs and chromosomal alterations demonstrated divergent inheritance patterns and clonal selection dynamics during cell culture and xenografting. We infer that ecDNA was unevenly inherited by offspring cells, a characteristic that affects the oncogenic potential of cells with more or fewer ecDNAs. Longitudinal patient tumor profiling found that oncogenic ecDNAs are frequently retained throughout the course of disease. Our analysis shows that extrachromosomal elements allow rapid increase of genomic heterogeneity during GBM evolution, independently of chromosomal DNA alterations.


July 19, 2019

Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits.

The ancestors of Gossypium arboreum and Gossypium herbaceum provided the A subgenome for the modern cultivated allotetraploid cotton. Here, we upgraded the G. arboreum genome assembly by integrating different technologies. We resequenced 243?G. arboreum and G. herbaceum accessions to generate a map of genome variations and found that they are equally diverged from Gossypium raimondii. Independent analysis suggested that Chinese G. arboreum originated in South China and was subsequently introduced to the Yangtze and Yellow River regions. Most accessions with domestication-related traits experienced geographic isolation. Genome-wide association study (GWAS) identified 98 significant peak associations for 11 agronomically important traits in G. arboreum. A nonsynonymous substitution (cysteine-to-arginine substitution) of GaKASIII seems to confer substantial fatty acid composition (C16:0 and C16:1) changes in cotton seeds. Resistance to fusarium wilt disease is associated with activation of GaGSTF9 expression. Our work represents a major step toward understanding the evolution of the A genome of cotton.


July 19, 2019

The Rosa genome provides new insights into the domestication of modern roses.

Roses have high cultural and economic importance as ornamental plants and in the perfume industry. We report the rose whole-genome sequencing and assembly and resequencing of major genotypes that contributed to rose domestication. We generated a homozygous genotype from a heterozygous diploid modern rose progenitor, Rosa chinensis ‘Old Blush’. Using single-molecule real-time sequencing and a meta-assembly approach, we obtained one of the most comprehensive plant genomes to date. Diversity analyses highlighted the mosaic origin of ‘La France’, one of the first hybrids combining the growth vigor of European species and the recurrent blooming of Chinese species. Genomic segments of Chinese ancestry identified new candidate genes for recurrent blooming. Reconstructing regulatory and secondary metabolism pathways allowed us to propose a model of interconnected regulation of scent and flower color. This genome provides a foundation for understanding the mechanisms governing rose traits and should accelerate improvement in roses, Rosaceae and ornamentals.


July 19, 2019

Introduction: The host-associated microbiome: Pattern, process and function.

An explosion of studies in recent years has established the ubiquity of host-associated microbes and their centrality to host biology (McFall-Ngai et al., 2013; Russell, Dubilier, & Rudgers, 2014). Microbes aid in digestion, modulate development, contribute to host immunity, mediate abiotic stress and more. While relationships with host-associated microbes are ubiquitous and important, they are cer- tainly not monolithic. Characterizing the microbial diversity associ- ated with an ever-broadening array of hosts (diverse animals, plants, algae and protists) has shown that essential functions can be per- formed by microbes that are integrated with the host to varying degrees, ranging from embedded endosymbionts to a variable cast of transient microbes acquired from the environment. The maturing host–microbiome field is now developing a mechanistic understand- ing of host/microbe relationships across this spectrum and the cross- talk mediating these interactions. Similarly, studies across systems are illuminating the ecological and evolutionary factors that shape host–microbe interactions today and providing hints into the origins of specific relationships.


July 19, 2019

Long-read sequencing and de novo genome assembly of Ammopiptanthus nanus, a desert shrub.

Ammopiptanthus nanus is a rare broad-leaved shrub that is found in the desert and arid regions of Central Asia. This plant species exhibits extremely high tolerance to drought and freezing and has been used in abiotic tolerance research in plants. As a relic of the tertiary period, A. nanus is of great significance to plant biogeographic research in the ancient Mediterranean region. Here, we report a draft genome assembly using the Pacific Biosciences (PacBio) platform and gene annotation for A. nanus.A total of 64.72 Gb of raw PacBio sequel reads were generated from four 20-kb libraries. After filtering, 64.53 Gb of clean reads were obtained, giving 72.59× coverage depth. Assembly using Canu gave an assembly length of 823.74 Mb, with a contig N50 of 2.76 Mb. The final size of the assembled A. nanus genome was close to the 889 Mb estimated by k-mer analysis. The gene annotation completeness was evaluated using Benchmarking Universal Single-Copy Orthologs; 1,327 of the 1,440 conserved genes (92.15%) could be found in the A. nanus assembly. Genome annotation revealed that 74.08% of the A. nanus genome is composed of repetitive elements and 53.44% is composed of long terminal repeat elements. We predicted ?37,188 protein-coding genes, of which 96.53% were functionally annotated.The genomic sequences of A. nanus could be a valuable source for comparative genomic analysis in the legume family and will be useful for understanding the phylogenetic relationships of the Thermopsideae and the evolutionary response of plant species to the Qinghai Tibetan Plateau uplift.


July 19, 2019

Adaptation and conservation insights from the koala genome.

The koala, the only extant species of the marsupial family Phascolarctidae, is classified as ‘vulnerable’ due to habitat loss and widespread disease. We sequenced the koala genome, producing a complete and contiguous marsupial reference genome, including centromeres. We reveal that the koala’s ability to detoxify eucalypt foliage may be due to expansions within a cytochrome P450 gene family, and its ability to smell, taste and moderate ingestion of plant secondary metabolites may be due to expansions in the vomeronasal and taste receptors. We characterized novel lactation proteins that protect young in the pouch and annotated immune genes important for response to chlamydial disease. Historical demography showed a substantial population crash coincident with the decline of Australian megafauna, while contemporary populations had biogeographic boundaries and increased inbreeding in populations affected by historic translocations. We identified genetically diverse populations that require habitat corridors and instituting of translocation programs to aid the koala’s survival in the wild.


July 19, 2019

Fern genomes elucidate land plant evolution and cyanobacterial symbioses.

Ferns are the closest sister group to all seed plants, yet little is known about their genomes other than that they are generally colossal. Here, we report on the genomes of Azolla filiculoides and Salvinia cucullata (Salviniales) and present evidence for episodic whole-genome duplication in ferns-one at the base of ‘core leptosporangiates’ and one specific to Azolla. One fern-specific gene that we identified, recently shown to confer high insect resistance, seems to have been derived from bacteria through horizontal gene transfer. Azolla coexists in a unique symbiosis with N2-fixing cyanobacteria, and we demonstrate a clear pattern of cospeciation between the two partners. Furthermore, the Azolla genome lacks genes that are common to arbuscular mycorrhizal and root nodule symbioses, and we identify several putative transporter genes specific to Azolla-cyanobacterial symbiosis. These genomic resources will help in exploring the biotechnological potential of Azolla and address fundamental questions in the evolution of plant life.


July 19, 2019

Population genomics shows no distinction between pathogenic Candida krusei and environmental Pichia kudriavzevii: One species, four names.

We investigated genomic diversity of a yeast species that is both an opportunistic pathogen and an important industrial yeast. Under the name Candida krusei, it is responsible for about 2% of yeast infections caused by Candida species in humans. Bloodstream infections with C. krusei are problematic because most isolates are fluconazole-resistant. Under the names Pichia kudriavzevii, Issatchenkia orientalis and Candida glycerinogenes, the same yeast, including genetically modified strains, is used for industrial-scale production of glycerol and succinate. It is also used to make some fermented foods. Here, we sequenced the type strains of C. krusei (CBS573T) and P. kudriavzevii (CBS5147T), as well as 30 other clinical and environmental isolates. Our results show conclusively that they are the same species, with collinear genomes 99.6% identical in DNA sequence. Phylogenetic analysis of SNPs does not segregate clinical and environmental isolates into separate clades, suggesting that C. krusei infections are frequently acquired from the environment. Reduced resistance of strains to fluconazole correlates with the presence of one gene instead of two at the ABC11-ABC1 tandem locus. Most isolates are diploid, but one-quarter are triploid. Loss of heterozygosity is common, including at the mating-type locus. Our PacBio/Illumina assembly of the 10.8 Mb CBS573T genome is resolved into 5 complete chromosomes, and was annotated using RNAseq support. Each of the 5 centromeres is a 35 kb gene desert containing a large inverted repeat. This species is a member of the genus Pichia and family Pichiaceae (the methylotrophic yeasts clade), and so is only distantly related to other pathogenic Candida species.


July 19, 2019

Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes.

Maize is an important crop with a high level of genome diversity and heterosis. The genome sequence of a typical female line, B73, was previously released. Here, we report a de novo genome assembly of a corresponding male representative line, Mo17. More than 96.4% of the 2,183?Mb assembled genome can be accounted for by 362 scaffolds in ten pseudochromosomes with 38,620 annotated protein-coding genes. Comparative analysis revealed large gene-order and gene structural variations: approximately 10% of the annotated genes were mutually nonsyntenic, and more than 20% of the predicted genes had either large-effect mutations or large structural variations, which might cause considerable protein divergence between the two inbred lines. Our study provides a high-quality reference-genome sequence of an important maize germplasm, and the intraspecific gene order and gene structural variations identified should have implications for heterosis and genome evolution.


July 19, 2019

Identification and analysis of adenine N6-methylation sites in the rice genome.

DNA N6-methyladenine (6mA) is a non-canonical DNA modification that is present at low levels in different eukaryotes1-8, but its prevalence and genomic function in higher plants are unclear. Using mass spectrometry, immunoprecipitation and validation with analysis of single-molecule real-time sequencing, we observed that about 0.2% of all adenines are 6mA methylated in the rice genome. 6mA occurs most frequently at GAGG motifs and is mapped to about 20% of genes and 14% of transposable elements. In promoters, 6mA marks silent genes, but in bodies correlates with gene activity. 6mA overlaps with 5-methylcytosine (5mC) at CG sites in gene bodies and is complementary to 5mC at CHH sites in transposable elements. We show that OsALKBH1 may be potentially involved in 6mA demethylation in rice. The results suggest that 6mA is complementary to 5mC as an epigenomic mark in rice and reinforce a distinct role for 6mA as a gene expression-associated epigenomic mark in eukaryotes.


July 19, 2019

High-quality genome assemblies reveal long non-coding RNAs expressed in ant brains.

Ants are an emerging model system for neuroepigenetics, as embryos with virtually identical genomes develop into different adult castes that display diverse physiology, morphology, and behavior. Although a number of ant genomes have been sequenced to date, their draft quality is an obstacle to sophisticated analyses of epigenetic gene regulation. We reassembled de novo high-quality genomes for two ant species, Camponotus floridanus and Harpegnathos saltator. Using long reads enabled us to span large repetitive regions and improve genome contiguity, leading to comprehensive and accurate protein-coding annotations that facilitated the identification of a Gp-9-like gene as differentially expressed in Harpegnathos castes. The new assemblies also enabled us to annotate long non-coding RNAs in ants, revealing caste-, brain-, and developmental-stage-specific long non-coding RNAs (lncRNAs) in Harpegnathos. These upgraded genomes, along with the new gene annotations, will aid future efforts to identify epigenetic mechanisms of phenotypic and behavioral plasticity in ants. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.


July 19, 2019

Deep genome annotation of the opportunistic human pathogen Streptococcus pneumoniae D39.

A precise understanding of the genomic organization into transcriptional units and their regulation is essential for our comprehension of opportunistic human pathogens and how they cause disease. Using single-molecule real-time (PacBio) sequencing we unambiguously determined the genome sequence of Streptococcus pneumoniae strain D39 and revealed several inversions previously undetected by short-read sequencing. Significantly, a chromosomal inversion results in antigenic variation of PhtD, an important surface-exposed virulence factor. We generated a new genome annotation using automated tools, followed by manual curation, reflecting the current knowledge in the field. By combining sequence-driven terminator prediction, deep paired-end transcriptome sequencing and enrichment of primary transcripts by Cappable-Seq, we mapped 1015 transcriptional start sites and 748 termination sites. We show that the pneumococcal transcriptional landscape is complex and includes many secondary, antisense and internal promoters. Using this new genomic map, we identified several new small RNAs (sRNAs), RNA switches (including sixteen previously misidentified as sRNAs), and antisense RNAs. In total, we annotated 89 new protein-encoding genes, 34 sRNAs and 165 pseudogenes, bringing the S. pneumoniae D39 repertoire to 2146 genetic elements. We report operon structures and observed that 9% of operons are leaderless. The genome data are accessible in an online resource called PneumoBrowse (https://veeninglab.com/pneumobrowse) providing one of the most complete inventories of a bacterial genome to date. PneumoBrowse will accelerate pneumococcal research and the development of new prevention and treatment strategies.


July 19, 2019

A near complete, chromosome-scale assembly of the black raspberry (Rubus occidentalis) genome.

The fragmented nature of most draft plant genomes has hindered downstream gene discovery, trait mapping for breeding, and other functional genomics applications. There is a pressing need to improve or finish draft plant genome assemblies.Here, we present a chromosome-scale assembly of the black raspberry genome using single-molecule real-time Pacific Biosciences sequencing and high-throughput chromatin conformation capture (Hi-C) genome scaffolding. The updated V3 assembly has a contig N50 of 5.1 Mb, representing an ~200-fold improvement over the previous Illumina-based version. Each of the 235 contigs was anchored and oriented into seven chromosomes, correcting several major misassemblies. Black raspberry V3 contains 47 Mb of new sequences including large pericentromeric regions and thousands of previously unannotated protein-coding genes. Among the new genes are hundreds of expanded tandem gene arrays that were collapsed in the Illumina-based assembly. Detailed comparative genomics with the high-quality V4 woodland strawberry genome (Fragaria vesca) revealed near-perfect 1:1 synteny with dramatic divergence in tandem gene array composition. Lineage-specific tandem gene arrays in black raspberry are related to agronomic traits such as disease resistance and secondary metabolite biosynthesis.The improved resolution of tandem gene arrays highlights the need to reassemble these highly complex and biologically important regions in draft plant genomes. The updated, high-quality black raspberry reference genome will be useful for comparative genomics across the horticulturally important Rosaceae family and enable the development of marker assisted breeding in Rubus.


July 19, 2019

Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease.

Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 ‘GGGGCC’ (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9.Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinION was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8× coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained >?800× coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual’s repeat region was >?99% G4C2 content, though we cannot rule out small interruptions.Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.


July 19, 2019

Characterization of a human-specific tandem repeat associated with bipolar disorder and schizophrenia.

Bipolar disorder (BD) and schizophrenia (SCZ) are highly heritable diseases that affect more than 3% of individuals worldwide. Genome-wide association studies have strongly and repeatedly linked risk for both of these neuropsychiatric diseases to a 100 kb interval in the third intron of the human calcium channel gene CACNA1C. However, the causative mutation is not yet known. We have identified a human-specific tandem repeat in this region that is composed of 30 bp units, often repeated hundreds of times. This large tandem repeat is unstable using standard polymerase chain reaction and bacterial cloning techniques, which may have resulted in its incorrect size in the human reference genome. The large 30-mer repeat region is polymorphic in both size and sequence in human populations. Particular sequence variants of the 30-mer are associated with risk status at several flanking single-nucleotide polymorphisms in the third intron of CACNA1C that have previously been linked to BD and SCZ. The tandem repeat arrays function as enhancers that increase reporter gene expression in a human neural progenitor cell line. Different human arrays vary in the magnitude of enhancer activity, and the 30-mer arrays associated with increased psychiatric disease risk status have decreased enhancer activity. Changes in the structure and sequence of these arrays likely contribute to changes in CACNA1C function during human evolution and may modulate neuropsychiatric disease risk in modern human populations. Copyright © 2018. Published by Elsevier Inc.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.