AGBT 2015 Workshop Presentation Slides: Dick McCombie from Cold Spring Harbor Laboratory (CSHL) described the use of SMRT Sequencing to analyze a breast cancer cell line with complex genomic events. Still ongoing, the project has already uncovered structural variants missed by other sequencers.
Comprehensive genome and transcriptome structural analysis of a breast cancer cell line using PacBio long read sequencing
Genomic instability is one of the hallmarks of cancer, leading to widespread copy number variations, chromosomal fusions, and other structural variations. The breast cancer cell line SK-BR-3 is an important model for HER2+ breast cancers, which are among the most aggressive forms of the disease and affect one in five cases. Through short read sequencing, copy number arrays, and other technologies, the genome of SK-BR-3 is known to be highly rearranged with many copy number variations, including an approximately twenty-fold amplification of the HER2 oncogene. However, these technologies cannot precisely characterize the nature and context of the identified genomic events and other important mutations may be missed altogether because of repeats, multi-mapping reads, and the failure to reliably anchor alignments to both sides of a variation. To address these challenges, we have sequenced SK-BR-3 using PacBio long read technology. Using the new P6-C4 chemistry, we generated more than 70X coverage of the genome with average read lengths of 9-13kb (max: 71kb). Using Lumpy for split-read alignment analysis, as well as our novel assembly-based algorithms for finding complex variants, we have developed a detailed map of structural variations in this cell line. Taking advantage of the newly identified breakpoints and combining these with copy number assignments, we have developed an algorithm to reconstruct the mutational history of this cancer genome. From this we have discovered a complex series of nested duplications and translocations between chr17 and chr8, two of the most frequent translocation partners in primary breast cancers, resulting in amplification of HER2. We have also carried out full-length transcriptome sequencing using PacBio’s Iso-Seq technology, which has revealed a number of previously unrecognized gene fusions and isoforms. Combining long-read genome and transcriptome sequencing technologies enables an in-depth analysis of how changes in the genome affect the transcriptome, including how gene fusions are created across multiple chromosomes. This analysis has established the most complete cancer reference genome available to date, and is already opening the door to applying long-read sequencing to patient samples with complex genome structures.
Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.
Identification of Initial Colonizing Bacteria in Dental Plaques from Young Adults Using Full-Length 16S rRNA Gene Sequencing.
Development of dental plaque begins with the adhesion of salivary bacteria to the acquired pellicle covering the tooth surface. In this study, we collected in vivo dental plaque formed on hydroxyapatite disks for 6 h from 74 young adults and identified initial colonizing taxa based on full-length 16S rRNA gene sequences. A long-read, single-molecule sequencer, PacBio Sequel, provided 100,109 high-quality full-length 16S rRNA gene sequence reads from the early plaque microbiota, which were assigned to 90 oral bacterial taxa. The microbiota obtained from every individual mostly comprised the 21 predominant taxa with the maximum relative abundance of over 10% (95.8?±?6.2%, mean ± SD), which included Streptococcus species as well as nonstreptococcal species. A hierarchical cluster analysis of their relative abundance distribution suggested three major patterns of microbiota compositions: a Streptococcus mitis/Streptococcus sp. HMT-423-dominant profile, a Neisseria sicca/Neisseria flava/Neisseria mucosa-dominant profile, and a complex profile with high diversity. No notable variations in the community structures were associated with the dental caries status, although the total bacterial amounts were larger in the subjects with a high number of caries-experienced teeth (=8) than in those with no or a low number of caries-experienced teeth. Our results revealed the bacterial taxa primarily involved in early plaque formation on hydroxyapatite disks in young adults.IMPORTANCE Selective attachment of salivary bacteria to the tooth surface is an initial and repetitive phase in dental plaque development. We employed full-length 16S rRNA gene sequence analysis with a high taxonomic resolution using a third-generation sequencer, PacBio Sequel, to determine the bacterial composition during early plaque formation in 74 young adults accurately and in detail. The results revealed 21 bacterial taxa primarily involved in early plaque formation on hydroxyapatite disks in young adults, which include several streptococcal species as well as nonstreptococcal species, such as Neisseria sicca/Nflava/Nmucosa and Rothia dentocariosa Given that no notable variations in the microbiota composition were associated with the dental caries status, the maturation process, rather than the specific bacterial species that are the initial colonizers, is likely to play an important role in the development of dysbiotic microbiota associated with dental caries. Copyright © 2019 Ihara et al.
The antibody repertoire of Bos taurus is characterized by a subset of variable heavy (VH) chain regions with ultralong third complementarity determining regions (CDR3) which, compared to other species, can provide a potent response to challenging antigens like HIV env. These unusual CDR3 can range to over seventy highly diverse amino acids in length and form unique ß-ribbon ‘stalk’ and disulfide bonded ‘knob’ structures, far from the typical antigen binding site. The genetic components and processes for forming these unusual cattle antibody VH CDR3 are not well understood. Here we analyze sequences of Bos taurus antibody VH domains and find that the subset with ultralong CDR3 exclusively uses a single variable gene, IGHV1-7 (VHBUL) rearranged to the longest diversity gene, IGHD8-2. An eight nucleotide duplication at the 3′ end of IGHV1-7 encodes a longer V-region producing an extended F ß-strand that contributes to the stalk in a rearranged CDR3. A low amino acid variability was observed in CDR1 and CDR2, suggesting that antigen binding for this subset most likely only depends on the CDR3. Importantly a novel, potentially AID mediated, deletional diversification mechanism of the B. taurus VH ultralong CDR3 knob was discovered, in which interior codons of the IGHD8-2 region are removed while maintaining integral structural components of the knob and descending strand of the stalk in place. These deletions serve to further diversify cysteine positions, and thus disulfide bonded loops. Hence, both germline and somatic genetic factors and processes appear to be involved in diversification of this structurally unusual cattle VH ultralong CDR3 repertoire.
a-Difluoromethylornithine reduces gastric carcinogenesis by causing mutations in Helicobacter pylori cagY.
Infection by Helicobacter pylori is the primary cause of gastric adenocarcinoma. The most potent H. pylori virulence factor is cytotoxin-associated gene A (CagA), which is translocated by a type 4 secretion system (T4SS) into gastric epithelial cells and activates oncogenic signaling pathways. The gene cagY encodes for a key component of the T4SS and can undergo gene rearrangements. We have shown that the cancer chemopreventive agent a-difluoromethylornithine (DFMO), known to inhibit the enzyme ornithine decarboxylase, reduces H. pylori-mediated gastric cancer incidence in Mongolian gerbils. In the present study, we questioned whether DFMO might directly affect H. pylori pathogenicity. We show that H. pylori output strains isolated from gerbils treated with DFMO exhibit reduced ability to translocate CagA in gastric epithelial cells. Further, we frequently detected genomic modifications in the middle repeat region of the cagY gene of output strains from DFMO-treated animals, which were associated with alterations in the CagY protein. Gerbils did not develop carcinoma when infected with a DFMO output strain containing rearranged cagY or the parental strain in which the wild-type cagY was replaced by cagY with DFMO-induced rearrangements. Lastly, we demonstrate that in vitro treatment of H. pylori by DFMO induces oxidative DNA damage, expression of the DNA repair enzyme MutS2, and mutations in cagY, demonstrating that DFMO directly affects genomic stability. Deletion of mutS2 abrogated the ability of DFMO to induce cagY rearrangements directly. In conclusion, DFMO-induced oxidative stress in H. pylori leads to genomic alterations and attenuates virulence.
Evolution of Goat’s Rue Rhizobia (Neorhizobium galegae): Analysis of Polymorphism of the Nitrogen Fixation and Nodule Formation Genes
The goat’s rue rhizobia (Neorhizobium galegae) represent a convenient model to study the evolution and speciation of symbiotic bacteria. This rhizobial species is composed of two biovars (bv. orientalis and bv. officinalis), which form N2-fixing nodules with certain species of goat’s rue (Galega orientalis and G. officinalis). The cross-inoculation between them results in the formation of nodules unable to fix nitrogen. On the basis of the data on the whole-genome sequencing, we studied the nucleotide polymorphism of 11 N. galegae strains isolated from the North Caucasus ecosystems, where G. orientalis has higher diversity than G. officinalis. The low level of differences in the polymorphism within the group of the sym genes in comparison with the nonsymbiotic genes can be associated with the active participation of host plants in the evolution of rhizobia. The intragenic polymorphism of bv. orientalis proved to be significantly higher than that of bv. officinalis. The level of polymorphism of nonsymbiotic genes was lower than that of the symbiotic genes, which are functionally more homogeneous. The divergence of the nitrogen fixation genes (nif/fix) is more pronounced than that of the nodule formation genes (nod) in the N. galegae biovars. These facts indicate the leading role of the host-specific nitrogen fixation in the evolution of the studied rizhobial species.
Structural and functional characterization of an intradiol ring-cleavage dioxygenase from the polyphagous spider mite herbivore Tetranychus urticae Koch.
Genome analyses of the polyphagous spider mite herbivore Tetranychus urticae (two-spotted spider mite) revealed the presence of a set of 17 genes that code for secreted proteins belonging to the “intradiol dioxygenase-like” subgroup. Phylogenetic analyses indicate that this novel enzyme family has been acquired by horizontal gene transfer. In order to better understand the role of these proteins in T. urticae, we have structurally and functionally characterized one paralog (tetur07g02040). It was demonstrated that this protein is indeed an intradiol ring-cleavage dioxygenase, as the enzyme is able to cleave catechol between two hydroxyl-groups using atmospheric dioxygen. The enzyme was characterized functionally and structurally. The active site of the T. urticae enzyme contains an Fe3+ cofactor that is coordinated by two histidine and two tyrosine residues, an arrangement that is similar to those observed in bacterial homologs. However, the active site is significantly more solvent exposed than in bacterial proteins. Moreover, the mite enzyme is monomeric, while almost all structurally characterized bacterial homologs form oligomeric assemblies. Tetur07g02040 is not only the first spider mite dioxygenase that has been characterized at the molecular level, but is also the first structurally characterized intradiol ring-cleavage dioxygenase originating from a eukaryote.Copyright © 2018 Elsevier Ltd. All rights reserved.
Insights into the evolution and drug susceptibility of Babesia duncani from the sequence of its mitochondrial and apicoplast genomes.
Babesia microti and Babesia duncani are the main causative agents of human babesiosis in the United States. While significant knowledge about B. microti has been gained over the past few years, nothing is known about B. duncani biology, pathogenesis, mode of transmission or sensitivity to currently recommended therapies. Studies in immunocompetent wild type mice and hamsters have shown that unlike B. microti, infection with B. duncani results in severe pathology and ultimately death. The parasite factors involved in B. duncani virulence remain unknown. Here we report the first known completed sequence and annotation of the apicoplast and mitochondrial genomes of B. duncani. We found that the apicoplast genome of this parasite consists of a 34?kb monocistronic circular molecule encoding functions that are important for apicoplast gene transcription as well as translation and maturation of the organelle’s proteins. The mitochondrial genome of B. duncani consists of a 5.9?kb monocistronic linear molecule with two inverted repeats of 48?bp at both ends. Using the conserved cytochrome b (Cytb) and cytochrome c oxidase subunit I (coxI) proteins encoded by the mitochondrial genome, phylogenetic analysis revealed that B. duncani defines a new lineage among apicomplexan parasites distinct from B. microti, Babesia bovis, Theileria spp. and Plasmodium spp. Annotation of the apicoplast and mitochondrial genomes of B. duncani identified targets for development of effective therapies. Our studies set the stage for evaluation of the efficacy of these drugs alone or in combination against B. duncani in culture as well as in animal models.Copyright © 2018 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.