Duplication Archives - Page 21 of 35

September 22, 2019

Whole genome sequencing of greater amberjack (Seriola dumerili) for SNP identification on aligned scaffolds and genome structural variation analysis using parallel resequencing

Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8?Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence.

September 22, 2019

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we develop a reproducible, cloud-based pipeline to integrate multiple sequencing datasets and form benchmark calls, enabling application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. These new genomes’ broad, open consent with few restrictions on availability of samples and data is enabling a uniquely diverse array of applications. Our new methods produce 17% more high-confidence SNPs, 176% more indels, and 12% larger regions than our previously published calls. To demonstrate that these calls can be used for accurate benchmarking, we compare other high-quality callsets to ours (e.g., Illumina Platinum Genomes), and we demonstrate that the majority of discordant calls are errors in the other callsets, We also highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. We show that benchmarking tools from the Global Alliance for Genomics and Health can be used with our calls to stratify performance metrics by variant type and genome context and elucidate strengths and weaknesses of a method.

September 22, 2019

Repeat-driven generation of antigenic diversity in a major human pathogen, Trypanosoma cruzi

Trypanosoma cruzi, a zoonotic kinetoplastid protozoan with a complex genome, is the causative agent of American trypanosomiasis (Chagas disease). The parasite uses a highly diverse repertoire of surface molecules, with roles in cell invasion, immune evasion and pathogenesis. Thus far, the genomic regions containing these genes have been impossible to resolve and it has been impossible to study the structure and function of the several thousand repetitive genes encoding the surface molecules of the parasite. We here present an improved genome assembly of a T. cruzi clade I (TcI) strain using high coverage PacBio single molecule sequencing, together with Illumina sequencing of 34 T. cruzi TcI isolates and clones from different geographic locations, sample sources and clinical outcomes. Resolution of the surface molecule gene structure reveals an unusual duality in the organisation of the parasite genome, a core genomic region syntenous with related protozoa flanked by unique and highly plastic subtelomeric regions encoding surface antigens. The presence of abundant interspersed retrotransposons in the subtelomeres suggests that these elements are involved in a recombination mechanism for the generation of antigenic variation and evasion of the host immune response. The comparative genomic analysis of the cohort of TcI strains revealed multiple cases of such recombination events involving surface molecule genes and has provided new insights into T. cruzi population structure.

September 22, 2019

Ploidy variation in Kluyveromyces marxianus separates dairy and non-dairy isolates.

Kluyveromyces marxianus is traditionally associated with fermented dairy products, but can also be isolated from diverse non-dairy environments. Because of thermotolerance, rapid growth and other traits, many different strains are being developed for food and industrial applications but there is, as yet, little understanding of the genetic diversity or population genetics of this species. K. marxianus shows a high level of phenotypic variation but the only phenotype that has been clearly linked to a genetic polymorphism is lactose utilisation, which is controlled by variation in the LAC12 gene. The genomes of several strains have been sequenced in recent years and, in this study, we sequenced a further nine strains from different origins. Analysis of the Single Nucleotide Polymorphisms (SNPs) in 14 strains was carried out to examine genome structure and genetic diversity. SNP diversity in K. marxianus is relatively high, with up to 3% DNA sequence divergence between alleles. It was found that the isolates include haploid, diploid, and triploid strains, as shown by both SNP analysis and flow cytometry. Diploids and triploids contain long genomic tracts showing loss of heterozygosity (LOH). All six isolates from dairy environments were diploid or triploid, whereas 6 out 7 isolates from non-dairy environment were haploid. This also correlated with the presence of functional LAC12 alleles only in dairy haplotypes. The diploids were hybrids between a non-dairy and a dairy haplotype, whereas triploids included three copies of a dairy haplotype.

September 22, 2019

Dynamic evolution of a-gliadin prolamin gene family in homeologous genomes of hexaploid wheat.

Wheat Gli-2 loci encode complex groups of a-gliadin prolamins that are important for breadmaking, but also major triggers of celiac disease (CD). Elucidation of a-gliadin evolution provides knowledge to produce wheat with better end-use properties and reduced immunogenic potential. The Gli-2 loci contain a large number of tandemly duplicated genes and highly repetitive DNA, making sequence assembly of their genomic regions challenging. Here, we constructed high-quality sequences spanning the three wheat homeologous a-gliadin loci by aligning PacBio-based sequence contigs with BioNano genome maps. A total of 47 a-gliadin genes were identified with only 26 encoding intact full-length protein products. Analyses of a-gliadin loci and phylogenetic tree reconstruction indicate significant duplications of a-gliadin genes in the last ~2.5 million years after the divergence of the A, B and D genomes, supporting its rapid lineage-independent expansion in different Triticeae genomes. We showed that dramatic divergence in expression of a-gliadin genes could not be attributed to sequence variations in the promoter regions. The study also provided insights into the evolution of CD epitopes and identified a single indel event in the hexaploid wheat D genome that likely resulted in the generation of the highly toxic 33-mer CD epitope.

September 22, 2019

Targeted sequencing by gene synteny, a new strategy for polyploid species: sequencing and physical structure of a complex sugarcane region.

Sugarcane exhibits a complex genome mainly due to its aneuploid nature and high ploidy level, and sequencing of its genome poses a great challenge. Closely related species with well-assembled and annotated genomes can be used to help assemble complex genomes. Here, a stable quantitative trait locus (QTL) related to sugar accumulation in sorghum was successfully transferred to the sugarcane genome. Gene sequences related to this QTL were identified in silico from sugarcane transcriptome data, and molecular markers based on these sequences were developed to select bacterial artificial chromosome (BAC) clones from the sugarcane variety SP80-3280. Sixty-eight BAC clones containing at least two gene sequences associated with the sorghum QTL were sequenced using Pacific Biosciences (PacBio) technology. Twenty BAC sequences were found to be related to the syntenic region, of which nine were sufficient to represent this region. The strategy we propose is called “targeted sequencing by gene synteny,” which is a simpler approach to understanding the genome structure of complex genomic regions associated with traits of interest.

September 22, 2019

Comparative genomic insights into endofungal lifestyles of two bacterial endosymbionts, Mycoavidus cysteinexigens and Burkholderia rhizoxinica.

Endohyphal bacteria (EHB), dwelling within fungal hyphae, markedly affect the growth and metabolic potential of their hosts. To date, two EHB belonging to the family Burkholderiaceae have been isolated and characterized as new taxa, Burkholderia rhizoxinica (HKI 454T) and Mycoavidus cysteinexigens (B1-EBT), in Japan. Metagenome sequencing was recently reported for Mortierella elongata AG77 together with its endosymbiont M. cysteinexigens (Mc-AG77) from a soil/litter sample in the USA. In the present study, we elucidated the complete genome sequence of B1-EBT and compared it with those of Mc-AG77 and HKI 454T. The genomes of B1-EBT and Mc-AG77 contained a higher level of prophage sequences and were markedly smaller than that of HKI 454T. Although the B1-EBT and Mc-AG77 genomes lacked the chitinolytic enzyme genes responsible for invasion into fungal cells, they contained several predicted toxin-antitoxin systems including an insecticidal toxin complex and PIN domain imposing an addiction-like mechanism essential for endohyphal growth control during host colonization. Despite the different host fungi, the alignment of amino acid sequences showed that the HKI 454T genome consisted of 1,265 (32.6%) and 1,221 (31.5%) orthologous coding sequences (CDSs) with those of B1-EBT and Mc-AG77, respectively. This comparative study of three phylogenetically associated endosymbionts has provided insights into their origin and evolution, and suggests the later bacterial invasion and adaptation of B1-EBT to its host metabolism.

September 22, 2019

Challenges of Francisella classification exemplified by an atypical clinical isolate.

The accumulation of sequenced Francisella strains has made it increasingly apparent that the 16S rRNA gene alone is not enough to stratify the Francisella genus into precise and clinically useful classifications. Continued whole-genome sequencing of isolates will provide a larger base of knowledge for targeted approaches with broad applicability. Additionally, examination of genomic information on a case-by-case basis will help resolve outstanding questions regarding strain stratification. We report the complete genome sequence of a clinical isolate, designated here as F. novicida-like strain TCH2015, acquired from the lymph node of a 6-year-old male. Two features were atypical for F. novicida: exhibition of functional oxidase activity and additional gene content, including proposed virulence determinants. These differences, which could potentially impact virulence and clinical diagnosis, emphasize the need for more comprehensive methods to profile Francisella isolates. This study highlights the value of whole-genome sequencing, which will lead to a more robust database of environmental and clinical genomes and inform strategies to improve detection and classification of Francisella strains. Copyright © 2017 Elsevier Inc. All rights reserved.

September 22, 2019

Genetic basis of chromosomally-encoded mcr-1 gene.

Compared with plasmid-borne mcr-1, the occurrence of chromosomally-encoded mcr-1 is rare although it has been reported in several cases. This study aimed to investigate the genetic features of chromosomally-encoded mcr-1 among Escherichia coli strains as well as the potential genetic basis governing mobilisation of mcr-1 in bacterial chromosomes. The genome sequences of 16 E. coli strains containing a chromosomal mcr-1 gene were obtained and analysed. Phylogenetic and whole-genome sequencing (WGS) analysis demonstrated that mcr-1 was associated with four major types of genetic arrangements, namely ISApl1-mcr1-orf, Tn6330, complex Tn6330 and ?Tn6330 in chromosomes of genetically unrelated E. coli strains. The mcr-1-carrying mobile elements were shown to insert into the AT-rich region, which was also the case for ISApl1. Analysis of complete E. coli genome sequences showed that there were multiple copies of ISApl1 present in E. coli chromosomes that also carried mcr-1, whilst all mcr-1-negative chromosomes were absent of any copy of ISApl1, suggesting the strong association of ISApl1 and mcr-1. Insertion of ISApl1 into E. coli chromosomes may be a prerequisite for the insertion of mcr-1-carrying mobile elements. Insertion of mcr-1 into E. coli chromosomes would enable it to become intrinsically resistant, which is expected to become more prevalent. Policy on the prudent use of colistin both in veterinary and clinical settings should be imposed globally to further prevent dissemination of mcr-1 in E. coli and other bacterial pathogens. Copyright © 2017 Elsevier B.V. and International Society of Chemotherapy. All rights reserved.

September 22, 2019

Extensive gene amplification as a mechanism for piperacillin-tazobactam resistance in Escherichia coli.

Although the TEM-1 ß-lactamase (BlaTEM-1) hydrolyzes penicillins and narrow-spectrum cephalosporins, organisms expressing this enzyme are typically susceptible to ß-lactam/ß-lactamase inhibitor combinations such as piperacillin-tazobactam (TZP). However, our previous work led to the discovery of 28 clinical isolates of Escherichia coli resistant to TZP that contained only blaTEM-1 One of these isolates, E. coli 907355, was investigated further in this study. E. coli 907355 exhibited significantly higher ß-lactamase activity and BlaTEM-1 protein levels when grown in the presence of subinhibitory concentrations of TZP. A corresponding TZP-dependent increase in blaTEM-1 copy number was also observed, with as many as 113 copies of the gene detected per cell. These results suggest that TZP treatment promotes an increase in blaTEM-1 gene dosage, allowing BlaTEM-1 to reach high enough levels to overcome inactivation by the available tazobactam in the culture. To better understand the nature of the blaTEM-1 copy number proliferation, whole-genome sequence (WGS) analysis was performed on E. coli 907355 in the absence and presence of TZP. The WGS data revealed that the blaTEM-1 gene is located in a 10-kb genomic resistance module (GRM) that contains multiple resistance genes and mobile genetic elements. The GRM was found to be tandemly repeated at least 5 times within a p1ESCUM/p1ECUMN-like plasmid when bacteria were grown in the presence of TZP.IMPORTANCE Understanding how bacteria acquire resistance to antibiotics is essential for treating infected patients effectively, as well as preventing the spread of resistant organisms. In this study, a clinical isolate of E. coli was identified that dedicated more than 15% of its genome toward tandem amplification of a ~10-kb resistance module, allowing it to escape antibiotic-mediated killing. Our research is significant in that it provides one possible explanation for clinical isolates that exhibit discordant behavior when tested for antibiotic resistance by different phenotypic methods. Our research also shows that GRM amplification is difficult to detect by short-read WGS technologies. Analysis of raw long-read sequence data was required to confirm GRM amplification as a mechanism of antibiotic resistance. Copyright © 2018 Schechter et al.

September 22, 2019

Repeated evolution of self-compatibility for reproductive assurance.

Sexual reproduction in eukaryotes requires the fusion of two compatible gametes of opposite sexes or mating types. To meet the challenge of finding a mating partner with compatible gametes, evolutionary mechanisms such as hermaphroditism and self-fertilization have repeatedly evolved. Here, by combining the insights from comparative genomics, computer simulations and experimental evolution in fission yeast, we shed light on the conditions promoting separate mating types or self-compatibility by mating-type switching. Analogous to multiple independent transitions between switchers and non-switchers in natural populations mediated by structural genomic changes, novel switching genotypes readily evolved under selection in the experimental populations. Detailed fitness measurements accompanied by computer simulations show the benefits and costs of switching during sexual and asexual reproduction, governing the occurrence of both strategies in nature. Our findings illuminate the trade-off between the benefits of reproductive assurance and its fitness costs under benign conditions facilitating the evolution of self-compatibility.

September 22, 2019

Comparative genomics of smut pathogens: Insights from orphans and positively selected genes into host specialization.

Host specialization is a key evolutionary process for the diversification and emergence of new pathogens. However, the molecular determinants of host range are poorly understood. Smut fungi are biotrophic pathogens that have distinct and narrow host ranges based on largely unknown genetic determinants. Hence, we aimed to expand comparative genomics analyses of smut fungi by including more species infecting different hosts and to define orphans and positively selected genes to gain further insights into the genetics basis of host specialization. We analyzed nine lineages of smut fungi isolated from eight crop and non-crop hosts: maize, barley, sugarcane, wheat, oats, Zizania latifolia (Manchurian rice), Echinochloa colona (a wild grass), and Persicaria sp. (a wild dicot plant). We assembled two new genomes: Ustilago hordei (strain Uhor01) isolated from oats and U. tritici (strain CBS 119.19) isolated from wheat. The smut genomes were of small sizes, ranging from 18.38 to 24.63 Mb. U. hordei species experienced genome expansions due to the proliferation of transposable elements and the amount of these elements varied among the two strains. Phylogenetic analysis confirmed that Ustilago is not a monophyletic genus and, furthermore, detected misclassification of the U. tritici specimen. The comparison between smut pathogens of crop and non-crop hosts did not reveal distinct signatures, suggesting that host domestication did not play a dominant role in shaping the evolution of smuts. We found that host specialization in smut fungi likely has a complex genetic basis: different functional categories were enriched in orphans and lineage-specific selected genes. The diversification and gain/loss of effector genes are probably the most important determinants of host specificity.

September 22, 2019

Genome sequence, assembly and characterization of two Metschnikowia fructicola strains used as biocontrol agents of postharvest diseases.

The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term) and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue.

September 22, 2019

Plasmid-mediated quinolone resistance in Shigella flexneriisolated from macaques.

Non-human primates (NHPs) for biomedical research are commonly infected with Shigella spp. that can cause acute dysentery or chronic episodic diarrhea. These animals are often prophylactically and clinically treated with quinolone antibiotics to eradicate these possible infections. However, chromosomally- and plasmid-mediated antibiotic resistance has become an emerging concern for species in the family Enterobacteriaceae. In this study, five individual isolates of multi-drug resistant Shigella flexneri were isolated from the feces of three macaques. Antibiotic susceptibility testing confirmed resistance or decreased susceptibility to ampicillin, amoxicillin-clavulanic acid, cephalosporins, gentamicin, tetracycline, ciprofloxacin, enrofloxacin, levofloxacin, and nalidixic acid. S. flexneri isolates were susceptible to trimethoprim-sulfamethoxazole, and this drug was used to eradicate infection in two of the macaques. Plasmid DNA from all isolates was positive for the plasmid-encoded quinolone resistance gene qnrS, but not qnrA and qnrB. Conjugation and transformation of plasmid DNA from several S. flexneri isolates into antibiotic-susceptible Escherichia coli strains conferred the recipients with resistance or decreased susceptibility to quinolones and beta-lactams. Genome sequencing of two representative S. flexneri isolates identified the qnrS gene on a plasmid-like contig. These contigs showed >99% homology to plasmid sequences previously characterized from quinolone-resistant Shigella flexneri 2a and Salmonella enterica strains. Other antibiotic resistance genes and virulence factor genes were also identified in chromosome and plasmid sequences in these genomes. The findings from this study indicate macaques harbor pathogenic S. flexneri strains with chromosomally- and plasmid-encoded antibiotic resistance genes. To our knowledge, this is the first report of plasmid-mediated quinolone resistance in S. flexneri isolated from NHPs and warrants isolation and antibiotic testing of enteric pathogens before treating macaques with quinolones prophylactically or therapeutically.

September 22, 2019

A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) ‘Hongyang’ draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models.A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within ‘Hongyang’ The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned ‘Hort16A’ cDNAs and comparing with the predicted protein models for Red5 and both the original ‘Hongyang’ assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised ‘Hongyang’ annotation, respectively, compared with 90.9% to the Red5 models.Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.

Auto Tag: Duplication

Whole genome sequencing of greater amberjack (Seriola dumerili) for SNP identification on aligned scaffolds and genome structural variation analysis using parallel resequencing

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Repeat-driven generation of antigenic diversity in a major human pathogen, Trypanosoma cruzi

Ploidy variation in Kluyveromyces marxianus separates dairy and non-dairy isolates.

Dynamic evolution of a-gliadin prolamin gene family in homeologous genomes of hexaploid wheat.

Targeted sequencing by gene synteny, a new strategy for polyploid species: sequencing and physical structure of a complex sugarcane region.

Comparative genomic insights into endofungal lifestyles of two bacterial endosymbionts, Mycoavidus cysteinexigens and Burkholderia rhizoxinica.

Challenges of Francisella classification exemplified by an atypical clinical isolate.

Genetic basis of chromosomally-encoded mcr-1 gene.

Extensive gene amplification as a mechanism for piperacillin-tazobactam resistance in Escherichia coli.

Repeated evolution of self-compatibility for reproductive assurance.

Comparative genomics of smut pathogens: Insights from orphans and positively selected genes into host specialization.

Genome sequence, assembly and characterization of two Metschnikowia fructicola strains used as biocontrol agents of postharvest diseases.

Plasmid-mediated quinolone resistance in Shigella flexneriisolated from macaques.

A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert