Tandem repeat Archives - Page 5 of 6

July 7, 2019

Structure of the type IV secretion system in different strains of Anaplasma phagocytophilum.

Anaplasma phagocytophilum is an intracellular organism in the Order Rickettsiales that infects diverse animal species and is causing an emerging disease in humans, dogs and horses. Different strains have very different cell tropisms and virulence. For example, in the U.S., strains have been described that infect ruminants but not dogs or rodents. An intriguing question is how the strains of A. phagocytophilum differ and what different genome loci are involved in cell tropisms and/or virulence. Type IV secretion systems (T4SS) are responsible for translocation of substrates across the cell membrane by mechanisms that require contact with the recipient cell. They are especially important in organisms such as the Rickettsiales which require T4SS to aid colonization and survival within both mammalian and tick vector cells. We determined the structure of the T4SS in 7 strains from the U.S. and Europe and revised the sequence of the repetitive virB6 locus of the human HZ strain.Although in all strains the T4SS conforms to the previously described split loci for vir genes, there is great diversity within these loci among strains. This is particularly evident in the virB2 and virB6 which are postulated to encode the secretion channel and proteins exposed on the bacterial surface. VirB6-4 has an unusual highly repetitive structure and can have a molecular weight greater than 500,000. For many of the virs, phylogenetic trees position A. phagocytophilum strains infecting ruminants in the U.S. and Europe distant from strains infecting humans and dogs in the U.S.Our study reveals evidence of gene duplication and considerable diversity of T4SS components in strains infecting different animals. The diversity in virB2 is in both the total number of copies, which varied from 8 to 15 in the herein characterized strains, and in the sequence of each copy. The diversity in virB6 is in the sequence of each of the 4 copies in the single locus and the presence of varying numbers of repetitive units in virB6-3 and virB6-4. These data suggest that the T4SS should be investigated further for a potential role in strain virulence of A. phagocytophilum.

July 7, 2019

Structure and evolution of the filaggrin gene repeated region in primates

The evolutionary dynamics of repeat sequences is quite complex, with some duplicates never having differentiated from each other. Two models can explain the complex evolutionary process for repeated genes—concerted and birth-and-death, of which the latter is driven by duplications maintained by selection. Copy number variations caused by random duplications and losses in repeat regions may modulate molecular pathways and therefore affect phenotypic characteristics in a population, resulting in individuals that are able to adapt to new environments. In this study, we investigated the filaggrin gene (FLG), which codes for filaggrin—an important component of the outer layers of mammalian skin—and contains tandem repeats that exhibit copy number variation between and within species. To examine which model best fits the evolutionary pathway for the complete tandem repeats within a single exon of FLG, we determined the repeat sequences in crab-eating macaque (Macaca fascicularis), orangutan (Pongo abelii), gorilla (Gorilla gorilla), and chimpanzee (Pan troglodytes) and compared these with the sequence in human (Homo sapiens).

July 7, 2019

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual.The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.

July 7, 2019

Combination of short-read, long-read and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications.

Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and telomeric regions it influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly, LR) and single-molecule restriction maps (optical map assembly, OM). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing mis-assemblies. Combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using genome-wide population re-sequencing data, we estimated the population-scaled recombination rate (?) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin, and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three independent technologies, our results highlight the importance of adding a layer of information on genome structure inaccessible to each approach independently. Published by Cold Spring Harbor Laboratory Press.

July 7, 2019

Extremely low genomic diversity of Rickettsia japonica distributed in Japan.

Rickettsiae are obligate intracellular bacteria that have small genomes as a result of reductive evolution. Many Rickettsia species of the spotted fever group (SFG) cause tick-borne diseases known as “spotted fevers”. The life cycle of SFG rickettsiae is closely associated with that of the tick, which is generally thought to act as a bacterial vector and reservoir that maintains the bacterium through transstadial and transovarial transmission. Each SFG member is thought to have adapted to a specific tick species, thus restricting the bacterial distribution to a relatively limited geographic region. These unique features of SFG rickettsiae allow investigation of how the genomes of such biologically and ecologically specialized bacteria evolve after genome reduction and the types of population structures that are generated. Here, we performed a nationwide, high-resolution phylogenetic analysis of Rickettsia japonica, an etiological agent of Japanese spotted fever that is distributed in Japan and Korea. The comparison of complete or nearly complete sequences obtained from 31 R. japonica strains isolated from various sources in Japan over the past 30 years demonstrated an extremely low level of genomic diversity. In particular, only 34 single nucleotide polymorphisms were identified among the 27 strains of the major lineage containing all clinical isolates and tick isolates from the three tick species. Our data provide novel insights into the biology and genome evolution of R. japonica, including the possibilities of recent clonal expansion and a long generation time in nature due to the long dormant phase associated with tick life cycles.© The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

July 7, 2019

Complete genome sequence of Bordetella pertussis Pelita III, the production strain for an Indonesian whole-cell pertussis vaccine.

PT Bio Farma, the sole World Health Organization-approved Indonesian vaccine producer, manufactures a whole-cell whooping cough vaccine (wP) that, as part of a pentavalent diphtheria-tetanus-pertussis/hepatitis B/Haemophilus influenzae b (DTP/HB/Hib) vaccine, is used in Indonesia and many other countries. We report here the whole-genome sequence for Bordetella pertussis Pelita III, PT Bio Farma’s wP production strain. Copyright © 2017 Efendi et al.

July 7, 2019

Complete genome sequence of Mycoplasma bovis strain 08M.

Mycoplasma bovis is a major bacterial pathogen that can cause respiratory disease, mastitis, and arthritis in cattle. We report here the complete and annotated genome sequence of M. bovis strain 08M, isolated from a calf lung with pneumonia in China. Copyright © 2017 Chen et al.

July 7, 2019

Genome sequencing and comparative genomics reveal the potential pathogenic mechanism of Cercospora sojina Hara on soybean.

Frogeye leaf spot, caused by Cercospora sojina Hara, is a common disease of soybean in most soybean-growing countries of the world. In this study, we report a high-quality genome sequence of C. sojina by Single Molecule Real-Time sequencing method. The 40.8-Mb genome encodes 11,655 predicated genes, and 8,474 genes are revealed by RNA sequencing. Cercospora sojina genome contains large numbers of gene clusters that are involved in synthesis of secondary metabolites, including mycotoxins and pigments. However, much less carbohydrate-binding module protein encoding genes are identified in C. sojina genome, when compared with other phytopathogenic fungi. Bioinformatics analysis reveals that C. sojina harbours about 752 secreted proteins, and 233 of them are effectors. During early infection, the genes for metabolite biosynthesis and effectors are significantly enriched, suggesting that they may play essential roles in pathogenicity. We further identify 13 effectors that can inhibit BAX-induced cell death. Taken together, our results provide insights into the infection mechanisms of C. sojina on soybean.© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019

Structural variation offers new home for disease associations and gene discovery

Following completion of the Human Genome Project, most studies of human genetic variation have centered on single nucleotide polymorphisms (SNPs). SNPs are numerous in individual genomes and serve as useful genetic markers in association studies across a population. These markers have been leveraged to identify genetic loci for disease risk and draw associations with numerous traits of interest. Despite their usefulness, SNPs do not tell the whole story. For example, most SNPs are associated with only a small increased risk of disease, and they usually cannot identify on their own which genes are causal. This has resulted in what many researchers have referred to as missing or hidden heritability.

July 7, 2019

Hunting structural variants: Population by population

Until recently, most population-scale genome sequencing studies have focused on identifying single nucleotide variants (SNVs) to explore genetic differences between individuals. Like so many SNV-based genome-wide association studies, however, these efforts have had difficulty identifying causative genetic mechanisms underlying most complex functions. More and more, the genomics community has realised that structural variation is likely responsible for many of the traits and phenotypes that scientists have not been able to attribute to SNVs. This class of variants, defined as genetic differences of 50 bp or larger, accounts for most of the DNA sequence differences between any two people. Structural variants (SVs) are also already known to cause many common and rare diseases including ALS, schizophrenia, leukemia, Carney complex, and Huntington’s disease. Despite the importance of SVs, these larger variants have been understudied and underreported compared to their single-nucleotide counterparts. One reason is that they remain difficult to detect. Their length often means they cannot be fully spanned using short sequencing reads. They also often occur in highly repetitive or GC-rich regions of the genome, making them challenging targets. As such, this class of human genetic variation has remained vastly under-explored in global populations and is now ripe for discovery.

July 7, 2019

Genome of Cnaphalocrocis medinalis granulovirus, the first Crambidae-infecting betabaculovirus isolated from rice leaffolder to sequenced.

Cnaphalocrocis medinalis is a major pest of rice in South and South-East Asia. Insecticides are the major means farmers use for management. A naturally occurring baculovirus, C. medinalis granulovirus (CnmeGV), has been isolated from the larvae and this has the potential for use as microbial agent. Here, we described the complete genome sequence of CnmeGV and compared it to other baculovirus genomes. The genome of CnmeGV is 112,060 base pairs in length, has a G+C content of 35.2%. It contains 133 putative open reading frames (ORFs) of at least 150 nucleotides. A hundred and one (101) of these ORFs are homologous to other baculovirus genes including 37 baculovirus core genes. Thirty-two (32) ORFs are unique to CnmeGV with no homologues detected in the GeneBank and 53 tandem repeats (TRs) with sequence length from 25 to 551 nt intersperse throughout the genome of CnmeGV. Six (6) homologous regions (hrs) were identified interspersed throughout the genome. Hr2 contains 11 imperfect palindromes and a high content of AT sequence (about 73%). The unique ORF28 contains a coiled-coil region and a zinc finger-like domain of 4-50 residues specialized by two C2C2 zinc finger motifs that putatively bound two atoms of zinc. ORF21 encoding a chit-1 protein suggesting a horizontal gene transfer from alphabaculovirus. The putative protein presents two carbohydrate-binding module family 14 (CBM_14) domains rather than other homologues detected from betabaculovirus that only contains one chit-binding region. Gene synteny maps showed the colinearity of sequenced betabaculovirus. Phylogenetic analysis indicated that CnmeGV grouped in the betabaculovirus, with a close relation to AdorGV. The cladogram obtained in this work grouped the 17 complete GV genomes in one monophyletic clade. CnmeGV represents a new crambidae host-isolated virus species from the genus Betabaculovirus and is most closely relative of AdorGV. The analyses and information derived from this study will provide a better understanding of the pathological symptoms caused by this virus and its potential use as a microbial pesticide.

July 7, 2019

High quality maize centromere 10 sequence reveals evidence of frequent recombination events.

The ancestral centromeres of maize contain long stretches of the tandemly arranged CentC repeat. The abundance of tandem DNA repeats and centromeric retrotransposons (CR) has presented a significant challenge to completely assembling centromeres using traditional sequencing methods. Here, we report a nearly complete assembly of the 1.85 Mb maize centromere 10 from inbred B73 using PacBio technology and BACs from the reference genome project. The error rates estimated from overlapping BAC sequences are 7 × 10(-6) and 5 × 10(-5) for mismatches and indels, respectively. The number of gaps in the region covered by the reassembly was reduced from 140 in the reference genome to three. Three expressed genes are located between 92 and 477 kb from the inferred ancestral CentC cluster, which lies within the region of highest centromeric repeat density. The improved assembly increased the count of full-length CR from 5 to 55 and revealed a 22.7 kb segmental duplication that occurred approximately 121,000 years ago. Our analysis provides evidence of frequent recombination events in the form of partial retrotransposons, deletions within retrotransposons, chimeric retrotransposons, segmental duplications including higher order CentC repeats, a deleted CentC monomer, centromere-proximal inversions, and insertion of mitochondrial sequences. Double-strand DNA break (DSB) repair is the most plausible mechanism for these events and may be the major driver of centromere repeat evolution and diversity. In many cases examined here, DSB repair appears to be mediated by microhomology, suggesting that tandem repeats may have evolved to efficiently repair frequent DSBs in centromeres.

July 7, 2019

Single-locus enrichment without amplification for sequencing and direct detection of epigenetic modifications.

A gene-level targeted enrichment method for direct detection of epigenetic modifications is described. The approach is demonstrated on the CGG-repeat region of the FMR1 gene, for which large repeat expansions, hitherto refractory to sequencing, are known to cause fragile X syndrome. In addition to achieving a single-locus enrichment of nearly 700,000-fold, the elimination of all amplification steps removes PCR-induced bias in the repeat count and preserves the native epigenetic modifications of the DNA. In conjunction with the single-molecule real-time sequencing approach, this enrichment method enables direct readout of the methylation status and the CGG repeat number of the FMR1 allele(s) for a clonally derived cell line. The current method avoids potential biases introduced through chemical modification and/or amplification methods for indirect detection of CpG methylation events.

July 7, 2019

Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing.

Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly.We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies.Documentation and source code for Alpha-CENTAURI are freely available at http://github.com/volkansevim/alpha-CENTAURI CONTACT: ali.bashir@mssm.eduSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press.

July 7, 2019

Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102.

Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture.The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90.Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.

Asset Tag: Tandem repeat

Structure of the type IV secretion system in different strains of Anaplasma phagocytophilum.

Structure and evolution of the filaggrin gene repeated region in primates

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

Combination of short-read, long-read and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications.

Extremely low genomic diversity of Rickettsia japonica distributed in Japan.

Complete genome sequence of Bordetella pertussis Pelita III, the production strain for an Indonesian whole-cell pertussis vaccine.

Complete genome sequence of Mycoplasma bovis strain 08M.

Genome sequencing and comparative genomics reveal the potential pathogenic mechanism of Cercospora sojina Hara on soybean.

Structural variation offers new home for disease associations and gene discovery

Hunting structural variants: Population by population

Genome of Cnaphalocrocis medinalis granulovirus, the first Crambidae-infecting betabaculovirus isolated from rice leaffolder to sequenced.

High quality maize centromere 10 sequence reveals evidence of frequent recombination events.

Single-locus enrichment without amplification for sequencing and direct detection of epigenetic modifications.

Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing.

Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert