Satellite repeats are a structural component of centromeres and telomeres, and in some instances their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50?bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: (1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and (2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males vs. females; using Y chromosome assemblies or FIuorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59?kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The damage caused by Bradysia odoriphaga is the main factor threatening the production of vegetables in the Liliaceae family. However, few genetic studies of B. odoriphaga have been conducted because of a lack of genomic resources. Many long-read sequencing technologies have been developed in the last decade; therefore, in this study, the transcriptome including all development stages of B. odoriphaga was sequenced for the first time by Pacific single-molecule long-read sequencing. Here, 39,129 isoforms were generated, and 35,645 were found to have annotation results when checked against sequences available in different databases. Overall, 18,473 isoforms were distributed in 25 various Clusters of Orthologous Groups, and 11,880 isoforms were categorized into 60 functional groups that belonged to the three main Gene Ontology classifications. Moreover, 30,610 isoforms were assigned into 44 functional categories belonging to six main Kyoto Encyclopedia of Genes and Genomes functional categories. Coding DNA sequence (CDS) prediction showed that 36,419 out of 39,129 isoforms were predicted to have CDS, and 4319 simple sequence repeats were detected in total. Finally, 266 insecticide resistance and metabolism-related isoforms were identified as candidate genes for further investigation of insecticide resistance and metabolism in B. odoriphaga.
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity. Copyright © 2018 Elsevier Inc. All rights reserved.
a-Difluoromethylornithine reduces gastric carcinogenesis by causing mutations in Helicobacter pylori cagY.
Infection by Helicobacter pylori is the primary cause of gastric adenocarcinoma. The most potent H. pylori virulence factor is cytotoxin-associated gene A (CagA), which is translocated by a type 4 secretion system (T4SS) into gastric epithelial cells and activates oncogenic signaling pathways. The gene cagY encodes for a key component of the T4SS and can undergo gene rearrangements. We have shown that the cancer chemopreventive agent a-difluoromethylornithine (DFMO), known to inhibit the enzyme ornithine decarboxylase, reduces H. pylori-mediated gastric cancer incidence in Mongolian gerbils. In the present study, we questioned whether DFMO might directly affect H. pylori pathogenicity. We show that H. pylori output strains isolated from gerbils treated with DFMO exhibit reduced ability to translocate CagA in gastric epithelial cells. Further, we frequently detected genomic modifications in the middle repeat region of the cagY gene of output strains from DFMO-treated animals, which were associated with alterations in the CagY protein. Gerbils did not develop carcinoma when infected with a DFMO output strain containing rearranged cagY or the parental strain in which the wild-type cagY was replaced by cagY with DFMO-induced rearrangements. Lastly, we demonstrate that in vitro treatment of H. pylori by DFMO induces oxidative DNA damage, expression of the DNA repair enzyme MutS2, and mutations in cagY, demonstrating that DFMO directly affects genomic stability. Deletion of mutS2 abrogated the ability of DFMO to induce cagY rearrangements directly. In conclusion, DFMO-induced oxidative stress in H. pylori leads to genomic alterations and attenuates virulence.
Microsatellite marker set for genetic diversity assessment of primitive Chitala chitala (Hamilton, 1822) derived through SMRT sequencing technology.
In present study, single molecule-real time sequencing technology was used to obtain a validated set of microsatellite markers for application in population genetics of the primitive fish, Chitala chitala. Assembly of circular consensus sequencing reads resulted into 1164 sequences which contained 2005 repetitive motifs. A total of 100 sequences were used for primer designing and amplification yielded a set of 28 validated polymorphic markers. These loci were used to genotype n?=?72 samples from three distant riverine populations of India, namely Son, Satluj and Brahmaputra, for determining intraspecific genetic variation. The microsatellite loci exhibited high level of polymorphism with PIC values ranging from 0.281 to 0.901. The genetic parameters revealed that mean heterozygosity ranged from 0.6802 to 0.6826 and the populations were found to be genetically diverse (Fst 0.03-0.06). This indicated the potential application of these microsatellite marker set that can used for stock characterization of C. chitala, in the wild. These newly developed loci were assayed for cross transferability in another notopterid fish, Notopterus notopterus.
Morella rubra, red bayberry, is an economically important fruit tree in south China. Here, we assembled the first high-quality genome for both a female and a male individual of red bayberry. The genome size was 313-Mb, and 90% sequences were assembled into eight pseudo chromosome molecules, with 32 493 predicted genes. By whole-genome comparison between the female and male and association analysis with sequences of bulked and individual DNA samples from female and male, a 59-Kb region determining female was identified and located on distal end of pseudochromosome 8, which contains abundant transposable element and seven putative genes, four of them are related to sex floral development. This 59-Kb female-specific region was likely to be derived from duplication and rearrangement of paralogous genes and retained non-recombinant in the female-specific region. Sex-specific molecular markers developed from candidate genes co-segregated with sex in a genetically diverse female and male germplasm. We propose sex determination follow the ZW model of female heterogamety. The genome sequence of red bayberry provides a valuable resource for plant sex chromosome evolution and also provides important insights for molecular biology, genetics and modern breeding in Myricaceae family. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Long-read sequencing identified intronic repeat expansions in SAMD12 from Chinese pedigrees affected with familial cortical myoclonic tremor with epilepsy.
The locus for familial cortical myoclonic tremor with epilepsy (FCMTE) has long been mapped to 8q24 in linkage studies, but the causative mutations remain unclear. Recently, expansions of intronic TTTCA and TTTTA repeat motifs within SAMD12 were found to be involved in the pathogenesis of FCMTE in Japanese pedigrees. We aim to identify the causative mutations of FCMTE in Chinese pedigrees.We performed genetic linkage analysis by microsatellite markers in a five-generation Chinese pedigree with 55 members. We also used array-comparative genomic hybridisation (CGH) and next-generation sequencing (NGS) technologies (whole-exome sequencing, capture region deep sequencing and whole-genome sequencing) to identify the causative mutations in the disease locus. Recently, we used low-coverage (~10×) long-read genome sequencing (LRS) on the PacBio Sequel and Oxford Nanopore platforms to identify the causative mutations, and used repeat-primed PCR for validation of the repeat expansions.Linkage analysis mapped the disease locus to 8q23.3-24.23. Array-CGH and NGS failed to identify causative mutations in this locus. LRS identified the intronic TTTCA and TTTTA repeat expansions in SAMD12 as the causative mutations, thus corroborating the recently published results in Japanese pedigrees.We identified the pentanucleotide repeat expansion in SAMD12 as the causative mutation in Chinese FCMTE pedigrees. Our study also suggested that LRS is an effective tool for molecular diagnosis of genetic disorders, especially for neurological diseases that cannot be positively diagnosed by conventional clinical microarray and NGS technologies. © Author(s) (or their employer(s)) 2019. No commercial re-use. See rights and permissions. Published by BMJ.
The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor >?98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.
In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100?bps to ~10,000?bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With this result, we constructed JRGv1 (Japanese Reference Genome version 1) by integrating the 903 verified insertions, totaling 1,086,173 bases, shared by at least two Japanese individuals into GRCh38. We also constructed decoyJRGv1 by concatenating 3559 verified insertions, totaling 2,536,870 bases, shared by at least two Japanese individuals or by six other assemblies. This assembly improved the alignment ratio by 0.4% on average. These results demonstrate the importance of refining the reference assembly and creating a population-specific reference genome. JRGv1 and decoyJRGv1 are available at the JRG website.