During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Marker gene amplicon sequencing is often preferred over whole genome sequencing for microbial community characterization, due to its lower cost while still enabling assessment of uncultivable organisms. This technique involves many experimental steps, each of which can be a source of errors and bias. We present an up-to-date overview of the whole experimental pipeline, from sampling to sequencing reads, and give information allowing for informed choices at each step of both planning and execution of a microbial community assessment study. When applicable, we also suggest ways of avoiding inherent pitfalls in amplicon sequencing. © 2019 The Society for Applied Microbiology.
The evolution of Bordetella pertussis from a common ancestor similar to Bordetella bronchiseptica has occurred through large-scale gene loss, inactivation and rearrangements, largely driven by the spread of insertion sequence element repeats throughout the genome. B. pertussis is widely considered to be monomorphic, and recent evolution of the B. pertussis genome appears to, at least in part, be driven by vaccine-based selection. Given the recent global resurgence of whooping cough despite the wide-spread use of vaccination, a more thorough understanding of B. pertussis genomics could be highly informative. In this chapter we discuss the evolution of B. pertussis, including how vaccination is changing the circulating B. pertussis population at the gene-level, and how new sequencing technologies are revealing previously unknown levels of inter- and intra-strain variation at the genome-level.
The emergence of third generation sequencing (3GS; long-reads) is making closer the goal of chromosome-size fragments in de novo genome assemblies. This allows the exploration of new and broader questions on genome evolution for a number of non-model organisms. However, long-read technologies result in higher sequencing error rates and therefore impose an elevated cost of sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining short-reads and long-reads provide an alternative efficient and cost-effective approach to generate de novo, chromosome-level genome assemblies. The array of available software programs for hybrid genome assembly, sequence correction and manipulation is constantly being expanded and improved. This makes it difficult for non-experts to find efficient, fast and tractable computational solutions for genome assembly, especially in the case of non-model organisms lacking a reference genome or one from a closely related species. In this study, we review and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a non-model cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent contiguity on this non-model organism using the DBG2OLC pipeline.
Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.
Genetic sequencing technologies are evolving at a rapid pace with major implications for research and clinical practice. In this review, the authors provide an updated overview of next-generation sequencing (NGS) and emerging methodologies. NGS has tremendously improved sequencing output while being more time and cost-efficient in comparison to Sanger sequencing. The authors describe short-read sequencing approaches, such as sequencing by synthesis, ion semiconductor sequencing, and nanoball sequencing. Third-generation long-read sequencing now promises to overcome many of the limitations of short-read sequencing, such as the ability to reliably resolve repeat sequences and large genomic rearrangements. By combining complementary methods with massively parallel DNA sequencing, a greater insight into the biological context of disease mechanisms is now possible. Emerging methodologies, such as advances in nanopore technology, in situ nucleic acid sequencing, and microscopy-based sequencing, will continue the rapid evolution of this area. These new technologies hold many potential applications for hematological disorders, with the promise of precision and personalized medical care in the future.Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.
Diverse gene pool, advanced plant phenomics and genomics methods enhanced genetic gain and understanding of important agronomic, adaptation and nutritional traits in finger millet. Finger millet (Eleusine coracana L. Gaertn) is an important minor millet for food and nutritional security in semi-arid regions of the world. The crop has wide adaptability and can be grown right from high hills in Himalayan region to coastal plains. It provides food grain as well as palatable straw for cattle, and is fairly climate resilient. The crop has large gene pool with distinct features of both Indian and African germplasm types. Interspecific hybridization between Indian and African germplasm has resulted in greater yield enhancement and disease resistance. The crop has shown numerous advantages over major cereals in terms of stress adaptation, nutritional quality and health benefits. It has indispensable repository of novel genes for the benefits of mankind. Although rapid strides have been made in allele mining in model crops and major cereals, the progress in finger millet genomics is lacking. Comparative genomics have paved the way for the marker-assisted selection, where resistance gene homologues of rice for blast and sequence variants for nutritional traits from other cereals have been invariably used. Transcriptomics studies have provided preliminary understanding of the nutritional variation, drought and salinity tolerance. However, the genetics of many important traits in finger millet is poorly understood and need systematic efforts from biologists across disciplines. Recently, deciphered finger millet genome will enable identification of candidate genes for agronomically and nutritionally important traits. Further, improvement in genome assembly and application of genomic selection as well as genome editing in near future will provide plethora of information and opportunity to understand the genetics of complex traits.
Alternative splicing of pre-mRNAs is a crucial mechanism for maintaining protein diversity in eukaryotes without requiring a considerable increase of genes in the number. Due to rapid advances in high-throughput sequencing technologies and computational algorithms, it is anticipated that alternative splicing events will be more intensively studied to address different kinds of biological questions. The occurrences of alternative splicing mean that all exons could be classified to be either constitutively or alternatively spliced depending on whether they are virtually included into all mature mRNAs. From an evolutionary point of view, therefore, the alternatively spliced exons would have been associated with distinctive biological characteristics in comparison with constitutively spliced exons. In this paper, we first outline the representative types of alternative splicing events and exon classification, and then review sequence and evolutionary features for the alternatively spliced exons. The main purpose is to facilitate understanding of the biological implications of alternative splicing in eukaryotes. This knowledge is also helpful to establish computational approaches for predicting the splicing pattern of exons.
Transposable Elements Adaptive Role in Genome Plasticity, Pathogenicity and Evolution in Fungal Phytopathogens.
Transposable elements (TEs) are agents of genetic variability in phytopathogens as they are a source of adaptive evolution through genome diversification. Although many studies have uncovered information on TEs, the exact mechanism behind TE-induced changes within the genome remains poorly understood. Furthermore, convergent trends towards bigger genomes, emergence of novel genes and gain or loss of genes implicate a TE-regulated genome plasticity of fungal phytopathogens. TEs are able to alter gene expression by revamping the cis-regulatory elements or recruiting epigenetic control. Recent findings show that TEs recruit epigenetic control on the expression of effector genes as part of the coordinated infection strategy. In addition to genome plasticity and diversity, fungal pathogenicity is an area of economic concern. A survey of TE distribution suggests that their proximity to pathogenicity genes TEs may act as sites for emergence of novel pathogenicity factors via nucleotide changes and expansion or reduction of the gene family. Through a systematic survey of literature, we were able to conclude that the role of TEs in fungi is wide: ranging from genome plasticity, pathogenicity to adaptive behavior in evolution. This review also identifies the gaps in knowledge that requires further elucidation for a better understanding of TEs’ contribution to genome architecture and versatility.
The central goal of medical genomics is to understand the inherited basis of sequence variation that underlies human physiology, evolution, and disease. Functional association studies currently ignore millions of bases that span each centromeric region and acrocentric short arm. These regions are enriched in long arrays of tandem repeats, or satellite DNAs, that are known to vary extensively in copy number and repeat structure in the human population. Satellite sequence variation in the human genome is often so large that it is detected cytogenetically, yet due to the lack of a reference assembly and informatics tools to measure this variability, contemporary high-resolution disease association studies are unable to detect causal variants in these regions. Nevertheless, recently uncovered associations between satellite DNA variation and human disease support that these regions present a substantial and biologically important fraction of human sequence variation. Therefore, there is a pressing and unmet need to detect and incorporate this uncharacterized sequence variation into broad studies of human evolution and medical genomics. Here I discuss the current knowledge of satellite DNA variation in the human genome, focusing on centromeric satellites and their potential implications for disease.
Aspergillus oryzae has been used for the production of traditional fermentation and has promising potential to produce primary and secondary metabolites. Due to the tough cell walls and high drug resistance of A. oryzae, functional genomic characterization studies are relatively limited. The exploitation of selection markers and genetic transformation methods are critical for improving A. oryzae fermentative strains. In this review, we describe the genome sequencing of various A. oryzae strains. Recently developed selection markers and transformation strategies are also described in detail, and the advantages and disadvantages of transformation methods are presented. Lastly, we introduce the recent progress on highlighted topics in A. oryzae functional genomics including conidiation, protein secretion and expression, and secondary metabolites, which will be beneficial for improving the application of A. oryzae to industrial production.
The problem of ‘missing heritability’ affects both common and rare diseases hindering: discovery, diagnosis, and patient care. The ‘missing heritability’ concept has been mainly associated with common and complex diseases where promising modern technological advances, like genome-wide association studies (GWAS), were unable to uncover the complete genetic mechanism of the disease/trait. Although rare diseases (RDs) have low prevalence individually, collectively they are common. Furthermore, multi-level genetic and phenotypic complexity when combined with the individual rarity of these conditions poses an important challenge in the quest to identify causative genetic changes in RD patients. In recent years, high throughput sequencing has accelerated discovery and diagnosis in RDs. However, despite the several-fold increase (from ~10% using traditional to ~40% using genome-wide genetic testing) in finding genetic causes of these diseases in RD patients, as is the case in common diseases-the majority of RDs are also facing the ‘missing heritability’ problem. This review outlines the key role of high throughput sequencing in uncovering genetics behind RDs, with a particular focus on genome sequencing. We review current advances and challenges of sequencing technologies, bioinformatics approaches, and resources.
Supernumerary B chromosomes (Bs) are extra karyotype units in addition to A chromosomes, and are found in some fungi and thousands of animals and plant species. Bs are uniquely characterized due to their non-Mendelian inheritance, and represent one of the best examples of genomic conflict. Over the last decades, their genetic composition, function and evolution have remained an unresolved query, although a few successful attempts have been made to address these phenomena. A classical concept based on cytogenetics and genetics is that Bs are selfish and abundant with DNA repeats and transposons, and in most cases, they do not carry any function. However, recently, the modern quantum development of high scale multi-omics techniques has shifted B research towards a new-born field that we call “B-omics”. We review the recent literature and add novel perspectives to the B research, discussing the role of new technologies to understand the mechanistic perspectives of the molecular evolution and function of Bs. The modern view states that B chromosomes are enriched with genes for many significant biological functions, including but not limited to the interesting set of genes related to cell cycle and chromosome structure. Furthermore, the presence of B chromosomes could favor genomic rearrangements and influence the nuclear environment affecting the function of other chromatin regions. We hypothesize that B chromosomes might play a key function in driving their transmission and maintenance inside the cell, as well as offer an extra genomic compartment for evolution.