Human MHC class I genes HLA-A, -B, -C, and class II genes HLA -DR, -DQ, and -DP play a critical role in the immune system as primary factors responsible for organ transplant rejection. Additionally, the HLA genes are important targets for clinical and drug sensitivity research because of their direct or linkage-based association with several diseases, including cancer, and autoimmune diseases. HLA genes are highly polymorphic, and their diversity originates from exonic combinations as well as recombination events. With full-length gene sequencing, a significant increase of new alleles in the HLA database is expected, stressing the need for high-resolution sequencing.…
Pulmonary non-tuberculous mycobacterial (PNTM) infections occur in patients with chronic lung disease, but also in a distinct group of elderly women without lung defects who share a common body morphology: tall and lean with scoliosis, pectus excavatum, and mitral valve prolapse. In order to characterize the human host susceptibility to PNTM, we performed whole exome sequencing (WES) of 44 individuals in extended families of patients with active PNTM as well as 55 additional unrelated individuals with PNTM. This unique collection of familial cohorts in PNTM represents an important opportunity for a high yield search for genes that regulate mucosal immunity.…
Structural variants (genomic differences =50 base pairs) contribute to the evolution of traits and disease. Most structural variants (SVs) are too small to detect with array comparative genomic hybridization and too large to reliably discover with short-read DNA sequencing.
The recent advent of long-read sequencing technologies is expected to provide reasonable answers to genetic challenges unresolvable by short-read sequencing, primarily the inability to accurately study structural variations, copy number variations, and homologous repeats in complex parts of the genome. However, long-read sequencing comes along with higher rates of random short deletions and insertions, and single nucleotide errors. The relatively higher sequencing accuracy of short-read sequencing has kept it as the first choice of screening for single nucleotide variants and short deletions and insertions. Albeit, short-read sequencing still suffers from systematic errors that tend to occur at specific positions where…
Motivation: Third-generation sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. Results: In this paper, we present a categorization and review of long read error correction methods,…
Achieving high intraspecific genetic diversity is a critical goal in ecological restoration as it increases the adaptive potential and long-term resilience of populations. Thus, we investigated genetic diversity within and between pristine sites in a fossil floodplain and compared it to sites restored by hay-transfer between 1997 and 2014. RAD-seq genotyping revealed that the stenoecious flood-plain species Arabis nemorensis is co-occurring with individuals that, based on ploidy, ITS-sequencing and morphology, probably belong to the close relative Arabis sagittata, which has a documented preference for dry calcareous grasslands but has not been reported in floodplain meadows. We show that hay-transfer maintains…
Suppressed recombination allows divergence between homologous sex chromosomes and the functionality of their genes. Here, we reveal patterns of the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua on the basis of cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis. The genome assembly contained 34,105 expressed genes, of which 10,076 were assigned to linkage groups. Genetic mapping and exome resequencing of individuals across the species range both identified the largest linkage group, LG1, as the sex chromosome. Although the sex chromosomes of M. annua are karyotypically…
Yellowhorn (Xanthoceras sorbifolium) is a species of the Sapindaceae family native to China and is an oil tree that can withstand cold and drought conditions. A pseudomolecule-level genome assembly for this species will not only contribute to understanding the evolution of its genes and chromosomes but also bring yellowhorn breeding into the genomic era.Here, we generated 15 pseudomolecules of yellowhorn chromosomes, on which 97.04% of scaffolds were anchored, using the combined Illumina HiSeq, Pacific Biosciences Sequel, and Hi-C technologies. The length of the final yellowhorn genome assembly was 504.2 Mb with a contig N50 size of 1.04 Mb and a…
Phenotypic change is a hallmark of bacterial adaptation during chronic infection. In the case of chronic Pseudomonas aeruginosa lung infection in patients with cystic fibrosis, well-characterized phenotypic variants include mucoid and small colony variants (SCVs). It has previously been shown that SCVs can be reproducibly isolated from the murine lung following the establishment of chronic infection with mucoid P. aeruginosa strain NH57388A. Using a combination of single-molecule real-time (PacBio) and Illumina sequencing we identify a large genomic inversion in the SCV through recombination between homologous regions of two rRNA operons and an associated truncation of one of the 16S rRNA…
A high level of transposon-mediated genome rearrangement is a common trait among microorganisms isolated from thermal environments, probably contributing to the extraordinary genomic plasticity and horizontal gene transfer (HGT) observed in these habitats. In this work, active and inactive insertion sequences (ISs) spanning the sequenced members of the genus Thermus were characterized, with special emphasis on three T. thermophilus strains: HB27, HB8, and NAR1. A large number of full ISs and fragments derived from different IS families were found, concentrating within megaplasmids present in most isolates. Potentially active ISs were identified through analysis of transposase integrity, and domestication-related transposition events…
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations…
Downy mildews are obligate biotrophic oomycete pathogens that cause devastating plant diseases on economically important crops. Plasmopara viticola is the causal agent of grapevine downy mildew, a major disease in vineyards worldwide. We sequenced the genome of Pl. viticola with PacBio long reads and obtained a new 92.94?Mb assembly with high contiguity (359 scaffolds for a N50 of 706.5?kb) due to a better resolution of repeat regions. This assembly presented a high level of gene completeness, recovering 1,592 genes encoding secreted proteins involved in plant-pathogen interactions. Plasmopara viticola had a two-speed genome architecture, with secreted protein-encoding genes preferentially located in…
Contaminant sequences that appear in published genomes can cause numerous problems for downstream analyses, particularly for evolutionary studies and metagenomics projects. Our large-scale scan of complete and draft bacterial and archaeal genomes in the NCBI RefSeq database reveals that 2250 genomes are contaminated by human sequence. The contaminant sequences derive primarily from high-copy human repeat regions, which themselves are not adequately represented in the current human reference genome, GRCh38. The absence of the sequences from the human assembly offers a likely explanation for their presence in bacterial assemblies. In some cases, the contaminating contigs have been erroneously annotated as containing…
Scutellaria baicalensis Georgi is important in Chinese traditional medicine where preparations of dried roots, “Huang Qin,” are used for liver and lung complaints and as complementary cancer treatments. We report a high-quality reference genome sequence for S. baicalensis where 93% of the 408.14-Mb genome has been assembled into nine pseudochromosomes with a super-N50 of 33.2 Mb. Comparison of this sequence with those of closely related species in the order Lamiales, Sesamum indicum and Salvia splendens, revealed that a specialized metabolic pathway for the synthesis of 4′-deoxyflavone bioactives evolved in the genus Scutellaria. We found that the gene encoding a specific…
Cell-free DNA (cfDNA) fragments in maternal plasma contain DNA damage and may negatively impact the sensitivity of noninvasive prenatal testing (NIPT). However, some of these DNA damages are potentially reparable. We aimed to recover these damaged cfDNA molecules using PreCR DNA repair mix.cfDNA was extracted from 20 maternal plasma samples and was repaired and sequenced by the Illumina platform. Size profiles and fetal DNA fraction changes of repaired samples were characterized. Targeted sequencing of chromosome Y sequences was used to enrich fetal cfDNA molecules following repair. Single-molecule real-time (SMRT) sequencing platform was employed to characterize long (>250 bp) cfDNA molecules. NIPT…