Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington’s Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and…
The pig is a well-studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI reference sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from current NCBI and Ensembl reference pig gene sets, additional alternate transcripts, and other improvements. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline…
Lactobacillus crispatus is a commonly found bacterium in vertebrate microbiota, particularly the human vagina. We report the first complete genome of a strain isolated from a human vagina, L. crispatus CO3MRSI1.
Contaminant sequences that appear in published genomes can cause numerous problems for downstream analyses, particularly for evolutionary studies and metagenomics projects. Our large-scale scan of complete and draft bacterial and archaeal genomes in the NCBI RefSeq database reveals that 2250 genomes are contaminated by human sequence. The contaminant sequences derive primarily from high-copy human repeat regions, which themselves are not adequately represented in the current human reference genome, GRCh38. The absence of the sequences from the human assembly offers a likely explanation for their presence in bacterial assemblies. In some cases, the contaminating contigs have been erroneously annotated as containing…
Recombination between loci underlying mate choice and ecological traits is a major evolutionary force acting against speciation with gene flow. The evolution of linkage disequilibrium between such loci is therefore a fundamental step in the origin of species. Here, we show that this process can take place in the absence of physical linkage in hamlets-a group of closely related reef fishes from the wider Caribbean that differ essentially in colour pattern and are reproductively isolated through strong visually-based assortative mating. Using full-genome analysis, we identify four narrow genomic intervals that are consistently differentiated among sympatric species in a backdrop of…
The discovery of mutations associated with human genetic dis- ease is an exercise in comparative genomics (see Glossary). Although there are many different strategies and approaches, the central premise is that affected persons harbor a significant excess of pathogenic DNA variants as com- pared with a group of unaffected persons (controls) that is either clinically defined1 or established by surveying large swaths of the general population.2 The more exclu- sive the variant is to the disease, the greater its penetrance, the larger its effect size, and the more relevant it becomes to both disease diagnosis and future therapeutic investigation. The…
The study of the draft genome of an Antarctic marine ciliate, Euplotes petzi, revealed foreign sequences of bacterial origin belonging to the ?-proteobacterium Francisella that includes pathogenic and environmental species. TEM and FISH analyses confirmed the presence of a Francisella endocytobiont in E. petzi. This endocytobiont was isolated and found to be a new species, named F. adeliensis sp. nov.. F. adeliensis grows well at wide ranges of temperature, salinity, and carbon dioxide concentrations implying that it may colonize new organisms living in deeply diversified habitats. The F. adeliensis genome includes the igl and pdp gene sets (pdpC and pdpE…
Unlike the normal anadromous lifestyle, Chinese native Dolly Varden char (Salvelinus malma) is locked in land and lives in fresh water lifetime. To explore the effect of freshwater adaption on its immune system, we constructed a pooled cDNA library of hepatopancreas and spleen of Chinese freshwater Dolly Varden char (S. malma). A total of 27,829 unigenes were generated from 31,233 high-quality transcripts and 17,670 complete open reading frames (ORF) were identified. Totally 25,809 unigenes were successfully annotated and it classified more native than adaptive immunity-associated genes, and more genes involved in toll-like receptor signal pathway than those in complement and…
The iconic orange clownfish, Amphiprion percula, is a model organism for studying the ecology and evolution of reef fishes, including patterns of population connectivity, sex change, social organization, habitat selection and adaptation to climate change. Notably, the orange clownfish is the only reef fish for which a complete larval dispersal kernel has been established and was the first fish species for which it was demonstrated that antipredator responses of reef fishes could be impaired by ocean acidification. Despite its importance, molecular resources for this species remain scarce and until now it lacked a reference genome assembly. Here, we present a…
The antibody repertoire of Bos taurus is characterized by a subset of variable heavy (VH) chain regions with ultralong third complementarity determining regions (CDR3) which, compared to other species, can provide a potent response to challenging antigens like HIV env. These unusual CDR3 can range to over seventy highly diverse amino acids in length and form unique ß-ribbon ‘stalk’ and disulfide bonded ‘knob’ structures, far from the typical antigen binding site. The genetic components and processes for forming these unusual cattle antibody VH CDR3 are not well understood. Here we analyze sequences of Bos taurus antibody VH domains and find…
Whole-genome duplications are an important source of evolutionary novelties that change the mode and tempo at which genetic elements evolve within a genome. The Cucurbita genus experienced a whole-genome duplication around 30 million years ago, although the evolutionary dynamics of the coding and noncoding genes in this genus have not yet been scrutinized. Here, we analyzed the genomes of four Cucurbita species, including a newly assembled genome of Cucurbita argyrosperma, and compared the gene contents of these species with those of five other members of the Cucurbitaceae family to assess the evolutionary dynamics of protein-coding and long intergenic noncoding RNA (lincRNA) genes…
A production herd of Czech Simmental cattle (Czech Red Pied, CRP), the conserved subpopulation of this breed, and the ancient local breed Czech Red cattle (CR) were screened for diversity in the antibacterial toll-like receptors (TLRs), which are members of the innate immune system. Polymerase chain reaction (PCR) amplicons of TLR1, TLR2, TLR4, TLR5, and TLR6 from pooled DNA samples were sequenced with PacBio technology, with 3–5×?coverage per gene per animal. To increase the reliability of variant detection, the gDNA pools were sequenced in parallel with the Illumina X-ten platform at low coverage (60× per gene). The diversity in conserved…
Icefishes (suborder Notothenioidei; family Channichthyidae) are the only vertebrates that lack functional haemoglobin genes and red blood cells. Here, we report a high-quality genome assembly and linkage map for the Antarctic blackfin icefish Chaenocephalus aceratus, highlighting evolved genomic features for its unique physiology. Phylogenomic analysis revealed that Antarctic fish of the teleost suborder Notothenioidei, including icefishes, diverged from the stickleback lineage about 77 million years ago and subsequently evolved cold-adapted phenotypes as the Southern Ocean cooled to sub-zero temperatures. Our results show that genes involved in protection from ice damage, including genes encoding antifreeze glycoprotein and zona pellucida proteins, are…
Rapidly improving sequencing technology coupled with computational developments in sequence assembly are making reference-quality genome assembly economical. Hundreds of vertebrate genome assemblies are now publicly available, and projects are being proposed to sequence thousands of additional species in the next few years. Such dense sampling of the tree of life should give an unprecedented new understanding of evolution and allow a detailed determination of the events that led to the wealth of biodiversity around us. To gain this knowledge, these new genomes must be compared through genome alignment (at the sequence level) and comparative annotation (at the gene level). However,…
To better understand the immune system of shrimp, this study combined PacBio isoform sequencing (Iso-Seq) and Illumina paired-end short reads sequencing methods to discover full-length immune-related molecules of the Pacific white shrimp, Litopenaeus vannamei. A total of 72,648 nonredundant full-length transcripts (unigenes) were generated with an average length of 2545 bp from five main tissues, including the hepatopancreas, cardiac stomach, heart, muscle, and pyloric stomach. These unigenes exhibited a high annotation rate (62,164, 85.57%) when compared against NR, NT, Swiss-Prot, Pfam, GO, KEGG and COG databases. A total of 7544 putative long noncoding RNAs (lncRNAs) were detected and 1164 nonredundant…