The human immunoglobulin heavy chain locus (IGH) remains among the most understudied regions of the human genome. Recent efforts have shown that haplotype diversity within IGH is elevated and exhibits population specific patterns; for example, our re-sequencing of the locus from only a single chromosome uncovered >100 Kb of novel sequence, including descriptions of six novel alleles, and four previously unmapped genes. Historically, this complex locus architecture has hindered the characterization of IGH germline single nucleotide, copy number, and structural variants (SNVs; CNVs; SVs), and as a result, there remains little known about the role of IGH polymorphisms in inter-individual…
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and…
Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated…
Pharmacogenetic testing increasingly is available from clinical and research laboratories. However, only a limited number of quality control and other reference materials currently are available for the complex rearrangements and rare variants that occur in the CYP2D6 gene. To address this need, the Division of Laboratory Systems, CDC-based Genetic Testing Reference Material Coordination Program, in collaboration with members of the pharmacogenetic testing and research communities and the Coriell Cell Repositories (Camden, NJ), has characterized 179 DNA samples derived from Coriell cell lines. Testing included the recharacterization of 137 genomic DNAs that were genotyped in previous Genetic Testing Reference Material Coordination…
a-Gliadins are a major group of gluten proteins in wheat flour that contribute to the end-use properties for food processing and contain major immunogenic epitopes that can cause serious health-related issues including celiac disease (CD). a-Gliadins are also the youngest group of gluten proteins and are encoded by a large gene family. The majority of the gene family members evolved independently in the A, B, and D genomes of different wheat species after their separation from a common ancestral species. To gain insights into the origin and evolution of these complex genes, the genomic regions of the Gli-2 loci encoding…
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a…
Teak, a member of the Lamiaceae family, produces one of the most expensive hardwoods in the world. High demand coupled with deforestation have caused a decrease in natural teak forests, and future supplies will be reliant on teak plantations. Hence, selection of teak tree varieties for clonal propagation with superior growth performance is of great importance, and access to high-quality genetic and genomic resources can accelerate the selection process by identifying genes underlying desired traits.To facilitate teak research and variety improvement, we generated a highly contiguous, chromosomal-scale genome assembly using high-coverage Pacific Biosciences long reads coupled with high-throughput chromatin conformation…
Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available.We sequenced and assembled the genome of P. westermani, which is among the largest of the known pathogen genomes with an estimated size of 1.1 Gb. A 922.8 Mb genome assembly was generated from Illumina and Pacific Biosciences (PacBio) sequence data, covering 84% of…
As they migrated out of Africa and into Europe and Asia, anatomically modern humans interbred with archaic hominins, such as Neanderthals and Denisovans. The result of this genetic introgression on the recipient populations has been of considerable interest, especially in cases of selection for specific archaic genetic variants. Hsieh et al. characterized adaptive structural variants and copy number variants that are likely targets of positive selection in Melanesians. Focusing on population-specific regions of the genome that carry duplicated genes and show an excess of amino acid replacements provides evidence for one of the mechanisms by which genetic novelty can arise…
In addition to the BAC-based reference sequence of the accession Columbia-0 from the year 2000, several short read assemblies of THE plant model organism Arabidopsis thaliana were published during the last years. Also, a SMRT-based assembly of Landsberg erecta has been generated that identified translocation and inversion polymorphisms between two genotypes of the species. Here we provide a chromosome-arm level assembly of the A. thaliana accession Niederzenz-1 (AthNd-1_v2c) based on SMRT sequencing data. The best assembly comprises 69 nucleome sequences and displays a contig length of up to 16 Mbp. Compared to an earlier Illumina short read-based NGS assembly (AthNd-1_v1),…
Scutellaria baicalensis Georgi is important in Chinese traditional medicine where preparations of dried roots, “Huang Qin,” are used for liver and lung complaints and as complementary cancer treatments. We report a high-quality reference genome sequence for S. baicalensis where 93% of the 408.14-Mb genome has been assembled into nine pseudochromosomes with a super-N50 of 33.2 Mb. Comparison of this sequence with those of closely related species in the order Lamiales, Sesamum indicum and Salvia splendens, revealed that a specialized metabolic pathway for the synthesis of 4′-deoxyflavone bioactives evolved in the genus Scutellaria. We found that the gene encoding a specific…
Construction of chromosome-level assembly is a vital step in achieving the goal of a ‘Platinum’ genome, but it remains a major challenge to assemble and anchor sequences to chromosomes in autopolyploid or highly heterozygous genomes. High-throughput chromosome conformation capture (Hi-C) technology serves as a robust tool to dramatically advance chromosome scaffolding; however, existing approaches are mostly designed for diploid genomes and often with the aim of reconstructing a haploid representation, thereby having limited power to reconstruct chromosomes for autopolyploid genomes. We developed a novel algorithm (ALLHiC) that is capable of building allele-aware, chromosomal-scale assembly for autopolyploid genomes using Hi-C paired-end…
We present reference-quality genome assembly and annotation for the stout camphor tree (Cinnamomum kanehirae (Laurales, Lauraceae)), the first sequenced member of the Magnoliidae comprising four orders (Laurales, Magnoliales, Canellales and Piperales) and over 9,000 species. Phylogenomic analysis of 13 representative seed plant genomes indicates that magnoliid and eudicot lineages share more recent common ancestry than monocots. Two whole-genome duplication events were inferred within the magnoliid lineage: one before divergence of Laurales and Magnoliales and the other within the Lauraceae. Small-scale segmental duplications and tandem duplications also contributed to innovation in the evolutionary history of Cinnamomum. For example, expansion of the…
The white poplar (Populus alba) is widely distributed in Central Asia and Europe. There are natural populations of white poplar in Irtysh River basin in China. It also can be cultivated and grown well in northern China. In this study, we sequenced the genome of P. alba by single-molecule real-time technology. De novo assembly of P. alba had a genome size of 415.99 Mb with a contig N50 of 1.18 Mb. A total of 32,963 protein-coding genes were identified. 45.16% of the genome was annotated as repetitive elements. Genome evolution analysis revealed that divergence between P. alba and Populus trichocarpa…
Chalcidoidea (chalcidoid wasps) are an abundant and megadiverse insect group with both ecological and economical importance. Here we report a complete mitochondrial genome in Chalcidoidea from Pteromalus puparum (Pteromalidae). Eight tandem repeats followed by 6 reversed repeats were detected in its 3308?bp control region. This long and complex control region may explain failures of amplifying and sequencing of complete mitochondrial genomes in some chalcidoids. In addition to 37 typical mitochondrial genes, an extra identical isoleucine tRNA (trnI) was detected at the opposite end of the control region. This recent mitochondrial gene duplication indicates that gene arrangements in chalcidoids are ongoing.…