During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and…
Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated…
A Gram-stain-negative, rod-shaped and red-pigmented strain, HME7025T, was isolated from freshwater sampled in the Republic of Korea. Phylogenetic analysis based on its 16S rRNA gene sequence revealed that strain HME7025T formed a lineage within the family Cytophagaceae of the phylum Bacteroidetes. Strain HME7025T was closely related to the genera Pseudarcicella, Arcicella and Flectobacillus. The 16S rRNA gene sequence similarity values of strain HME7025T were under 94.5?% to its closest phylogenetic neighbours. The major fatty acids of strain HME7025T were iso-C15?:?0 (41.9?%), summed feature 3 (comprising C16?:?1?7c and/or C16?:?1?6c; 12.2?%) and anteiso-C15?:?0 (10.8?%). The major respiratory quinone was menaquinone-7. The major…
Pharmacogenetic testing increasingly is available from clinical and research laboratories. However, only a limited number of quality control and other reference materials currently are available for the complex rearrangements and rare variants that occur in the CYP2D6 gene. To address this need, the Division of Laboratory Systems, CDC-based Genetic Testing Reference Material Coordination Program, in collaboration with members of the pharmacogenetic testing and research communities and the Coriell Cell Repositories (Camden, NJ), has characterized 179 DNA samples derived from Coriell cell lines. Testing included the recharacterization of 137 genomic DNAs that were genotyped in previous Genetic Testing Reference Material Coordination…
To track stepwise changes in genetic diversity and antimicrobial resistance in rapidly evolving OXA-232-producing Klebsiella pneumoniae ST14, an emerging carbapenem-resistant high-risk clone, in clinical settings.Twenty-six K. pneumoniae ST14 isolates were collected by the Korean Nationwide Surveillance of Antimicrobial Resistance system over the course of 1 year. Isolates were subjected to whole-genome sequencing and MIC determinations using 33 antibiotics from 14 classes.Single-nucleotide polymorphism (SNP) typing identified 72 unique SNP sites spanning the chromosomes of the isolates, dividing them into three clusters (I, II and III). The initial isolate possessed two plasmids with 18 antibiotic-resistance genes, including blaOXA-232, and exhibited resistance to 11 antibiotic…
We report reference-quality genome assemblies and annotations for two accessions of soybean (Glycine max) and one of Glycine soja, the closest wild relative of G. max. The G. max assemblies are for widely used U.S. cultivars: the northern line ‘Williams 82’ (Wm82); and the southern line ‘Lee’. The Wm82 assembly improves the prior published assembly, and the Lee and G. soja assemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 SNPs/kb between Wm82 and Lee, and 4.7 SNPs/kb between these lines and G. soja. SNP distributions and comparisons…
Chlorella vulgaris is a fast-growing fresh-water microalga cultivated at the industrial scale for applications ranging from food to biofuel production. To advance our understanding of its biology and to establish genetics tools for biotechnological manipulation, we sequenced the nuclear and organelle genomes of Chlorella vulgaris 211/11P by combining next generation sequencing and optical mapping of isolated DNA molecules. This hybrid approach allowed to assemble the nuclear genome in 14 pseudo-molecules with an N50 of 2.8 Mb and 98.9% of scaffolded genome. The integration of RNA-seq data obtained at two different irradiances of growth (high light-HL versus low light -LL) enabled…
Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions…
Horizontal transfer of plasmids encoding antimicrobial-resistance and virulence determinants has been instrumental in Staphylococcus aureus evolution, including the emergence of community-associated methicillin-resistant S. aureus (CA-MRSA). In the early 1990s the first CA-MRSA isolated in Western Australia (WA), WA-5, encoded cadmium, tetracycline and penicillin-resistance genes on plasmid pWBG753 (~30 kb). WA-5 and pWBG753 appeared only briefly in WA, however, fusidic-acid-resistance plasmids related to pWBG753 were also present in the first European CA-MRSA at the time. Here we characterized a 72-kb conjugative plasmid pWBG731 present in multiresistant WA-5-like clones from the same period. pWBG731 was a cointegrant formed from pWBG753 and a…
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based,…
The zebra mussel, Dreissena polymorpha, continues to spread from its native range in Eurasia to Europe and North America, causing billions of dollars in damage and dramatically altering invaded aquatic ecosystems. Despite these impacts, there are few genomic resources for Dreissena or related bivalves, with nearly 450 million years of divergence between zebra mussels and its closest sequenced relative. Although the D. polymorpha genome is highly repetitive, we have used a combination of long-read sequencing and Hi-C-based scaffolding to generate the highest quality molluscan assembly to date. Through comparative analysis and transcriptomics experiments we have gained insights into processes that…
Neisseria gonorrhoeae, the sole causative agent of gonorrhea, constitutively undergoes diversification of the Type IV pilus. Gene conversion occurs between one of the several donor silent copies located in distinct loci and the recipient pilE gene, encoding the major pilin subunit of the pilus. A guanine quadruplex (G4) DNA structure and a cis-acting sRNA (G4-sRNA) are located upstream of the pilE gene and both are required for pilin antigenic variation (Av). We show that the reduced sRNA transcription lowers pilin Av frequencies. Extended transcriptional elongation is not required for Av, since limiting the transcript to 32 nt allows for normal…
Soybean cyst nematode (SCN, Heterodera glycines) is a major pest of soybean that is spreading across major soybean production regions worldwide. Increased SCN virulence has recently been observed in both the United States and China. However, no study has reported a genome assembly for H. glycines at the chromosome scale. Herein, the first chromosome-level reference genome of X12, an unusual SCN race with high infection ability, is presented. Using whole-genome shotgun (WGS) sequencing, PacBio sequencing, Illumina paired-end sequencing, 10X Genomics linked reads and high-throughput chromatin conformation capture (Hi-C) genome scaffolding techniques, a 141.01-Mb assembled genome was obtained with scaffold and…
The NA1 clonal lineage of Phytophthora ramorum is responsible for Sudden Oak Death, an epidemic that has devastated California’s coastal forest ecosystems. An NA1 isolate Pr102 derived from coast live oak in California was previously sequenced and reported with 65 Mb assembly containing 12 Mb gaps in 2576 scaffolds. Here we report an improved 70 Mb genome in 1512 scaffolds with 6752 bp gaps after incorporating PacBio P5-C3 longreads. This assembly contains 19494 gene models (average gene length 2515 bp) compared to 16134 genes (average gene length of 1673 bp) in the previous version. We predicted 29 new RXLRs and…