Menu
July 7, 2019  |  

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual.The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.


July 7, 2019  |  

Combination of short-read, long-read and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications.

Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and telomeric regions it influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly, LR) and single-molecule restriction maps (optical map assembly, OM). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing mis-assemblies. Combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using genome-wide population re-sequencing data, we estimated the population-scaled recombination rate (?) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin, and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three independent technologies, our results highlight the importance of adding a layer of information on genome structure inaccessible to each approach independently. Published by Cold Spring Harbor Laboratory Press.


July 7, 2019  |  

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

The human reference genome assembly plays a central role in nearly all aspects of today’s basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health. © 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019  |  

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy. © 2017 Zimin et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019  |  

The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression.

Spider silks are the toughest known biological materials, yet are lightweight and virtually invisible to the human immune system, and they thus have revolutionary potential for medicine and industry. Spider silks are largely composed of spidroins, a unique family of structural proteins. To investigate spidroin genes systematically, we constructed the first genome of an orb-weaving spider: the golden orb-weaver (Nephila clavipes), which builds large webs using an extensive repertoire of silks with diverse physical properties. We cataloged 28 Nephila spidroins, representing all known orb-weaver spidroin types, and identified 394 repeated coding motif variants and higher-order repetitive cassette structures unique to specific spidroins. Characterization of spidroin expression in distinct silk gland types indicates that glands can express multiple spidroin types. We find evidence of an alternatively spliced spidroin, a spidroin expressed only in venom glands, evolutionary mechanisms for spidroin diversification, and non-spidroin genes with expression patterns that suggest roles in silk production.


July 7, 2019  |  

De novo genome and transcriptome assembly of the Canadian beaver (Castor canadensis).

The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon-gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology. Copyright © 2017 Lok et al.


July 7, 2019  |  

Complete genome sequence of Mycoplasma bovis strain 08M.

Mycoplasma bovis is a major bacterial pathogen that can cause respiratory disease, mastitis, and arthritis in cattle. We report here the complete and annotated genome sequence of M. bovis strain 08M, isolated from a calf lung with pneumonia in China. Copyright © 2017 Chen et al.


July 7, 2019  |  

Whole-genome restriction mapping by “subhaploid”-based RAD sequencing: An efficient and flexible approach for physical mapping and genome scaffolding.

Assembly of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies. Generation of ultradense linkage maps is promising for anchoring such assemblies, but traditional linkage mapping methods are hindered by the infrequency and unevenness of meiotic recombination that limit attainable map resolution. Here we develop a sequencing-based “in vitro” linkage mapping approach (called RadMap), where chromosome breakage and segregation are realized by generating hundreds of “subhaploid” fosmid/bacterial-artificial-chromosome clone pools, and by restriction site-associated DNA sequencing of these clone pools to produce an ultradense whole-genome restriction map to facilitate genome scaffolding. A bootstrap-based minimum spanning tree algorithm is developed for grouping and ordering of genome-wide markers and is implemented in a user-friendly, integrated software package (AMMO). We perform extensive analyses to validate the power and accuracy of our approach in the model plant Arabidopsis thaliana and human. We also demonstrate the utility of RadMap for enhancing the contiguity of a variety of whole-genome shotgun assemblies generated using either short Illumina reads (300 bp) or long PacBio reads (6-14 kb), with up to 15-fold improvement of N50 (~816 kb-3.7 Mb) and high scaffolding accuracy (98.1-98.5%). RadMap outperforms BioNano and Hi-C when input assembly is highly fragmented (contig N50 = 54 kb). RadMap can capture wide-range contiguity information and provide an efficient and flexible tool for high-resolution physical mapping and scaffolding of highly fragmented assemblies. Copyright © 2017 Dou et al.


July 7, 2019  |  

Complete genome sequence of the Bifidobacterium animalis subspecies lactis BL3, preventive probiotics for acute colitis and colon cancer.

We report the genome sequence of Bifidobacterium animalis subspecies lactis BL3, which has preventive properties on acute colitis and colon cancer. The genome of BL3, which was isolated from Korean faeces, consisted of a 1 944 323 bp size single chromosome, and its G+C content was 60.5%. Genome comparison against the closest Bifidobacterium animalis strain revealed that BL3 had particularly different regions of four areas encoding flavin-nucleotide-binding protein, transposase, multidrug ABC transporter and ATP binding protein.


July 7, 2019  |  

Molecular and genomic features of Mycobacterium bovis strain 1595 isolated from Korean cattle.

The aim of this study was to investigate the molecular characteristics and to conduct a comparative genomic analysis of Mycobacterium (M.) bovis strain 1595 isolated from a native Korean cow. Molecular typing showed that M. bovis 1595 has spoligotype SB0140 with mycobacterial interspersed repetitive units-variable number of tandem repeats typing of 4-2-5-3-2-7-5-5-4-3-4-3-4-3, representing the most common type of M. bovis in Korea. The complete genome sequence of strain 1595 was determined by single-molecule real-time technology, which showed a genome of 4351712 bp in size with a 65.64% G + C content and 4358 protein-coding genes. Comparative genomic analysis with the genomes of Mycobacterium tuberculosis complex strains revealed that all genomes are similar in size and G + C content. Phylogenetic analysis revealed all strains were within a 0.1% average nucleotide identity value, and MUMmer analysis illustrated that all genomes showed positive collinearity with strain 1595. A sequence comparison based on BLASTP analysis showed that M. bovis AF2122/97 was the strain with the greatest number of completely matched proteins to M. bovis 1595. This genome sequence analysis will serve as a valuable reference for improving understanding of the virulence and epidemiologic traits among M. bovis isolates in Korea.


July 7, 2019  |  

Rifamorpholines A-E, potential antibiotics from locust-associated actinobacteria Amycolatopsis sp. Hca4.

Cultivation of locust associated rare actinobacteria, Amycolatopsis sp. HCa4, has provided five unusual macrolactams rifamorpholines A-E. Their structures were determined by interpretation of spectroscopic and crystallographic data. Rifamorpholines A-E possess an unprecedented 5/6/6/6 ring chromophore, representing a new subclass of rifamycin antibiotics. The biosynthetic pathway for compounds 1-5 involves a key 1,6-cyclization for the formation of the morpholine ring. Compounds 2 and 4 showed potent activities against methicillin-resistant Staphylococcus aureus (MRSA) with MICs of 4.0 and 8.0 µM, respectively.


July 7, 2019  |  

The blaOXA-23-associated transposons in the genome of Acinetobacter spp. represent an epidemiological situation of the species encountering carbapenems.

High rates of carbapenem resistance in the human pathogen Acinetobacter baumannii threaten public health and need to be scrutinized.A total of 356 A. baumannii and 50 non-baumannii Acinetobacter spp. (NBA) strains collected in 2013 throughout South Korea were studied. The type of blaOXA-23 transposon was determined by PCR mapping and molecular epidemiology was assessed by MLST. Twelve representative strains and two comparative A. baumannii were entirely sequenced by single-molecule real-time sequencing.The carbapenem resistance rate was 88% in A. baumannii, mainly due to blaOXA-23, with five exceptional cases associated with ISAba1-blaOXA-51-like. The blaOXA-23 gene in A. baumannii was carried either by Tn2006 (44%) or Tn2009 (54%), with a few exceptions carried by Tn2008 (1.6%). Of the NBA strains, 14% were resistant to carbapenems, two with blaOXA-58 and five with blaOXA-23 associated with Tn2006. The Tn2006-possessing strains belonged to various STs, whereas Tn2008- and Tn2009-possessing strains were limited to ST208 and ST191, respectively. The three transposons were often multiplied in the chromosome, and the gene copy number and the carbapenem MICs presented linear relationships either very strongly for Tn2008 or moderately for Tn2006 and Tn2009.The dissemination of Tn2006 was facilitated by its capability for intercellular transfer and that of Tn2009 was attributable to successful dissemination of the ST191 bacterial host carrying the transposon. Tn2008 was infrequent because of its insufficient ability to undergo intercellular transfer and the scarce bacterial host A. baumannii ST208. Gene amplification is an adaptive mechanism for bacteria that encounter antimicrobial drugs.© The Author 2017. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please email: journals.permissions@oup.com.


July 7, 2019  |  

Xanthomonas adaptation to common bean is associated with horizontal transfers of genes encoding TAL effectors.

Common bacterial blight is a devastating bacterial disease of common bean (Phaseolus vulgaris) caused by Xanthomonas citri pv. fuscans and Xanthomonas phaseoli pv. phaseoli. These phylogenetically distant strains are able to cause similar symptoms on common bean, suggesting that they have acquired common genetic determinants of adaptation to common bean. Transcription Activator-Like (TAL) effectors are bacterial type III effectors that are able to induce the expression of host genes to promote infection or resistance. Their capacity to bind to a specific host DNA sequence suggests that they are potential candidates for host adaption.To study the diversity of tal genes from Xanthomonas strains responsible for common bacterial blight of bean, whole genome sequences of 17 strains representing the diversity of X. citri pv. fuscans and X. phaseoli pv. phaseoli were obtained by single molecule real time sequencing. Analysis of these genomes revealed the existence of four tal genes named tal23A, tal20F, tal18G and tal18H, respectively. While tal20F and tal18G were chromosomic, tal23A and tal18H were carried on plasmids and shared between phylogenetically distant strains, therefore suggesting recent horizontal transfers of these genes between X. citri pv. fuscans and X. phaseoli pv. phaseoli strains. Strikingly, tal23A was present in all strains studied, suggesting that it played an important role in adaptation to common bean. In silico predictions of TAL effectors targets in the common bean genome suggested that TAL effectors shared by X. citri pv. fuscans and X. phaseoli pv. phaseoli strains target the promoters of genes of similar functions. This could be a trace of convergent evolution among TAL effectors from different phylogenetic groups, and comforts the hypothesis that TAL effectors have been implied in the adaptation to common bean.Altogether, our results favour a model where plasmidic TAL effectors are able to contribute to host adaptation by being horizontally transferred between distant lineages.


July 7, 2019  |  

Identification of novel conjugative plasmids with multiple copies of fosB that confer high-level fosfomycin resistance to vancomycin-resistant Enterococci.

To further characterize the fosB-carrying plasmids of 19 vancomycin-resistant enterococci, the complete sequences of the fosB- and vanA-containing plasmids of Enterococcus faecium (pEMA120) and E. avium (pEA19081) were obtained by single-molecule, real-time sequencing. We found that these two plasmids are essentially identical (99.99% nucleotide sequence identity), which proved the possibility of interspecies transmission. Comparative analysis of the plasmids revealed that the backbone of pEMA120 is 99% similar to a conjugative fosB-negative E. faecium plasmid, pZB18. There is a traE disrupted in the transfer region of pEMA120, in comparison to pZB18 with an intact traE. The difference of their transfer frequencies between pEMA120 and pZB18 suggests this interruption of traE might affect conjugative transfer. Two copies of the fosB gene linked to a tnpA gene, forming an ISL3-like transposon, were found at separate locations within pEMA120, which had not been reported previously. These two fosB-carrying transposons were confirmed to form circular intermediates by inverse PCR. The hybridization of plasmid DNA digested by BsaI, having restriction site within the fosB sequence, demonstrated that the presence of multiple copies of fosB per plasmid is common. The total copy number of the fosB gene as revealed by qRT-PCR did not correlate with fosfomycin MICs or growth rates at sub-MICs of fosfomycin in different transconjugants. From susceptibility tests, the fosB gene, regardless of the copy number, conferred high fosfomycin MICs that ranged from 16384 to 65536 µg/ml. This first complete nucleotide sequence of a plasmid carrying two copies of fosB in VRE suggests that the fosB gene can transfer to multiple loci of plasmids by the ISL3 family transposase TnpA, possibly in the form of circular intermediates, leading to the dissemination of high fosfomycin resistance in VRE.


July 7, 2019  |  

Genome sequencing and comparative genomics reveal the potential pathogenic mechanism of Cercospora sojina Hara on soybean.

Frogeye leaf spot, caused by Cercospora sojina Hara, is a common disease of soybean in most soybean-growing countries of the world. In this study, we report a high-quality genome sequence of C. sojina by Single Molecule Real-Time sequencing method. The 40.8-Mb genome encodes 11,655 predicated genes, and 8,474 genes are revealed by RNA sequencing. Cercospora sojina genome contains large numbers of gene clusters that are involved in synthesis of secondary metabolites, including mycotoxins and pigments. However, much less carbohydrate-binding module protein encoding genes are identified in C. sojina genome, when compared with other phytopathogenic fungi. Bioinformatics analysis reveals that C. sojina harbours about 752 secreted proteins, and 233 of them are effectors. During early infection, the genes for metabolite biosynthesis and effectors are significantly enriched, suggesting that they may play essential roles in pathogenicity. We further identify 13 effectors that can inhibit BAX-induced cell death. Taken together, our results provide insights into the infection mechanisms of C. sojina on soybean.© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.