Menu
September 21, 2019

Recent advances in bioinformatics for fish genomics

In the past few years, we have contributed efforts to ~1/5 of the reported fish genomes. Based on our related experience, here we outline recent advances in bioinformatics for fish genomics, with an emphasis on development of software for genome assembly, genome annotation and evolutionary analysis. This review will be helpful for the new players of genome analysis on both animals and plants. In the past decade, whole genome sequences of approximately 50 fish species have been reported [1]. We have been involved in ~1/5 of these international works from 2014 to 2017, such as mudskippers (2014) [2], Chinese large yellow croaker [3], Chinese barbel fishes [4], Asian arowana [5,6], Channel catfish [7], seahorses [8], Japanese flounder [9], Chinese clearhead icefish [10] and Northern snakehead [11]. We are also in charge of the China Auqatic 10-100-1,000 Genomics Program [12], in which ~100 fish genomes are sequencing targets for the next 3~5 years. Based on our previous experience on fish genomic studies, here we outline recent advances in related bioinformatics for fish genomics to share with public readers. Since the basic informatics includes genome assembly, genome annotation and evolutionary analysis, we discuss them one by one in this order.


September 21, 2019

DNA-guided delivery of single molecules into zero-mode waveguides.

Zero-mode waveguides (ZMWs) are powerful analytical tools corresponding to optical nanostructures fabricated in a thin metallic film capable of confining an excitation volume to the range of attoliters. This small volume of confinement allows single-molecule fluorescence experiments to be performed at physiologically relevant concentrations of fluorescently labeled biomolecules. Exactly one molecule to be studied must be attached at the floor of the ZMW for signal detection and analysis; however, the massive parallelism of these nanoarrays suffers from a Poissonian-limited distribution of these biomolecules. To date, there is no method available that provides full single-molecule occupancy of massively arrayed ZMWs. Here we report the performance of a DNA-guided method that uses steric exclusion properties of large DNA molecules to bias the Poissonian-limited delivery of single molecules. Non-Poissonian statistics were obtained with DNA molecules that contain a free-biotinylated extremity for efficient binding to the floor of the ZMW, which resulted in a decrease of accessibility for a second molecule. Both random-coiled and condensed DNA conformations drove non-Poissonian single-molecule delivery into ZMW arrays. The results suggest that an optimal balance between the rigidity and flexibility of the macromolecule is critical for favorable accessibility and single occupancy. The optimized method provides a means for full exploitation of these massively parallelized analytical tools.


September 21, 2019

Comparative genomics of enterohemorrhagic Escherichia coli O145:H28 demonstrates a common evolutionary lineage with Escherichia coli O157:H7.

Although serotype O157:H7 is the predominant enterohemorrhagic Escherichia coli (EHEC), outbreaks of non-O157 EHEC that cause severe foodborne illness, including hemolytic uremic syndrome have increased worldwide. In fact, non-O157 serotypes are now estimated to cause over half of all the Shiga toxin-producing Escherichia coli (STEC) cases, and outbreaks of non-O157 EHEC infections are frequently associated with serotypes O26, O45, O103, O111, O121, and O145. Currently, there are no complete genomes for O145 in public databases.We determined the complete genome sequences of two O145 strains (EcO145), one linked to a US lettuce-associated outbreak (RM13514) and one to a Belgium ice-cream-associated outbreak (RM13516). Both strains contain one chromosome and two large plasmids, with genome sizes of 5,737,294 bp for RM13514 and 5,559,008 bp for RM13516. Comparative analysis of the two EcO145 genomes revealed a large core (5,173 genes) and a considerable amount of strain-specific genes. Additionally, the two EcO145 genomes display distinct chromosomal architecture, virulence gene profile, phylogenetic origin of Stx2a prophage, and methylation profile (methylome). Comparative analysis of EcO145 genomes to other completely sequenced STEC and other E. coli and Shigella genomes revealed that, unlike any other known non-O157 EHEC strain, EcO145 ascended from a common lineage with EcO157/EcO55. This evolutionary relationship was further supported by the pangenome analysis of the 10 EHEC str ains. Of the 4,192 EHEC core genes, EcO145 shares more genes with EcO157 than with the any other non-O157 EHEC strains.Our data provide evidence that EcO145 and EcO157 evolved from a common lineage, but ultimately each serotype evolves via a lineage-independent nature to EHEC by acquisition of the core set of EHEC virulence factors, including the genes encoding Shiga toxin and the large virulence plasmid. The large variation between the two EcO145 genomes suggests a distinctive evolutionary path between the two outbreak strains. The distinct methylome between the two EcO145 strains is likely due to the presence of a BsuBI/PstI methyltransferase gene cassette in the Stx2a prophage of the strain RM13514, suggesting a role of horizontal gene transfer-mediated epigenetic alteration in the evolution of individual EHEC strains.


September 21, 2019

Characterization of multi-drug resistant Enterococcus faecalis isolated from cephalic recording chambers in research macaques (Macaca spp.).

Nonhuman primates are commonly used for cognitive neuroscience research and often surgically implanted with cephalic recording chambers for electrophysiological recording. Aerobic bacterial cultures from 25 macaques identified 72 bacterial isolates, including 15 Enterococcus faecalis isolates. The E. faecalis isolates displayed multi-drug resistant phenotypes, with resistance to ciprofloxacin, enrofloxacin, trimethoprim-sulfamethoxazole, tetracycline, chloramphenicol, bacitracin, and erythromycin, as well as high-level aminoglycoside resistance. Multi-locus sequence typing showed that most belonged to two E. faecalis sequence types (ST): ST 4 and ST 55. The genomes of three representative isolates were sequenced to identify genes encoding antimicrobial resistances and other traits. Antimicrobial resistance genes identified included aac(6′)-aph(2″), aph(3′)-III, str, ant(6)-Ia, tetM, tetS, tetL, ermB, bcrABR, cat, and dfrG, and polymorphisms in parC (S80I) and gyrA (S83I) were observed. These isolates also harbored virulence factors including the cytolysin toxin genes in ST 4 isolates, as well as multiple biofilm-associated genes (esp, agg, ace, SrtA, gelE, ebpABC), hyaluronidases (hylA, hylB), and other survival genes (ElrA, tpx). Crystal violet biofilm assays confirmed that ST 4 isolates produced more biofilm than ST 55 isolates. The abundance of antimicrobial resistance and virulence factor genes in the ST 4 isolates likely relates to the loss of CRISPR-cas. This macaque colony represents a unique model for studying E. faecalis infection associated with indwelling devices, and provides an opportunity to understand the basis of persistence of this pathogen in a healthcare setting.


September 21, 2019

Whole genome sequence of the soybean aphid, Aphis glycines.

Aphids are emerging as model organisms for both basic and applied research. Of the 5,000 estimated species, only three aphids have published whole genome sequences: the pea aphid Acyrthosiphon pisum, the Russian wheat aphid, Diuraphis noxia, and the green peach aphid, Myzus persicae. We present the whole genome sequence of a fourth aphid, the soybean aphid (Aphis glycines), which is an extreme specialist and an important invasive pest of soybean (Glycine max). The availability of genomic resources is important to establish effective and sustainable pest control, as well as to expand our understanding of aphid evolution. We generated a 302.9 Mbp draft genome assembly for Ap. glycines using a hybrid sequencing approach. This assembly shows high completeness with 19,182 predicted genes, 92% of known Ap. glycines transcripts mapping to contigs, and substantial continuity with a scaffold N50 of 174,505 bp. The assembly represents 95.5% of the predicted genome size of 317.1 Mbp based on flow cytometry. Ap. glycines contains the smallest known aphid genome to date, based on updated genome sizes for 19 aphid species. The repetitive DNA content of the Ap. glycines genome assembly (81.6 Mbp or 26.94% of the 302.9 Mbp assembly) shows a reduction in the number of classified transposable elements compared to Ac. pisum, and likely contributes to the small estimated genome size. We include comparative analyses of gene families related to host-specificity (cytochrome P450’s and effectors), which may be important in Ap. glycines evolution. This Ap. glycines draft genome sequence will provide a resource for the study of aphid genome evolution, their interaction with host plants, and candidate genes for novel insect control methods. Copyright © 2017 Elsevier Ltd. All rights reserved.


September 21, 2019

Decreased fitness and virulence in ST10 Escherichia coli harboring blaNDM-5 and mcr-1 against a ST4981 strain with blaNDM-5.

Although coexistence of blaNDM-5 and mcr-1 in Escherichia coli has been reported, little is known about the fitness and virulence of such strains. Three carbapenem-resistant Escherichia coli (GZ1, GZ2, and GZ3) successively isolated from one patient in 2015 were investigated for microbiological fitness and virulence. GZ1 and GZ2 were also resistant to colistin. To verify the association between plasmids and fitness, growth kinetics of the transconjugants were performed. We also analyzed genomic sequences of GZ2 and GZ3 using PacBio sequencing. GZ1 and GZ2 (ST10) co-harbored blaNDM-5 and mcr-1, while GZ3 (ST4981) carried only blaNDM-5. GZ3 demonstrated significantly more rapid growth (P < 0.001) and overgrew GZ2 with a competitive index of 1.0157 (4 h) and 2.5207 (24 h). Increased resistance to serum killing and mice mortality was also identified in GZ3. While GZ2 had four plasmids (IncI2, IncX3, IncHI2, IncFII), GZ3 possessed one plasmid (IncFII). The genetic contexts of blaNDM-5 in GZ2 and GZ3 were identical but inserted into different backbones, IncX3 (102,512 bp) and IncFII (91,451 bp), respectively. The growth was not statistically different between the transconjugants with mcr-1 or blaNDM-5 plasmid and recipient (P = 0.6238). Whole genome sequence analysis revealed that 28 virulence genes were specific to GZ3, potentially contributing to increased virulence of GZ3. Decreased fitness and virulence in a mcr-1 and blaNDM-5 co-harboring ST10 E. coli was found alongside a ST4981 strain with only blaNDM-5. Acquisition of mcr-1 or blaNDM-5 plasmid did not lead to considerable fitness costs, indicating the potential for dissemination of mcr-1 and blaNDM-5 in Enterobacteriaceae.


September 21, 2019

Retrotransposons are the major contributors to the expansion of the Drosophila ananassae Muller F element.

The discordance between genome size and the complexity of eukaryotes can partly be attributed to differences in repeat density. The Muller F element (~5.2 Mb) is the smallest chromosome in Drosophila melanogaster, but it is substantially larger (>18.7 Mb) in D. ananassae To identify the major contributors to the expansion of the F element and to assess their impact, we improved the genome sequence and annotated the genes in a 1.4-Mb region of the D. ananassae F element, and a 1.7-Mb region from the D element for comparison. We find that transposons (particularly LTR and LINE retrotransposons) are major contributors to this expansion (78.6%), while Wolbachia sequences integrated into the D. ananassae genome are minor contributors (0.02%). Both D. melanogaster and D. ananassae F-element genes exhibit distinct characteristics compared to D-element genes (e.g., larger coding spans, larger introns, more coding exons, and lower codon bias), but these differences are exaggerated in D. ananassae Compared to D. melanogaster, the codon bias observed in D. ananassae F-element genes can primarily be attributed to mutational biases instead of selection. The 5′ ends of F-element genes in both species are enriched in dimethylation of lysine 4 on histone 3 (H3K4me2), while the coding spans are enriched in H3K9me2. Despite differences in repeat density and gene characteristics, D. ananassae F-element genes show a similar range of expression levels compared to genes in euchromatic domains. This study improves our understanding of how transposons can affect genome size and how genes can function within highly repetitive domains. Copyright © 2017 Leung et al.


September 21, 2019

A distinct and genetically diverse lineage of the hybrid fungal pathogen Verticillium longisporum population causes stem striping in British oilseed rape.

Population genetic structures illustrate evolutionary trajectories of organisms adapting to differential environmental conditions. Verticillium stem striping disease on oilseed rape was mainly observed in continental Europe, but has recently emerged in the United Kingdom. The disease is caused by the hybrid fungal species Verticillium longisporum that originates from at least three separate hybridization events, yet hybrids between Verticillium progenitor species A1 and D1 are mainly responsible for Verticillium stem striping. We reveal a hitherto un-described dichotomy within V. longisporum lineage A1/D1 that correlates with the geographic distribution of the isolates with an ‘A1/D1 West’ and an ‘A1/D1 East’ cluster. Genome comparison between representatives of the A1/D1 West and East clusters excluded population distinctiveness through separate hybridization events. Remarkably, the A1/D1 West population that is genetically more diverse than the entire A1/D1 East cluster caused the sudden emergence of Verticillium stem striping in the UK, whereas in continental Europe Verticillium stem striping is predominantly caused by the more genetically uniform A1/D1 East population. The observed genetic diversity of the A1/D1 West population argues against a recent introduction of the pathogen into the UK, but rather suggests that the pathogen previously established in the UK and remained latent or unnoticed as oilseed rape pathogen until recently.© 2017 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.


September 21, 2019

PacBio assembly of a Plasmodium knowlesi genome sequence with Hi-C correction and manual annotation of the SICAvar gene family.

Plasmodium knowlesi has risen in importance as a zoonotic parasite that has been causing regular episodes of malaria throughout South East Asia. The P. knowlesi genome sequence generated in 2008 highlighted and confirmed many similarities and differences in Plasmodium species, including a global view of several multigene families, such as the large SICAvar multigene family encoding the variant antigens known as the schizont-infected cell agglutination proteins. However, repetitive DNA sequences are the bane of any genome project, and this and other Plasmodium genome projects have not been immune to the gaps, rearrangements and other pitfalls created by these genomic features. Today, long-read PacBio and chromatin conformation technologies are overcoming such obstacles. Here, based on the use of these technologies, we present a highly refined de novo P. knowlesi genome sequence of the Pk1(A+) clone. This sequence and annotation, referred to as the ‘MaHPIC Pk genome sequence’, includes manual annotation of the SICAvar gene family with 136 full-length members categorized as type I or II. This sequence provides a framework that will permit a better understanding of the SICAvar repertoire, selective pressures acting on this gene family and mechanisms of antigenic variation in this species and other pathogens.


September 21, 2019

The kinetoplastid-infecting Bodo saltans virus (BsV), a window into the most abundant giant viruses in the sea.

Giant viruses are ecologically important players in aquatic ecosystems that have challenged concepts of what constitutes a virus. Herein, we present the giant Bodo saltans virus (BsV), the first characterized representative of the most abundant group of giant viruses in ocean metagenomes, and the first isolate of a klosneuvirus, a subgroup of the Mimiviridae proposed from metagenomic data. BsV infects an ecologically important microzooplankton, the kinetoplastid Bodo saltans. Its 1.39 Mb genome encodes 1227 predicted ORFs, including a complex replication machinery. Yet, much of its translational apparatus has been lost, including all tRNAs. Essential genes are invaded by homing endonuclease-encoding self-splicing introns that may defend against competing viruses. Putative anti-host factors show extensive gene duplication via a genomic accordion indicating an ongoing evolutionary arms race and highlighting the rapid evolution and genomic plasticity that has led to genome gigantism and the enigma that is giant viruses.© 2018, Deeg et al.


September 21, 2019

Potato late blight field resistance from QTL dPI09c is conferred by the NB-LRR gene R8.

Following the often short-lived protection that major nucleotide binding, leucine-rich-repeat (NB-LRR) resistance genes offer against the potato pathogen Phytophthora infestans, field resistance was thought to provide a more durable alternative to prevent late blight disease. We previously identified the QTL dPI09c on potato chromosome 9 as a more durable field resistance source against late blight. Here, the resistance QTL was fine-mapped to a 186 kb region. The interval corresponds to a larger, 389 kb, genomic region in the potato reference genome of Solanum tuberosum Group Phureja doubled monoploid clone DM1-3 (DM) and from which functional NB-LRRs R8, R9a, Rpi-moc1, and Rpi_vnt1 have arisen independently in wild species. dRenSeq analysis of parental clones alongside resistant and susceptible bulks of the segregating population B3C1HP showed full sequence representation of R8. This was independently validated using long-range PCR and screening of a bespoke bacterial artificial chromosome library. The latter enabled a comparative analysis of the sequence variation in this locus in diverse Solanaceae. We reveal for the first time that broad spectrum and durable field resistance against P. infestans is conferred by the NB-LRR gene R8, which is thought to provide narrow spectrum race-specific resistance.


September 21, 2019

Chromulinavorax destructans, a pathogenic TM6 bacterium with an unusual replication strategy targeting protist mitochondrion

Most of the diversity of microbial life is not available in culture, and as such we lack even a fundamental understanding of the biological diversity of several branches on the tree of life. One branch that is highly underrepresented is the candidate phylum TM6, also known as the Dependentiae. Their biology is known only from reduced genomes recovered from metagenomes around the world and two isolates infecting amoebae, all suggest that they live highly host-associated lifestyles as parasites or symbionts. Chromulinavorax destructans is an isolate from the TM6/Dependentiae that infects and lyses the abundant heterotrophic flagellate, Spumella elongata. Chromulinavorax destructans is characterized by a high degree of reduction and specialization for infection, so much so it was discovered in a screen for giant viruses. Its 1.2 Mb genome shows no metabolic potential and C. destructans instead relies on extensive transporter system to import nutrients, and even energy in the form of ATP from the host. Accordingly, it replicates in a viral-like fashion, while extensively reorganizing and expanding the host mitochondrion. 44% of proteins contain signal sequences for secretion, which includes many proteins of unknown function as well as 98 copies of ankyrin-repeat domain proteins, known effectors of host modulation, suggesting the presence of an extensive host-manipulation apparatus.


September 21, 2019

Assessing genome assembly quality using the LTR Assembly Index (LAI).

Assembling a plant genome is challenging due to the abundance of repetitive sequences, yet no standard is available to evaluate the assembly of repeat space. LTR retrotransposons (LTR-RTs) are the predominant interspersed repeat that is poorly assembled in draft genomes. Here, we propose a reference-free genome metric called LTR Assembly Index (LAI) that evaluates assembly continuity using LTR-RTs. After correcting for LTR-RT amplification dynamics, we show that LAI is independent of genome size, genomic LTR-RT content, and gene space evaluation metrics (i.e., BUSCO and CEGMA). By comparing genomic sequences produced by various sequencing techniques, we reveal the significant gain of assembly continuity by using long-read-based techniques over short-read-based methods. Moreover, LAI can facilitate iterative assembly improvement with assembler selection and identify low-quality genomic regions. To apply LAI, intact LTR-RTs and total LTR-RTs should contribute at least 0.1% and 5% to the genome size, respectively. The LAI program is freely available on GitHub: https://github.com/oushujun/LTR_retriever.


September 21, 2019

Divergent selection causes whole genome differentiation without physical linkage among the targets in Spodoptera frugiperda (Noctuidae)

The process of speciation involves whole genome differentiation by overcoming gene flow between diverging populations. We have ample knowledge which evolutionary forces may cause genomic differentiation, and several speciation models have been proposed to explain the transition from genetic to genomic differentiation. However, it is still unclear what are critical conditions enabling genomic differentiation in nature. The Fall armyworm, Spodoptera frugiperda, is observed as two sympatric strains that have different host-plant ranges, suggesting the possibility of ecological divergent selection. In our previous study, we observed that these two strains show genetic differentiation across the whole genome with an unprecedentedly low extent, suggesting the possibility that whole genome sequences started to be differentiated between the strains. In this study, we analyzed whole genome sequences from these two strains from Mississippi to identify critical evolutionary factors for genomic differentiation. The genomic Fst is low (0.017) while 91.3% of 10kb windows have Fst greater than 0, suggesting genome-wide differentiation with a low extent. We identified nearly 400 outliers of genetic differentiation between strains, and found that physical linkage among these outliers is not a primary cause of genomic differentiation. Fst is not significantly correlated with gene density, a proxy for the strength of selection, suggesting that a genomic reduction in migration rate dominates the extent of local genetic differentiation. Our analyses reveal that divergent selection alone is sufficient to generate genomic differentiation, and any following diversifying factors may increase the level of genetic differentiation between diverging strains in the process of speciation.


September 21, 2019

Real-time DNA sequencing from single polymerase molecules.

We present single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. The data report directly on polymerase dynamics, revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data were aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.