Menu
July 7, 2019

A high throughput screen for active human transposable elements.

Transposable elements (TEs) are mobile genetic sequences that randomly propagate within their host’s genome. This mobility has the potential to affect gene transcription and cause disease. However, TEs are technically challenging to identify, which complicates efforts to assess the impact of TE insertions on disease. Here we present a targeted sequencing protocol and computational pipeline to identify polymorphic and novel TE insertions using next-generation sequencing: TE-NGS. The method simultaneously targets the three subfamilies that are responsible for the majority of recent TE activity (L1HS, AluYa5/8, and AluYb8/9) thereby obviating the need for multiple experiments and reducing the amount of input material required.Here we describe the laboratory protocol and detection algorithm, and a benchmark experiment for the reference genome NA12878. We demonstrate a substantial enrichment for on-target fragments, and high sensitivity and precision to both reference and NA12878-specific insertions. We report 17 previously unreported loci for this individual which are supported by orthogonal long-read evidence, and we identify 1470 polymorphic and novel TEs in 12 additional samples that were previously undocumented in databases of insertion polymorphisms.We anticipate that future applications of TE-NGS alongside exome sequencing of patients with sporadic disease will reduce the number of unresolved cases, and improve estimates of the contribution of TEs to human genetic disease.


July 7, 2019

New high copy tandem repeat in the content of the chicken W chromosome.

The content of repetitive DNA in avian genomes is considerably less than in other investigated vertebrates. The first descriptions of tandem repeats were based on the results of routine biochemical and molecular biological experiments. Both satellite DNA and interspersed repetitive elements were annotated using library-based approach and de novo repeat identification in assembled genome. The development of deep-sequencing methods provides datasets of high quality without preassembly allowing one to annotate repetitive elements from unassembled part of genomes. In this work, we search the chicken assembly and annotate high copy number tandem repeats from unassembled short raw reads. Tandem repeat (GGAAA)n has been identified and found to be the second after telomeric repeat (TTAGGG)n most abundant in the chicken genome. Furthermore, (GGAAA)n repeat forms expanded arrays on the both arms of the chicken W chromosome. Our results highlight the complexity of repetitive sequences and update data about organization of sex W chromosome in chicken.


July 7, 2019

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D.

Completion of eukaryal genomes can be difficult task with the highly repetitive sequences along the chromosomes and short read lengths of second-generation sequencing. Saccharomyces cerevisiae strain CEN.PK113-7D, widely used as a model organism and a cell factory, was selected for this study to demonstrate the superior capability of very long sequence reads for de novo genome assembly. We generated long reads using two common third-generation sequencing technologies (Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio)) and used short reads obtained using Illumina sequencing for error correction. Assembly of the reads derived from all three technologies resulted in complete sequences for all 16 yeast chromosomes, as well as the mitochondrial chromosome, in one step. Further, we identified three types of DNA methylation (5mC, 4mC and 6mA). Comparison between the reference strain S288C and strain CEN.PK113-7D identified chromosomal rearrangements against a background of similar gene content between the two strains. We identified full-length transcripts through ONT direct RNA sequencing technology. This allows for the identification of transcriptional landscapes, including untranslated regions (UTRs) (5′ UTR and 3′ UTR) as well as differential gene expression quantification. About 91% of the predicted transcripts could be consistently detected across biological replicates grown either on glucose or ethanol. Direct RNA sequencing identified many polyadenylated non-coding RNAs, rRNAs, telomere-RNA, long non-coding RNA and antisense RNA. This work demonstrates a strategy to obtain complete genome sequences and transcriptional landscapes that can be applied to other eukaryal organisms.


July 7, 2019

Complete genomic analysis of multidrug-resistance Pseudomonas aeruginosa Guangzhou-Pae617, the host of megaplasmid pBM413.

We previously described the novel qnrVC6 and blaIMP-45carrying megaplasmid pBM413. This study aimed to investigate the complete genome of multidrug-resistance P. aeruginosa Guangzhou-Pae617, a clinical isolate from the sputum of a patient who was suffering from respiratory disease in Guangzhou, China.The genome was sequenced using Illumina Hiseq 2500 and PacBio RS II sequencers and assembled de novo using HGAP. The genome was automatically and manually annotated.The genome of P. aeruginosa Guangzhou-Pae617 is 6,430,493 bp containing 5881 predicted genes with an average G + C content of 66.43%. The genome showed high similarity to two new sequenced P. aeruginosa strains isolated from New York, USA. From the whole genome sequence, we identified a type IV pilin, two large prophages, 15 antibiotic resistant genes, 5 genes involved in the “Infectious diseases” pathways, and 335 virulence factors.The antibiotic resistance and virulence factors in the genome of P. aeruginosa strain Guangzhou-Pae617 were identified by complete genomic analysis. It contributes to further study on antibiotic resistance mechanism and clinical control of P. aeruginosa. Copyright © 2018 Elsevier Ltd. All rights reserved.


July 7, 2019

RepLong: de novo repeat identification using long read sequencing data.

The identification of repetitive elements is important in genome assembly and phylogenetic analyses. The existing de novo repeat identification methods exploiting the use of short reads are impotent in identifying long repeats. Since long reads are more likely to cover repeat regions completely, using long reads is more favorable for recognizing long repeats.In this study, we propose a novel de novo repeat elements identification method namely RepLong based on PacBio long reads. Given that the reads mapped to the repeat regions are highly overlapped with each other, the identification of repeat elements is equivalent to the discovery of consensus overlaps between reads, which can be further cast into a community detection problem in the network of read overlaps. In RepLong, we first construct a network of read overlaps based on pair-wise alignment of the reads, where each vertex indicates a read and an edge indicates a substantial overlap between the corresponding two reads. Secondly, the communities whose intra connectivity is greater than the inter connectivity are extracted based on network modularity optimization. Finally, representative reads in each community are extracted to form the repeat library. Comparison studies on Drosophila melanogaster and human long read sequencing data with genome-based and short-read-based methods demonstrate the efficiency of RepLong in identifying long repeats. RepLong can handle lower coverage data and serve as a complementary solution to the existing methods to promote the repeat identification performance on long-read sequencing data.The software of RepLong is freely available at https://github.com/ruiguo-bio/replong.ywsun@szu.edu.cn or zhuzx@szu.edu.cn.Supplementary data are available at Bioinformatics online.


July 7, 2019

Recent progress and prospects for advancing arachnid genomics

Arachnids exhibit tremendous species richness and adaptations of biomedical, industrial, and agricultural importance. Yet genomic resources for arachnids are limited, with the first few spider and scorpion genomes becoming accessible in the last four years. We review key insights from these genome projects, and recommend additional genomes for sequencing, emphasizing taxa of greatest value to the scientific community. We suggest greater sampling of spiders whose genomes are understudied but hold important protein recipes for silk and venom production. We further recommend arachnid genomes to address significant evolutionary topics, including the phenotypic impact of genome duplications. A barrier to high-quality arachnid genomes are assemblies based solely on short-read data, which may be overcome by long-range sequencing and other emerging methods.


July 7, 2019

Construction of two whole genome radiation hybrid panels for dromedary (Camelus dromedarius): 5000RAD and 15000RAD.

The availability of genomic resources including linkage information for camelids has been very limited. Here, we describe the construction of a set of two radiation hybrid (RH) panels (5000RADand 15000RAD) for the dromedary (Camelus dromedarius) as a permanent genetic resource for camel genome researchers worldwide. For the 5000RADpanel, a total of 245 female camel-hamster radiation hybrid clones were collected, of which 186 were screened with 44 custom designed marker loci distributed throughout camel genome. The overall mean retention frequency (RF) of the final set of 93 hybrids was 47.7%. For the 15000RADpanel, 238 male dromedary-hamster radiation hybrid clones were collected, of which 93 were tested using 44 PCR markers. The final set of 90 clones had a mean RF of 39.9%. This 15000RADpanel is an important high-resolution complement to the main 5000RADpanel and an indispensable tool for resolving complex genomic regions. This valuable genetic resource of dromedary RH panels is expected to be instrumental for constructing a high resolution camel genome map. Construction of the set of RH panels is essential step toward chromosome level reference quality genome assembly that is critical for advancing camelid genomics and the development of custom genomic tools.


July 7, 2019

Genome sequence of Galleria mellonella(greater wax moth).

The larvae of the greater wax moth,Galleria mellonella, are pests of active beehives. In infection biology, these larvae are playing a more and more attractive role as an invertebrate host model. Here, we report on the first genome sequence ofGalleria mellonella. Copyright © 2018 Lange et al.


July 7, 2019

Complete genome sequence of a new halophilic archaeon, Haloarcula taiwanensis, isolated from a solar saltern in southern Taiwan.

We report here the completion of the genome sequence of a new species of haloarchaea, Haloarcula taiwanensis, isolated in southern Taiwan. The 3,721,706-bp genome consisted of chromosome I (2,966,258 bp, 63.6% GC content), chromosome II (525,233 bp, 59.6% GC content), plasmid pNYT1 (129,893 bp, 55.3% GC content), and plasmid pNYT2 (100,322 bp, 55.7% GC content).


July 7, 2019

Ten steps to get started in Genome Assembly and Annotation.

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).


July 7, 2019

scanPAV: a pipeline for extracting presence-absence variations in genome pairs.

The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new software to evaluate assembly quality and completeness. One way to determine the completeness of an assembly is by detecting its Presence-Absence variations (PAV) with respect to a reference, where PAVs between two assemblies are defined as the sequences present in one assembly but entirely missing in the other one. Beyond assembly error or technology bias, PAVs can also reveal real genome polymorphism, consequence of species or individual evolution, or horizontal transfer from viruses and bacteria.We present scanPAV, a pipeline for pairwise assembly comparison to identify and extract sequences present in one assembly but not the other. In this note, we use the GRCh38 reference assembly to assess the completeness of six human genome assemblies from various assembly strategies and sequencing technologies including Illumina short reads, 10× genomics linked-reads, PacBio and Oxford Nanopore long reads, and Bionano optical maps. We also discuss the PAV polymorphism of seven Tasmanian devil whole genome assemblies of normal animal tissues and devil facial tumour 1 (DFT1) and 2 (DFT2) samples, and the identification of bacterial sequences as contamination in some of the tumorous assemblies.The pipeline is available under the MIT License at https://github.com/wtsi-hpag/scanPAV.Supplementary data are available at Bioinformatics online.


July 7, 2019

IWTomics: testing high-resolution sequence-based ‘Omics’ data at multiple locations and scales.

With increased generation of high-resolution sequence-based ‘Omics’ data, detecting statistically significant effects at different genomic locations and scales has become key to addressing several scientific questions. IWTomics is an R/Bioconductor package (integrated in Galaxy) that, exploiting sophisticated Functional Data Analysis techniques (i.e. statistical techniques that deal with the analysis of curves), allows users to pre-process, visualize and test these data at multiple locations and scales. The package provides a friendly, flexible and complete workflow that can be employed in many genomic and epigenomic applications.IWTomics is freely available at the Bioconductor website (http://bioconductor.org/packages/IWTomics) and on the main Galaxy instance (https://usegalaxy.org/).Supplementary data are available at Bioinformatics online.


July 7, 2019

The sequenced angiosperm genomes and genome databases.

Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.


July 7, 2019

Molecular preadaptation to antimony resistance in Leishmania donovani on the Indian subcontinent.

Antimonials (Sb) were used for decades for chemotherapy of visceral leishmaniasis (VL). Now abandoned in the Indian subcontinent (ISC) because of Leishmania donovani resistance, this drug offers a unique model for understanding drug resistance dynamics. In a previous phylogenomic study, we found two distinct populations of L. donovani: the core group (CG) in the Gangetic plains and ISC1 in the Nepalese highlands. Sb resistance was only encountered within the CG, and a series of potential markers were identified. Here, we analyzed the development of resistance to trivalent antimonials (SbIII) upon experimental selection in ISC1 and CG strains. We observed that (i) baseline SbIII susceptibility of parasites was higher in ISC1 than in the CG, (ii) time to SbIII resistance was higher for ISC1 parasites than for CG strains, and (iii) untargeted genomic and metabolomic analyses revealed molecular changes along the selection process: these were more numerous in ISC1 than in the CG. Altogether these observations led to the hypothesis that CG parasites are preadapted to SbIII resistance. This hypothesis was experimentally confirmed by showing that only wild-type CG strains could survive a direct exposure to the maximal concentration of SbIII The main driver of this preadaptation was shown to be MRPA, a gene involved in SbIII sequestration and amplified in an intrachromosomal amplicon in all CG strains characterized so far. This amplicon emerged around 1850 in the CG, well before the implementation of antimonials for VL chemotherapy, and we discuss here several hypotheses of selective pressure that could have accompanied its emergence.IMPORTANCE The “antibiotic resistance crisis” is a major challenge for scientists and medical professionals. This steady rise in drug-resistant pathogens also extends to parasitic diseases, with antimony being the first anti-Leishmania drug that fell in the Indian subcontinent (ISC). Leishmaniasis is a major but neglected infectious disease with limited therapeutic options. Therefore, understanding how parasites became resistant to antimonials is of commanding importance. In this study, we experimentally characterized the dynamics of this resistance acquisition and show for the first time that some Leishmania populations of the ISC were preadapted to antimony resistance, likely driven by environmental factors or by drugs used in the 19th century. Copyright © 2018 Dumetz et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.