June 1, 2021  |  

Evaluating the potential of new sequencing technologies for genotyping and variation discovery in human data.

A first look at Pacific Biosciences RS data Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome these limitations by providing significantly longer reads (now averaging >1kb), enabling more unique seeds for reference alignment. In addition, the lack of amplification in the library construction step avoids a common source of base composition bias. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical resequencing projects by assessing the quality of the raw sequencing data, as well as its use for SNP discovery and genotyping using the Genome Analysis Toolkit (GATK).


April 21, 2020  |  

De novo assembly and annotation of the Ganoderma australe genome.

The Ganoderma genus represents clear biotechnological potential, due to the large quantity of molecules with biological activity that could be explored. However, available information regarding the biotechnological importance of species within Ganoderma, other than G. lucidum, is quite limited. Genomic studies of little-known species can contribute to the knowledge thereof, as well as the search for metabolic pathways and the identification of genes which code for proteins that may be of biotechnological relevance. Therefore, the objective of the present study was to obtain the G. australe genome, through the use of new sequencing technologies. Genomic DNA from G. australe was sequenced with the PacBio Sequel system, to a depth of 100×. The genome was assembled de novo with the Canu assembly tool, and gene prediction and annotation were performed with a funannotate pipeline. An assembled 84?Mb genome was obtained, and 22,756 putative protein-coding sequences were predicted in the G. australe genome. Ganoderic acid pathways were annotated and listed in the funannotate pipeline, and were recognized using Pfam and Antismash signals. Thus, the G. australe genome shows great potential, mainly, due to the annotation of putative sequences that could be employed in biotechnological approaches. Copyright © 2019 Elsevier Inc. All rights reserved.


April 21, 2020  |  

How Genomics Is Changing What We Know About the Evolution and Genome of Bordetella pertussis.

The evolution of Bordetella pertussis from a common ancestor similar to Bordetella bronchiseptica has occurred through large-scale gene loss, inactivation and rearrangements, largely driven by the spread of insertion sequence element repeats throughout the genome. B. pertussis is widely considered to be monomorphic, and recent evolution of the B. pertussis genome appears to, at least in part, be driven by vaccine-based selection. Given the recent global resurgence of whooping cough despite the wide-spread use of vaccination, a more thorough understanding of B. pertussis genomics could be highly informative. In this chapter we discuss the evolution of B. pertussis, including how vaccination is changing the circulating B. pertussis population at the gene-level, and how new sequencing technologies are revealing previously unknown levels of inter- and intra-strain variation at the genome-level.


April 21, 2020  |  

Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies

Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from textquoteleftfinishedtextquoteright. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies.Results We employed three gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: six with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and three with new assemblies based on re-scaffolding or Pacific Biosciences long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: seven for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further seven with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi.Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our comparisons show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.ADADSEQAGOAGOUTI-basedAGOUTIannotated genome optimization using transcriptome information toolALNalignment-basedCAMSAcomparative analysis and merging of scaffold assemblies toolDPdynamic programmingFISHfluorescence in situ hybridizationGAGOS-ASMGOS-ASMGene order scaffold assemblerKbpkilobasepairsMbpmegabasepairsOSORTHOSTITCHPacBioPacific BiosciencesPBPacBio-basedPHYphysical-mapping-basedRNAseqRNA sequencingQTLquantitative trait lociSYNsynteny-based.


April 21, 2020  |  

A critical comparison of technologies for a plant genome sequencing project.

A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates.Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs.The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution.

Targeted PCR amplification and high-throughput sequencing (amplicon sequencing) of 16S rRNA gene fragments is widely used to profile microbial communities. New long-read sequencing technologies can sequence the entire 16S rRNA gene, but higher error rates have limited their attractiveness when accuracy is important. Here we present a high-throughput amplicon sequencing methodology based on PacBio circular consensus sequencing and the DADA2 sample inference method that measures the full-length 16S rRNA gene with single-nucleotide resolution and a near-zero error rate. In two artificial communities of known composition, our method recovered the full complement of full-length 16S sequence variants from expected community members without residual errors. The measured abundances of intra-genomic sequence variants were in the integral ratios expected from the genuine allelic variants within a genome. The full-length 16S gene sequences recovered by our approach allowed Escherichia coli strains to be correctly classified to the O157:H7 and K12 sub-species clades. In human fecal samples, our method showed strong technical replication and was able to recover the full complement of 16S rRNA alleles in several E. coli strains. There are likely many applications beyond microbial profiling for which high-throughput amplicon sequencing of complete genes with single-nucleotide resolution will be of use. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes.

The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprecedented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spectrum of available options. Our decision tree recommends workflows for the generation of a high-quality genome assembly when used in combination with the specific needs and resources of a project.Copyright © 2019 Elsevier Ltd. All rights reserved.


April 21, 2020  |  

The role of genomic structural variation in the genetic improvement of polyploid crops

Many of our major crop species are polyploids, containing more than one genome or set of chromosomes. Polyploid crops present unique challenges, including difficulties in genome assembly, in discriminating between multiple gene and sequence copies, and in genetic mapping, hindering use of genomic data for genetics and breeding. Polyploid genomes may also be more prone to containing structural variation, such as loss of gene copies or sequences (presence–absence variation) and the presence of genes or sequences in multiple copies (copy-number variation). Although the two main types of genomic structural variation commonly identified are presence–absence variation and copy-number variation, we propose that homeologous exchanges constitute a third major form of genomic structural variation in polyploids. Homeologous exchanges involve the replacement of one genomic segment by a similar copy from another genome or ancestrally duplicated region, and are known to be extremely common in polyploids. Detecting all kinds of genomic structural variation is challenging, but recent advances such as optical mapping and long-read sequencing offer potential strategies to help identify structural variants even in complex polyploid genomes. All three major types of genomic structural variation (presence–absence, copy-number, and homeologous exchange) are now known to influence phenotypes in crop plants, with examples of flowering time, frost tolerance, and adaptive and agronomic traits. In this review, we summarize the challenges of genome analysis in polyploid crops, describe the various types of genomic structural variation and the genomics technologies and data that can be used to detect them, and collate information produced to date related to the impact of genomic structural variation on crop phenotypes. We highlight the importance of genomic structural variation for the future genetic improvement of polyploid crops.


April 21, 2020  |  

A Pathovar of Xanthomonas oryzae Infecting Wild Grasses Provides Insight Into the Evolution of Pathogenicity in Rice Agroecosystems

Xanthomonas oryzae (Xo) are critical rice pathogens. Virulent lineages from Africa and Asia and less virulent strains from the US have been well characterized. X. campestris pv. leersiae (Xcl), first described in 1957, causes bacterial streak on the perennial grass, Leersia hexandra, and is a close relative of Xo. L. hexandra, a member of the Poaceae, is highly similar to rice phylogenetically, is globally ubiquitous around rice paddies, and is a reservoir of pathogenic Xo. We used long read, single molecule, real time (SMRT) genome sequences of five strains of Xcl from Burkina Faso, China, Mali and Uganda to determine the genetic relatedness of this organism with Xo. Novel Transcription Activator-Like Effectors (TALEs) were discovered in all five strains of Xcl. Predicted TALE target sequences were identified in the L. perrieri genome and compared to rice susceptibility gene homologs. Pathogenicity screening on L. hexandra and diverse rice cultivars confirmed that Xcl are able to colonize rice and produce weak but not progressive symptoms. Overall, based on average nucleotide identity, type III effector repertoires and disease phenotype, we propose to rename Xcl to X. oryzae pv. leersiae (Xol) and use this parallel system to improve understanding of the evolution of bacterial pathogenicity in rice agroecosystems.


April 21, 2020  |  

Comprehensive characterization of T-DNA integration induced chromosomal rearrangement in a birch T-DNA mutant.

Integration of T-DNA into plant genomes via Agrobacterium may interrupt gene structure and generate numerous mutants. The T-DNA caused mutants are valuable materials for understanding T-DNA integration model in plant research. T-DNA integration in plants is complex and still largely unknown. In this work, we reported that multiple T-DNA fragments caused chromosomal translocation and deletion in a birch (Betula platyphylla × B. pendula) T-DNA mutant yl.We performed PacBio genome resequencing for yl and the result revealed that two ends of a T-DNA can be integrated into plant genome independently because the two ends can be linked to different chromosomes and cause chromosomal translocation. We also found that these T-DNA were connected into tandem fragment regardless of direction before integrating into plant genome. In addition, the integration of T-DNA in yl genome also caused several chromosomal fragments deletion. We then summarized three cases for T-DNA integration model in the yl genome. (1) A T-DNA fragment is linked to the two ends of a double-stranded break (DSB); (2) Only one end of a T-DNA fragment is linked to a DSB; (3) A T-DNA fragment is linked to the ends of different DSBs. All the observations in the yl genome supported the DSB repair model.In this study, we showed a comprehensive genome analysis of a T-DNA mutant and provide a new insight into T-DNA integration in plants. These findings would be helpful for the analysis of T-DNA mutants with special phenotypes.


April 21, 2020  |  

A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds.

The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor >?98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.


April 21, 2020  |  

Retrotranspositional landscape of Asian rice revealed by 3000 genomes.

The recent release of genomic sequences for 3000 rice varieties provides access to the genetic diversity at species level for this crop. We take advantage of this resource to unravel some features of the retrotranspositional landscape of rice. We develop software TRACKPOSON specifically for the detection of transposable elements insertion polymorphisms (TIPs) from large datasets. We apply this tool to 32 families of retrotransposons and identify more than 50,000 TIPs in the 3000 rice genomes. Most polymorphisms are found at very low frequency, suggesting that they may have occurred recently in agro. A genome-wide association study shows that these activations in rice may be triggered by external stimuli, rather than by the alteration of genetic factors involved in transposable element silencing pathways. Finally, the TIPs dataset is used to trace the origin of rice domestication. Our results suggest that rice originated from three distinct domestication events.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.