Menu
July 7, 2019  |  

Cloche is a bHLH-PAS transcription factor that drives haemato-vascular specification.

Vascular and haematopoietic cells organize into specialized tissues during early embryogenesis to supply essential nutrients to all organs and thus play critical roles in development and disease. At the top of the haemato-vascular specification cascade lies cloche, a gene that when mutated in zebrafish leads to the striking phenotype of loss of most endothelial and haematopoietic cells and a significant increase in cardiomyocyte numbers. Although this mutant has been analysed extensively to investigate mesoderm diversification and differentiation and continues to be broadly used as a unique avascular model, the isolation of the cloche gene has been challenging due to its telomeric location. Here we used a deletion allele of cloche to identify several new cloche candidate genes within this genomic region, and systematically genome-edited each candidate. Through this comprehensive interrogation, we succeeded in isolating the cloche gene and discovered that it encodes a PAS-domain-containing bHLH transcription factor, and that it is expressed in a highly specific spatiotemporal pattern starting during late gastrulation. Gain-of-function experiments show that it can potently induce endothelial gene expression. Epistasis experiments reveal that it functions upstream of etv2 and tal1, the earliest expressed endothelial and haematopoietic transcription factor genes identified to date. A mammalian cloche orthologue can also rescue blood vessel formation in zebrafish cloche mutants, indicating a highly conserved role in vertebrate vasculogenesis and haematopoiesis. The identification of this master regulator of endothelial and haematopoietic fate enhances our understanding of early mesoderm diversification and may lead to improved protocols for the generation of endothelial and haematopoietic cells in vivo and in vitro.


July 7, 2019  |  

Challenges, solutions, and quality metrics of personal genome assembly in advancing precision medicine.

Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.


July 7, 2019  |  

The report of my death was an exaggeration: A review for researchers using microsatellites in the 21st century.

Microsatellites, or simple sequence repeats (SSRs), have long played a major role in genetic studies due to their typically high polymorphism. They have diverse applications, including genome mapping, forensics, ascertaining parentage, population and conservation genetics, identification of the parentage of polyploids, and phylogeography. We compare SSRs and newer methods, such as genotyping by sequencing (GBS) and restriction site associated DNA sequencing (RAD-Seq), and offer recommendations for researchers considering which genetic markers to use. We also review the variety of techniques currently used for identifying microsatellite loci and developing primers, with a particular focus on those that make use of next-generation sequencing (NGS). Additionally, we review software for microsatellite development and report on an experiment to assess the utility of currently available software for SSR development. Finally, we discuss the future of microsatellites and make recommendations for researchers preparing to use microsatellites. We argue that microsatellites still have an important place in the genomic age as they remain effective and cost-efficient markers.


July 7, 2019  |  

An ultra-high density genetic linkage map of perennial ryegrass (Lolium perenne) using genotyping by sequencing (GBS) based on a reference shotgun genome assembly.

High density genetic linkage maps that are extensively anchored to assembled genome sequences of the organism in question are extremely useful in gene discovery. To facilitate this process in perennial ryegrass (Lolium perenne L.), a high density single nucleotide polymorphism (SNP)- and presence/absence variant (PAV)-based genetic linkage map has been developed in an F2 mapping population that has been used as a reference population in numerous studies. To provide a reference sequence to which to align genotyping by sequencing (GBS) reads, a shotgun assembly of one of the grandparents of the population, a tenth-generation inbred line, was created using Illumina-based sequencing.The assembly was based on paired-end Illumina reads, scaffolded by mate pair and long jumping distance reads in the range of 3-40?kb, with >200-fold initial genome coverage. A total of 169 individuals from an F2 mapping population were used to construct PstI-based GBS libraries tagged with unique 4-9 nucleotide barcodes, resulting in 284 million reads, with approx. 1·6 million reads per individual. A bioinformatics pipeline was employed to identify both SNPs and PAVs. A core genetic map was generated using high confidence SNPs, to which lower confidence SNPs and PAVs were subsequently fitted in a straightforward binning approach.The assembly comprises 424?750 scaffolds, covering 1·11 Gbp of the 2·5 Gbp perennial ryegrass genome, with a scaffold N50 of 25 212?bp and a contig N50 of 3790?bp. It is available for download, and access to a genome browser has been provided. Comparison of the assembly with available transcript and gene model data sets for perennial ryegrass indicates that approx. 570 Mbp of the gene-rich portion of the genome has been captured. An ultra-high density genetic linkage map with 3092 SNPs and 7260 PAVs was developed, anchoring just over 200?Mb of the reference assembly.The combined genetic map and assembly, combined with another recently released genome assembly, represent a significant resource for the perennial ryegrass genetics community.© The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.


July 7, 2019  |  

Whole-genome sequencing recommendations

Recent technological developments have revolutionized the way we perform genetic analyses. In particular whole-genome sequencing provides access to the entire genetic makeup of an individual, and it is now an affordable approach for many research groups. As a consequence genome sequencing is pervading many fields of biological research. Sequencing technologies are evolving rapidly and so do their applications. Here we provide a first primer on whole-genome sequencing, focusing on two of the most popular applications: (1) de novo genome sequencing, in which the objective is obtaining a high-quality genome assembly that can serve as a reference for a species or variety, and (2) genome resequencing, when there is an available reference genome and the objective is to map sequence variation of an individual or a set of individuals. It is not our intention to provide a comprehensive overview of current methodologies that will likely soon become obsolete, but rather focus on general principles that will have a more general applicability.


July 7, 2019  |  

Short tandem repeat number estimation from paired-end reads for multiple individuals by considering coalescent tree.

Two types of approaches are mainly considered for the repeat number estimation in short tandem repeat (STR) regions from high-throughput sequencing data: approaches directly counting repeat patterns included in sequence reads spanning the region and approaches based on detecting the difference between the insert size inferred from aligned paired-end reads and the actual insert size. Although the accuracy of repeat numbers estimated with the former approaches is high, the size of target STR regions is limited to the length of sequence reads. On the other hand, the latter approaches can handle STR regions longer than the length of sequence reads. However, repeat numbers estimated with the latter approaches is less accurate than those with the former approaches.We proposed a new statistical model named coalescentSTR that estimates repeat numbers from paired-end read distances for multiple individuals simultaneously by connecting the read generative model for each individual with their genealogy. In the model, the genealogy is represented by handling coalescent trees as hidden variables, and the summation of the hidden variables is taken on coalescent trees sampled based on phased genotypes located around a target STR region with Markov chain Monte Carlo. In the sampled coalescent trees, repeat number information from insert size data is propagated, and more accurate estimation of repeat numbers is expected for STR regions longer than the length of sequence reads. For finding the repeat numbers maximizing the likelihood of the model on the estimation of repeat numbers, we proposed a state-of-the-art belief propagation algorithm on sampled coalescent trees.We verified the effectiveness of the proposed approach from the comparison with existing methods by using simulation datasets and real whole genome and whole exome data for HapMap individuals analyzed in the 1000 Genomes Project.


July 7, 2019  |  

Strategies for sequence assembly of plant genomes

The field of plant genome assembly has greatly benefited from the development and widespread adoption of next-generation DNA sequencing platforms. Very high sequencing throughputs and low costs per nucleotide have considerably reduced the technical and budgetary constraints associated with early assembly projects done primarily with a traditional Sanger-based approach. Those improvements led to a sharp increase in the number of plant genomes being sequenced, including large and complex genomes of economically important crops. Although next-generation DNA sequencing has considerably improved our understanding of the overall structure and dynamics of many plant genomes, severe limitations still remain because next-generation DNA sequencing reads typically are shorter than Sanger reads. In addition, the software tools used to de novo assemble sequences are not necessarily designed to optimize the use of short reads. These cause challenges, common to many plant species with large genome sizes, high repeat contents, polyploidy and genome-wide duplications. This chapter provides an overview of historical and current methods used to sequence and assemble plant genomes, along with new solutions offered by the emergence of technologies such as single molecule sequencing and optical mapping to address the limitations of current sequence assemblies.


July 7, 2019  |  

Next-generation sequencing: a diagnostic one-stop shop for Hepatitis C?

Before starting chronic hepatitis C treatment, the viral genotype/subtype has to be accurately determined and potentially coupled with drug resistance testing. Due to the high genetic variability of the hepatitis C virus, this can be a demanding task that can potentially be streamlined by viral whole-genome sequencing using next-generation sequencing as demonstrated by an article in this issue of the Journal of Clinical Microbiology by E. Thomson, C. L. C. Ip, A. Badhan, M. T. Christiansen, W. Adamson, et al. (J Clin Microbiol. 54:2455-2469, 2016, http://dx.doi.org/10.1128/JCM.00330-16). Copyright © 2016, American Society for Microbiology. All Rights Reserved.


July 7, 2019  |  

Effector diversification contributes to Xanthomonas oryzae pv. oryzae phenotypic adaptation in a semi-isolated environment.

Understanding the processes that shaped contemporary pathogen populations in agricultural landscapes is quite important to define appropriate management strategies and to support crop improvement efforts. Here, we took advantage of an historical record to examine the adaptation pathway of the rice pathogen Xanthomonas oryzae pv. oryzae (Xoo) in a semi-isolated environment represented in the Philippine archipelago. By comparing genomes of key Xoo groups we showed that modern populations derived from three Asian lineages. We also showed that diversification of virulence factors occurred within each lineage, most likely driven by host adaptation, and it was essential to shape contemporary pathogen races. This finding is particularly important because it expands our understanding of pathogen adaptation to modern agriculture.


July 7, 2019  |  

LongISLND: in silico sequencing of lengthy and noisy datatypes.

LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling.LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press.


July 7, 2019  |  

ChIP-Seq-annotated Heliconius erato genome highlights patterns of cis-regulatory evolution in Lepidoptera.

Uncovering phylogenetic patterns of cis-regulatory evolution remains a fundamental goal for evolutionary and developmental biology. Here, we characterize the evolution of regulatory loci in butterflies and moths using chromatin immunoprecipitation sequencing (ChIP-seq) annotation of regulatory elements across three stages of head development. In the process we provide a high-quality, functionally annotated genome assembly for the butterfly, Heliconius erato. Comparing cis-regulatory element conservation across six lepidopteran genomes, we find that regulatory sequences evolve at a pace similar to that of protein-coding regions. We also observe that elements active at multiple developmental stages are markedly more conserved than elements with stage-specific activity. Surprisingly, we also find that stage-specific proximal and distal regulatory elements evolve at nearly identical rates. Our study provides a benchmark for genome-wide patterns of regulatory element evolution in insects, and it shows that developmental timing of activity strongly predicts patterns of regulatory sequence evolution. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.


July 7, 2019  |  

Genetic basis of priority effects: insights from nectar yeast.

Priority effects, in which the order of species arrival dictates community assembly, can have a major influence on species diversity, but the genetic basis of priority effects remains unknown. Here, we suggest that nitrogen scavenging genes previously considered responsible for starvation avoidance may drive priority effects by causing rapid resource depletion. Using single-molecule sequencing, we de novo assembled the genome of the nectar-colonizing yeast, Metschnikowia reukaufii, across eight scaffolds and complete mitochondrion, with gap-free coverage over gene spaces. We found a high rate of tandem gene duplication in this genome, enriched for nitrogen metabolism and transport. Both high-capacity amino acid importers, GAP1 and PUT4, present as tandem gene arrays, were highly expressed in synthetic nectar and regulated by the availability and quality of amino acids. In experiments with competitive nectar yeast, Candida rancensis, amino acid addition alleviated suppression of C. rancensis by early arrival of M. reukaufii, corroborating that amino acid scavenging may contribute to priority effects. Because niche pre-emption via rapid resource depletion may underlie priority effects in a broad range of microbial, plant and animal communities, nutrient scavenging genes like the ones we considered here may be broadly relevant to understanding priority effects.© 2016 The Author(s).


July 7, 2019  |  

Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the Chromosome 8p23.1 region.

Recurrent rearrangements of Chromosome 8p23.1 are associated with congenital heart defects and developmental delay. The complexity of this region has led to inconsistencies in the current reference assembly, confounding studies of genetic variation. Using comparative sequence-based approaches, we generated a high-quality 6.3-Mbp alternate reference assembly of an inverted Chromosome 8p23.1 haplotype. Comparison with nonhuman primates reveals a 746-kbp duplicative transposition and two separate inversion events that arose in the last million years of human evolution. The breakpoints associated with these rearrangements map to an ape-specific interchromosomal core duplicon that clusters at sites of evolutionary inversion (P = 7.8 × 10(-5)). Refinement of microdeletion breakpoints identifies a subgroup of patients that map to the same interchromosomal core involved in the evolutionary formation of the duplication blocks. Our results define a higher-order genomic instability element that has shaped the structure of specific chromosomes during primate evolution contributing to rearrangements associated with inversion and disease.© 2016 Mohajeri et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019  |  

Towards integration of population and comparative genomics in forest trees.

The past decade saw the initiation of an ongoing revolution in sequencing technologies that is transforming all fields of biology. This has been driven by the advent and widespread availability of high-throughput, massively parallel short-read sequencing (MPS) platforms. These technologies have enabled previously unimaginable studies, including draft assemblies of the massive genomes of coniferous species and population-scale resequencing. Transcriptomics studies have likewise been transformed, with RNA-sequencing enabling studies in nonmodel organisms, the discovery of previously unannotated genes (novel transcripts), entirely new classes of RNAs and previously unknown regulatory mechanisms. Here we touch upon current developments in the areas of genome assembly, comparative regulomics and population genetics as they relate to studies of forest tree species.© 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.


July 7, 2019  |  

Probabilistic viral quasispecies assembly

Viruses are pathogens that cause infectious diseases. The swarm of virions is subject to the host’s immune pressure and possibly antiviral therapy. It may escape this selective pressure and gain selective advantage by acquiring one or more of the genomic alterations: single-nucleotide variants (SNVs), loss or gain of one or more amino acids, large deletions, for example, due to alternative splicing, or recombination of different strains. Genotypic antiretroviral drug resistance testing is performed via sequencing. Next-generation sequencing (NGS) technologies revolutionized assessing viral genetic diversity experimentally. In viral quasispecies analysis, there are two main goals: the identification of low-frequency variants and haplotype assembly on a whole-genome scale. PacBio performs single-molecule sequencing. This chapter elaborates human haplotyping and its relationship to probabilistic viral haplotype reconstruction methods. Viral quasispecies assembly has the potential to replace the current de facto diversity estimation by SNV calling. With advances in library preparation, increasing sensitivity of sequencing platforms, and more sophisticated models, it might be possible to detect all or most viral strains in a single individual.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.