Goat is an important source of milk, meat, and fiber, especially in developing countries. An advantage of goats as livestock is the low maintenance requirements and high adaptability compared to other milk producers. The global population of domestic goats exceeds 800 million. In Africa, goat production is characterized by low productivity levels, and attempts to introduce more productive breeds have met with poor success due in part to nutritional constraints. It has been suggested that incorporation of selective breeding within the herds adapted for survival could represent one approach to improving food security across Africa. A recently produced genome assembly of a Chinese Yunnan breed goat, based on 192 Gb of short reads across a range of insert sizes from 180 bp to 20 kb, reported a contig N50 of 18.7 kb. The scaffold N50 was improved from 2.2 Mb to 3.1 Mb by addition of fosmid end sequence, with an estimated 140 million Ns in gaps and 91% coverage. The assembly has proven somewhat problematic for pursuing genome-wide association analysis with SNP arrays, apparently due in part to errors in ordering of markers using the draft genome. In order to provide a higher quality assembly, we sequenced a highly inbred, San Clemente breed goat genome using 458 SMRT cells on the Pacific Biosciences platform. These cells generated 193.5 Gbases of sequence after processing into subreads, with mean 5110 bases and max subread length of 40.5 kb. This sequence data generated an assembly using the recently reported MHAP error correction approach and Celera Assembler v8.2. The contig N50 was 2.5 Mb, with the largest contig spanning 19.5 Mb. Additional characteristics of the assembly will be presented.
Reference quality de novo genome assemblies were once solely the domain of large, well-funded genome projects. While next-generation short read technology removed some of the cost barriers, accurate chromosome-scale assembly remains a real challenge. Here we present efforts to de novo assemble the goat (Capra hircus) genome. Through the combination of single-molecule technologies from Pacific Biosciences (sequencing) and BioNano Genomics (optical mapping) coupled with high-throughput chromosome conformation capture sequencing (Hi-C), an inbred San Clemente goat genome has been sequenced and assembled to a high degree of completeness at a relatively modest cost. Starting with 38 million PacBio reads, we integrated the MinHash Alignment Process (MHAP) with the Celera Assembler (CA) to produce an assembly composed of 3110 contigs with a contig N50 size of 4.7 Mb. This assembly was scaffolded with BioNano genome maps derived from a single IrysChip into 333 scaffolds with an N50 of 23.1 Mb including the complete scaffolding of chromosome 20. Finally, cis-chromosome associations were determined by Hi-C, yielding complete reconstruction of all autosomes into single scaffolds with a final N50 of 91.7 Mb. We hope to demonstrate that our methods are not only cost effective, but improve our ability to annotate challenging genomic regions such as highly repetitive immune gene clusters.
Commentary from PacBio users on their applications of SMRT Sequencing, including Ulf Gyllensten (Uppsala University), Tim Smith (USDA-ARS) and Bobby Sebra (Icahn School of Medicine)
A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system
Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ~36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.
Complete genome sequences of two genotype A2 small ruminant lentiviruses isolated from infected U.S. sheep.
Two distinct subgroups of genotype A2 small ruminant lentiviruses (SRLVs) have been identified in the United States that infect sheep with specific host transmembrane protein 154 (TMEM154) diplotypes. Here, we report the first two complete genome sequences of SRLV strains infecting U.S. sheep belonging to genotype A2, subgroups 1 and 2. Copyright © 2017 Workman et al.
Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers.
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplification primer selection, and read length, which can affect the apparent microbial community. In this study, we compared short read 16S rRNA variable regions, V1-V3, with that of near-full length 16S regions, V1-V8, using highly diverse steer rumen microbial communities, in order to examine the impact of technology selection on phylogenetic profiles. Short paired-end reads from the Illumina MiSeq platform were used to generate V1-V3 sequence, while long “circular consensus” reads from the Pacific Biosciences RSII instrument were used to generate V1-V8 data. The two platforms revealed similar microbial operational taxonomic units (OTUs), as well as similar species richness, Good’s coverage, and Shannon diversity metrics. However, the V1-V8 amplified ruminal community resulted in significant increases in several orders of taxa, such as phyla Proteobacteria and Verrucomicrobia (P < 0.05). Taxonomic classification accuracy was also greater in the near full-length read. UniFrac distance matrices using jackknifed UPGMA clustering also noted differences between the communities. These data support the consensus that longer reads result in a finer phylogenetic resolution that may not be achieved by shorter 16S rRNA gene fragments. Our work on the cattle rumen bacterial community demonstrates that utilizing near full-length 16S reads may be useful in conducting a more thorough study, or for developing a niche-specific database to use in analyzing data from shorter read technologies when budgetary constraints preclude use of near-full length 16S sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.
Zea mays is an important genetic model for elucidating transcriptional networks. Uncertainties about the complete structure of mRNA transcripts limit the progress of research in this system. Here, using single-molecule sequencing technology, we produce 111,151 transcripts from 6 tissues capturing ~70% of the genes annotated in maize RefGen_v3 genome. A large proportion of transcripts (57%) represent novel, sometimes tissue-specific, isoforms of known genes and 3% correspond to novel gene loci. In other cases, the identified transcripts have improved existing gene models. Averaging across all six tissues, 90% of the splice junctions are supported by short reads from matched tissues. In addition, we identified a large number of novel long non-coding RNAs and fusion transcripts and found that DNA methylation plays an important role in generating various isoforms. Our results show that characterization of the maize B73 transcriptome is far from complete, and that maize gene expression is more complex than previously thought.
Genome sequencing and comparative genomics provides insights on the evolutionary dynamics and pathogenic potential of different H-serotypes of Shiga toxin-producing Escherichia coli O104.
Various H-serotypes of the Shiga toxin-producing Escherichia coli (STEC) O104, including H4, H7, H21, and H¯, have been associated with sporadic cases of illness and have caused food-borne outbreaks globally. In the U.S., STEC O104:H21 caused an outbreak associated with milk in 1994. However, there is little known on the evolutionary origins of STEC O104 strains, and how genotypic diversity contributes to pathogenic potential of various O104 H-antigen serotypes isolated from different ecological niches and/or geographical regions.Two STEC O104:H21 (milk outbreak strain) and O104:H7 (cattle isolate) strains were shot-gun sequenced, and the genomes were closed. The intimin (eae) gene, involved in the attaching-effacing phenotype of diarrheagenic E. coli, was not found in either strain. Examining various O104 genome sequences, we found that two “complete” left and right end portions of the locus of enterocyte effacement (LEE) pathogenicity island were present in 13 O104 strains; however, the central portion of LEE was missing, where the eae gene is located. In O104:H4 strains, the missing central portion of the LEE locus was replaced by a pathogenicity island carrying the aidA (adhesin involved in diffuse adherence) gene and antibiotic resistance genes commonly carried on plasmids. Enteroaggregative E. coli-specific virulence genes and European outbreak O104:H4-specific stx2-encoding Escherichia P13374 or Escherichia TL-2011c bacteriophages were missing in some of the O104:H4 genome sequences available from public databases. Most of the genomic variations in the strains examined were due to the presence of different mobile genetic elements, including prophages and genomic island regions. The presence of plasmids carrying virulence-associated genes may play a role in the pathogenic potential of O104 strains.The two strains sequenced in this study (O104:H21 and O104:H7) are genetically more similar to each other than to the O104:H4 strains that caused an outbreak in Germany in 2011 and strains found in Central Africa. A hypothesis on strain evolution and pathogenic potential of various H-serotypes of E. coli O104 strains is proposed.
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
Edwardsiella hoshinae is a Gram-negative facultative anaerobe that has primarily been isolated from avians and reptiles. We report here the complete and annotated genome sequence of an isolate from a monitor lizard (Varanus sp.), which contains a chromosome of 3,811,650 bp and no plasmids. Copyright © 2017 Reichley et al.
Complete genome sequences of two Staphylococcus aureus sequence type 5 isolates from California, USA.
Staphylococcus aureus causes a variety of human diseases ranging in severity. The pathogenicity of S. aureus can be partially attributed to the acquisition of mobile genetic elements. In this report, we provide two complete genome sequences from human clinical S. aureus isolates. Copyright © 2017 Hau et al.
Zinc resistance within swine associated methicillin resistant staphylococcus aureus (MRSA) Isolates in the USA is associated with MLST lineage.
Zinc resistance in livestock-associated methicillin resistant Staphylococcus aureus (LA-MRSA) sequence type (ST) 398 is primarily mediated by the czrC gene co-located with the mecA gene, encoding methicillin resistance, within the type V SCCmec element. Because czrC and mecA are located within the same mobile genetic element, it has been suggested that the use of in feed zinc as an antidiarrheal agent has the potential to contribute to the emergence and spread of MRSA in swine through increased selection pressure to maintain the SCCmec element in isolates obtained from pigs. In this study we report the prevalence of the czrC gene and phenotypic zinc resistance in US swine associated LA-MRSA ST5 isolates, MRSA ST5 isolates from humans with no swine contact, and US swine associated LA-MRSA ST398 isolates. We demonstrate that the prevalence of zinc resistance in US swine associated LA-MRSA ST5 isolates was significantly lower than the prevalence of zinc resistance in MRSA ST5 isolates from humans with no swine contact, swine associated LA-MRSA ST398 isolates, and previous reports describing zinc resistance in other LA-MRSA ST398 isolates. Collectively our data suggest that selection pressure associated with zinc supplementation in feed is unlikely to have played a significant role in the emergence of LA-MRSA ST5 in the US swine population. Additionally, our data indicate that zinc resistance is associated with MLST lineage suggesting a potential link between genetic lineage and carriage of resistance determinants.Importance Our data suggest that coselection thought to be associated with the use of zinc in feed as an antimicrobial agent is not playing a role in the emergence of livestock-associated methicillin resistant Staphylococcus aureus (LA-MRSA) ST5 in the US swine population. Additionally, our data indicate that zinc resistance is more associated with multi locus sequence type (MLST) lineage suggesting a potential link between genetic lineage and carriage of resistance markers. This information is important to public health professionals, veterinarians, producers, and consumers. Copyright © 2017 American Society for Microbiology.
The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation.
Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination.
Complete genomic sequences of two Salmonella enterica subsp. enterica serogroup C2 (O:6,8) strains from Central California.
Salmonella enterica subsp. enterica strains RM11060, serotype 6,8:d:-, and RM11065, serotype 6,8:-:e,n,z15, were isolated from environmental samples collected in central California in 2009. We report the complete genome sequences of these two strains. These genomic sequences are distinct and will provide additional data to our understanding of S. enterica genomics.
Escherichia coli serotype O157:H7 strain B6914-MS1 is an isolate from the Centers for Disease Control and Prevention that is missing both Shiga toxin genes and has been used extensively in applied research studies. Here we report the genome sequence of strain B6914-ARS, a B6914-MS1 clone that has unique biofilm properties.