Comparing structural and functional elements of orthologous HSP70s in the fission yeast Schizosaccharomyces pombe and the budding yeast Saccharomyces cerevisiae

This is a research article I did on the heat shock proteins of two species of yeast in 2013.


Seventy-kD Heat Shock Protein (HSP70) is a multigene family of proteins that is important for cellular stress response and survival (Lindquist 1988). The HSP70 proteins are approximately 70 kDa in size and are highly conserved across all three domains of life (Eukaryotes, Prokaryotes, and Archaea). These genes are either constitutively expressed or heat inducible (Lindquist and Craig 1988). HSP70s are a family of ATPases that contain an N-terminal Adenosine Triphosphatase domain (aka. nucleotide binding domain, NBD), a substrate binding domain (SBD), and a C-terminal domain of varying length. These proteins are involved in the transport of proteins across membranes as well as protein folding in a cell (Hartl and Hayer-Harlt 2002). HSP70s’ role in protein folding is important in cell survival during heat shock stress. Higher temperatures can lead to protein misfolding and subsequent aggregation within the cell. HSP70s bind denatured or abnormal proteins via the exposed hydrophobic regions to prevent aggregation (Finley et al. 1984). Binding of these proteins also facilitates refolding into the proper conformation (Wegele et al. 2004). The structure and function of HSP70s are well studied in the yeast Saccharomyces cerevisiae.

Saccaromyces cerevisiae is a single-celled fungus that is used in applications such as beer brewing and bread making. Because the organism has important commercial uses, it has been subject to extensive study. S. cerevisiae has also been used as a model organism to study the function and structure of eukaryotic cells. Like other organisms, S. cerevisiae contains many HSP70 genes. There have been a total of 14 HSP70 genes discovered that are grouped by sub-cellular location. SSA1-4, SSB1-2 and SSE1-2 are HSP70s that reside in the cytosol (Lindquist and Craig 1988; Mukai et al. 1993); SSC1, SSQ and ECM10 are mitochondrial HSP70s (Voos et al 2002); and Kar2 and LHS1 reside in the endoplasmic reticulum (Normington et al 1989; Saris et al 1997). Although some hsp70 genes are heat-inducible, not all in the family share the same expression profile. Previous studies have shown that SSA2 expression is not temperature based, while the SSB proteins had decreased expression when temperature was increased (Craig et al. 1985).

Schizosaccharomyces pombe is a basal member of the fungi phylum Ascomycota, as are the rest of the subphylum Taphrinomycotina (Ebersberger 2012), although Taphrinomycotina may not be a monophyletic grouping (Schoch 2009). Unlike other Ascomycotes who reproduce by producing ascospores, Schizosaccharomyces divide by medial fission (Nurse 1976), hence Schizosaccharomyces are known as the “fission yeasts”. S. pombe was originally used as a component in the traditional African sorghum beer “pito” in Ghana (N’guessan 2011) and was not used as a scientific model organism until 1950 (Leupold). Previously, S. pombe has been used as a model organism for various genetic studies (Mitchison 1970, Gutz 1974, Beach and Nurse 1981, Hagan and Hyams 1988, Matsuyama 2006, Kim 2010), although it has not been used as widely as a model organism as S. cerevisiae due to its lack of easily controllable gene expression methods (Zilio 2012), although some recent progress has been made on developing effective methods of gene control that do not induce cellular stress (Zilio 2012).

The complete genome of S. cerevisiae was published in 1996 by Goffeau et al, and the genome of S. pombe was published by Wood et al. in 2002. Interestingly, the genome of S. cerevisiae is marked by a whole genome duplication event that led to the duplication of many genes (Wolfe 1997, Kellis 2004). Comparisons of S. pombe and S. cerevisiae are vital in understanding how HSP70 maintain similar functionality across great timespans as these two species likely diverged around 425 million years (Berbee et al. 2007). Interestingly, these two species have drastically different genomic arrangements. S. cerevisiae maintains 16 chromosomes and only 250 introns, S. pombe maintains only 3 chromosomes, but thousands of introns, which suggests these two species are experiencing very different selection pressures on a genomic-scale. If these two species have been experiencing different genomic-scale selective pressures, we would expect that HSP70s might have been shuffled around and undergone significant sequence divergence, yet still remained functionally the same. Specifically, we hypothesized the following: 1.) The presence/absence of regulatory elements (HSE, intron/exon) has been unchanged. 2.) Amino acid sequences have diverged significantly (~5%). 3.) Presence/absence of signal peptides or transmembrane proteins remained unchanged. 4.) Local gene neighborhood synteny has been lost. 5.) 3-dimensional structures have remained unchanged. 5.) The nucleotide binding site has remained functionally unchanged in the lhs1 HSP70 orthologs.


Sequence collection and ortholog detection

Orthologous sequences were detected by first obtaining known HSP70 sequences from S. cerevisiae S288 and then searching known yeast sequences against the S. pombe genome using BLASTp. We considered proteins that had greater than 25% identity to each search sequence to be potential orthologous sequences. We retrieved those potential orthologs and constructed a preliminary tree to make sure that all possible orthologs had been found in S. pombe. Orthologs that had not yet been found in S. pombe but were found in S. cerevisiae were then searched against the genomic sequence of S. pombe using the corresponding S. cerevisiae protein sequence in tBLASTn.

Phylogenetic analysis

S. cerevisiae and S. pombe protein sequences were aligned online using the MAFFT program (Katoh 2013) withG-INS-i parameters in order to achieve an optimal global alignment. Ortholog detection was done using the neighbor joining method and bootstrap method with pairwise p-distances of the amino acid (AA) sequences in the MEGA5.1 program (Felsenstein 1985; Saitou and Nei 1987; Tamura 2011). The best distance model for the protein data set was determined by using ProtTest 3.2 (Darriba 2011) and modeled AA trees were inferred using PhyML 3.0 and MEGA 5.1.

Following AA based tree drawing, the AA alignment was converted codon-by-codon to an aligned, genomic CDS alignment using a lookup table so that the resultant nucleotide alignment matched the AA-based alignment. The best nucleotide model for this data set was determined using jModelTest (Guindon and Gascuel 2003; Posada 2008). Following model selection, maximum likelihood phylogeny was inferred using the PhyML 3.1 standalone version (Guindon 2010) with 100 bootstrap replicates. The amino-acid based tree and nucleotide-based tree were compared for topological differences.

Exon-Intron Analysis

The NCBI gene database for S. cerevisiae and S. pombe was used to identify exons and introns within hsp70 genes. Genes with introns were analyzed using SPIDEY software to determine position of splice points in genomic DNA and intron phases.

Regulatory elements analysis

TransFac Match program was used to search for the presence of Heat Shock Elements (HSE) 1000 base pair (bp) upstream of each hsp70 coding sequence. A matrix and core match score above 85% was used to represent strong evidence for a HSE with thenucleotide motif nGAAnnTTCnnGAAn.The sequences GAAnnTTC or TTCnnGAA were also considered for possible HSEs.

Synteny Analysis

Gene maps of each hsp70 were obtained through NCBI database. Maps were used to compare the orientation of each gene in their respective genomes. Conservation of neighboring genes were also compared.

Protein characterization

SignalP 4.1 server software was used to predict possible signal peptides in HSP70 amino acid sequences using eukaryotic organism setting. TMHMM 2.0 server software was used to predict whether the HSP70 proteins were membrane bound by detecting the presence of transmembrane helices. Presence of conserved protein domains in HSP70s were searched for using Superfamily HMM library and genome assignments server version 1.75.
Nucleotide binding site analysis

In order to determine if HSP70 proteins were functionally different or merely differed in their nucleotide sequences, the 3-dimensional structures of a subset (the lhs1 genes) were analyzed. The nucleotide binding site (NBS), a conserved site in the nucleotide binding domain (NBD) found in all hsp70 genes, was targeted. This region is responsible for binding ATP which allows the opening of the substrate binding domain (SBD), and so its function should remain highly conserved through time, particularly the NBS site which itself is a cleft within the NBD where ATP binds (Liu 2007).


Orthologous sequence detection

We found a total of 14 hsp70 family proteins in S. cerevisiae, but only found 8 orthologous sequences in S. pombe (Table 1). The ssa gene group was marked by a paralogous duplicate in S. cerevisiae not found in S. pombe (Figure 1), as well as in the ssb, sse, and mitochondrial groups (Figure 1). All gene names will be referred to as shown in Figure 1. For each of the cellular regions that Hsp70 proteins were found in S. cerevisiae, S. pombe also had at least one ortholog. However, S. cerevisiae produced a markedly greater number of paralogs, although it maintains the same number of orthologs.

Phylogenetic analysis

The best model for the amino acid data set was found to be LG+I+F+G, although a JTT based NJ tree was also inferred for comparison. GTR+I+G was found to be the best model for the genomic CDS nucleotide data set. In the AA based data sets, several notable topological differences between NJ, p-distance based tree and the NJ, JTT distance based AA data set, as well as between both NJ methods and the maximum-likelihood based LG+I+F+G based tree, although these differences were not usually supported with strong bootstrap support. There were no topological differences between the maximum-likelihood, AA based tree and the maximum-likelihood, nucleotide based tree.

Differences between phylogenetic methods were present but subtle, and not usually strongly supported (greater than 95% support) by bootstrap analysis. The fact that the maximum-likelihood trees inferred using the best amino-acid model (LG+I+F+G) and the best nucleotide model (GTR+I+G) produced identical topologies suggests that these trees are correctly representing the evolutionary history of these heat-shock proteins.

Intron analysis of hsp70s

NCBI gene database revealed no introns within S. cerevisiae hsp70 genes. This was also the case in S. pombe except for pdr13-pombe, which contains two exons and one intron. SPIDEY analysis of pdr13-pombe revealed a 126 nucleotide phase two intron between positions 125-126 of the mRNA. This is compared to the ortholog ssz1-yeast in S. cerevisiae which contains no intron (Figure 3). The intron of pdr13-pombe is a phase 2 intron located fairly close to the start codon of the mRNA between positions 125-126.

Regulatory elements analysis

TransFac analysis revealed full HSE regions containing the GAAnnTTCnnGAA sequence in ssa1-yeast, kar2-yeast, ssa1-pombe, and ssa2-pombe genes. Both ssa1 genes from S. cerevisiae and S. pombe contained HSEs approximately 300 bp upstream of the start codon (Figure 4). All other hsp70 genes contained no HSEs or a Partial HSEs with either a GAAnnTTC or TTCnnGAA motif. The ssa2-yeast, ssa4-yeast, ssb2-yeast, sse1-yeast, sse2-yeast, bip-pombe, ssc1-pombe, pdr13-pombe, and pss1-pombe genes all contain partial HSEs. Ecm10-yeast, lhs1-yeast, ssa3-yeast, ssb1-yeast, ssc1-yeast, ssz1-yeast, ssq1-yeast, lhs1-pombe, and sks2-pombe contained no HSEs. The mitochondrial ssc1-pombe contains a partial HSE approximately 200 bp upstream of the start codon. This is compared to the ortholog ecm10-yeast which contains no HSE. Another example of this pattern is seen in the intron-containing pdr13-pombe and ssz1-yeast genes. The endoplasmic reticulum kar2-yeast was found to have a full HSE compared to its ortholog bip-pombe. The sse2-yeast and pss1-pombe both contain two partial HSEs (Figure 4).

Synteny Analysis

Comparison of gene maps for ssa1 and ssa2 orthologs revealed few syntenic relationships (Figure 5). Though ssa1 genes showed the same orientation, ssa2 genes were found to be in opposite orientation. There were no similarities in the neighboring genes for between ssa1 and ssa2 orthologs. The pattern of no syntenic relationships was observed for the other hsp70 genes (data not shown).

Protein Characterization

Lhs1-yeast, Lhs1-pombe, and Bip-pombe were predicted to have signal peptides located in their amino-terminus region. The Bip-pombe ortholog Kar2-yeast was not predicted to have any signal peptides (Figure 6). SignalP software identified possible cleavage sites for Lhs1-yeast and Lhs1-pombe between amino acid position 20-21 and position 21-22 respectively. Cleavage site for Bip-pombe was predicted to be between position 24 and 25. All other HSP70 sequences were not predicted to have a signal peptide. Bip-pombe, Kar2-yeast, Lhs1-pombe, and Lhs1-yeast sequences contained possible transmembrane regions in their N-terminus regions (Figure 7). Bip-pombe had transmembrane helices at position 7-24. Kar2-yeast and Lhs1-yeast both contained helices at position 7-29. Lhs1-pombe contained helices near the N-terminus region with a 0.57 possibility score. All other proteins were predicted to be non-membrane bound. Superfamily web database search revealed conserved ATPase domains at the N-terminus of ssa1-yeast (Figure 8). Ssa1-yeast also contained a HSP70 domain as well as the C-terminal HSP70 domain. This pattern was seen in nearly all other HSP70s. The notable exceptions were Lhs1-yeast and Lhs1-pombe. Lhs1-yeast lacked the HSP70 domain while Lhs1-pombe was missing both the HSP70 and HSP70 C-terminal sub-domain.

Nucleotide binding site analysis

We found that overall, the structure of LHS1 in both S. cerevisiae and S. pombe were nearly identical, and most visual differences were located on the loop portions of the amino acid sequence (Figure 9). When looking at differences between the NBS in both species, we found did differ in its amino acid composition, but structurally the two sites were also nearly identical. The differences in amino acid sequence may result in functional differences between these two proteins as several amino acids frequently changed in charge and polarity (Figure 10) between these two species.


S. cerevisiae and S. pombe are two fungi separated by millions of years of evolution as well as a complete genome rearrangement, and exhibit markedly different life histories. S. cerevisiae is a budding yeast and so it is likely under less selective pressure for rapid DNA replication and maintains 16 chromosomes. Conversely, S. pombe is likely under great selective pressure for its DNA to rapidly condense to chromosomes, replicate, and then quickly separate to the poles so that fission can occur, and so maintains only 3 chromosomes. This replication strategy can leave S. pombe vulnerable during its replication process, whereas S. cerevisiae is relatively unaffected. We anticipated that these different selective pressures might have caused large differences in the location of hsp70s in both of these species genomes, but the actual proteins themselves maintained structural and functional similarities.

When we compared the homologous HSP70s of S. pombe to S. cerevisiae, we found that S. cerevisiae usually contained more hsp70 genes than S. pombe. In some cases, for instance, in the mitochondrial based hsp70s, the divergences between both paralogs and orthologs are high. It is likely that these mitochondrial HSP70s represent ancient origins. However, it’s uncertain whether or not these genes arose before the genome duplication in S. cerevisiae, as we cannot determine whether or not they were lost in S. pombe or gained in S. cerevisiae from the data presented here. We can say though that since the divergences between these mitochondrial HSP70s are high, that it’s not likely they are undergoing concerted evolution. Conversely, for example, in the SSA complex of HSP70s, paralogous pairs of HSP70s appear to be undergoing concerted evolution, as paralogs always exhibit almost zero sequence divergence, although we cannot eliminate the possibility that each of these paralogous pairs arose only recently with the data presented here. In order to better answer these questions more fungal species must be added to the protein family tree of orthologs.

We found that, after modeling the evolutionary distances, the HSP70s of these two species had undergone large amounts of amino acid sequence divergence. In some cases, as in the LHS1 orthologous pair, the pairwise distance exceeds 100%. These findings suggest that these genes have undergone large amounts of mutations in the same locations (multiple hits), and that it is possible these distances have undergone saturation so that no more additional phylogenetic information might be garnered from comparing these sequences. This suggests that even though HSP70 genes are conserved across all three domains of life, they might actually be poor phylogenetic markers.

After determining that large amounts of sequence divergence had occurred, we then investigated whether or not these significant changes in amino acid sequence were associated with actual functional changes in these proteins. We did not find any significant differences in intron/exon structure save for pdr13-pombe. The presence of a single intron in pdr13-pombe could be attributed to S. pombe having nearly 20 times the number of introns compared to S. cerevisiae. One could reasonably expect that the large number of introns would increase the possibility of gaining introns in hsp70 genes even if the ancestral state was to not have introns. It is also possible that the intron corresponds to the regulatory sequence for a neighboring gene due to the compact nature of the S. pombe genome. Again, the addition of more fungal species to this analysis is necessary to truly assess the ancestral state of introns/exons in these two species and decide whether or not the similarity in the current state is due to convergent evolution or homology.

Analysis of the HSEs in each gene revealed surprising results. The ssa1 gene orthologs contained full HSEs which contradict the literature stating they are not heat-inducible. Ssa2-yeast is not heat-inducible as well. However, ssa2-pombe contains a full HSE. It is interesting to note that while the homologs ssa3-yeast and ssa4-yeast contain partial HSEs, they have been found experimentally to be heat inducible nevertheless (Boorstein and Craig, 1990; Werner-Washburne et al, 1989). It is possible that ssa3-yeast and ssa4-yeast retained its heat-inducible nature by having a sequence that was still sufficient enough for binding. It is also suggested that due to S. cerevisiae having 4 ssa homologs, ssa1-yeast and ssa2-yeast might have lost its heat-inducible ability over time due to the other genes being able to compensate for this loss. It is possible that ssa1-pombe and ssa2-pombe are indeed heat-inducible since they lack a third and fourth protein. The retention of the heat-inducible nature would be important since S. pombe does not have a third and fourth gene to compensate for such a loss. This would also suggest that while the S. pombe genes are more closely related to their S. cerevisiae orthologs, functionally they may be more closely related to SSA3-yeast and SSA4-yeast. Another thing to note is that the HSE sequence motif was positioned within 300 bp upstream of the coding sequence in genes that had HSEs. This suggests that the position of HSEs is evolutionary conserved between the two fungi, though the actual HSE sequence motif might have diverged. Overall, we found the presence or absence of HSEs to be highly conserved through time which we believe supports our hypothesis that these proteins are exhibiting similar functions.

When we analyzed the position of signal peptides, we found that all of the cytosolic and mitochondrial HSP70s lacked a signal peptide sequence. The ER proteins Bip-pombe, Lhs1-yeast, and Lhs1-pombe were predicted to have a signal peptide sequence in the N-terminus region followed by a cleavage site. This motif is consistent with other ER proteins as well as the experimental data (Baxter et. al, 1996). It is interesting to note that SignalP did not predict a signal peptide sequence for the Kar2-yeast protein even though it was present in the S. pombe ortholog. The presence of positive signal peptide scores within the first 40 amino acids of Kar2-yeast would suggest a possible sequence (Figure 6). Although the software could not confidently predict a signal peptide, Kar2-yeast has been experimentally verified to contain a signal peptide that is cleaved off after transport into the ER (Normington et al, 1989). This discrepancy could be explained by the presence of an extra 15 amino acids within the signal peptide region of Kar2-yeast that is not found in bip-pombe. The addition of these amino acids could be the reason why SignalP software did not accurately predict Kar2-yeast to have a signal peptide.

The TMHMM software detected transmembrane regions within the N-terminus of the ER proteins. Since these transmembrane helices are located within the signal peptide, it is suggested that these regions could help in the transport of the protein to the ER. None of the cytosolic or mitochondrial HSP70s were predicted to have transmembrane helices. Though prediction of protein characteristics using web software was accurate for almost all of the proteins, these results underscore the importance of experimentally verifying these predictions as well.

We also analyzed the 3-dimensional structure of these HSP70s, and found that for the most part, these proteins were structurally identical, although we did not conduct an electrostatic surface analysis which might detect more subtle changes in the structure of these proteins. We did notice that the lhs1 gene in S. cerevisiae and S. pombe, which exhibited the highest sequence divergence between orthologs, did also diverge structurally both from the remainder the HSP70 family proteins analyzed here as well as from each other. This is not surprising given that analysis of conserved protein domains showed identical patterns across all proteins analyzed save for the LHS1 protein group. However, when we focused on the nucleotide binding site of the lhs1 gene, we did not see any obvious structural change there, although we did note potentially important changes in amino acids (i.e. some amino acids shifted from negative to positive charge, hydrophobic to non-polar, etc). Most of the structural changes were observed in the loops, rather than helices or sheets of the proteins, but there is a great need for electrostatic potential analysis of these proteins to more accurately predict whether or not they might be functionally different.

Our studies have shown that the HSP70s in both S. cerevisiae and S. pombe are divergent at a genomic level, but highly conserved at the functional/structural protein level. Some of the discrepancies found in our analysis (e.g. HSE elements and signal peptides) underscore the importance of experimentally verifying each protein. Many of the S. pombe sequences used in this study were inferred from the S. pombe genome sequencing project. It is possible that different regulatory characteristics or novel protein functions may be discovered as future experimentation is done on S. pombe HSP70s. In conclusion, we believe that while these proteins are encountering extremely high rates of mutation and shuffling throughout their respective genomes, they are also experiencing equally strong purifying selection which acts on those mutations to maintain conserved structure and function in these sequences.


Literature Cited/Figures available upon request.