The Evolution of Eusociality in Insects

Epigenetics in Social Insects: A New Direction for Understanding the Evolution of Castes

Originally written April, 2012 by Bryan White

Article 1 Source:

Epigenetics is a new field of biology that deals with an only recently discovered method of DNA inactivation called DNA methylation. DNA methylation is the process in which sections of DNA are methylated and primarily occurs on cytosines, although they could occur on any nucleotide. In this paper, the current state and understanding of DNA methylation and how it relates to the development and evolution of insect castes (particularly in the eusocial insect groups) is reviewed. Methylation is not the only possible epigenetic mechanism. DNA acetylation (the addition of acetyl molecules) is also possible, as well as ubiquitination (the addition of the ubiquitin protein).  However, DNA methylation is probably the most common. The end result of DNA methylation is the existence of a secondary language on top of the DNA language that can be modified by environmental factors, can be passed on to the next generation, and influence the development of offspring. DNA methylation can also have an evolutionary affect by increasing the rate of mutations in genes that are methylated for multiple generations, for genes that are inactivated can accumulate stop codons and other deleterious mutations. Based on this, the authors hypothesize that DNA methylation is potentially the primary method for caste selection in eusocial insects.

Epigenetics brings a whole new aspect to the table for understanding how castes evolved, and how castes are regulated (should a larva develop into a queen or worker?) in eusocial systems. In hymenopteran eusocial species, there is typically a vast amount of physical diversity amongst castes (workers, soldiers, queens and male drones), and workers have found it hard to explain this diversity using only genetic methods. This is largely due to the fact that it is well known that the development and selection of what a larva will develop into is environmentally based, but scientists do not have a clear idea of exactly how that developmental “decision” is enforced. Epigenetics stands as a good explanation for how environmental factors can influence larval development, and the authors suggest this probably carried out by the presence of DNA methylation genes such as DNMT3, coincidentally which Drosophila is lacking and so was thought unimportant. The direct connection between the expression of DNMT3 and the genes that are methylated is a new, expanding area of research.

Another one of the difficulties in understanding the evolution of eusociality has been trying to explain its evolution in terms of kin selection, specifically that haplodiploid species exhibit on average 75% more genetic relatedness of sisters than other species. The benefits of a haplodiploidy system as an example of kin selection theory were that it provided a strict means for both the regulation of sexual dimorphism (males are made up of only the queen’s genome) and suggested some involvement in the development of castes. However, epigenetics and DNA methylation offers a much better explanation for the existence of both large amounts of sexual dimorphism and phenotypic plasticity. DNA methylation has been found in many eusocial hymenopteran species, as well as primitively social hymenopterans, suggesting that DNA methylation is both a heavily conserved trait and is correlated to sociality, phenotypic plasticity and sexual dimorphism. Better understanding the phylogenetic location of insect groups that make use of DNA methylation can probably elucidate the question as to whether or not DNA methylation is the sole (or primary) source of caste determination.

The authors also attempt to lay out a conceptual framework for future studies, however I found their model unclear. What the authors seem to suggest is that eusociality is correlated with DNA methylation, but not a requirement. They do, however, do a good job outlining the specific areas of DNA methylation that need to be explored and understood to eliminate other possible explanations for the correlation between DNA methylation and eusociality, such as understanding the mechanistic effects that DNA methylation has on gene splicing and whether or not it is possible for eusocial insects to exhibit caste differentiation without DNA methylation genes.

Article 2 Source:

In this article the researchers hypothesize that up until this date, all progress on kin selection theory has largely been abstract in nature and not provided any concrete evidence for the theory. They argue that, in order for kin selection theory to be fulfilled in an empirical system, several stringent conditions must be met.

First, all interactions that are measured must be “additive and pairwise”, that is, they must only affect the pair of individuals involved in the interaction. This means that synergistic effects, such as the simultaneous cooperation of more than two individuals, are unable to be measured or incorporated into any mathematical model of kin selection.

Second, they argue that kin selection theory can only be applied to a very limited subset of population structures due to the requirement of global updating of interactions wherein global updating is the idea that any two individuals are competing uniformly for reproduction regardless of their geographic proximity to each other.

Third, they argue that if these two requirements are met, and they can only be met in some limited, artificial world, then when these requirements are met that the organismal interactions within that aforementioned world are also acting according to the conditions of natural selection theory, and that kin selection theory does not provide any additional biological information.

Finally, the authors also argue that the apparent simplicity of kin selection theory compared to that of natural selection theory is an illusion. Since the primary component of kin selection theory is the calculation of inclusive fitness, and the calculation of inclusive fitness requires the state of “all individuals whose fitness is affected by an action, not only those whose payoff is changed” to be known, then in effect kin selection theory is requiring the same information to be known as natural selection theory the state of all individuals affected rather than only those whose payoff (fitness) is increased.

In order to overcome the limitations imposed by kin selection theory, the authors propose a general, multi-level model of natural selection theory using only the general principals of population genetics. This model is used to explain how eusociality might evolve in five distinct evolutionary stages.

First, an organism must reach a state where there are clear groups within a population. Groups typically form around resources, nest sites, when parents and offspring stay together, or when flocks go to known breeding grounds.

Second, these groups begin to accumulate traits, otherwise known as pre-adaptations, that will increase the overall cohesion and cooperation of these groups. One such pre-adaptation is when a parent places large numbers of paralyzed prey around her eggs so that when the eggs hatch they will have a food source readily available, and then she moves on to create another nest. The next step towards eusociality would be for the parent to stay near the nest and guard the eggs until they are hatched. However, at this stage, the offspring will still leave the nest and so will the parent –  there is still dispersion.

Third is the evolution of clearly eusocial alleles, that is, traits that enforce the primary traits of eusociality. The key traits here are for individuals to stay in the nest instead of dispersing, and then other cooperative pre-adaptations can come into play.

Fourth is probably what can be called the optimization stage in which these eusocial alleles can be selected upon to reinforce the nest/colony structure.

Fifth is the final phase and selection now operates on the colonies instead of the individual organisms, and the evolution of more derived traits such as castes (workers/soldiers), fungal farming, aphid farming, and other highly cooperative activities. Here the authors have outlined the framework through which future studies can be conducted, most likely which will be a combination of behavioral ecology and phylogenetics. My criticisms of this paper can only be restricted to the authors’ use of the words “primitive” and “advanced”, which are common misnomers in evolutionary biology. A better term should be less derived or more derived, in reference to the ancestral state. For instance, the caste system of most ants is more derived compared to the loose grouping structure of some wasps.

Blood quantum and DNA testing in Native American tribes

Originally written April, 2012, by Bryan White

The issue of what determines whether or not a person is an official member of a Native American tribe constantly changes and evolves with society, culture, and technology. In the New York Times article “Ancestry in a Drop of Blood”, Karen Kaplan traces the struggle of Marilyn Vann, a black woman that considers herself Native American. It had always been a dream of Marilyn’s to identify with her Native American ancestry, which she had known through many paper documents to be true, in an official way by joining the Cherokee Nation. The Cherokee Nation is one of the largest federally recognized tribes with over 299 thousand members enrolled, over 63% of which live in Oklahoma. While I agree that official membership to federally recognized tribes should be restricted, but the basis for those restrictions should not be made on centuries old documents such as the 1907 tribal roll call the “Final Rolls of Citizens and Freedmen of the Five Civilized Tribes in Indian Territory (Dawes)”.

The existence of federally recognized tribes is important both for Native American culture as well as American and Human culture as a whole. Each tribe has a multitude of unique cultural characteristics that are worth preserving in of themselves, and allowing members of these tribes federal recognition helps to preserve and ensure that each of their different cultures are protected in the way that actual members of that culture feel are proper. For this reason I agree it is important to maintain the cultural integrity of the tribes, so that each member of the tribe has some investment in protecting that culture. Tribal membership, the process of becoming a member of a federally recognized tribe, should therefore enact some requirements that ensure each member both has some investment in protecting the tribal integrity, and also deserves some benefit or reparations for the past suffering of their people. Marilyn Vann felt that she had enough connection with her Cherokee ancestry that wished to join the tribe official, and may have perhaps contributed to its cultural growth and preservation. However, the means by which that integrity is ensured can give rise to many legal and ethical problems, as is demonstrated in the fact that Marilyn Vann was denied her membership to the Cherokee tribe.

One such basis for a person gaining tribal membership is that of blood quantum, or the amount of “Native American blood” that a person has. However, the methods in which “blood quantum” is determined, and indeed the very definition of what “blood quantum” is, are somewhat tenuous and dubious. For example, under one definition, if a person has even a single drop of Native American blood, that person is considered Native American. However, under another definition, a person must have at least 25% Native American “blood” to be considered an actual Native American. The irony is that this method is the same one purveyed by proponents of racism during a segregated America, and is still widely used by the Federal Government to enforce the idea of race for employees and students. According to this rule, Marilyn Vann could not be determined to have enough Native American blood because her father had been listed as a Freedmen (a former slave) rather than a Cherokee, despite her knowledge that she was indeed part Cherokee.

One alternative method for determining ones claim to have Native American ancestry is to use DNA testing in which a potential tribe member. Marilyn Vann sought this method because she wished to be validated in her belief that she was indeed part Cherokee, even though her skin color and facial features probably looked more African American. The process of DNA testing to determine ones geographic ancestry is carried out by taking a sample of a person’s DNA and comparing several genes to a database wherein genetic sequences are already matched to their geographic location. According to this test, Marilyn Vann found out that she was indeed at least 3% Native American.

Should Marilyn Vann then be admitted to the Cherokee Nation? Personally, I believe the cumulative evidence suggests that Marilyn Vann should be considered a full member of the Cherokee Nation because she has taken the time to compile actual paper evidence of her ancestry, as well as genetic evidence. That time and effort suggests that Marilyn is interested in becoming an official member of the Cherokee Nation under goodwill, and that her intentions are to maintain the integrity of the tribe, and gain a sense of her position in the world as an African American and Native American. If, for example, Marilyn had simply taken a genetic test and found out that she was some small part Native American, then I would question her motives in becoming an official member of the tribe, but this is not the case. Hopefully tribes will begin to integrate the use of DNA testing in a fair way so that people who consider themselves Native Americans will not be shunned of their ancestry because of old and outdated methods that are based on the precepts of racism.

Comparing structural and functional elements of orthologous HSP70s in the fission yeast Schizosaccharomyces pombe and the budding yeast Saccharomyces cerevisiae

This is a research article I did on the heat shock proteins of two species of yeast in 2013.


Seventy-kD Heat Shock Protein (HSP70) is a multigene family of proteins that is important for cellular stress response and survival (Lindquist 1988). The HSP70 proteins are approximately 70 kDa in size and are highly conserved across all three domains of life (Eukaryotes, Prokaryotes, and Archaea). These genes are either constitutively expressed or heat inducible (Lindquist and Craig 1988). HSP70s are a family of ATPases that contain an N-terminal Adenosine Triphosphatase domain (aka. nucleotide binding domain, NBD), a substrate binding domain (SBD), and a C-terminal domain of varying length. These proteins are involved in the transport of proteins across membranes as well as protein folding in a cell (Hartl and Hayer-Harlt 2002). HSP70s’ role in protein folding is important in cell survival during heat shock stress. Higher temperatures can lead to protein misfolding and subsequent aggregation within the cell. HSP70s bind denatured or abnormal proteins via the exposed hydrophobic regions to prevent aggregation (Finley et al. 1984). Binding of these proteins also facilitates refolding into the proper conformation (Wegele et al. 2004). The structure and function of HSP70s are well studied in the yeast Saccharomyces cerevisiae.

Saccaromyces cerevisiae is a single-celled fungus that is used in applications such as beer brewing and bread making. Because the organism has important commercial uses, it has been subject to extensive study. S. cerevisiae has also been used as a model organism to study the function and structure of eukaryotic cells. Like other organisms, S. cerevisiae contains many HSP70 genes. There have been a total of 14 HSP70 genes discovered that are grouped by sub-cellular location. SSA1-4, SSB1-2 and SSE1-2 are HSP70s that reside in the cytosol (Lindquist and Craig 1988; Mukai et al. 1993); SSC1, SSQ and ECM10 are mitochondrial HSP70s (Voos et al 2002); and Kar2 and LHS1 reside in the endoplasmic reticulum (Normington et al 1989; Saris et al 1997). Although some hsp70 genes are heat-inducible, not all in the family share the same expression profile. Previous studies have shown that SSA2 expression is not temperature based, while the SSB proteins had decreased expression when temperature was increased (Craig et al. 1985).

Schizosaccharomyces pombe is a basal member of the fungi phylum Ascomycota, as are the rest of the subphylum Taphrinomycotina (Ebersberger 2012), although Taphrinomycotina may not be a monophyletic grouping (Schoch 2009). Unlike other Ascomycotes who reproduce by producing ascospores, Schizosaccharomyces divide by medial fission (Nurse 1976), hence Schizosaccharomyces are known as the “fission yeasts”. S. pombe was originally used as a component in the traditional African sorghum beer “pito” in Ghana (N’guessan 2011) and was not used as a scientific model organism until 1950 (Leupold). Previously, S. pombe has been used as a model organism for various genetic studies (Mitchison 1970, Gutz 1974, Beach and Nurse 1981, Hagan and Hyams 1988, Matsuyama 2006, Kim 2010), although it has not been used as widely as a model organism as S. cerevisiae due to its lack of easily controllable gene expression methods (Zilio 2012), although some recent progress has been made on developing effective methods of gene control that do not induce cellular stress (Zilio 2012).

The complete genome of S. cerevisiae was published in 1996 by Goffeau et al, and the genome of S. pombe was published by Wood et al. in 2002. Interestingly, the genome of S. cerevisiae is marked by a whole genome duplication event that led to the duplication of many genes (Wolfe 1997, Kellis 2004). Comparisons of S. pombe and S. cerevisiae are vital in understanding how HSP70 maintain similar functionality across great timespans as these two species likely diverged around 425 million years (Berbee et al. 2007). Interestingly, these two species have drastically different genomic arrangements. S. cerevisiae maintains 16 chromosomes and only 250 introns, S. pombe maintains only 3 chromosomes, but thousands of introns, which suggests these two species are experiencing very different selection pressures on a genomic-scale. If these two species have been experiencing different genomic-scale selective pressures, we would expect that HSP70s might have been shuffled around and undergone significant sequence divergence, yet still remained functionally the same. Specifically, we hypothesized the following: 1.) The presence/absence of regulatory elements (HSE, intron/exon) has been unchanged. 2.) Amino acid sequences have diverged significantly (~5%). 3.) Presence/absence of signal peptides or transmembrane proteins remained unchanged. 4.) Local gene neighborhood synteny has been lost. 5.) 3-dimensional structures have remained unchanged. 5.) The nucleotide binding site has remained functionally unchanged in the lhs1 HSP70 orthologs.


Sequence collection and ortholog detection

Orthologous sequences were detected by first obtaining known HSP70 sequences from S. cerevisiae S288 and then searching known yeast sequences against the S. pombe genome using BLASTp. We considered proteins that had greater than 25% identity to each search sequence to be potential orthologous sequences. We retrieved those potential orthologs and constructed a preliminary tree to make sure that all possible orthologs had been found in S. pombe. Orthologs that had not yet been found in S. pombe but were found in S. cerevisiae were then searched against the genomic sequence of S. pombe using the corresponding S. cerevisiae protein sequence in tBLASTn.

Phylogenetic analysis

S. cerevisiae and S. pombe protein sequences were aligned online using the MAFFT program (Katoh 2013) withG-INS-i parameters in order to achieve an optimal global alignment. Ortholog detection was done using the neighbor joining method and bootstrap method with pairwise p-distances of the amino acid (AA) sequences in the MEGA5.1 program (Felsenstein 1985; Saitou and Nei 1987; Tamura 2011). The best distance model for the protein data set was determined by using ProtTest 3.2 (Darriba 2011) and modeled AA trees were inferred using PhyML 3.0 and MEGA 5.1.

Following AA based tree drawing, the AA alignment was converted codon-by-codon to an aligned, genomic CDS alignment using a lookup table so that the resultant nucleotide alignment matched the AA-based alignment. The best nucleotide model for this data set was determined using jModelTest (Guindon and Gascuel 2003; Posada 2008). Following model selection, maximum likelihood phylogeny was inferred using the PhyML 3.1 standalone version (Guindon 2010) with 100 bootstrap replicates. The amino-acid based tree and nucleotide-based tree were compared for topological differences.

Exon-Intron Analysis

The NCBI gene database for S. cerevisiae and S. pombe was used to identify exons and introns within hsp70 genes. Genes with introns were analyzed using SPIDEY software to determine position of splice points in genomic DNA and intron phases.

Regulatory elements analysis

TransFac Match program was used to search for the presence of Heat Shock Elements (HSE) 1000 base pair (bp) upstream of each hsp70 coding sequence. A matrix and core match score above 85% was used to represent strong evidence for a HSE with thenucleotide motif nGAAnnTTCnnGAAn.The sequences GAAnnTTC or TTCnnGAA were also considered for possible HSEs.

Synteny Analysis

Gene maps of each hsp70 were obtained through NCBI database. Maps were used to compare the orientation of each gene in their respective genomes. Conservation of neighboring genes were also compared.

Protein characterization

SignalP 4.1 server software was used to predict possible signal peptides in HSP70 amino acid sequences using eukaryotic organism setting. TMHMM 2.0 server software was used to predict whether the HSP70 proteins were membrane bound by detecting the presence of transmembrane helices. Presence of conserved protein domains in HSP70s were searched for using Superfamily HMM library and genome assignments server version 1.75.
Nucleotide binding site analysis

In order to determine if HSP70 proteins were functionally different or merely differed in their nucleotide sequences, the 3-dimensional structures of a subset (the lhs1 genes) were analyzed. The nucleotide binding site (NBS), a conserved site in the nucleotide binding domain (NBD) found in all hsp70 genes, was targeted. This region is responsible for binding ATP which allows the opening of the substrate binding domain (SBD), and so its function should remain highly conserved through time, particularly the NBS site which itself is a cleft within the NBD where ATP binds (Liu 2007).


Orthologous sequence detection

We found a total of 14 hsp70 family proteins in S. cerevisiae, but only found 8 orthologous sequences in S. pombe (Table 1). The ssa gene group was marked by a paralogous duplicate in S. cerevisiae not found in S. pombe (Figure 1), as well as in the ssb, sse, and mitochondrial groups (Figure 1). All gene names will be referred to as shown in Figure 1. For each of the cellular regions that Hsp70 proteins were found in S. cerevisiae, S. pombe also had at least one ortholog. However, S. cerevisiae produced a markedly greater number of paralogs, although it maintains the same number of orthologs.

Phylogenetic analysis

The best model for the amino acid data set was found to be LG+I+F+G, although a JTT based NJ tree was also inferred for comparison. GTR+I+G was found to be the best model for the genomic CDS nucleotide data set. In the AA based data sets, several notable topological differences between NJ, p-distance based tree and the NJ, JTT distance based AA data set, as well as between both NJ methods and the maximum-likelihood based LG+I+F+G based tree, although these differences were not usually supported with strong bootstrap support. There were no topological differences between the maximum-likelihood, AA based tree and the maximum-likelihood, nucleotide based tree.

Differences between phylogenetic methods were present but subtle, and not usually strongly supported (greater than 95% support) by bootstrap analysis. The fact that the maximum-likelihood trees inferred using the best amino-acid model (LG+I+F+G) and the best nucleotide model (GTR+I+G) produced identical topologies suggests that these trees are correctly representing the evolutionary history of these heat-shock proteins.

Intron analysis of hsp70s

NCBI gene database revealed no introns within S. cerevisiae hsp70 genes. This was also the case in S. pombe except for pdr13-pombe, which contains two exons and one intron. SPIDEY analysis of pdr13-pombe revealed a 126 nucleotide phase two intron between positions 125-126 of the mRNA. This is compared to the ortholog ssz1-yeast in S. cerevisiae which contains no intron (Figure 3). The intron of pdr13-pombe is a phase 2 intron located fairly close to the start codon of the mRNA between positions 125-126.

Regulatory elements analysis

TransFac analysis revealed full HSE regions containing the GAAnnTTCnnGAA sequence in ssa1-yeast, kar2-yeast, ssa1-pombe, and ssa2-pombe genes. Both ssa1 genes from S. cerevisiae and S. pombe contained HSEs approximately 300 bp upstream of the start codon (Figure 4). All other hsp70 genes contained no HSEs or a Partial HSEs with either a GAAnnTTC or TTCnnGAA motif. The ssa2-yeast, ssa4-yeast, ssb2-yeast, sse1-yeast, sse2-yeast, bip-pombe, ssc1-pombe, pdr13-pombe, and pss1-pombe genes all contain partial HSEs. Ecm10-yeast, lhs1-yeast, ssa3-yeast, ssb1-yeast, ssc1-yeast, ssz1-yeast, ssq1-yeast, lhs1-pombe, and sks2-pombe contained no HSEs. The mitochondrial ssc1-pombe contains a partial HSE approximately 200 bp upstream of the start codon. This is compared to the ortholog ecm10-yeast which contains no HSE. Another example of this pattern is seen in the intron-containing pdr13-pombe and ssz1-yeast genes. The endoplasmic reticulum kar2-yeast was found to have a full HSE compared to its ortholog bip-pombe. The sse2-yeast and pss1-pombe both contain two partial HSEs (Figure 4).

Synteny Analysis

Comparison of gene maps for ssa1 and ssa2 orthologs revealed few syntenic relationships (Figure 5). Though ssa1 genes showed the same orientation, ssa2 genes were found to be in opposite orientation. There were no similarities in the neighboring genes for between ssa1 and ssa2 orthologs. The pattern of no syntenic relationships was observed for the other hsp70 genes (data not shown).

Protein Characterization

Lhs1-yeast, Lhs1-pombe, and Bip-pombe were predicted to have signal peptides located in their amino-terminus region. The Bip-pombe ortholog Kar2-yeast was not predicted to have any signal peptides (Figure 6). SignalP software identified possible cleavage sites for Lhs1-yeast and Lhs1-pombe between amino acid position 20-21 and position 21-22 respectively. Cleavage site for Bip-pombe was predicted to be between position 24 and 25. All other HSP70 sequences were not predicted to have a signal peptide. Bip-pombe, Kar2-yeast, Lhs1-pombe, and Lhs1-yeast sequences contained possible transmembrane regions in their N-terminus regions (Figure 7). Bip-pombe had transmembrane helices at position 7-24. Kar2-yeast and Lhs1-yeast both contained helices at position 7-29. Lhs1-pombe contained helices near the N-terminus region with a 0.57 possibility score. All other proteins were predicted to be non-membrane bound. Superfamily web database search revealed conserved ATPase domains at the N-terminus of ssa1-yeast (Figure 8). Ssa1-yeast also contained a HSP70 domain as well as the C-terminal HSP70 domain. This pattern was seen in nearly all other HSP70s. The notable exceptions were Lhs1-yeast and Lhs1-pombe. Lhs1-yeast lacked the HSP70 domain while Lhs1-pombe was missing both the HSP70 and HSP70 C-terminal sub-domain.

Nucleotide binding site analysis

We found that overall, the structure of LHS1 in both S. cerevisiae and S. pombe were nearly identical, and most visual differences were located on the loop portions of the amino acid sequence (Figure 9). When looking at differences between the NBS in both species, we found did differ in its amino acid composition, but structurally the two sites were also nearly identical. The differences in amino acid sequence may result in functional differences between these two proteins as several amino acids frequently changed in charge and polarity (Figure 10) between these two species.


S. cerevisiae and S. pombe are two fungi separated by millions of years of evolution as well as a complete genome rearrangement, and exhibit markedly different life histories. S. cerevisiae is a budding yeast and so it is likely under less selective pressure for rapid DNA replication and maintains 16 chromosomes. Conversely, S. pombe is likely under great selective pressure for its DNA to rapidly condense to chromosomes, replicate, and then quickly separate to the poles so that fission can occur, and so maintains only 3 chromosomes. This replication strategy can leave S. pombe vulnerable during its replication process, whereas S. cerevisiae is relatively unaffected. We anticipated that these different selective pressures might have caused large differences in the location of hsp70s in both of these species genomes, but the actual proteins themselves maintained structural and functional similarities.

When we compared the homologous HSP70s of S. pombe to S. cerevisiae, we found that S. cerevisiae usually contained more hsp70 genes than S. pombe. In some cases, for instance, in the mitochondrial based hsp70s, the divergences between both paralogs and orthologs are high. It is likely that these mitochondrial HSP70s represent ancient origins. However, it’s uncertain whether or not these genes arose before the genome duplication in S. cerevisiae, as we cannot determine whether or not they were lost in S. pombe or gained in S. cerevisiae from the data presented here. We can say though that since the divergences between these mitochondrial HSP70s are high, that it’s not likely they are undergoing concerted evolution. Conversely, for example, in the SSA complex of HSP70s, paralogous pairs of HSP70s appear to be undergoing concerted evolution, as paralogs always exhibit almost zero sequence divergence, although we cannot eliminate the possibility that each of these paralogous pairs arose only recently with the data presented here. In order to better answer these questions more fungal species must be added to the protein family tree of orthologs.

We found that, after modeling the evolutionary distances, the HSP70s of these two species had undergone large amounts of amino acid sequence divergence. In some cases, as in the LHS1 orthologous pair, the pairwise distance exceeds 100%. These findings suggest that these genes have undergone large amounts of mutations in the same locations (multiple hits), and that it is possible these distances have undergone saturation so that no more additional phylogenetic information might be garnered from comparing these sequences. This suggests that even though HSP70 genes are conserved across all three domains of life, they might actually be poor phylogenetic markers.

After determining that large amounts of sequence divergence had occurred, we then investigated whether or not these significant changes in amino acid sequence were associated with actual functional changes in these proteins. We did not find any significant differences in intron/exon structure save for pdr13-pombe. The presence of a single intron in pdr13-pombe could be attributed to S. pombe having nearly 20 times the number of introns compared to S. cerevisiae. One could reasonably expect that the large number of introns would increase the possibility of gaining introns in hsp70 genes even if the ancestral state was to not have introns. It is also possible that the intron corresponds to the regulatory sequence for a neighboring gene due to the compact nature of the S. pombe genome. Again, the addition of more fungal species to this analysis is necessary to truly assess the ancestral state of introns/exons in these two species and decide whether or not the similarity in the current state is due to convergent evolution or homology.

Analysis of the HSEs in each gene revealed surprising results. The ssa1 gene orthologs contained full HSEs which contradict the literature stating they are not heat-inducible. Ssa2-yeast is not heat-inducible as well. However, ssa2-pombe contains a full HSE. It is interesting to note that while the homologs ssa3-yeast and ssa4-yeast contain partial HSEs, they have been found experimentally to be heat inducible nevertheless (Boorstein and Craig, 1990; Werner-Washburne et al, 1989). It is possible that ssa3-yeast and ssa4-yeast retained its heat-inducible nature by having a sequence that was still sufficient enough for binding. It is also suggested that due to S. cerevisiae having 4 ssa homologs, ssa1-yeast and ssa2-yeast might have lost its heat-inducible ability over time due to the other genes being able to compensate for this loss. It is possible that ssa1-pombe and ssa2-pombe are indeed heat-inducible since they lack a third and fourth protein. The retention of the heat-inducible nature would be important since S. pombe does not have a third and fourth gene to compensate for such a loss. This would also suggest that while the S. pombe genes are more closely related to their S. cerevisiae orthologs, functionally they may be more closely related to SSA3-yeast and SSA4-yeast. Another thing to note is that the HSE sequence motif was positioned within 300 bp upstream of the coding sequence in genes that had HSEs. This suggests that the position of HSEs is evolutionary conserved between the two fungi, though the actual HSE sequence motif might have diverged. Overall, we found the presence or absence of HSEs to be highly conserved through time which we believe supports our hypothesis that these proteins are exhibiting similar functions.

When we analyzed the position of signal peptides, we found that all of the cytosolic and mitochondrial HSP70s lacked a signal peptide sequence. The ER proteins Bip-pombe, Lhs1-yeast, and Lhs1-pombe were predicted to have a signal peptide sequence in the N-terminus region followed by a cleavage site. This motif is consistent with other ER proteins as well as the experimental data (Baxter et. al, 1996). It is interesting to note that SignalP did not predict a signal peptide sequence for the Kar2-yeast protein even though it was present in the S. pombe ortholog. The presence of positive signal peptide scores within the first 40 amino acids of Kar2-yeast would suggest a possible sequence (Figure 6). Although the software could not confidently predict a signal peptide, Kar2-yeast has been experimentally verified to contain a signal peptide that is cleaved off after transport into the ER (Normington et al, 1989). This discrepancy could be explained by the presence of an extra 15 amino acids within the signal peptide region of Kar2-yeast that is not found in bip-pombe. The addition of these amino acids could be the reason why SignalP software did not accurately predict Kar2-yeast to have a signal peptide.

The TMHMM software detected transmembrane regions within the N-terminus of the ER proteins. Since these transmembrane helices are located within the signal peptide, it is suggested that these regions could help in the transport of the protein to the ER. None of the cytosolic or mitochondrial HSP70s were predicted to have transmembrane helices. Though prediction of protein characteristics using web software was accurate for almost all of the proteins, these results underscore the importance of experimentally verifying these predictions as well.

We also analyzed the 3-dimensional structure of these HSP70s, and found that for the most part, these proteins were structurally identical, although we did not conduct an electrostatic surface analysis which might detect more subtle changes in the structure of these proteins. We did notice that the lhs1 gene in S. cerevisiae and S. pombe, which exhibited the highest sequence divergence between orthologs, did also diverge structurally both from the remainder the HSP70 family proteins analyzed here as well as from each other. This is not surprising given that analysis of conserved protein domains showed identical patterns across all proteins analyzed save for the LHS1 protein group. However, when we focused on the nucleotide binding site of the lhs1 gene, we did not see any obvious structural change there, although we did note potentially important changes in amino acids (i.e. some amino acids shifted from negative to positive charge, hydrophobic to non-polar, etc). Most of the structural changes were observed in the loops, rather than helices or sheets of the proteins, but there is a great need for electrostatic potential analysis of these proteins to more accurately predict whether or not they might be functionally different.

Our studies have shown that the HSP70s in both S. cerevisiae and S. pombe are divergent at a genomic level, but highly conserved at the functional/structural protein level. Some of the discrepancies found in our analysis (e.g. HSE elements and signal peptides) underscore the importance of experimentally verifying each protein. Many of the S. pombe sequences used in this study were inferred from the S. pombe genome sequencing project. It is possible that different regulatory characteristics or novel protein functions may be discovered as future experimentation is done on S. pombe HSP70s. In conclusion, we believe that while these proteins are encountering extremely high rates of mutation and shuffling throughout their respective genomes, they are also experiencing equally strong purifying selection which acts on those mutations to maintain conserved structure and function in these sequences.


Literature Cited/Figures available upon request.