An Annotated Review of Joseph Le Conte’s “Elements of Geology (1896)”: Part 1

Joseph LeConte (February 26, 1823 – July 6, 1901), American geologist, conservationist, and professor at UC Berkeley, made significant contributions to the science of geology following the era of Charles Lyell and Charles Darwin. Importantly, this represents the major shift in scientific thought towards the principle of Uniformitarianism – that the natural processes we observe today have been in effect on the Earth for millennia. This allowed for scientists to make inferences about past events based on their knowledge and observations of current events, or events and processes described by other scientists. It is for this reason I’ve decided to review as much as possible of LeConte’s major work on geology – “The Elements of Geology” – in an attempt to understand 1.) how the science of geology has changed over the past 120 years since the book’s inception and 2.) understand, in general, how scientific thinking has changed over the past century. How did professor’s and scientists formulate arguments, describe complex ideas, develop new theories, and present evidence in age where the computer had not yet been invented? Hopefully this series of posts will shed some light on how scientific thinking and communication strategies have changed over the course of human development.

LeConte begins the Elements with a brief introductory chapter introducing his framework for the study of geology. Here, LeConte outlines the 3 principal departments of geology by using analogies to organic science – what we might today call the life sciences or biological sciences. He relates structural geology to the study of anatomy, dynamical or chemical geology to the study of physiology, and historical geology to the study of embryology or developmental biology. Interestingly though, in the following half of his introduction, he highlights a key difference between the two sciences.

But there are two important points of difference between geology and organic science. The central department of organic science is physiology, and both anatomy and embryology are chiefly studied to throw light on this. But the central department of geology, to which the others are subservient, is history. Again : in case of organisms – especially animal organisms – the nature of the changes producing development is such that the record of each previous condition is successively and entirely obliterated ; so that the science of embryology is possible only by direct observation of each successive stage. If this were true also of the earth, a history of the earth would, of course, be impossible. But, fortunately, we find that each previous condition of the earth has left its record indelibly impressed on its structure.


Elements of Geology, pg. 2, Joseph LeConte, D. Appleton and Company, 1896


Namely, LeConte emphasizes that the central tenet of the organic sciences (biology) is the understanding of physiology through the lenses of anatomy and embryology. Conversely, he states that the central tenet of geology is historical geology (the organic equivalent being embryology), and that the history of the earth can only be studied through the lenses of structural and dynamical geology. Interestingly, LeConte alludes here to a principal that might have been known at the time, but the specific mechanism largely a mystery – that biological organisms selectively kill of cells and tissue during various stages of embryonic development, or even post-embryonic, through specific biochemical pathways that have evolved to activate at specific times during an organisms life cycle. In this way, LeConte is correct – biology tends to eliminate its past completely through selective pruning, whereas geology has no pruning mechanism (save, perhaps, volcanic processes). In other words, in LeConte’s view, geological processes tend to preserve more historical information than organic ones.

Furthermore, another interesting faucet of science is touched on in this passage. Famously, Ernst Haeckel wrongly concluded from his embryological observations that “Ontology recapitulates phylogeny”. That is, the entire evolutionary history of an organism is played out during the embryological development of an organism (e.g., from fish to vertebrate in the development of a human fetus). This was later shown to be an incorrect conclusion as it was shown that developmental processes only retain traits or phases as they are relevant to the evolutionary fitness of an organism, and so the presence or absence of developmental phases (e.g., a “fish” phase or a “tail” phase) during embryological development reflects the steps needed to produce a healthy, functioning organism rather than specifically retaining each step of organism’s evolution in development. That is to say, entire phases of embryological development might be lost or gained regardless of evolutionary history.

Part of the thinking that led to the widespread belief that “Ontology recapitulated phylogeny” went in line with the tendency for early scientific thinkers to occasionally, or frequently, embrace teleological thinking – the process of describing scientific processes in terms of their apparent goal – which led to the anthropomorphical description of many processes later shown to be undirected (e.g., Darwinian selection and dynamical geological processes). Interestingly, LeConte’s statement that organic processes obliterate information, seems to directly contradict the idea that evolutionary history reflected during embryonic development, making him a possible early-adopter of the more rigorous Darwinian lines of thinking regarding the type of information that embryonic development actually portrays.

In concluding his introduction, LeConte describes the prime objective of geology “as the history of the earth and its inhabitants, as revealed in its structure, and as interpreted by causes still in operation”. This is an interesting “prime objective” of geology and it might find itself at odds with modern interpretations of the geological sciences. Although the study of the history of broad patterns in life and macroevolution are still relegated to geology in that tend to pertain to the major geological epoch’s described in earth’s history, the majority of specific scientific understandings for the “inhabitants” of earth have shifted into their own sciences: paleontology (the study of extinct organisms) and neontology (the study of extant, still living organisms). In this sense, modern students of geology may be confused by LeConte’s introduction and find it strange to learn that they are about to have indepth discussions regarding the evolutionary history of life on earth, but LeConte might respond that the two budding sciences are still deeply intertwined and so should be studied together.

Next up will be LeConte’s introduction to Dynamical Geology – the science of the active processes of geology as they can be observed in modern times. Stay tuned!

The Evolution of Eusociality in Insects

Epigenetics in Social Insects: A New Direction for Understanding the Evolution of Castes

Originally written April, 2012 by Bryan White

Article 1 Source:

Epigenetics is a new field of biology that deals with an only recently discovered method of DNA inactivation called DNA methylation. DNA methylation is the process in which sections of DNA are methylated and primarily occurs on cytosines, although they could occur on any nucleotide. In this paper, the current state and understanding of DNA methylation and how it relates to the development and evolution of insect castes (particularly in the eusocial insect groups) is reviewed. Methylation is not the only possible epigenetic mechanism. DNA acetylation (the addition of acetyl molecules) is also possible, as well as ubiquitination (the addition of the ubiquitin protein).  However, DNA methylation is probably the most common. The end result of DNA methylation is the existence of a secondary language on top of the DNA language that can be modified by environmental factors, can be passed on to the next generation, and influence the development of offspring. DNA methylation can also have an evolutionary affect by increasing the rate of mutations in genes that are methylated for multiple generations, for genes that are inactivated can accumulate stop codons and other deleterious mutations. Based on this, the authors hypothesize that DNA methylation is potentially the primary method for caste selection in eusocial insects.

Epigenetics brings a whole new aspect to the table for understanding how castes evolved, and how castes are regulated (should a larva develop into a queen or worker?) in eusocial systems. In hymenopteran eusocial species, there is typically a vast amount of physical diversity amongst castes (workers, soldiers, queens and male drones), and workers have found it hard to explain this diversity using only genetic methods. This is largely due to the fact that it is well known that the development and selection of what a larva will develop into is environmentally based, but scientists do not have a clear idea of exactly how that developmental “decision” is enforced. Epigenetics stands as a good explanation for how environmental factors can influence larval development, and the authors suggest this probably carried out by the presence of DNA methylation genes such as DNMT3, coincidentally which Drosophila is lacking and so was thought unimportant. The direct connection between the expression of DNMT3 and the genes that are methylated is a new, expanding area of research.

Another one of the difficulties in understanding the evolution of eusociality has been trying to explain its evolution in terms of kin selection, specifically that haplodiploid species exhibit on average 75% more genetic relatedness of sisters than other species. The benefits of a haplodiploidy system as an example of kin selection theory were that it provided a strict means for both the regulation of sexual dimorphism (males are made up of only the queen’s genome) and suggested some involvement in the development of castes. However, epigenetics and DNA methylation offers a much better explanation for the existence of both large amounts of sexual dimorphism and phenotypic plasticity. DNA methylation has been found in many eusocial hymenopteran species, as well as primitively social hymenopterans, suggesting that DNA methylation is both a heavily conserved trait and is correlated to sociality, phenotypic plasticity and sexual dimorphism. Better understanding the phylogenetic location of insect groups that make use of DNA methylation can probably elucidate the question as to whether or not DNA methylation is the sole (or primary) source of caste determination.

The authors also attempt to lay out a conceptual framework for future studies, however I found their model unclear. What the authors seem to suggest is that eusociality is correlated with DNA methylation, but not a requirement. They do, however, do a good job outlining the specific areas of DNA methylation that need to be explored and understood to eliminate other possible explanations for the correlation between DNA methylation and eusociality, such as understanding the mechanistic effects that DNA methylation has on gene splicing and whether or not it is possible for eusocial insects to exhibit caste differentiation without DNA methylation genes.

Article 2 Source:

In this article the researchers hypothesize that up until this date, all progress on kin selection theory has largely been abstract in nature and not provided any concrete evidence for the theory. They argue that, in order for kin selection theory to be fulfilled in an empirical system, several stringent conditions must be met.

First, all interactions that are measured must be “additive and pairwise”, that is, they must only affect the pair of individuals involved in the interaction. This means that synergistic effects, such as the simultaneous cooperation of more than two individuals, are unable to be measured or incorporated into any mathematical model of kin selection.

Second, they argue that kin selection theory can only be applied to a very limited subset of population structures due to the requirement of global updating of interactions wherein global updating is the idea that any two individuals are competing uniformly for reproduction regardless of their geographic proximity to each other.

Third, they argue that if these two requirements are met, and they can only be met in some limited, artificial world, then when these requirements are met that the organismal interactions within that aforementioned world are also acting according to the conditions of natural selection theory, and that kin selection theory does not provide any additional biological information.

Finally, the authors also argue that the apparent simplicity of kin selection theory compared to that of natural selection theory is an illusion. Since the primary component of kin selection theory is the calculation of inclusive fitness, and the calculation of inclusive fitness requires the state of “all individuals whose fitness is affected by an action, not only those whose payoff is changed” to be known, then in effect kin selection theory is requiring the same information to be known as natural selection theory the state of all individuals affected rather than only those whose payoff (fitness) is increased.

In order to overcome the limitations imposed by kin selection theory, the authors propose a general, multi-level model of natural selection theory using only the general principals of population genetics. This model is used to explain how eusociality might evolve in five distinct evolutionary stages.

First, an organism must reach a state where there are clear groups within a population. Groups typically form around resources, nest sites, when parents and offspring stay together, or when flocks go to known breeding grounds.

Second, these groups begin to accumulate traits, otherwise known as pre-adaptations, that will increase the overall cohesion and cooperation of these groups. One such pre-adaptation is when a parent places large numbers of paralyzed prey around her eggs so that when the eggs hatch they will have a food source readily available, and then she moves on to create another nest. The next step towards eusociality would be for the parent to stay near the nest and guard the eggs until they are hatched. However, at this stage, the offspring will still leave the nest and so will the parent –  there is still dispersion.

Third is the evolution of clearly eusocial alleles, that is, traits that enforce the primary traits of eusociality. The key traits here are for individuals to stay in the nest instead of dispersing, and then other cooperative pre-adaptations can come into play.

Fourth is probably what can be called the optimization stage in which these eusocial alleles can be selected upon to reinforce the nest/colony structure.

Fifth is the final phase and selection now operates on the colonies instead of the individual organisms, and the evolution of more derived traits such as castes (workers/soldiers), fungal farming, aphid farming, and other highly cooperative activities. Here the authors have outlined the framework through which future studies can be conducted, most likely which will be a combination of behavioral ecology and phylogenetics. My criticisms of this paper can only be restricted to the authors’ use of the words “primitive” and “advanced”, which are common misnomers in evolutionary biology. A better term should be less derived or more derived, in reference to the ancestral state. For instance, the caste system of most ants is more derived compared to the loose grouping structure of some wasps.

Europa and Ceres – Two Inter-Solar-System Bodies that May Contain Oceans of Liquid Water


Europa, one of Jupiter’s largest moons, is considered to be one of the most likely places within the solar system that might harbor life. Europa possesses a great number of characteristics that might lend themselves to the independent evolution of life, similar to what occurred on Earth. In this essay, I will outline some of those key characteristics and highlight where and why they might suggest Europa is a potential breeding ground for, at the very least, microbial organisms undergoing Darwinian selection.

Layers of Europa's Crust.

Layers of Europa’s Crust. Public Domain by Latitude0116 and RP88. Wikimedia Commons.

One of the primary characteristics of Europa that suggest it might harbor life is the presence of a water-ice crust (that is, instead of a rocky crust like on Earth, Europa has a crust made up of frozen water-ice). The presence of frozen water-ice in of itself, however, is not a major astrobiological finding. More importantly, beneath the water-ice crust of Europa, it is hypothesized that a liquid ocean of water exists, warmed from a likely volcanically active iron-nickel core. This liquid ocean is most likely trapped between a rocky nickel-iron mantle and frozen water-ice crust, forming a bubble where temperatures are warm enough to allow liquid water to exist, with the help of high levels of salts. Evidence for a liquid ocean beneath the frozen crust has been identified by the Hubble Space Telescope in the form of liquid vapor jets (cryogeysers) erupting from the surface of Europa. This suggests that the ocean is under pressure, most likely created by the thermal heat generated by Europa’s core, and rocky ice layers, causing increased pressure on the liquid ocean trapped between two rocky layers.

Heat is most likely generated in the core and rocky layers due to tidal flexing, that is, the gravitational pull of Jupiter causes deformation in the metallic core and rocky ice sheets. This deformation is the result of bending, crystalline structures – the act of which generates heat. This heat is most likely enough to allow the liquid ocean layer to persist beneath the rocky crust.

The composition of Europa’s surface is hypothesized to contain a high level of dissolved “sea salt” (sodium chloride), which would contribute to maintaining its liquid form at low temperatures and present an oceanic environment similar to that on Earth’s. However, because the concentration of sea salt is so theorized to be so high on Europa’s ocean, only extreme halophilic bacteria-like organisms could survive such conditions. With a subsurface temperature of -171 degrees Celsius, and a salt concentration significantly higher than Earth’s ocean, this seems like a plausible conclusion. However, this leaves open the possibility that pockets of warmer water, or haloclines (areas of lesser or greater salt concentrations), that may provide environments for more complex life forms to exist.

The search for evidence of life on Europa continues with NASA’s Europa Multiple-Flyby Mission ( which will conduct multiple, low-angle flybys of Jupiter’s moon Europa. Interestingly, the EMFM probe will posses an ice-penetrating radar, which should allow for scientists to take a closer look beneath the surface. Unfortunately we would not see the results from this mission until, at the earliest, 2026. Until then we will have to rely on near-earth telescope data and the image data that other probe missions have produced.”


Ceres, unlike Europa, is not a moon – it is a dwarf planet. Interestingly, it is the only dwarf planet that makes its home within the inner solar system. Specifically, Ceres orbits around the sun among the other asteroids and comets within the Kuiper Belt. Similar to Europa, however, Ceres sports a multi-layered crust that houses a large body of frozen water-ice. Unfortunately, it is not currently known whether or not any of the water on Ceres is still liquid. However, Ceres poses an interesting conundrum for astrobiology. Since it is a member of the inner solar system, it stands as one of the possible originating points for life in the solar system. How could life have evolved on this cold, icy, rock that is similar in size and shape to Pluto? Well, most likely, due to Ceres’ small size, it would have cooled and formed a proto-planetary disc much earlier than the Earth (4.5 billion years ago). If Ceres cooled enough to have a stable atmosphere (albeit a small one due to its small gravity), then the organic chemical reactions needed to produce complex nucleic acids, proteins, and lipid structures may have begun much earlier than they would have had on Earth.

Ceres Structural Layers.

Ceres Structural Layers. Public Domain by NASA/JPL. Wikimedia Commons.

The next step would have been for some asteroid or comet to impact with Ceres and drag along any proto-bacteria type life forms with it, all the way to Earth. According to this hypothesis, Ceres would have been the “founder” of life in the solar system, giving rise to the earliest forms of bacteria that populated an early Earth. Of course, conditions on Ceres would not have remained favorable to life for very long (in geological time), so any life forms that did evolve on Ceres would not have likely evolved much further than a simple bacteria. In that sense, Ceres might be a good place look for early signs of bacterial life, but we shouldn’t expect to find much more than that.

Comparing structural and functional elements of orthologous HSP70s in the fission yeast Schizosaccharomyces pombe and the budding yeast Saccharomyces cerevisiae

This is a research article I did on the heat shock proteins of two species of yeast in 2013.


Seventy-kD Heat Shock Protein (HSP70) is a multigene family of proteins that is important for cellular stress response and survival (Lindquist 1988). The HSP70 proteins are approximately 70 kDa in size and are highly conserved across all three domains of life (Eukaryotes, Prokaryotes, and Archaea). These genes are either constitutively expressed or heat inducible (Lindquist and Craig 1988). HSP70s are a family of ATPases that contain an N-terminal Adenosine Triphosphatase domain (aka. nucleotide binding domain, NBD), a substrate binding domain (SBD), and a C-terminal domain of varying length. These proteins are involved in the transport of proteins across membranes as well as protein folding in a cell (Hartl and Hayer-Harlt 2002). HSP70s’ role in protein folding is important in cell survival during heat shock stress. Higher temperatures can lead to protein misfolding and subsequent aggregation within the cell. HSP70s bind denatured or abnormal proteins via the exposed hydrophobic regions to prevent aggregation (Finley et al. 1984). Binding of these proteins also facilitates refolding into the proper conformation (Wegele et al. 2004). The structure and function of HSP70s are well studied in the yeast Saccharomyces cerevisiae.

Saccaromyces cerevisiae is a single-celled fungus that is used in applications such as beer brewing and bread making. Because the organism has important commercial uses, it has been subject to extensive study. S. cerevisiae has also been used as a model organism to study the function and structure of eukaryotic cells. Like other organisms, S. cerevisiae contains many HSP70 genes. There have been a total of 14 HSP70 genes discovered that are grouped by sub-cellular location. SSA1-4, SSB1-2 and SSE1-2 are HSP70s that reside in the cytosol (Lindquist and Craig 1988; Mukai et al. 1993); SSC1, SSQ and ECM10 are mitochondrial HSP70s (Voos et al 2002); and Kar2 and LHS1 reside in the endoplasmic reticulum (Normington et al 1989; Saris et al 1997). Although some hsp70 genes are heat-inducible, not all in the family share the same expression profile. Previous studies have shown that SSA2 expression is not temperature based, while the SSB proteins had decreased expression when temperature was increased (Craig et al. 1985).

Schizosaccharomyces pombe is a basal member of the fungi phylum Ascomycota, as are the rest of the subphylum Taphrinomycotina (Ebersberger 2012), although Taphrinomycotina may not be a monophyletic grouping (Schoch 2009). Unlike other Ascomycotes who reproduce by producing ascospores, Schizosaccharomyces divide by medial fission (Nurse 1976), hence Schizosaccharomyces are known as the “fission yeasts”. S. pombe was originally used as a component in the traditional African sorghum beer “pito” in Ghana (N’guessan 2011) and was not used as a scientific model organism until 1950 (Leupold). Previously, S. pombe has been used as a model organism for various genetic studies (Mitchison 1970, Gutz 1974, Beach and Nurse 1981, Hagan and Hyams 1988, Matsuyama 2006, Kim 2010), although it has not been used as widely as a model organism as S. cerevisiae due to its lack of easily controllable gene expression methods (Zilio 2012), although some recent progress has been made on developing effective methods of gene control that do not induce cellular stress (Zilio 2012).

The complete genome of S. cerevisiae was published in 1996 by Goffeau et al, and the genome of S. pombe was published by Wood et al. in 2002. Interestingly, the genome of S. cerevisiae is marked by a whole genome duplication event that led to the duplication of many genes (Wolfe 1997, Kellis 2004). Comparisons of S. pombe and S. cerevisiae are vital in understanding how HSP70 maintain similar functionality across great timespans as these two species likely diverged around 425 million years (Berbee et al. 2007). Interestingly, these two species have drastically different genomic arrangements. S. cerevisiae maintains 16 chromosomes and only 250 introns, S. pombe maintains only 3 chromosomes, but thousands of introns, which suggests these two species are experiencing very different selection pressures on a genomic-scale. If these two species have been experiencing different genomic-scale selective pressures, we would expect that HSP70s might have been shuffled around and undergone significant sequence divergence, yet still remained functionally the same. Specifically, we hypothesized the following: 1.) The presence/absence of regulatory elements (HSE, intron/exon) has been unchanged. 2.) Amino acid sequences have diverged significantly (~5%). 3.) Presence/absence of signal peptides or transmembrane proteins remained unchanged. 4.) Local gene neighborhood synteny has been lost. 5.) 3-dimensional structures have remained unchanged. 5.) The nucleotide binding site has remained functionally unchanged in the lhs1 HSP70 orthologs.


Sequence collection and ortholog detection

Orthologous sequences were detected by first obtaining known HSP70 sequences from S. cerevisiae S288 and then searching known yeast sequences against the S. pombe genome using BLASTp. We considered proteins that had greater than 25% identity to each search sequence to be potential orthologous sequences. We retrieved those potential orthologs and constructed a preliminary tree to make sure that all possible orthologs had been found in S. pombe. Orthologs that had not yet been found in S. pombe but were found in S. cerevisiae were then searched against the genomic sequence of S. pombe using the corresponding S. cerevisiae protein sequence in tBLASTn.

Phylogenetic analysis

S. cerevisiae and S. pombe protein sequences were aligned online using the MAFFT program (Katoh 2013) withG-INS-i parameters in order to achieve an optimal global alignment. Ortholog detection was done using the neighbor joining method and bootstrap method with pairwise p-distances of the amino acid (AA) sequences in the MEGA5.1 program (Felsenstein 1985; Saitou and Nei 1987; Tamura 2011). The best distance model for the protein data set was determined by using ProtTest 3.2 (Darriba 2011) and modeled AA trees were inferred using PhyML 3.0 and MEGA 5.1.

Following AA based tree drawing, the AA alignment was converted codon-by-codon to an aligned, genomic CDS alignment using a lookup table so that the resultant nucleotide alignment matched the AA-based alignment. The best nucleotide model for this data set was determined using jModelTest (Guindon and Gascuel 2003; Posada 2008). Following model selection, maximum likelihood phylogeny was inferred using the PhyML 3.1 standalone version (Guindon 2010) with 100 bootstrap replicates. The amino-acid based tree and nucleotide-based tree were compared for topological differences.

Exon-Intron Analysis

The NCBI gene database for S. cerevisiae and S. pombe was used to identify exons and introns within hsp70 genes. Genes with introns were analyzed using SPIDEY software to determine position of splice points in genomic DNA and intron phases.

Regulatory elements analysis

TransFac Match program was used to search for the presence of Heat Shock Elements (HSE) 1000 base pair (bp) upstream of each hsp70 coding sequence. A matrix and core match score above 85% was used to represent strong evidence for a HSE with thenucleotide motif nGAAnnTTCnnGAAn.The sequences GAAnnTTC or TTCnnGAA were also considered for possible HSEs.

Synteny Analysis

Gene maps of each hsp70 were obtained through NCBI database. Maps were used to compare the orientation of each gene in their respective genomes. Conservation of neighboring genes were also compared.

Protein characterization

SignalP 4.1 server software was used to predict possible signal peptides in HSP70 amino acid sequences using eukaryotic organism setting. TMHMM 2.0 server software was used to predict whether the HSP70 proteins were membrane bound by detecting the presence of transmembrane helices. Presence of conserved protein domains in HSP70s were searched for using Superfamily HMM library and genome assignments server version 1.75.
Nucleotide binding site analysis

In order to determine if HSP70 proteins were functionally different or merely differed in their nucleotide sequences, the 3-dimensional structures of a subset (the lhs1 genes) were analyzed. The nucleotide binding site (NBS), a conserved site in the nucleotide binding domain (NBD) found in all hsp70 genes, was targeted. This region is responsible for binding ATP which allows the opening of the substrate binding domain (SBD), and so its function should remain highly conserved through time, particularly the NBS site which itself is a cleft within the NBD where ATP binds (Liu 2007).


Orthologous sequence detection

We found a total of 14 hsp70 family proteins in S. cerevisiae, but only found 8 orthologous sequences in S. pombe (Table 1). The ssa gene group was marked by a paralogous duplicate in S. cerevisiae not found in S. pombe (Figure 1), as well as in the ssb, sse, and mitochondrial groups (Figure 1). All gene names will be referred to as shown in Figure 1. For each of the cellular regions that Hsp70 proteins were found in S. cerevisiae, S. pombe also had at least one ortholog. However, S. cerevisiae produced a markedly greater number of paralogs, although it maintains the same number of orthologs.

Phylogenetic analysis

The best model for the amino acid data set was found to be LG+I+F+G, although a JTT based NJ tree was also inferred for comparison. GTR+I+G was found to be the best model for the genomic CDS nucleotide data set. In the AA based data sets, several notable topological differences between NJ, p-distance based tree and the NJ, JTT distance based AA data set, as well as between both NJ methods and the maximum-likelihood based LG+I+F+G based tree, although these differences were not usually supported with strong bootstrap support. There were no topological differences between the maximum-likelihood, AA based tree and the maximum-likelihood, nucleotide based tree.

Differences between phylogenetic methods were present but subtle, and not usually strongly supported (greater than 95% support) by bootstrap analysis. The fact that the maximum-likelihood trees inferred using the best amino-acid model (LG+I+F+G) and the best nucleotide model (GTR+I+G) produced identical topologies suggests that these trees are correctly representing the evolutionary history of these heat-shock proteins.

Intron analysis of hsp70s

NCBI gene database revealed no introns within S. cerevisiae hsp70 genes. This was also the case in S. pombe except for pdr13-pombe, which contains two exons and one intron. SPIDEY analysis of pdr13-pombe revealed a 126 nucleotide phase two intron between positions 125-126 of the mRNA. This is compared to the ortholog ssz1-yeast in S. cerevisiae which contains no intron (Figure 3). The intron of pdr13-pombe is a phase 2 intron located fairly close to the start codon of the mRNA between positions 125-126.

Regulatory elements analysis

TransFac analysis revealed full HSE regions containing the GAAnnTTCnnGAA sequence in ssa1-yeast, kar2-yeast, ssa1-pombe, and ssa2-pombe genes. Both ssa1 genes from S. cerevisiae and S. pombe contained HSEs approximately 300 bp upstream of the start codon (Figure 4). All other hsp70 genes contained no HSEs or a Partial HSEs with either a GAAnnTTC or TTCnnGAA motif. The ssa2-yeast, ssa4-yeast, ssb2-yeast, sse1-yeast, sse2-yeast, bip-pombe, ssc1-pombe, pdr13-pombe, and pss1-pombe genes all contain partial HSEs. Ecm10-yeast, lhs1-yeast, ssa3-yeast, ssb1-yeast, ssc1-yeast, ssz1-yeast, ssq1-yeast, lhs1-pombe, and sks2-pombe contained no HSEs. The mitochondrial ssc1-pombe contains a partial HSE approximately 200 bp upstream of the start codon. This is compared to the ortholog ecm10-yeast which contains no HSE. Another example of this pattern is seen in the intron-containing pdr13-pombe and ssz1-yeast genes. The endoplasmic reticulum kar2-yeast was found to have a full HSE compared to its ortholog bip-pombe. The sse2-yeast and pss1-pombe both contain two partial HSEs (Figure 4).

Synteny Analysis

Comparison of gene maps for ssa1 and ssa2 orthologs revealed few syntenic relationships (Figure 5). Though ssa1 genes showed the same orientation, ssa2 genes were found to be in opposite orientation. There were no similarities in the neighboring genes for between ssa1 and ssa2 orthologs. The pattern of no syntenic relationships was observed for the other hsp70 genes (data not shown).

Protein Characterization

Lhs1-yeast, Lhs1-pombe, and Bip-pombe were predicted to have signal peptides located in their amino-terminus region. The Bip-pombe ortholog Kar2-yeast was not predicted to have any signal peptides (Figure 6). SignalP software identified possible cleavage sites for Lhs1-yeast and Lhs1-pombe between amino acid position 20-21 and position 21-22 respectively. Cleavage site for Bip-pombe was predicted to be between position 24 and 25. All other HSP70 sequences were not predicted to have a signal peptide. Bip-pombe, Kar2-yeast, Lhs1-pombe, and Lhs1-yeast sequences contained possible transmembrane regions in their N-terminus regions (Figure 7). Bip-pombe had transmembrane helices at position 7-24. Kar2-yeast and Lhs1-yeast both contained helices at position 7-29. Lhs1-pombe contained helices near the N-terminus region with a 0.57 possibility score. All other proteins were predicted to be non-membrane bound. Superfamily web database search revealed conserved ATPase domains at the N-terminus of ssa1-yeast (Figure 8). Ssa1-yeast also contained a HSP70 domain as well as the C-terminal HSP70 domain. This pattern was seen in nearly all other HSP70s. The notable exceptions were Lhs1-yeast and Lhs1-pombe. Lhs1-yeast lacked the HSP70 domain while Lhs1-pombe was missing both the HSP70 and HSP70 C-terminal sub-domain.

Nucleotide binding site analysis

We found that overall, the structure of LHS1 in both S. cerevisiae and S. pombe were nearly identical, and most visual differences were located on the loop portions of the amino acid sequence (Figure 9). When looking at differences between the NBS in both species, we found did differ in its amino acid composition, but structurally the two sites were also nearly identical. The differences in amino acid sequence may result in functional differences between these two proteins as several amino acids frequently changed in charge and polarity (Figure 10) between these two species.


S. cerevisiae and S. pombe are two fungi separated by millions of years of evolution as well as a complete genome rearrangement, and exhibit markedly different life histories. S. cerevisiae is a budding yeast and so it is likely under less selective pressure for rapid DNA replication and maintains 16 chromosomes. Conversely, S. pombe is likely under great selective pressure for its DNA to rapidly condense to chromosomes, replicate, and then quickly separate to the poles so that fission can occur, and so maintains only 3 chromosomes. This replication strategy can leave S. pombe vulnerable during its replication process, whereas S. cerevisiae is relatively unaffected. We anticipated that these different selective pressures might have caused large differences in the location of hsp70s in both of these species genomes, but the actual proteins themselves maintained structural and functional similarities.

When we compared the homologous HSP70s of S. pombe to S. cerevisiae, we found that S. cerevisiae usually contained more hsp70 genes than S. pombe. In some cases, for instance, in the mitochondrial based hsp70s, the divergences between both paralogs and orthologs are high. It is likely that these mitochondrial HSP70s represent ancient origins. However, it’s uncertain whether or not these genes arose before the genome duplication in S. cerevisiae, as we cannot determine whether or not they were lost in S. pombe or gained in S. cerevisiae from the data presented here. We can say though that since the divergences between these mitochondrial HSP70s are high, that it’s not likely they are undergoing concerted evolution. Conversely, for example, in the SSA complex of HSP70s, paralogous pairs of HSP70s appear to be undergoing concerted evolution, as paralogs always exhibit almost zero sequence divergence, although we cannot eliminate the possibility that each of these paralogous pairs arose only recently with the data presented here. In order to better answer these questions more fungal species must be added to the protein family tree of orthologs.

We found that, after modeling the evolutionary distances, the HSP70s of these two species had undergone large amounts of amino acid sequence divergence. In some cases, as in the LHS1 orthologous pair, the pairwise distance exceeds 100%. These findings suggest that these genes have undergone large amounts of mutations in the same locations (multiple hits), and that it is possible these distances have undergone saturation so that no more additional phylogenetic information might be garnered from comparing these sequences. This suggests that even though HSP70 genes are conserved across all three domains of life, they might actually be poor phylogenetic markers.

After determining that large amounts of sequence divergence had occurred, we then investigated whether or not these significant changes in amino acid sequence were associated with actual functional changes in these proteins. We did not find any significant differences in intron/exon structure save for pdr13-pombe. The presence of a single intron in pdr13-pombe could be attributed to S. pombe having nearly 20 times the number of introns compared to S. cerevisiae. One could reasonably expect that the large number of introns would increase the possibility of gaining introns in hsp70 genes even if the ancestral state was to not have introns. It is also possible that the intron corresponds to the regulatory sequence for a neighboring gene due to the compact nature of the S. pombe genome. Again, the addition of more fungal species to this analysis is necessary to truly assess the ancestral state of introns/exons in these two species and decide whether or not the similarity in the current state is due to convergent evolution or homology.

Analysis of the HSEs in each gene revealed surprising results. The ssa1 gene orthologs contained full HSEs which contradict the literature stating they are not heat-inducible. Ssa2-yeast is not heat-inducible as well. However, ssa2-pombe contains a full HSE. It is interesting to note that while the homologs ssa3-yeast and ssa4-yeast contain partial HSEs, they have been found experimentally to be heat inducible nevertheless (Boorstein and Craig, 1990; Werner-Washburne et al, 1989). It is possible that ssa3-yeast and ssa4-yeast retained its heat-inducible nature by having a sequence that was still sufficient enough for binding. It is also suggested that due to S. cerevisiae having 4 ssa homologs, ssa1-yeast and ssa2-yeast might have lost its heat-inducible ability over time due to the other genes being able to compensate for this loss. It is possible that ssa1-pombe and ssa2-pombe are indeed heat-inducible since they lack a third and fourth protein. The retention of the heat-inducible nature would be important since S. pombe does not have a third and fourth gene to compensate for such a loss. This would also suggest that while the S. pombe genes are more closely related to their S. cerevisiae orthologs, functionally they may be more closely related to SSA3-yeast and SSA4-yeast. Another thing to note is that the HSE sequence motif was positioned within 300 bp upstream of the coding sequence in genes that had HSEs. This suggests that the position of HSEs is evolutionary conserved between the two fungi, though the actual HSE sequence motif might have diverged. Overall, we found the presence or absence of HSEs to be highly conserved through time which we believe supports our hypothesis that these proteins are exhibiting similar functions.

When we analyzed the position of signal peptides, we found that all of the cytosolic and mitochondrial HSP70s lacked a signal peptide sequence. The ER proteins Bip-pombe, Lhs1-yeast, and Lhs1-pombe were predicted to have a signal peptide sequence in the N-terminus region followed by a cleavage site. This motif is consistent with other ER proteins as well as the experimental data (Baxter et. al, 1996). It is interesting to note that SignalP did not predict a signal peptide sequence for the Kar2-yeast protein even though it was present in the S. pombe ortholog. The presence of positive signal peptide scores within the first 40 amino acids of Kar2-yeast would suggest a possible sequence (Figure 6). Although the software could not confidently predict a signal peptide, Kar2-yeast has been experimentally verified to contain a signal peptide that is cleaved off after transport into the ER (Normington et al, 1989). This discrepancy could be explained by the presence of an extra 15 amino acids within the signal peptide region of Kar2-yeast that is not found in bip-pombe. The addition of these amino acids could be the reason why SignalP software did not accurately predict Kar2-yeast to have a signal peptide.

The TMHMM software detected transmembrane regions within the N-terminus of the ER proteins. Since these transmembrane helices are located within the signal peptide, it is suggested that these regions could help in the transport of the protein to the ER. None of the cytosolic or mitochondrial HSP70s were predicted to have transmembrane helices. Though prediction of protein characteristics using web software was accurate for almost all of the proteins, these results underscore the importance of experimentally verifying these predictions as well.

We also analyzed the 3-dimensional structure of these HSP70s, and found that for the most part, these proteins were structurally identical, although we did not conduct an electrostatic surface analysis which might detect more subtle changes in the structure of these proteins. We did notice that the lhs1 gene in S. cerevisiae and S. pombe, which exhibited the highest sequence divergence between orthologs, did also diverge structurally both from the remainder the HSP70 family proteins analyzed here as well as from each other. This is not surprising given that analysis of conserved protein domains showed identical patterns across all proteins analyzed save for the LHS1 protein group. However, when we focused on the nucleotide binding site of the lhs1 gene, we did not see any obvious structural change there, although we did note potentially important changes in amino acids (i.e. some amino acids shifted from negative to positive charge, hydrophobic to non-polar, etc). Most of the structural changes were observed in the loops, rather than helices or sheets of the proteins, but there is a great need for electrostatic potential analysis of these proteins to more accurately predict whether or not they might be functionally different.

Our studies have shown that the HSP70s in both S. cerevisiae and S. pombe are divergent at a genomic level, but highly conserved at the functional/structural protein level. Some of the discrepancies found in our analysis (e.g. HSE elements and signal peptides) underscore the importance of experimentally verifying each protein. Many of the S. pombe sequences used in this study were inferred from the S. pombe genome sequencing project. It is possible that different regulatory characteristics or novel protein functions may be discovered as future experimentation is done on S. pombe HSP70s. In conclusion, we believe that while these proteins are encountering extremely high rates of mutation and shuffling throughout their respective genomes, they are also experiencing equally strong purifying selection which acts on those mutations to maintain conserved structure and function in these sequences.


Literature Cited/Figures available upon request.

Oceans Around the World – The Sunda and Sahul Shelves

Notes on several key papers regarding biodiversity hotspots in and around the Sunda Shelf.

Article 1: Crandall 2008

Vicariance patterns as a result of Pleistocene sea-level changes in the Sunda Shelf area should be present in both invertebrates and their ectosymbionts. Highly variable results across many different studies have spurred the authors to explore a more closely linked hypothesis: That patterns of genetic variation found in marine invertebrates (in this case, two seastars) should closely match that of their ectosymbionts (a mollusk and crustacean). Most of the four species did show at least some genetic structure, but it was not concordant across species, with each species displaying a different pattern of range expansion most likely due to differences in dispersal, and adult survivability.


Map of Sunda and Sahul.

Map of Sunda and Sahul. CC 3.0 By Maximilian Dörrbecker (Chumwa). Wikimedia Commons.

Article 2: Crandall 2012

Sea-level changes during the end of the Last Glacial Maximum (LGM) should correspond closely with population range expansions of marine species. Prior to sea-level rises, the Sunda shelf and neighboring shelves were well above the ocean. Beginning roughly 20,000 years ago sea-levels began to rise rapidly, covering the Sunda shelf under water and facilitating the rapid expansion of marine species into this new habitat. The authors suggest that since the genetic signal of this sea-level rise is present in so many species, this event can be used as a means of calibrating the heterogeneous rate of mutation rates of lineages through time, that is, that younger lineages tend to have higher mutation rates. The authors proclaim strong support for the idea of time dependency of molecular clocks. This is an important understanding because correlating the time of geologic events with species/population events is a critical aspect of marine phylogeography.

Article 3: Kraus 2012

Here the authors investigate a genus of freshwater crab, Parathelphusa, for its historical biogeographic distributions in the Sunda region and the relation of those distributions to Pleistocene sea-level changes. The authors suggest that if Pleistocene-aged sea-level changes are responsible for the diversification of Parathelphusa clades throughout the Sunda region, then the rate of speciation should have greatly increased during that time. However, the authors find that most clades have Miocene or Pliocene origins, all with origins from Borneo although some speciation events did occur during the end of the Pleistocene, although rarely and via sporadic dispersal events as there have been no recent land-bridge connections.