Wednesday, September 23, 2009

The Simulation Argument meets Evolutionary Informatics

From the Simulation Argument:
Unless we are now living in a simulation, our descendants will almost certainly never run an ancestor-simulation.
I think it would be interesting to run a few ancestor-simulations. What is our simulation likely to be like then? What about an evolutionary algorithm?

In evolutionary informatics, two paramaters are needed for evolutionary algorithms. Heritability and Selection.

Heritability implies the following:
  1. "Parents" give rise to "offspring".
  2. Traits from "parents" are passed on to "offspring".
  3. Each "offspring" from a "parent" signifies a new generation.
  4. Variation between generations may or may not occur.

Selection implies that certain traits that are not on a fitness landscape will not be selected.

Let's look at Autodock as an example and how it relates to evolutionary informatics. Autodock employs a genetic evolutionary algorithm in order to try and predict the orientation of a ligand within a protein.

The ligand is the heritable structure. (A ligand is any structure that binds to a protein, e.g. a therapeutic molecule)
The protein is the fitness landscape.
The genetic evolutionary algorithm provides the variation and selection parameters.
Consider the following diagram:
Figure 1: A) Basic lay out of memetic algorithms. A population of individuals is randomly seeded with regard to fitness (initialized). The individuals are randomly mutated and their fitness is measured. Individuals with optimal fitness are further mutated until convergence of a local optima is reached. The process is carried out for the entire initialized population. The global optima is selected from the various local optima. B) Fitness landscape with local optima (A, B and D) and a global optima (C). In a memetic algorithm, the initial population of individual are randomly seeded and can be viewed as any of the arrows indicated in the figure.

A few important aspects from the figure:
  1. Fitness depends on the phenotype.
  2. Fitness (in the case of Autodock) is the capability of the ligand phenotype to bind and stay bound to the protein.
  3. The parameters for succesful binding are many. For Autodock, the following are included:
  • Van der Waals interactions
  • Electrostatic interactions
  • Desolvation,
  • Hydrogen bond interactions
  • Torsional free energy
  • Conformational interactions
If certain parameters (above) are not on a fitness landscape for a certain ligand phenotype such as the absence of hydrogen bonds at a particular area of the protein, such a trait will not aid in ligand binding for a particular ligand with hydrogen bonds. Therefore,hydrogen bonding (as a trait) will not be on the fitness landscpe and is thus not a selectable trait.

Autodock uses a Solis & Wets search algorithm to probe the fitness landscape of a particular protein (See figure below).

The surface of a protein is where the binding of the ligand will occur, thus 3-dimensionally, the fitness landscape would resemble something like this:
Rapamycin ligand bound to the mTOR protein.

So how does the algorithm find the local optim within proteins?
With autodock, a population of individuals (ligands) are randomly placed within the receptor. The conformation ligand-protein interactions are measured for each individual and is then followed by a conformational "mutation" (See image below).

Ligand "mutation".

The binding energy for each conformation "mutation" is measured until a local optimum for a specific population of individuals is reached. The binding energy of the local optimum of each population is measured, and the global optimum is the population of individuals that have the best binding energy (See results below).

If the evolutionary algorithm is well designed, the conformation of the global optima will correspond to the experimentally determined crystallographic pose. The Root Means Squared Deviation (RMSD) of a docked ligand compared the to the crystallographic pose is generally used as a good indicator. A RMSD value less than 2 is considered a success. In the case of the Autodock software, the global optima is supposed to correlate with the crystallographic pose (RMSD <2). As an example, a ligand was docked into a protein with the following results.

Docked ligand positions and binding energies

Now let's consider another example in nature and how heritability and selection is applied.
As an example, consider the following diagram:

A fitness landscape (From here)

Again, a few important aspects from the figure:
  • 1) Fitness depends on the phenotype
  • 2) Fitness in this case is the capability of the phenotype to reproduce (self-replicate)
  • 3) The parameters for succesful self-replication are many. A few examples:
  • A) Fast replicators (e.g. bacteria)
  • B) Intelligent replicators (e.g. monkeys)
  • C) Cooperative replicators (e.g. ants)
  • D) A combination of the above (e.g. humans)
  • E) Population dynamics
  • F) And others...etc.

Therefore, if certain parameters are not on a fitness landscape for a certain phenotype, such as the capacity to construct a car, such a trait will not be selected in the next generation if the population of phenotypes consist of bacteria)

One thing that is interesting about the docking software is that because it seeds the ligands randomly within the protein and the position of the protein is "mutated" randomly, you will get different results every time. However, docking runs still converge on a the same global optimum after the evolutionary algorithms were completed. And the global optimum corresponds reasonably well to the crystallographic pose (the optimal design). That happens if the software is well designed of course.

There are many parallels between this evolutionary algorithm and the history of our universe. These include:
  1. Incorporation of randomness (with regards to fitness) as well as selection.
  2. The process is biased towards a few ends just like our own evolutionary history (e.g. An End to Endless Forms: Epistasis, Phenotype Distribution Bias, and Nonuniform Evolution)
  3. Convergence. Our evolutionary history is filled with examples of comvergence. E.g.:
  • The spectacular convergence of abiogenesis into a universal highly optimized genetic code that governs just about all life forms on earth.
  • Beautiful structural convergence on several levels. e.g. Convergent Evolution
  • Molecular convergence: Carbonic anhydrases, Prestin, Others
If we are indeed living in a simulation, an evolutionary algorithm of some sort seems plausible.

Question is:
Is this conception of reality with information and algorithms being fundamental categories really compatible with the mechanistic/anti-teleological conception of the material world?

Tuesday, September 22, 2009

Nano-intentionality and Molecular Autonomous Agents

Intrinsic intentionality and inherent goal-directedness of eukaryotic cells is defended by Tecumseh Fitch and minimal molecular autonomous agents are characterized in "On emergence, agency, and organization" (by Stuart Kauffman and Philip Clayton).

The "aboutness" and "goal-directedness" of eukaryotic cells and how it relates to nano-intentionality is defined as follows (p14):
The crucial pre-mental properties of a cell are that it can
1) respond to (somewhat) novel circumstances, eventualities for which it is not specifically-prepared by the evolutionary "memory" instantiated in its DNA.
2) discover, through an individual process of trial and error, some "adaptive" (in the physiological sense) response or solution.
3) in various ways incorporate the results of this discovery into its own structure, thus "recording" or "remembering" (in a non-mental sense) this past, individual history.
It is argued that simple single-celled eukaryotes possess nano-intentionality and it is stressed that one of the abilities of a nano-intentional structure is its ability to rearrange its physical structure in response to environmental circumstances. An example of eukaryotic chemotaxis (sensory adaptation) in the amoeba and its ability to react to environmental signals and adapt to them by inducing structural changes was given, e.g. when seeking and ingesting food was given. Chemotaxis involves structural changes in response to environmental circumstances and it is not limited to eukaryotes as bacterial cells are also capable of chemotaxis. In this respect, bacterial cells would qualify since no other reason was provided for not including bacterial cells.

Kauffman and Clayton argue that the simple example of a bacteria that is able to swim up a glucose gradient is an example of an organism acting on its own behalf and they call such a system a "molecular autonomous agent". They continue to provide a tentative five part definition of a minimal molecular autonomous agents (p505):
Such a system should be able to
1) Reproduce with heritable variation.
2) Perform at least one work cycle.
3) Have boundaries such that it can be individuated naturally.
4) Engage in self-propagating work and constraint construction.
5) Be able to choose between at least two alternatives.
The earliest life forms emerged about 3000-3400 million years ago (ref) and were likely bacteria.

Many natural sciences aim to detect the actions and intentions of agents. A few of these include forensic science, archeology and SETI.

If it is accepted that bacteria qualify as nano-intentional molecular autonomous agents then it would seem evolutionary and molecular biology also falls squarely into the category of natural sciences concerned with detecting the actions and intentions and/or "nano-intentions" of agents over time, be it mental or pre-mental.

If ID science is defined as "detecting the actions and intentions and/or "nano-intentions" of agents over time" does this mean evolutionary and molecular biology qualify as ID science?

Tuesday, December 2, 2008

The Simulation Argument and Research Potential


In 2002, Nick Bostrom proposed his version of the simulation argument. Anders Hammarström concluded with the following in his MA thesis:

The conclusion reached in this paper is that the argument is, at our current stage of technological development, in principle irrefutable. It all depends on whether or not consciousness can emerge from advanced computer simulations of the human brain, and the answer to this question is, unfortunately, out of our current reach.

David Chalmers in an entry on his blog makes the following (among other interesting) comments:

As for intelligent design, I'm on the record as saying that I can't rule out the hypothesis that we're living in a computer simulation, so I suppose that it follows that I can't rule out the hypothesis that our world is designed.

The simulation argument could thus provide a starting point to look for evidence that our universe might be as a result of mind by focusing on the simulation argument and looking for evidence that might support the simulation hypothesis.


What kind of simulation?

Memetic algorithms and pre-existing realities. An example.

Memetic Algorithms (MAs) are search techniques used to solve problems by mimicking molecular processes of evolution including selection, recombination, mutation and inheritance.

A few important aspects of MAs (Figure 1):
1. The fitness landscape needs to be finite.
2. The search space of the MA is limited to the fitness landscape.
3. There is at least one solution in the fitness landscape (Figure 2).
4. A fitness function determines the relationship between the fitness of the genotype (or phenotype) and the fitness landscape.
5. Selection is based on fitness.

Figure 1: A) Basic lay out of memetic algorithms. A population of individuals is randomly seeded with regard to fitness (initialized). The individuals are randomly mutated and their fitness is measured. Individuals with optimal fitness are further mutated until convergence of a local optima is reached. The process is carried out for the entire initialized population. The global optima is selected from the various local optima. B) Fitness landscape with local optima (A, B and D) and a global optima (C). In a memetic algorithm, the initial population of individual are randomly seeded and can be viewed as any of the arrows indicated in the figure.

Autodock (a molecular docking program) employs a MA in order to try and predict the orientation of a ligand within a protein receptor. A docking run with Autodock can be characterized by the following:

1. Finite fitness landscape: The physical properties of the protein receptor (E.g. electrostatic properties, Van der Waals interactions, desolvation energies etc.). This can be characterized as the pre-existing fitness landscape.
2. Search space: Confined to the protein receptor.
3. At least one solution: The original crystallographic pose.
4. Fitness function: Estimated Free Energy of Binding pose. This is determined through a combination of various interactions including Van der Waals-, electrostatic-, desolvation-, hydrogen bond- and torsional free energy.
5. Selection (guiding function): Selection is based on fitness, i.e. The Estimated Free Energy of Binding pose.

Using Autodock as an example, a docking simulation of Colchicine was run 4 times by docking Colchicine into the tubulin receptor. Each time the ligand is docked, 30 populations with 250 individuals (ligands) are randomly placed within the receptor and the position of each ligand is randomly "mutated" after which the Estimated Free Energy of the pose is measured. The position of each ligand is "mutated" until a local optima of the Estimated Free Energy of a ligand is reached. The local optima of each of the four docking runs were measured (results here) and in all four runs, the convergence of the global optima (in each run) corresponded reasonably well to the crystallographic pose (RMSD<1.8). Two conclusions can be reached thus far:

1. The software can predict the best pose (biologically relevant) of a ligand in a protein with reasonable success.

2. Separate runs converge on similar local optima after random variation and selection in a pre-existing finite fitness landscape.

Are there parallels between the above mentioned docking simulation and the evolution of life?

Compare MAs to our current understanding of some aspects of the universe.

1. Finite fitness landscape: From Einstein's equation, E=mc^2, all matter was ultimately created out of energy, and is theoretically reducible to energy, and thus energy can be understood to be the ultimate foundation of all matter in this universe. Recent Quantum Teleportation experiments have shown that it is possible to accept that energy supervenes on information (informationalism). What are some of the aspects of information which energy supervenes on? Mass, spin and charge? What about elemental proto-experiences (PEs) as phenomenal aspects that are properties of elementary particle (superimposed) described in this paper (another paper)?. Thus, the finite fitness landscape of the universe can be viewed to be the elemental physical properties; mass, spin, charge and PEs. Granted that PEs are currently a philosophical construct and not testable entities. Whether they are scientifically testable remains to be seen.
2. Search space: Confined to this universe, and since the beginning of life on earth, confined to earth.
3. At least one solution: Consciousness is at least one solution. Us.
4. Fitness function: Standard evolutionary theory describes fitness as the capability of an individual and/or a population of a certain genotype to reproduce (self-replicate). Is this fitness function solely confined to self-replication prowess? Intelligence and agency also seem to play a role in organisms that do not self-replicate in high numbers (e.g. elephants [low] vs bacteria [high]). Self-replication entities do not necessarily result in intelligent self-replicating entities, and intelligent self-replicating entities do not necessarily result in intelligent self-replicating agents. In order to differentiate between intelligence and agency, intelligence can be viewed as an ability to process information (e.g. genetics, proteomics, metabolomics) and agency can be viewed as the ability to willfully manipulate information. Can self-replication, intelligence and agency all be part of the fitness function?
5. Selection (guiding function): Natural selection and the ability of an organism to survive. Self-replication, intelligence and agency would thus play a part.


What about the evidence?

A. The Memetic Algorithms of life.
1) A reasonably optimal genetic code that seems to be optimized for evolution and random searches. It also maintains its own functional integrity.
2) Quality control systems are in place. These include DNA repair, protein folding and programmed cell death (also in unicellular organisms). Some of these systems are so efficient that they remove even functional mutated proteins from the population of proteins generated in the genome. Thus this serve to constrain evolution, preventing certain functional proteins from entering a population.
3) Variation inducers. These include cytosine deaminases, Low vs High fidelity polymerases, gene conversion and homologous recombination. The immune system harnesses the properties of the genetic code for antibody diversification. Protein folding is an exquisitely controlled process, but cells can tweak this process under periods of stress to introduce variation. For example:
Misfolded Proteins Accelarate Yeast Evolution
ScienceDaily (Nov. 24, 2008) — Under stress, yeast cells can unleash a remarkable mechanism based on protein-misfolding that gives them new characteristics without requiring genetic mutations.
The article continues to comment that this mechanism serves as a mechanism tailored for evolution? Irrespective of its origins, somehow, life utilizes evolution FOR evolution.

B. Convergence
As seen in the docking simulation, the solutions all converge on relatively similar local optima. Examples of molecular and structural convergence are abundant in nature. Also, abiogenesis spectacularly converged into a reasonably optimized genetic code (with a few derivatives) and life's memetic algorithms. Convergence in virtual simulations and in nature thus serve to point to similarities between them.

C. Quantum Physics and Consciousness
The Penrose-Hameroff orchestrated objective reduction (orch. OR) model provides a basis to connect consciousness with quantum mechanics. Philosophical models compatible with this view include Type-F monism (panprotopsychism) (Article) or Type-D Dualism. This view is also not incompatible with the Metaphysical Hypothesis discussed here (p10), and fits in nicely with the view as that information is an irreducible property of nature. Also, the Cambrian explosion and the emergence of consciousness (and agency?) can theoretically be connected by viewing the ability of an organism to quantum mechanically interact with the irreducible information of the universe as an adaptive advantage.

D) The Biased Nature of Evolution.
After running the docking simulations, the software seemed to have been biased towards a few local optima. Compare the developmental program to evolution. An interesting article that shows the parallels between evolution and development.

For development:
Primordial germ cells (PGC) are prevented from entering the somatic program and are demethylated (genome-wide erasure of existing epigenetic modifications). Then the gametes are imprinted (targeted DNA methylation) during gametogenesis, only to be demethylated again after fertilization. Then during development, DNA is methylated again, causing totipotential cells to become pluripotent. X-inactivation and reactivation (of the paternal gamete I think) also occurs. The whole process is governed by the genetic (and epigenetic?) program. During the unfolding of this somatic program, random variation and selection occur, ultimately leading to just a few endpoints, every time it is successful. The process is constrained (few end points) as a result of pre-existing information that is set up during the initiation of the process. All this is controlled by information in the genome.

For evolution:
There also seems to be only a few endpoints (small subset, limited variation) out of all the possible endpoints.
In the article:
An End to Endless Forms: Epistasis, Phenotype Distribution Bias, and Nonuniform Evolution
It is argued to be as a result of genetic instructions dating earlier in evolutionary time. (Preadaptations)

As in the case of the evolution of eyes, as soon as these sets of genes were formed (E.g. Pax genes), (through whatever mechanism), evolution seemed to have been biased to a few end points, and these few endpoints arose 40-60 times, independently, as a result of pre-existing (preadaptations) information in the case of eyes. To make it even more intriguing is the notion that the whole process is facilitated and under intrinsic control. Why? What other "biased" end points can there be? Nervous systems, smell, hearing? And why would evolution be biased, as in development, to only reach a few end points over and over?

To conclude, some evidence seem to point to parallels between our own designed simulated docking runs and the evolution of life. Humans seem to be teleological agents by nature. We plan, we create, we intentionally manipulate nature as a means to an end. Science by its very nature also seem teleological as scientists plan and execute experiments in order to gain an understanding of the universe and ourselves with the assumption and faith that we will be able to understand it. What to do?

Research Potential?

1) What other mechanisms exist that could bias evolution? Riboswitches? Biomelucular machines? Preadaptations?
2) Simulate a Front-loaded state and see what happens.
3) Can quantum mechanics and biological structures (e.g. tubulin) explain consciousness? What about elemental proto-experiences? Is it testable?
4) Why does the scientific endeavour seem to be biased toward teleological explanations for our existence, and why would one want to force non-teleological explanations?



Tuesday, November 25, 2008

The Optimality of the Genetic Code

Selected articles:
  1. Early Fixation of an Optimal Genetic Code
  2. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape
  3. The genetic code is nearly optimal for allowing additional information within protein-coding sequences
  4. An extension of the coevolution theory of the origin of the genetic code
  5. Can the genetic code be mathematically described?
  6. On the Hypercube Structure of the Genetic Code
  7. Topological structure of the triplet genetic code
  8. A Neutral Origin for Error Minimization in the Genetic Code.
  9. Does codon bias have an evolutionary origin?
  10. A chemical toolkit for proteins — an expanded genetic code
  11. Evolution and multilevel optimization of the genetic code

Article 1

Thus, to begin, in the first article it was determined by the researchers that:
Quote:
The Best of All Possible Codes?
When the error value of the standard code is compared with the lowest error value of any code found in an extensive search of parameter space, results are somewhat more variable. Estimates based on PAM data for the restricted set of codes indicate that the canonical code achieves between 96% and 100% optimization relative to the best possible code configuration (fig. 2c ). If our definition of biosynthetic restrictions are a good approximation of the possible variation from which the canonical code emerged, then it appears at or very close to a global optimum for error minimization: the best of all possible codes.
No better codes out of a million biosynthetically restricted codes.
This conclusion might be misleading though (addressed here), as the paper states that the tested codes were from a biosynthetically restricted set based on the current hypothesis of the evolution of the genetic code from pre-biotic scenarios. When not viewed from this point of view, other, more optimized codes are possible.

The next article (nr 2) shows that:

Quote:
Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak.
Thus showing in that analysis which include all possible codes (not only biosynthetically restricted codes) that the genetic code is partially optimal with regards to error minimization. It should be noted though that analysis only included a subset of the possible optimal feature of the code (i.e. error minimization).

From article 3

The analysis above did not include other nearly optimal features of the genetic code including:
A) The actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred.
B) The code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences.

Thus, two more features for which the code is close to being optimal. What is interesting about these two optimal features is that they may facilitate evolution i.e. the code is primed for the future by being optimal in allowing future incorporation of additional information.

In article nr.4

The coevolution theory of the origin of the genetic code is discussed. The theory suggests that the genetic code is an imprint of the biosynthetic (biosynthetically restricted) relationships between amino acids.
A few interesting observations can be made:
Firstly, from the article.
Quote:
As will become clear in the following, I maintain that these amino acid-pre-tRNAs came directly from the biosynthetic pathways of the first six amino acids evolving along the biosynthetic pathways of energetic metabolism and that they were the first amino acids to be codified on these still evolving mRNAs.
It should be noted that other exotic amino acids are also used by a few other codes (derived form the original). E.g. Selenocysteine and pyrrolysine are encoded for in many archaea and vertebrates. Archaea, however seem to be the most primitive organisms, thus these encoded amino acids must have been fixated early on.
Thus an interesting question can be applied to an "evolving" code as posited in the above quote:
Are these "still evolving" mRNAs, still evolving? Or did it hit an inevitable global optimum?

Secondly, from the article:
Quote:
While Wong [9] highlighted the precursor-product relationships between amino acids and their crucial role in defining the organisation of the genetic code, Miseta [10] clearly identified that the non-amino acid molecules that were precursors of amino acids might have been able to play an important role in organising the genetic code. Miseta [10] suggested the idea of an intimate relationship between molecules, the intermediates of glucose degradation, as precursors of precursor amino acids, and the organisation of the genetic code. This observation is also analysed by Taylor and Coates [11] who showed the relationship between the glycolytic pathway, the citric acid cycle, the biosyntheses of amino acids and the genetic code (Fig. 1) and, in particular, they point out that (i) all the amino acids that are members of a biosynthetic family tend to have codons with the same first base (Fig. 1) and (ii) that the five amino acids codified by GNN codons are found in four biosynthetic pathways close to or at the beginning of the pathway head (Fig. 1)[11]. More recently, Davis [12,13] has provided evidence that tRNAs descending from a common ancestor were adaptors of amino acids synthesised by a common precursor and he also discusses the biosynthetic families of amino acids, suggesting their importance in genetic code origin.
Is it correct to assume that in the presence of the precursors of the standard genetic code (e.g. intermediates of glucose degradation and the citric acid cycle), the intimate relationship between these molecules resulted in the inevitable organization of the genetic code (global optimum of the system)?

Articles 5-7

These articles discuss fascinating mathematical representation of the genetic code.
In article 5, the question is asked:
Can the genetic code be mathematically described?

A few intriguing properties arose from the investigation. Including:
Parity coding
Palindromic symmetry
Binary coding
Error-correction mechanism based on parity checking

The author conclude:
It remains striking, however, that different fundamental properties of the genetic code, such as degeneracy distribution, and also unexpected hidden properties, such as the palindromic symmetry and the parity marking of triplets presented here, reflect a strong mathematical order which is accurately described by means of one of the most elementary operations at the root of mathematics: number representation.


In article 6 a representation of the genetic code as a six–dimensional Boolean hypercube is proposed.
Abstract:
Quote:
It is assumed here that this structure is the result of the hierarchical order of the interaction energies of the bases in codon–anticodon recognition. The proposed structure demonstrates that in the genetic code there is a balance between conservatism and innovation. Comparing aligned positions in homologous protein sequences two different behaviors are found:
a)There are sites in which the different amino acids present may be explained by one or two “attractor nodes” (coding for the dominating amino acid(s)) and their one–bit neighbors in the codon hypercube, and
b) There are sites in which the amino acids present correspond to codons located in closed paths in the hypercube. The structure of the code facilitates evolution: the variation found at the variable positions of proteins do not corresponds to random jumps at the codon level, but to well defined regions of the hypercube.


Article 8:

In this article it once again discusses the optimality of the code and a few fascinating conclusions were made. For example:
Quote:
The genetic code has the remarkable property of error minimization, whereby the arrangement of amino acids to codons is highly efficient at reducing the deleterious effects of random point mutations and transcriptional and translational errors. Whether this property has been explicitly selected for is unclear. Here, three scenarios of genetic code evolution are examined, and their effects on error minimization assessed. First, a simple model of random stepwise addition of physicochemically similar amino acids to the code is demonstrated to result in substantial error minimization. Second, a model of random addition of physicochemically similar amino acids in a codon expansion scheme derived from the Ambiguity Reduction Model results in improved error minimization over the first model. Finally, a recently introduced 213 Model of genetic code evolution is examined by the random addition of physicochemically similar amino acids to a primordial core of four amino acids. Under certain conditions, 22% of the resulting codes produced according to the latter model possess equivalent or superior error minimization to the standard genetic code. These analyses demonstrate that a substantial proportion of error minimization is likely to have arisen neutrally, simply as a consequence of code expansion, facilitated by duplication of the genes encoding adaptor molecules and charging enzymes. This implies that selection is at best only partly responsible for the property of error minimization. These results caution against assuming that selection is responsible for every beneficial trait observed in living organisms.
Also form the article:
Quote:
The SGC (Standard Genetic Code) has an EM (Error Minimization) value (see Methods for calculation) of 60.7. Ten thousand random codes have an average EM value of 74.5, and only 0.03% of these have equal or greater optimality than the SGC. These calculations once again illustrate the remarkable ‘optimization’ of the genetic code for EM.
Thus, an important point is raised:
Quote:
The point should be made that explicit selection for EM seems to necessitate both the occurrence of codon reassignments and group selection to generate and select alternate codes. The proposal that explicit selection for the EM did not occur, and that EM arose neutrally from the addition of similar amino acids to similar codons, may be termed the ‘Nonadaptive Code’ Hypothesis, in contrast to the Adaptive Code Hypothesis. Finally, on a fundamental level, as a result of the analyses presented here, the presence of EM in the SGC may be used as evidence that enzymes, whether partially proteinaceous, RNA based, or based on some other macromolecule, were already extant during the evolution of the SGC.
The article cautions on blithely using natural selection as an explanation for the features of the genetic code.

Article 9:

In this article, the functional integrity and how the architecture of the code relates to it is discussed.
From the article:
Quote:
The results put the concept of "codon bias" into a novel perspective. The internal connectivity of codons indicates that all synonymous codons might be integrated parts of the Genetic Code with equal importance in maintaining its functional integrity.
Thus, the properties of the code allow it to maintain its own functional integrity.
Also form the article:
Quote:
The cumulative Codon Usage Frequency of any codon is strongly dependent on the cumulative Codon Usage Frequency of other codons belonging to the same species. The rules of this codon dependency are the same for all species and reflect WC base pair complementarity. This internal connectivity of codons indicates that all synonymous codons are integrated parts of the Genetic Code with equal importance in maintaining its functional integrity. The so-called codon bias is a bias caused by the protein-centric view of the genome.
The maintenance of the integrity of the code is not dependent on selection, but dependent on internal variables (feedback system) for maintaining functional integrity. Again, showing another form of optimality.

In article 10:

Fascinating research was conducted whereby a sundry of different unnatural amino acids with novel three and four base codons have been selectively incorporated (engineered) into proteins yielding viable organisms.

An intriguing question arises from this research. It is easy to imagine these to arise through chance and selection (e.g. amino acids with photoaffinity) and then be incorporated into the standard code. Yet the code seems to remain stagnant. For billions of year after fixation, little evolution happened in the code. Why?
Did it arrive at a global optimum in a pre-existing fitness landscape, with a pre-existing fitness function?


Finally article 11:
Bollenbach et al. (2007) briefly describes a few of the optimal features (some described above) of the genetic code:
Evolution and multilevel optimization of the genetic code
Quote:
They (Itzkovitz and Alon) compared the actual genetic code with an ensemble of all other codes that are equally optimized with respect to mistranslation or mutation (for more on this statistical approach, see also Alff-Steinberger 1969; Haig and Hurst 1991; Freeland and Hurst 1998). Assuming that the usage frequencies of the different amino acids are fixed, while their codon assignments vary in the ensemble, they find that the actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred. This new observation by Itzkovitz and Alon could therefore be seen as reviving the basis for Crick’s theory of a comma-less code, modified by the constraints imposed on the code by the need to be robust to other kinds of translation errors and mutations. Another possible interpretation of their result is that the amino acid usage has adjusted to reduce the effects of frameshift errors; alternative genetic codes would have had a different amino acid usage coadapted to them. It has been shown previously that amino acid usage is rather malleable, and, for example, influenced by GC content (Knight et al. 2001b).
Quote:
Itzkovitz and Alon suggest another, quite unanticipated, type of optimality: the code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences. Optimality for encoding additional information is particularly important and relevant given the known signals contained in the nucleotide sequence of coding regions. These include RNA splicing signals, which are encoded in the nucleotide sequence together with the amino acid sequence of the prospective protein (Cartegni et al. 2002), as well as signals recognized by the translation apparatus.
They briefly proceed to mention how it could have evolved:

Quote:
(1) the code has evolved under selection pressure to optimize certain functions such as minimization of the impact of mutations (Sonneborn 1965) or translation errors (Woese 1965a); Random mutation is a source of variability, yet selection pressure is believed to have selected for a system to put constraints on variability. Why?
Quote:
(2) the number of amino acids in the code has increased over evolutionary time according to evolution of the pathways for amino acid biosynthesis (Wong 1975)
Intriguing questions can arise from the above suggestions.
1) Why was selection so strong in removing the other variants with fewer codons?
2) Is there evidence of organisms using only 5, 6, 9, 13, 18 etc. amino acid codons? And why isn't the code expanding to incorporate other codons when it is not even difficult to envision it happening, as it can contribute to fitness AND variety (See article #10).

The authors point this out:
Quote:
The discovery of variant codes (Barrell et al. 1979; Fox 1987; Knight et al. 2001a) made the connection between evolvability and universality even more puzzling. On one hand, they prove that the genetic codes can evolve; on the other hand, if they could easily evolve, why are all variations minor? It was recently proposed that extensive horizontal gene transfer during early evolution can account for both evolution toward optimality and the near universality of the genetic code (Vetsigian et al. 2006).
Part of the answer lies in the code's inherent capability of maintaining its own functional integrity that is independent of natural selection (article #9). Also, it is cautioned against blithely invoking natural selection as an explanation for the properties of the code ( article #8).

The authors conclude:
Quote:
As we learn more about the functions of the genetic code, it becomes ever clearer that the degeneracy in the genetic code is not exploited in such a way as to optimize one function, but rather to optimize a combination of several different functions simultaneously. Looking deeper into the structure of the code, we wonder what other remarkable properties it may bear. While our understanding of the genetic code has increased substantially over the last decades, it seems that exciting discoveries are waiting to be made.


The genetic code sure is interesting. Irrespective of its origin, the code seems to be optimized for evolution and maintain its own functional integrity. Whatever the explanation for the origins of the code, whether intentional agency, only RV+NS, self-organization or a combination of these, the fact that these processes converged on a single, reasonably optimal code that is able to facilitate evolution makes it look like it was an inevitable result from the system. The system seems to be rigged and biased towards certain outcomes similar to the evolution of life. Why?

Sunday, September 28, 2008

Memetic Algorithms, Convergence and Pre-existing Fitness Landscapes

Memetic Algorithms

Memetic Algorithms (MAs) are search techniques used to solve problems by mimicking molecular processes of evolution including selection, recombination, mutation and inheritance.

A few important aspects of MAs (Figure 1):

  • The fitness landscape needs to be finite.
  • The search space of the MA is limited to the fitness landscape.
  • There is at least one solution in the fitness landscape (Figure 2).
  • A fitness function determines the relationship between the fitness of the genotype (or phenotype) and the fitness landscape.
  • Selection is based on fitness.

Figure 1: Basic lay out of memetic algorithms. A population of individuals is randomly seeded with regard to fitness (initialized). The individuals are randomly mutated and their fitness is measured. Individuals with optimal fitness are further mutated until convergence of a local optima is reached. The process is carried out for the entire initialized population. The global optima is selected from the various local optima.


Figure 2: Fitness landscape with local optima (A, B and D) and a global optima (C). In a memetic algorithm, the initial population of individual are randomly seeded and can be viewed as any of the arrows indicated in the figure.


Various molecular docking programs employ genetic algorithms in order to try and predict the orientation of a ligand within a protein receptor. Autodock employs a MA for this purpose. A good docking program is one that can reproduce an existing crystallographic pose with reasonable success. The Root Means Squared Deviation (RMSD) of a docked ligand compared the to the crystallographic pose is generally used as a good indicator. A RMSD value less than 2 is considered a success. In the case of the Autodock software, the global optima is supposed to correlate with the crystallographic pose (RMSD <2)

As an example to illustrate, Colchicine binds to tubulin and interferes with tubulin dynamics by inhibiting tubulin polymerization. Colchicine binds at a position between the alpha and beta tubulin dimer (Figures 3 and 4).



Figure 3: Colchicine binding site.


Figure 4: Colchicine binding cavity.


A docking run with Autodock can be characterized by the following:

Finite fitness landscape: The physical properties of the protein receptor (E.g. electrostatic properties, Van der Waals interactions and desolvation energies). Pre-existing fitness landscape.

Search space: Confined to the protein receptor.

At least one solution: Crystallographic pose.

Fitness function: Estimated Free Energy of Binding pose. This is determined through a combination of various interactions including Van der Waals-, electrostatic-, desolvation-, hydrogen bond- and torsional free energy.

Selection (guiding function): Selection is based on fitness.


Using Autodock, Colchicine was "docked" 4 times into the tubulin receptor. Each time the ligand is docked, 30 populations with 250 individuals (ligands) are randomly placed within the receptor. The local optima of each population is determined (blue bar graph). The results revealed the following (Figure 5).

Figure 5a: Run 1

Figure 5b: Run 2

Figure 5c: Run 3

Figure 5d: Run 4

All four runs converged on a the same global optima which also corresponded reasonably well to the crystallographic pose (RMSD<1.8).>

Is this process analogous to the evolution of life?


The Memetic Algorithms of life:
A) A genetic code that is optimized for random searches.
B) Quality control systems (DNA repair, protein quality, programmed cell death).
C) Variation inducers (Cytosine deaminases, Low vs High fidelity polymerases, gene conversion and homologous recombination).

Examples of convergence in the evolution of life:
Running MAs in pre-existing fitness landscapes result in the convergence of various local optima, with the global optima being the best of the local optima. Evolutionary history is filled with examples of convergence (local optima).

A) The spectacular convergence of abiogenesis into a universal optimized genetic code and life's memetic algorithms.
B) Structural convergence
Nice article showing various examples of convergent evolution.
C) Molecular convergence
Carbonic anhydrases
Prestin
More examples

Pre-existing fitness landscapes and the evolution of life:
The fitness of the docking pose of the ligand in the above example is dependent on the pre-existing properties of the receptor protein. These properties include:

Van der Waals energy
Electrostatic energy
Desolvation energy
Hydrogen bond energy
Torsional free energy
These are all combined to determine the fitness (binding energy) of the ligand.

Figure 6: Convergence of local optima of Colchicine in the pre-existing fitness landscape of the tubulin protein receptor Fitness (binding energy) is measured by Van der Waals-, Electrostatic-, Desolvation-, Hydrogen bond - and Torsional free energy. Replaying the docking run yields similar results every time.


Standard evolutionary theory describes fitness as the capability of an individual of a certain genotype to reproduce (self-replicate). What are the properties of the pre-existing fitness landscape of life that determines the fitness (self-replication) of life forms?

Should these properties include the following?

Reproduction success (self-replication)
Intelligence (Ability to process information - genetics, proteomics, metabolomics)
Agency (Ability to manipulate information)
Complexity (Emergence of complexity seems to be the first rule of evolution)


What are these properties composed of?
Perhaps elemental proto-experiences (PEs) as phenomenal aspects that are properties of elementary particle (superimposed) described in this paper? Can it connect quantum physics, consciousness (article) and evolution?


A "docking" (replaying the tape of life) run with such a simulation can be characterized by the following :

Finite fitness landscape: The physical properties of the universe (Mass, spin, charge and proto-experiences superimposed as elementary particles. The pre-existing fitness landscape.

Search space: Confined to the universe.

At least one solution: Self-replication.

Fitness function: Reproduction success. This is determined through a combination of various interactions including self-replication, intelligence, agency and emergence of complexity.

Selection (guiding function): Selection is based on fitness.


What would a "docking" run of life look like if we run it over and over with a pre-existing fitness landscape and universal memetic genetic algorithms (Figure 6)?

Figure 7: Convergence of local optima in a fitness landscape whereby fitness is measured by reproduction, intelligence, agency and complexity. If life's memetic algorithms are comparable to a "docking" run, it should yield similar local optima in pre-existing fitness landscapes every time the simulation is run.


Monday, September 8, 2008

Robustness and back-up systems

New Evidence On The Robustness Of Metabolic Networks

Biological systems are constantly evolving in ways that increase their fitness for survival amidst environmental fluctuations and internal errors. Now, in a study of cell metabolism, a Northwestern University research team has found new evidence that evolution has produced cell metabolisms that are especially well suited to handle potentially harmful changes like gene deletions and mutations.

You Can Be Replaced: Immune Cells Compensate For Defective DNA Repair Factor
Genetic instability can lead to multiple problems, including cell death and many forms of cancer. Therefore, it is absolutely critical for cells to have both the means to constantly survey genes for damage and the mechanisms to repair broken DNA. Currently, there are six well characterized classical non-homologous end-joining (C-NHEJ) factors that repair double strand breaks (DSBs) in mammalian cells.Lymphocytes, a type of immune cell, use a kind of genetic shuffling called variable, diversity, joining V(D)J recombination. This gene shuffling occurs during lymphocyte development and helps to produce diverse immune system cells that can recognize all sorts of different foreign substances, called antigens, that might pose a threat to the organism. Previous work in mice has shown that deficiency of C-NHEJ factors results in a severely compromised immune system, because of incomplete V(D)J recombination, along with increased sensitivity to cellular ionizing radiation (IR) and genomic instability.


Nice to know cell intelligence and evolution from a front-loaded state provide for robust systems with back-up. Preadaptation is good for the future.

Sunday, September 7, 2008

Front-loaded evolution

The idea of front-loaded evolution (FLE) has been around for a while and Mike Gene (pseudonym) is one of the protagonists of the hypothesis and fleshes out his idea in his book, The Design Matrix, and his blog.
A few definitions from the blog:
1) The original front-loaded state had sufficient information that would bias evolutionary trajectories needed to evolve complex, multicellular organisms.
2) Front-loading assumes that life began with a consortium of different genomes that, as a communicating group, contained sufficient information needed to bias evolutionary trajectories needed to evolve complex, multi-cellular organisms.
3) Front-loading does NOT predict we should find genes that serve no apparent purpose until the new function (in this case multicellularity) arises.

Other people also have interesting ideas that are friendly to the idea of FLE. Michael Sherman in his article about the Universal Genome in the Origin of Metazoa proposes the following:
1) first that a significant fraction of genetic information in lower taxons must be functionally useless but becomes useful in higher taxons
2) Second, that one should be able to turn on in lower taxons some of the complex latent developmental programs, e.g., a program of eye development or antibody synthesis in sea urchin.
These propositions are in contrast to the third definition from Mike Gene's blog, however they might not be mutually exclusive. Here is how:
We observe ultraconserved, ultraselected sequences with no apparent effect on fitness of the organism. The four sequences that were knocked out in this study had no visible immediate effect on fitness in the mice. Interestingly, one of the sequences (uc467) is found in the reptile, Carolina anole. Use this site to blast the uc467 sequence in eukatyotes. It would be interesting to see what the function of this sequence is in the Carolina anole genome and whether deletion of the sequence will have any effect on fitness.
It is thus not inconceivable that some of the sequences in the proposed Universal Genome in the Origin of Metazoa might have no immediate effect on fitness, but still have a function by acting as a reservoir of genetic material on which variation inducing mechanisms such as sequence duplication, somatic hypermutation, gene conversion and homologous recombination can act upon during periods of selection. Intracellular quality control mechanisms then act as selection mechanisms to keep the sequences in tact, be it as a result of a redundant mechanism or an EAM mechanism.
Thus, from a FLE (and EAM?) perspective, ultraconserved sequences of DNA (and other sequences) that do not have any effect on immediate fitness might function as a reservoir of information for future adaptation whereby the intelligent systems within cells can act upon and use during times of selection. In this way the two views might be reconciled whereby sequences with no effect on fitness still biases evolutionary trajectories because of the intelligent use of "functionless" sequences.

From a FLE and EAM perspective, instead of viewing cells as passive entities whereby random mutations (and other mechanisms) introduce variety for no reason on which natural selection blindly acts upon with regards to fitness, why not view cells as active entities that search random space for solutions during times of selection pressure. The intrinsic quality control systems can also be seen to act as selection mechanisms to constrain the random search and thus bias the output of a random search.

Is this view tenable? Evidence for evolution from a front-loaded state?
1) The universal genetic code seems to be optimized for random searches:
  1. They (Itzkovitz and Alon) compared the actual genetic code with an ensemble of all other codes that are equally optimized with respect to mistranslation or mutation. (Bollenbach et al. (2007))
  2. Itzkovitz and Alon suggest another, quite unanticipated, type of optimality: the code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences. (Bollenbach et al. (2007))
  3. The effect of cytosine deamination on a random pool of amino acids facilitates evolution. (Link)
  4. Cytosine deamination also does not result in any stop codon formation. (Link)
2) Variation inducers:
The optimal features of the genetic code allows it to be exploited to generate controlled variety. Take the immune system as an example:
Antibody diversification is crucial in limiting the frequency of environmentally acquired infections and thereby increasing the fitness of the organism. Initial diversification of antibodies is achieved by assembling variable (V), diversity (D) and joining (J) gene segments (V(D)J recombination) by non-homologous recombination. Further diversification is carried out by somatic hypermutation (SHM) and Class Switch Recombination. Central to the initiation to these diversification processes is the activation-induced cytosine deaminase (AID) protein. AID deaminates cytosine to uracil in single stranded DNA (ssDNA - arising during gene transcription) and is dependent on active gene transcription of the various antibody genes. The induced mutation is resolved by at least 4 pathways (Figure 4):
1) Copying of the base by high-fidelity polymerases during DNA replication.
2) Short-Patch Base Excision Repair (SP-BER) by uracil-DNA glycosylase removal and subsequent repair of the base.
3) Long-Patch Base Excision Repair (LP-BER)
4) Mismatch repair (MMR)

The activity and gene expression of AID is controlled. The type of error-repair pathway and the subsequent recruitment of various low-fidelity polymerases determine the type of mutations after the repair process and these also seem to be controlled. Current research focuses on the mechanisms of control of downstream repair pathways and why this system is selectively targeted to the small region of antibody genes.

Thus, the immune system exploits the optimal properties the genetic code for the purpose of controlled variability. Is the system limited to vertabrates or can similar systems be found in other organisms. Cytosine deamninases are found in bacteria as well. Error-prone (low-fidelity) repair systems are also present. Will we discover an active system in bacteria that exploits the properties of the genetic code for the purpose of controlled variability under selective pressure? Will RecA (An evolution gene) and LexA play a part?

Thus, variation inducers facilitate evolution by making use of random variation to generate variability.

3) Quality control systems (Intrinsic selection systems)
Cell division in all the domains of life is under extreme control in order to ensure daughter cells have sufficient machinery to repeat the process. Cell division is a highly regulated process with various positive- and negative-feedback systems. Quality control mechanisms during various stages of cell division monitor the fidelity of the process. Whenever the events during cell division leads to faulty cell division, another program is activated to remove the faulty cell from the population through programmed cell death (apoptosis, autophagy, metabolic catastrophy, necrosis etc.). Programmed cell death is a process found in multicelullar as well as unicellular organisms and can be activated through a variety of signaling networks.

Protein quality control mechanisms are also found in uni- and multicelular organisms. Sometimes, even functional mutated proteins get removed from the population of proteins generated in the genome. Thus this serve to constrain evolution, preventing certain functional proteins from entering a population.

Quality control systems would thus facilitate in constraining (biasing) evolution, putting it under intrinsic control.

What about the front-loaded state?
A) How far back into the history of life can we go to posit a front-loaded state?
Various trees of life exist, however the revised tree of life from Doolittle is probably our best understanding at present (HT: Zachriel). Where on that tree of life did life start to harness random variation for controlled variability?

Figure 1: Doolittle tree of life with proposed front-loaded states

It is probably best to view the front-loaded state to be at position A.

B) What does the front-loaded state consist of?
Figure 2: Components of a front-loaded state

Components of the front-loaded can be viewed as those components that are ubiquitous components of various life forms at present. These include:
Optimized genetic code
Cell division machinery (replisome)
Variation inducers (Cytosine deaminase)
Quality control mechanisms (chaperones, programmed cell death)



C) Does the front-loaded state have to have any metaphysical implications?
Not necessarily. A front-loaded state can still be viewed to be the result of random variation and selection. All that needs to be determined is to find a plausible route(s) to the perceived front-loaded state and an explanation why natural selection during abiogenesis was so strong to lead to only one front-loaded state and not the sundry of other possibilities (Figure 2), and why a front-loaded state is able to bias evolutionary trajectories.


All-in-all, the FLE and EAM hypotheses, together with the observations of biomolecular machines, convergent evolution, the possible connection between quantum physics and consciousness and the possibility that evolution is exploring pre-existing realities, provide a powerful teleological framework for looking at the evolution of life. And then there are the surprises waiting to be discovered in the sequenced genomes of primitive eukaryotes.