Tuesday, November 25, 2008

The Optimality of the Genetic Code

Selected articles:
  1. Early Fixation of an Optimal Genetic Code
  2. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape
  3. The genetic code is nearly optimal for allowing additional information within protein-coding sequences
  4. An extension of the coevolution theory of the origin of the genetic code
  5. Can the genetic code be mathematically described?
  6. On the Hypercube Structure of the Genetic Code
  7. Topological structure of the triplet genetic code
  8. A Neutral Origin for Error Minimization in the Genetic Code.
  9. Does codon bias have an evolutionary origin?
  10. A chemical toolkit for proteins — an expanded genetic code
  11. Evolution and multilevel optimization of the genetic code

Article 1

Thus, to begin, in the first article it was determined by the researchers that:
Quote:
The Best of All Possible Codes?
When the error value of the standard code is compared with the lowest error value of any code found in an extensive search of parameter space, results are somewhat more variable. Estimates based on PAM data for the restricted set of codes indicate that the canonical code achieves between 96% and 100% optimization relative to the best possible code configuration (fig. 2c ). If our definition of biosynthetic restrictions are a good approximation of the possible variation from which the canonical code emerged, then it appears at or very close to a global optimum for error minimization: the best of all possible codes.
No better codes out of a million biosynthetically restricted codes.
This conclusion might be misleading though (addressed here), as the paper states that the tested codes were from a biosynthetically restricted set based on the current hypothesis of the evolution of the genetic code from pre-biotic scenarios. When not viewed from this point of view, other, more optimized codes are possible.

The next article (nr 2) shows that:

Quote:
Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak.
Thus showing in that analysis which include all possible codes (not only biosynthetically restricted codes) that the genetic code is partially optimal with regards to error minimization. It should be noted though that analysis only included a subset of the possible optimal feature of the code (i.e. error minimization).

From article 3

The analysis above did not include other nearly optimal features of the genetic code including:
A) The actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred.
B) The code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences.

Thus, two more features for which the code is close to being optimal. What is interesting about these two optimal features is that they may facilitate evolution i.e. the code is primed for the future by being optimal in allowing future incorporation of additional information.

In article nr.4

The coevolution theory of the origin of the genetic code is discussed. The theory suggests that the genetic code is an imprint of the biosynthetic (biosynthetically restricted) relationships between amino acids.
A few interesting observations can be made:
Firstly, from the article.
Quote:
As will become clear in the following, I maintain that these amino acid-pre-tRNAs came directly from the biosynthetic pathways of the first six amino acids evolving along the biosynthetic pathways of energetic metabolism and that they were the first amino acids to be codified on these still evolving mRNAs.
It should be noted that other exotic amino acids are also used by a few other codes (derived form the original). E.g. Selenocysteine and pyrrolysine are encoded for in many archaea and vertebrates. Archaea, however seem to be the most primitive organisms, thus these encoded amino acids must have been fixated early on.
Thus an interesting question can be applied to an "evolving" code as posited in the above quote:
Are these "still evolving" mRNAs, still evolving? Or did it hit an inevitable global optimum?

Secondly, from the article:
Quote:
While Wong [9] highlighted the precursor-product relationships between amino acids and their crucial role in defining the organisation of the genetic code, Miseta [10] clearly identified that the non-amino acid molecules that were precursors of amino acids might have been able to play an important role in organising the genetic code. Miseta [10] suggested the idea of an intimate relationship between molecules, the intermediates of glucose degradation, as precursors of precursor amino acids, and the organisation of the genetic code. This observation is also analysed by Taylor and Coates [11] who showed the relationship between the glycolytic pathway, the citric acid cycle, the biosyntheses of amino acids and the genetic code (Fig. 1) and, in particular, they point out that (i) all the amino acids that are members of a biosynthetic family tend to have codons with the same first base (Fig. 1) and (ii) that the five amino acids codified by GNN codons are found in four biosynthetic pathways close to or at the beginning of the pathway head (Fig. 1)[11]. More recently, Davis [12,13] has provided evidence that tRNAs descending from a common ancestor were adaptors of amino acids synthesised by a common precursor and he also discusses the biosynthetic families of amino acids, suggesting their importance in genetic code origin.
Is it correct to assume that in the presence of the precursors of the standard genetic code (e.g. intermediates of glucose degradation and the citric acid cycle), the intimate relationship between these molecules resulted in the inevitable organization of the genetic code (global optimum of the system)?

Articles 5-7

These articles discuss fascinating mathematical representation of the genetic code.
In article 5, the question is asked:
Can the genetic code be mathematically described?

A few intriguing properties arose from the investigation. Including:
Parity coding
Palindromic symmetry
Binary coding
Error-correction mechanism based on parity checking

The author conclude:
It remains striking, however, that different fundamental properties of the genetic code, such as degeneracy distribution, and also unexpected hidden properties, such as the palindromic symmetry and the parity marking of triplets presented here, reflect a strong mathematical order which is accurately described by means of one of the most elementary operations at the root of mathematics: number representation.


In article 6 a representation of the genetic code as a six–dimensional Boolean hypercube is proposed.
Abstract:
Quote:
It is assumed here that this structure is the result of the hierarchical order of the interaction energies of the bases in codon–anticodon recognition. The proposed structure demonstrates that in the genetic code there is a balance between conservatism and innovation. Comparing aligned positions in homologous protein sequences two different behaviors are found:
a)There are sites in which the different amino acids present may be explained by one or two “attractor nodes” (coding for the dominating amino acid(s)) and their one–bit neighbors in the codon hypercube, and
b) There are sites in which the amino acids present correspond to codons located in closed paths in the hypercube. The structure of the code facilitates evolution: the variation found at the variable positions of proteins do not corresponds to random jumps at the codon level, but to well defined regions of the hypercube.


Article 8:

In this article it once again discusses the optimality of the code and a few fascinating conclusions were made. For example:
Quote:
The genetic code has the remarkable property of error minimization, whereby the arrangement of amino acids to codons is highly efficient at reducing the deleterious effects of random point mutations and transcriptional and translational errors. Whether this property has been explicitly selected for is unclear. Here, three scenarios of genetic code evolution are examined, and their effects on error minimization assessed. First, a simple model of random stepwise addition of physicochemically similar amino acids to the code is demonstrated to result in substantial error minimization. Second, a model of random addition of physicochemically similar amino acids in a codon expansion scheme derived from the Ambiguity Reduction Model results in improved error minimization over the first model. Finally, a recently introduced 213 Model of genetic code evolution is examined by the random addition of physicochemically similar amino acids to a primordial core of four amino acids. Under certain conditions, 22% of the resulting codes produced according to the latter model possess equivalent or superior error minimization to the standard genetic code. These analyses demonstrate that a substantial proportion of error minimization is likely to have arisen neutrally, simply as a consequence of code expansion, facilitated by duplication of the genes encoding adaptor molecules and charging enzymes. This implies that selection is at best only partly responsible for the property of error minimization. These results caution against assuming that selection is responsible for every beneficial trait observed in living organisms.
Also form the article:
Quote:
The SGC (Standard Genetic Code) has an EM (Error Minimization) value (see Methods for calculation) of 60.7. Ten thousand random codes have an average EM value of 74.5, and only 0.03% of these have equal or greater optimality than the SGC. These calculations once again illustrate the remarkable ‘optimization’ of the genetic code for EM.
Thus, an important point is raised:
Quote:
The point should be made that explicit selection for EM seems to necessitate both the occurrence of codon reassignments and group selection to generate and select alternate codes. The proposal that explicit selection for the EM did not occur, and that EM arose neutrally from the addition of similar amino acids to similar codons, may be termed the ‘Nonadaptive Code’ Hypothesis, in contrast to the Adaptive Code Hypothesis. Finally, on a fundamental level, as a result of the analyses presented here, the presence of EM in the SGC may be used as evidence that enzymes, whether partially proteinaceous, RNA based, or based on some other macromolecule, were already extant during the evolution of the SGC.
The article cautions on blithely using natural selection as an explanation for the features of the genetic code.

Article 9:

In this article, the functional integrity and how the architecture of the code relates to it is discussed.
From the article:
Quote:
The results put the concept of "codon bias" into a novel perspective. The internal connectivity of codons indicates that all synonymous codons might be integrated parts of the Genetic Code with equal importance in maintaining its functional integrity.
Thus, the properties of the code allow it to maintain its own functional integrity.
Also form the article:
Quote:
The cumulative Codon Usage Frequency of any codon is strongly dependent on the cumulative Codon Usage Frequency of other codons belonging to the same species. The rules of this codon dependency are the same for all species and reflect WC base pair complementarity. This internal connectivity of codons indicates that all synonymous codons are integrated parts of the Genetic Code with equal importance in maintaining its functional integrity. The so-called codon bias is a bias caused by the protein-centric view of the genome.
The maintenance of the integrity of the code is not dependent on selection, but dependent on internal variables (feedback system) for maintaining functional integrity. Again, showing another form of optimality.

In article 10:

Fascinating research was conducted whereby a sundry of different unnatural amino acids with novel three and four base codons have been selectively incorporated (engineered) into proteins yielding viable organisms.

An intriguing question arises from this research. It is easy to imagine these to arise through chance and selection (e.g. amino acids with photoaffinity) and then be incorporated into the standard code. Yet the code seems to remain stagnant. For billions of year after fixation, little evolution happened in the code. Why?
Did it arrive at a global optimum in a pre-existing fitness landscape, with a pre-existing fitness function?


Finally article 11:
Bollenbach et al. (2007) briefly describes a few of the optimal features (some described above) of the genetic code:
Evolution and multilevel optimization of the genetic code
Quote:
They (Itzkovitz and Alon) compared the actual genetic code with an ensemble of all other codes that are equally optimized with respect to mistranslation or mutation (for more on this statistical approach, see also Alff-Steinberger 1969; Haig and Hurst 1991; Freeland and Hurst 1998). Assuming that the usage frequencies of the different amino acids are fixed, while their codon assignments vary in the ensemble, they find that the actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred. This new observation by Itzkovitz and Alon could therefore be seen as reviving the basis for Crick’s theory of a comma-less code, modified by the constraints imposed on the code by the need to be robust to other kinds of translation errors and mutations. Another possible interpretation of their result is that the amino acid usage has adjusted to reduce the effects of frameshift errors; alternative genetic codes would have had a different amino acid usage coadapted to them. It has been shown previously that amino acid usage is rather malleable, and, for example, influenced by GC content (Knight et al. 2001b).
Quote:
Itzkovitz and Alon suggest another, quite unanticipated, type of optimality: the code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences. Optimality for encoding additional information is particularly important and relevant given the known signals contained in the nucleotide sequence of coding regions. These include RNA splicing signals, which are encoded in the nucleotide sequence together with the amino acid sequence of the prospective protein (Cartegni et al. 2002), as well as signals recognized by the translation apparatus.
They briefly proceed to mention how it could have evolved:

Quote:
(1) the code has evolved under selection pressure to optimize certain functions such as minimization of the impact of mutations (Sonneborn 1965) or translation errors (Woese 1965a); Random mutation is a source of variability, yet selection pressure is believed to have selected for a system to put constraints on variability. Why?
Quote:
(2) the number of amino acids in the code has increased over evolutionary time according to evolution of the pathways for amino acid biosynthesis (Wong 1975)
Intriguing questions can arise from the above suggestions.
1) Why was selection so strong in removing the other variants with fewer codons?
2) Is there evidence of organisms using only 5, 6, 9, 13, 18 etc. amino acid codons? And why isn't the code expanding to incorporate other codons when it is not even difficult to envision it happening, as it can contribute to fitness AND variety (See article #10).

The authors point this out:
Quote:
The discovery of variant codes (Barrell et al. 1979; Fox 1987; Knight et al. 2001a) made the connection between evolvability and universality even more puzzling. On one hand, they prove that the genetic codes can evolve; on the other hand, if they could easily evolve, why are all variations minor? It was recently proposed that extensive horizontal gene transfer during early evolution can account for both evolution toward optimality and the near universality of the genetic code (Vetsigian et al. 2006).
Part of the answer lies in the code's inherent capability of maintaining its own functional integrity that is independent of natural selection (article #9). Also, it is cautioned against blithely invoking natural selection as an explanation for the properties of the code ( article #8).

The authors conclude:
Quote:
As we learn more about the functions of the genetic code, it becomes ever clearer that the degeneracy in the genetic code is not exploited in such a way as to optimize one function, but rather to optimize a combination of several different functions simultaneously. Looking deeper into the structure of the code, we wonder what other remarkable properties it may bear. While our understanding of the genetic code has increased substantially over the last decades, it seems that exciting discoveries are waiting to be made.


The genetic code sure is interesting. Irrespective of its origin, the code seems to be optimized for evolution and maintain its own functional integrity. Whatever the explanation for the origins of the code, whether intentional agency, only RV+NS, self-organization or a combination of these, the fact that these processes converged on a single, reasonably optimal code that is able to facilitate evolution makes it look like it was an inevitable result from the system. The system seems to be rigged and biased towards certain outcomes similar to the evolution of life. Why?

Sunday, September 28, 2008

Memetic Algorithms, Convergence and Pre-existing Fitness Landscapes

Memetic Algorithms

Memetic Algorithms (MAs) are search techniques used to solve problems by mimicking molecular processes of evolution including selection, recombination, mutation and inheritance.

A few important aspects of MAs (Figure 1):

  • The fitness landscape needs to be finite.
  • The search space of the MA is limited to the fitness landscape.
  • There is at least one solution in the fitness landscape (Figure 2).
  • A fitness function determines the relationship between the fitness of the genotype (or phenotype) and the fitness landscape.
  • Selection is based on fitness.

Figure 1: Basic lay out of memetic algorithms. A population of individuals is randomly seeded with regard to fitness (initialized). The individuals are randomly mutated and their fitness is measured. Individuals with optimal fitness are further mutated until convergence of a local optima is reached. The process is carried out for the entire initialized population. The global optima is selected from the various local optima.


Figure 2: Fitness landscape with local optima (A, B and D) and a global optima (C). In a memetic algorithm, the initial population of individual are randomly seeded and can be viewed as any of the arrows indicated in the figure.


Various molecular docking programs employ genetic algorithms in order to try and predict the orientation of a ligand within a protein receptor. Autodock employs a MA for this purpose. A good docking program is one that can reproduce an existing crystallographic pose with reasonable success. The Root Means Squared Deviation (RMSD) of a docked ligand compared the to the crystallographic pose is generally used as a good indicator. A RMSD value less than 2 is considered a success. In the case of the Autodock software, the global optima is supposed to correlate with the crystallographic pose (RMSD <2)

As an example to illustrate, Colchicine binds to tubulin and interferes with tubulin dynamics by inhibiting tubulin polymerization. Colchicine binds at a position between the alpha and beta tubulin dimer (Figures 3 and 4).



Figure 3: Colchicine binding site.


Figure 4: Colchicine binding cavity.


A docking run with Autodock can be characterized by the following:

Finite fitness landscape: The physical properties of the protein receptor (E.g. electrostatic properties, Van der Waals interactions and desolvation energies). Pre-existing fitness landscape.

Search space: Confined to the protein receptor.

At least one solution: Crystallographic pose.

Fitness function: Estimated Free Energy of Binding pose. This is determined through a combination of various interactions including Van der Waals-, electrostatic-, desolvation-, hydrogen bond- and torsional free energy.

Selection (guiding function): Selection is based on fitness.


Using Autodock, Colchicine was "docked" 4 times into the tubulin receptor. Each time the ligand is docked, 30 populations with 250 individuals (ligands) are randomly placed within the receptor. The local optima of each population is determined (blue bar graph). The results revealed the following (Figure 5).

Figure 5a: Run 1

Figure 5b: Run 2

Figure 5c: Run 3

Figure 5d: Run 4

All four runs converged on a the same global optima which also corresponded reasonably well to the crystallographic pose (RMSD<1.8).>

Is this process analogous to the evolution of life?


The Memetic Algorithms of life:
A) A genetic code that is optimized for random searches.
B) Quality control systems (DNA repair, protein quality, programmed cell death).
C) Variation inducers (Cytosine deaminases, Low vs High fidelity polymerases, gene conversion and homologous recombination).

Examples of convergence in the evolution of life:
Running MAs in pre-existing fitness landscapes result in the convergence of various local optima, with the global optima being the best of the local optima. Evolutionary history is filled with examples of convergence (local optima).

A) The spectacular convergence of abiogenesis into a universal optimized genetic code and life's memetic algorithms.
B) Structural convergence
Nice article showing various examples of convergent evolution.
C) Molecular convergence
Carbonic anhydrases
Prestin
More examples

Pre-existing fitness landscapes and the evolution of life:
The fitness of the docking pose of the ligand in the above example is dependent on the pre-existing properties of the receptor protein. These properties include:

Van der Waals energy
Electrostatic energy
Desolvation energy
Hydrogen bond energy
Torsional free energy
These are all combined to determine the fitness (binding energy) of the ligand.

Figure 6: Convergence of local optima of Colchicine in the pre-existing fitness landscape of the tubulin protein receptor Fitness (binding energy) is measured by Van der Waals-, Electrostatic-, Desolvation-, Hydrogen bond - and Torsional free energy. Replaying the docking run yields similar results every time.


Standard evolutionary theory describes fitness as the capability of an individual of a certain genotype to reproduce (self-replicate). What are the properties of the pre-existing fitness landscape of life that determines the fitness (self-replication) of life forms?

Should these properties include the following?

Reproduction success (self-replication)
Intelligence (Ability to process information - genetics, proteomics, metabolomics)
Agency (Ability to manipulate information)
Complexity (Emergence of complexity seems to be the first rule of evolution)


What are these properties composed of?
Perhaps elemental proto-experiences (PEs) as phenomenal aspects that are properties of elementary particle (superimposed) described in this paper? Can it connect quantum physics, consciousness (article) and evolution?


A "docking" (replaying the tape of life) run with such a simulation can be characterized by the following :

Finite fitness landscape: The physical properties of the universe (Mass, spin, charge and proto-experiences superimposed as elementary particles. The pre-existing fitness landscape.

Search space: Confined to the universe.

At least one solution: Self-replication.

Fitness function: Reproduction success. This is determined through a combination of various interactions including self-replication, intelligence, agency and emergence of complexity.

Selection (guiding function): Selection is based on fitness.


What would a "docking" run of life look like if we run it over and over with a pre-existing fitness landscape and universal memetic genetic algorithms (Figure 6)?

Figure 7: Convergence of local optima in a fitness landscape whereby fitness is measured by reproduction, intelligence, agency and complexity. If life's memetic algorithms are comparable to a "docking" run, it should yield similar local optima in pre-existing fitness landscapes every time the simulation is run.


Monday, September 8, 2008

Robustness and back-up systems

New Evidence On The Robustness Of Metabolic Networks

Biological systems are constantly evolving in ways that increase their fitness for survival amidst environmental fluctuations and internal errors. Now, in a study of cell metabolism, a Northwestern University research team has found new evidence that evolution has produced cell metabolisms that are especially well suited to handle potentially harmful changes like gene deletions and mutations.

You Can Be Replaced: Immune Cells Compensate For Defective DNA Repair Factor
Genetic instability can lead to multiple problems, including cell death and many forms of cancer. Therefore, it is absolutely critical for cells to have both the means to constantly survey genes for damage and the mechanisms to repair broken DNA. Currently, there are six well characterized classical non-homologous end-joining (C-NHEJ) factors that repair double strand breaks (DSBs) in mammalian cells.Lymphocytes, a type of immune cell, use a kind of genetic shuffling called variable, diversity, joining V(D)J recombination. This gene shuffling occurs during lymphocyte development and helps to produce diverse immune system cells that can recognize all sorts of different foreign substances, called antigens, that might pose a threat to the organism. Previous work in mice has shown that deficiency of C-NHEJ factors results in a severely compromised immune system, because of incomplete V(D)J recombination, along with increased sensitivity to cellular ionizing radiation (IR) and genomic instability.


Nice to know cell intelligence and evolution from a front-loaded state provide for robust systems with back-up. Preadaptation is good for the future.

Saturday, August 16, 2008

Putting cytosine deamination to work

The effect of cytosine deamination on a random pool of amino acids and how it might facilitate evolution has been described. Cytosine deamination also does not result in any stop codon formation. Bollenbach et al. (2007) briefly describes a few more optimal features of the genetic code as discussed in more detail by Itzkovitz and Alon (2007).
These include:
1) Quote:
They (Itzkovitz and Alon) compared the actual genetic code with an ensemble of all other codes that are equally optimized with respect to mistranslation or mutation (for more on this statistical approach, see also Alff-Steinberger 1969; Haig and Hurst 1991; Freeland and Hurst 1998). Assuming that the usage frequencies of the different amino acids are fixed, while their codon assignments vary in the ensemble, they find that the actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred. This new observation by Itzkovitz and Alon could therefore be seen as reviving the basis for Crick’s theory of a comma-less code, modified by the constraints imposed on the code by the need to be robust to other kinds of translation errors and mutations. Another possible interpretation of their result is that the amino acid usage has adjusted to reduce the effects of frameshift errors; alternative genetic codes would have had a different amino acid usage coadapted to them. It has been shown previously that amino acid usage is rather malleable, and, for example, influenced by GC content (Knight et al. 2001b).
2) Quote:
Itzkovitz and Alon suggest another, quite unanticipated, type of optimality: the code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences. Optimality for encoding additional information is particularly important and relevant given the known signals contained in the nucleotide sequence of coding regions. These include RNA splicing signals, which are encoded in the nucleotide sequence together with the amino acid sequence of the prospective protein (Cartegni et al. 2002), as well as signals recognized by the translation apparatus.
Bollenbach et al. (2007) also briefly mentions how the code could have evolved:
1) Quote:
(1) the code has evolved under selection pressure to optimize certain functions such as minimization of the impact of mutations (Sonneborn 1965) or translation errors (Woese 1965a); Random mutation is a source of variability, yet selection pressure is believed to have selected for a system to put constraints on variability. Why?

2) Quote:
(2) the number of amino acids in the code has increased over evolutionary time according to evolution of the pathways for amino acid biosynthesis (Wong 1975)
Why was selection so strong in removing the other variants with fewer codons? Is there evidence of organisms using only 5, 6, 9, 13, 18 etc. amino acid codons? Bollenbach et al. (2007) also points out the following:
Quote:
The discovery of variant codes (Barrell et al. 1979; Fox 1987; Knight et al. 2001a) made the connection between evolvability and universality even more puzzling. On one hand, they prove that the genetic codes can evolve; on the other hand, if they could easily evolve, why are all variations minor? It was recently proposed that extensive horizontal gene transfer during early evolution can account for both evolution toward optimality and the near universality of the genetic code (Vetsigian et al. 2006).
3) Quote:
(3) direct chemical interactions between amino acids and short nucleic acid sequences originally led to corresponding assignments in the genetic code (Woese et al. 1966b).
Bollenbach et al. (2007) concludes with the following:
Quote:
As we learn more about the functions of the genetic code, it becomes ever clearer that the degeneracy in the genetic code is not exploited in such a way as to optimize one function, but rather to optimize a combination of several different functions simultaneously. Looking deeper into the structure of the code, we wonder what other remarkable properties it may bear. While our understanding of the genetic code has increased substantially over the last decades, it seems that exciting discoveries are waiting to be made.
The vertebrate immune system exploits these optimal features of the genetic code by "putting cytosine deamination to work". Antibody diversification is crucial in limiting the frequency of environmentally acquired infections and thereby increasing the fitness of the organism. Initial diversification of antibodies is achieved by assembling variable (V), diversity (D) and joining (J) gene segments (V(D)J recombination) by non-homologous recombination. Further diversification is carried out by somatic hypermutation (SHM) and Class Switch Recombination. Central to the initiation to these diversification processes is the activation-induced cytosine deaminase (AID) protein. AID deaminates cytosine to uracil in single stranded DNA (ssDNA - arising during gene transcription) and is dependent on active gene transcription of the various antibody genes. The induced mutation is resolved by at least 4 pathways (Figure 4):
1) Copying of the base by high-fidelity polymerases during DNA replication.
2) Short-Patch Base Excision Repair (SP-BER) by uracil-DNA glycosylase removal and subsequent repair of the base.
3) Long-Patch Base Excision Repair (LP-BER)
4) Mismatch repair (MMR)

Figure 1: Activation induced cytosine deamination and the pathways involved in resolving the induced mutation. 1) Normal DNA replication results in a C:G→T:A transition. 2) Successful SP-BER resolves the mutation, however the recruitment of error-prone translesion polymerases results (e.g. REV1) in transversions (REV1; C:G→G:C) and transition. 3) LP-BER can also resolve the mutation, however recruitment of low-fidelity polymerases (e.g. Pol n) also causes transition and transversion mutations. 4) MMR repair can also resolve the mutation, however the recruitment of low-fidelity polymerases through this pathway is a major cause of A:T transitions.

AID causes somatic hypermutation and its activity is limited to the certain genetic regions of the immune system. When the system runs unchecked, mutations might be introduced into proto-oncogenes, resulting in possible cancerous growth. The system is controlled (Figure 2). The activity and gene expression of AID is controlled. The type of error-repair pathway and the subsequent recruitment of various low-fidelity polymerases determine the type of mutations after the repair process and these also seem to be controlled. Current research focuses on the mechanisms of control of downstream repair pathways and why this system is selectively targeted to the small region of antibody genes.

Figure 2: Controlled variability of somatic hypermutation.

Thus, the immune system exploits the properties the genetic code for the purpose of controlled variability. Is the system limited to vertabrates or can similar systems be found in other organisms. Cytosine deamninases are found in bacteria as well. Error-prone repair systems are also present. Will we discover an active system in bacteria that exploits the properties of the genetic code for the purpose of controlled variability under selective pressure? Will RecA
and LexA play a part?

References:
Peled JU, Kuang FL, Iglesias-Ussel MD, Roa S, Kalis SL, Goodman MF et al. The biochemistry of somatic hypermutation. Annu Rev Immunol. 2008;26:481-511.

Teng G, Papavasiliou FN. Immunoglobulin somatic hypermutation. Annu Rev Genet. 2007;41:107-20.

Goodman MF, Scharff MD, Romesberg FE. Abstract AID-initiated purposeful mutations in immunoglobulin genes. Adv Immunol. 2007;94:127-55.

Basu U, Chaudhuri J, Phan RT, Datta A, Alt FW. Regulation of activation induced deaminase via phosphorylation. Adv Exp Med Biol. 2007;596:129-37

Tuesday, August 12, 2008

Cell cycle signaling network

DNA replication, DNA repair, cell division signaling and programmed cell death

The cell cycle is a highly regulated process and "takes micromanagement to the extreme". Various positive- and negative-feedback systems ensure that cells divide in a controlled manner. The process consists of a sequence of events by which a growing cell duplicates all its components and divides into two daughter cells, each with sufficient machinery to repeat the process. In eukaryotic cells, one round of cell division consists of two “gap” phases termed G1- and G2-, an S-phase during which duplication of all DNA happen, and an M-phase where proper segregation of duplicated chromosomes and chromatid separation occur. During each of these phases, regulatory signaling pathways monitor the successful completion of events in each phase before proceeding to the next phase. These regulatory pathways are commonly referred to as cell cycle checkpoints. Cell cycle checkpoints are activated in response the following (Figure 1):
  • Cellular damage
  • Exogenous cellular stress signals
  • Lack of availability of nutrients, hormones and essential growth factors.

During the G1 phase many signals intervene to influence cell division and the deployment of a cell’s developmental program (Figure 1). Crucial "decisions" are made to pass the G1 restriction point as commitment to replicate DNA and divide is irreversible until the next G1 phase. Failure to meet the correct conditions results in a failed attempt to divide. Signaling events converge to affect the phosphorylation status of the retinoblastoma protein (pRB) family (pRB, p107, and p130). Cyclin dependent kinases (CDKs) play a crucial role in pRB phosphorylation status and their activity is in turn controlled by cell stress and growth inhibitory signaling pathways. Sufficient phosphorylation (hyper-phosphorylation) of pRB causes it to dissociate from the elongation factor 2 (e2F) family of transcription factors. Dissociated e2F transcription factors mediate the transcription and activity of genes required for DNA replication during the S-phase.

As soon as the restriction point (G1/S transition checkpoint) is passed, initiation of DNA replication takes place at multiple sites on the chromosomes, called the origins of replication. The origin recognition complex (ORC) marks the position of replication origins in the genome and serves as the landing pad for the assembly of a multiprotein, pre-replicative complex (pre-RC) at the origins, consisting of ORC, cell division cycle 6 (Cdc6), Cdc10-dependent transcript (Cdt1), mini-chromosome maintenance (MCM) proteins, clamp-loaders, sliding clamps, helicases, DNA polymerases etc. The MCM proteins serve as key participants in the mechanism that limits eukaryotic DNA replication to once-per-cell-cycle and its binding to the chromatin marks the final step of pre-RC formation. Once the replisome is assembled, the transition to DNA replication is irreversibly completed and the cell enters the S-phase.

After successful completion of DNA replication the mitosis promoting factor (MPF) complex forms and plays a crucial role in nuclear envelope breakdown, centrosome separation, spindle assembly, chromosome condensation and Golgi fragmentation during mitosis. Cells only enter mitosis (G2/M transition) after the completion of the above events.

When a cell is unable to address the above circumstances, cell division is permanently halted and the cell either enters senescence or programmed cell death is activated (Figure 1). Programmed cell death (particularly apoptosis) removes potentially hazardous cells from a population of cells, resulting in the controlled destruction of the cells designated for destruction. Two checkpoints during the cell cycle exist.

  1. The DNA structure checkpoint
  2. The spindle checkpoint

The DNA structure checkpoint operates between the G1/S transition, the S-phase and the G2/M transition (Figure 1). The DNA structure checkpoint during the G1/S and G2/M transitions ensure that DNA damage is minimal while the S-phase DNA structure checkpoint also recognizes and deals with replication intermediates, stalled replication forks and unreplicated DNA. Whenever the criteria are not met during a checkpoint, a cell will not proceed to the next phase. Various signaling networks are activated and operate to ensure these criteria are met. DNA structure checkpoint signaling has the same pattern during any phase of the cell cycle (Figure 1):

  • Detection: Sensor proteins include proliferating cell nuclear antigen (PCNA)-like and replication factor C (RFC)-like protein complexes (see Sliding clamps, clamp-loaders and helicases), which are able to bind to damaged DNA to form a scaffold for downstream repair proteins. The Rad50/Mre11/NBS1 complex is also loaded onto damaged DNA sites and mediates downstream checkpoint and repair proteins.
  • Signal transduction: Activated sensor proteins in turn activate several signaling proteins which in turn activates DNA repair mechanisms and downstream effector proteins that controls cell cycle checkpoint signal transduction and programmed cell death signaling. Some examples include, ataxia telangiectasia mutated (ATM), ataxia telangiectasia and Rad3 related (ATR) p53 binding protein (53bp), the topoisomerase binding protein TopBP1, mediator of DNA damage checkpoint (MDC1), breast cancer 1 (BRCA 1) etc.
  • Effect: Downstream of the signal transducers include the the effector serine/threonine protein kinases CHK1 and CHK2. CHK’s transfer the signal of DNA damage to the phosphotyrosine phosphatases and cell division cycle proteins Cdc25A, Cdc25B, and Cdc25C as well the tumor-suppressor p53. Cdc25A controls the G1/S and S-phase transition (prevents pRB dissociation through dephosphorylation of pRB proteins) while Cdc25B and Cdc25C control the G2/M transition (both upregulating Wee1 and Myt1 by phosphorylation, which together control Cdc2/CyclinB activity). Tumor supressor p53 protein activity links DNA damage to programmed cell death.

Figure 1: Dynamic control of cell cycle events through cell signaling, checkpoints, nutrient availability and extracellular stress.

The spindle assembly checkpoint is a molecular system that ensures accurate segregation of mitotic chromosomes and functions during the M-phase of cell division. The spindle checkpoint depends on the activity of two systems.

  1. The 26S proteasome (APC/C-cdc20 complex) for the degradation of cyclin B.
  2. The anaphase promoting complex/cyclosome (APC/C-cdh1 complex) for the degradation of cyclins and securin

How are these for provocative sounding titles:
Voges D, Zwickl P, Baumeister W. The 26S proteasome: a molecular machine designed for controlled proteolysis. Annu Rev Biochem. 1999;68:1015-68.
Peters JM. The anaphase promoting complex/cyclosome: a machine designed to destroy. Nat Rev Mol Cell Biol. 2006 Sep;7(9):644-56.

Cyclin B is ubiquitinylated and degraded by the the 26S proteasome (APC/C-cdc20 complex) which in turn results in the activation of the APC/C-cdh1 complex. The APC/C-cdc20 complex is controlled by the mitotic checkpoint complex (MCC) which detects tubulin and kinetochore integrity. The APC/C-cdh1 complex mediates the degradation of securin resulting in chromosome segregation.

There is a considerable amount of cross-talk between DNA repair mechanisms, programmed cell cycle signaling pathways, cell death pathways (autophagy, apoptosis, mitotic catastrophe etc.) and other cell stress signaling pathways. All these intricately interwoven pathways serve to ensure accurate cell division and removal of faulty cells from a population through programmed cell death. The problem comes when one of the checkpoints or programmed cell death pathways become corrupted and causes uncontrolled cell division in multicellular organisms. Cancer is one of the outcomes of abrogated cell death signaling and uncontrolled cell division. Programmed cell death is however not limited to multicellular organisms as bacteria also contain the necessary pathways to self destruct.

E.g.:
Engelberg-Kulka H, Amitai S, Kolodkin-Gal I, Hazan R. Bacterial programmed cell death and multicellular behavior in bacteria. PLoS Genet. 2006 Oct;2(10):e135.

Rice KC, Bayles KW. Molecular control of bacterial death and lysis. Microbiol Mol Biol Rev. 2008 Mar;72(1):85-109.


Saturday, July 26, 2008

Life's toolkits

Life has a genetic toolkit to build a wide variety of forms from just a few basic, simple and elegant body plans.

Take this into consideration and take a look at how stem cells become specialized.
Many Paths, Few Destinations: How Stem Cells Decide What They'll Become.
Quote:
How does a stem cell decide what specialized identity to adopt -- or simply to remain a stem cell? A new study suggests that the conventional view, which assumes that cells are "instructed" to progress along prescribed signaling pathways, is too simplistic. Instead, it supports the idea that cells differentiate through the collective behavior of multiple genes in a network that ultimately leads to just a few endpoints -- just as a marble on a hilltop can travel a nearly infinite number of downward paths, only to arrive in the same valley.
Quote:
The findings, published in the May 22 issue of Nature, give a glimpse into how that collective behavior works, and show that cell populations maintain a built-in variability that nature can harness for change under the right conditions. The findings also help explain why the process of differentiating stem cells into specific lineages in the laboratory has been highly inefficient.
Quote:
"Nature has created an incredibly elegant and simple way of creating variability, and maintaining it at a steady level, enabling cells to respond to changes in their environment in a systematic, controlled way," adds Chang, first author on the paper.
Quote:
The landscape analogy and collective "decision-making" are concepts unfamiliar to biologists, who have tended to focus on single genes acting in linear pathways. This made the work initially difficult to publish, notes Huang. "It's hard for biologists to move from thinking about single pathways to thinking about a landscape, which is the mathematical manifestation of the entirety of all the possible pathways," he says. "A single pathway is not a good way to understand a whole process. Our goal has been to understand the driving force behind it."
So stem cells have a built-in toolkit that responds to random changes, enabling then to respond to changes in their environment in a systematic and controlled way, ultimately leading to just a few endpoints. The toolkit harnesses random variation and selection to reach the same destination.
The stem cells are front-loaded (provided with a toolkit) to develop along a certain path while harnessing random variation and selection.


Key Regulator Of DNA Mutations Identified
Quote:
As a general rule, your DNA is not something you want rearranged. But there are exceptions – especially when it comes to fighting infections. Since the number of microbes in the world far surpasses the amount of human DNA dedicated to combat them, specialized cells in the immune system have adopted an ingenious, if potentially disastrous, strategy for making antibodies. These cells, called B lymphocytes, intentionally mutate their own DNA to ward off invaders they have never seen before.
B lymphocytes have a toolkit that regulates mutations for the purpose of generating antibodies. Thus, here we have another toolkit that harnesses random variation and selection to intentionally generate variety for the purpose of producing novel antibodies.

How many more toolkits that harness quantum randomness and selection to generate controlled variety will we discover?


Genetic toolkits in action:
New Evidence That Ancient Choanoflagellates' Form Evolutionary Link Between Single-celled And Multi-celled Organisms
Evolutionary Origin Of Mammalian Gene Regulation Is Over 150 Million Years Old
Marsupials And Humans Share Same Genetic Imprinting That Evolved 150 Million Years

Thursday, July 24, 2008

Intelligence

Intelligence is associated with a property of mind.
From wiki:
Intelligence
From the first sentence:
Quote:
Intelligence (also called intellect) is an umbrella term used to describe a property of the mind that encompasses many related abilities, such as the capacities to reason, to plan, to solve problems, to think abstractly, to comprehend ideas, to use language, and to learn.
Artificial intelligence
From this article a few essential traits of intelligence are considered:
1) Deduction, reasoning, problem solving
2) Knowledge representation
3) Planning
4) Learning
5) Natural language processing
6) Motion and manipulation
7) Perception
8) Social intelligence
9) Creativity
10) General intelligence

However, there is no universally accepted definition of intelligence.
So let's take what we do know about intelligence (the 10 criteria above) and compare the systems and machinery within cells to any intelligent AI system.

1) Deduction, reasoning, problem solving
Cells:
Deduction: No
Reasoning: No
Problem solving: Yes. E.g. (from Nature;Vol 446;12 April 2007: Quantum path to photosynthesis)
Quote:
Elsewhere in this issue, Engel et al. (page 782) take a close look at how nature, in the form of the green sulphur bacterium Chlorobium tepidum, manages to transfer and trap light’s energy so effectively. The key might be a clever quantum computation built into the photosynthetic algorithm.
Quote:
The process is analogous to Grover’s algorithm in quantum computing, which has been proved to provide the fastest possible search of an unsorted information database.
And in the same issue: Evidence for wavelike energy transfer through quantum coherence in photosynthetic systems
Quote:
When viewed in this way, the system is essentially performing a single quantum computation, sensing many states simultaneously and selecting the correct answer, as indicated by the efficiency of the energy transfer.
Who knows what other kinds of quantum computing we will discover in organisms? Perhaps a clever quantum “trick” together with coulombic interactions in the bifurcated electron transfer of bc1-like complexes through the Q-cycle? Microtubles, centrioles etc.?

AI:
Deduction: No
Reasoning: No
Problem solving: Yes. (not quantum mechanically)

2) Knowledge representation
Cells:
Default reasoning and the qualification problem: No?
Unconscious knowledge: Perhaps? Stored in any or all of the cellular codes?
The breadth of common sense knowledge: No.
AI:
Default reasoning and the qualification problem: No
Unconscious knowledge: Yes. The software contains the stored information
The breadth of common sense knowledge: No


3) Planning
Cells: Possibly yes!
Predictive Behavior Within Microbial Genetic Networks

Quote:
We question whether homeostasis alone adequately explains microbial responses to environmental stimuli, and explore the capacity of intra-cellular networks for predictive behavior in a fashion similar to metazoan nervous systems. We show that in silico biochemical networks, evolving randomly under precisely defined complex habitats, capture the dynamical, multidimensional structure of diverse environments by forming internal models that allow prediction of environmental change. We provide evidence for such anticipatory behavior by revealing striking correlations of Escherichia coli transcriptional responses to temperature and oxygen perturbations—precisely mirroring the co-variation of these parameters upon transitions between the outside world and the mammalian gastrointestinal-tract. We further show that these internal correlations reflect a true associative learning paradigm, since they show rapid decoupling upon exposure to novel environments.
Emphasis mine.

Microarray transcriptional profiling was employed to determine whether gene expression correlates with the observed global cellular state and physiological responses. And indeed it does.
From the study it was determined that anticipatory transcriptional reprogramming occurs in response to aerobic and anaerobic environmental changes and these anticipatory transcriptional reprogramming events are as a result an “associative learning” paradigm. Is this an example of harnessing random variation and selection that allow for predictive transcriptional reprogramming in response to environmental change that gives the illusion of foresight? Creativity?

It should also be interesting to determine how big a part riboswitches play in this phenomenon.

AI:Yes if programmed to.

4) Learning
Cells: Yes, see "planning".
AI: Yes, certain artificial neural networks are capable of this.

5) Natural language processing
Cells: Yes and no. Yes because cells are able to communicate and process information from themselves and other cells (autocrine, paracrine, endocrine etc). No, cells do not consciously talk to exchange concepts and ideas.
AI: Yes and no. Yes because certain programs can interpret human language and systems of various platforms can communicate (Linux to Mac etc). No, AI does not consciously talk to exchange concepts and ideas.

6) Motion and manipulation
Cells: Yes, with the possibility that tubulin and other structural components of cells acting as quantum computers, motion and manipulation is directed, not stochastic, in even the simplest organisms.
Movement of organisms without a nervous system.
Also here:
Interesting site about cell intelligence and movement.
AI: Yes

7) Perception
Cells: Yes, cells communicate with the environment through surface receptors and relays information through signal transduction which in turn affects gene expression and protein activity which it turn results in predictive cell responses. Information from the environment is also processed via the multiples codes, e.g. histone code, ribosomal code and the standard genetic code.
AI: Yes

8) Social intelligence
Cells: Yes, even bacteria interact with other bacteria and can even mimic a multicellular organism through quorum sensing.
AI: Perhaps? AI neural networks?

9) Creativity
Cells: Perhaps? Harnessing random variation and selection to adapt?
AI: Perhaps? An example?

10) General intelligence
Cells: No (Only in humans so far)
AI: No


At present, even traditionally viewed simple cells outsmart our best efforts at AI.


I don't know where to put the following:
Cell's 'Quality Control' Mechanism Discovered
Is this an example of a non-passive selection system to remove mutated proteins from the population, even if the mutated proteins are functional? Perhaps a system that preserves a set of proteins? Unconscious knowledge (2)? Constrained creativity (9)?
This mechanism (system?) is not limited to eukaryotic cells. The ERdj5 enzyme operates in eukaryotes. The DnaJ enzyme is a homlogous chaperone protein in bacteria that carries out virtually the same function. Also known as heat shock 40 proteins (HSP40).
The DNAJ gene family.
DnaJ is also found in primitive eubacteria, indicating that the system was present VERY early on during evolution.
Eubacteria:
Quote:
Most eubacteria are gram positive, and they are generally less structurally complex than other bacteria.
Articles:
ERdj4 and ERdj5 Are Required for Endoplasmic Reticulum-associated Protein Degradation of Misfolded Surfactant Protein C
ERdj5 is required as a disulfide reductase for degradation of misfolded proteins in the ER.

Seems interesting nonetheless.