Tuesday, November 25, 2008

The Optimality of the Genetic Code

Selected articles:
  1. Early Fixation of an Optimal Genetic Code
  2. Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape
  3. The genetic code is nearly optimal for allowing additional information within protein-coding sequences
  4. An extension of the coevolution theory of the origin of the genetic code
  5. Can the genetic code be mathematically described?
  6. On the Hypercube Structure of the Genetic Code
  7. Topological structure of the triplet genetic code
  8. A Neutral Origin for Error Minimization in the Genetic Code.
  9. Does codon bias have an evolutionary origin?
  10. A chemical toolkit for proteins — an expanded genetic code
  11. Evolution and multilevel optimization of the genetic code

Article 1

Thus, to begin, in the first article it was determined by the researchers that:
The Best of All Possible Codes?
When the error value of the standard code is compared with the lowest error value of any code found in an extensive search of parameter space, results are somewhat more variable. Estimates based on PAM data for the restricted set of codes indicate that the canonical code achieves between 96% and 100% optimization relative to the best possible code configuration (fig. 2c ). If our definition of biosynthetic restrictions are a good approximation of the possible variation from which the canonical code emerged, then it appears at or very close to a global optimum for error minimization: the best of all possible codes.
No better codes out of a million biosynthetically restricted codes.
This conclusion might be misleading though (addressed here), as the paper states that the tested codes were from a biosynthetically restricted set based on the current hypothesis of the evolution of the genetic code from pre-biotic scenarios. When not viewed from this point of view, other, more optimized codes are possible.

The next article (nr 2) shows that:

Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak.
Thus showing in that analysis which include all possible codes (not only biosynthetically restricted codes) that the genetic code is partially optimal with regards to error minimization. It should be noted though that analysis only included a subset of the possible optimal feature of the code (i.e. error minimization).

From article 3

The analysis above did not include other nearly optimal features of the genetic code including:
A) The actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred.
B) The code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences.

Thus, two more features for which the code is close to being optimal. What is interesting about these two optimal features is that they may facilitate evolution i.e. the code is primed for the future by being optimal in allowing future incorporation of additional information.

In article nr.4

The coevolution theory of the origin of the genetic code is discussed. The theory suggests that the genetic code is an imprint of the biosynthetic (biosynthetically restricted) relationships between amino acids.
A few interesting observations can be made:
Firstly, from the article.
As will become clear in the following, I maintain that these amino acid-pre-tRNAs came directly from the biosynthetic pathways of the first six amino acids evolving along the biosynthetic pathways of energetic metabolism and that they were the first amino acids to be codified on these still evolving mRNAs.
It should be noted that other exotic amino acids are also used by a few other codes (derived form the original). E.g. Selenocysteine and pyrrolysine are encoded for in many archaea and vertebrates. Archaea, however seem to be the most primitive organisms, thus these encoded amino acids must have been fixated early on.
Thus an interesting question can be applied to an "evolving" code as posited in the above quote:
Are these "still evolving" mRNAs, still evolving? Or did it hit an inevitable global optimum?

Secondly, from the article:
While Wong [9] highlighted the precursor-product relationships between amino acids and their crucial role in defining the organisation of the genetic code, Miseta [10] clearly identified that the non-amino acid molecules that were precursors of amino acids might have been able to play an important role in organising the genetic code. Miseta [10] suggested the idea of an intimate relationship between molecules, the intermediates of glucose degradation, as precursors of precursor amino acids, and the organisation of the genetic code. This observation is also analysed by Taylor and Coates [11] who showed the relationship between the glycolytic pathway, the citric acid cycle, the biosyntheses of amino acids and the genetic code (Fig. 1) and, in particular, they point out that (i) all the amino acids that are members of a biosynthetic family tend to have codons with the same first base (Fig. 1) and (ii) that the five amino acids codified by GNN codons are found in four biosynthetic pathways close to or at the beginning of the pathway head (Fig. 1)[11]. More recently, Davis [12,13] has provided evidence that tRNAs descending from a common ancestor were adaptors of amino acids synthesised by a common precursor and he also discusses the biosynthetic families of amino acids, suggesting their importance in genetic code origin.
Is it correct to assume that in the presence of the precursors of the standard genetic code (e.g. intermediates of glucose degradation and the citric acid cycle), the intimate relationship between these molecules resulted in the inevitable organization of the genetic code (global optimum of the system)?

Articles 5-7

These articles discuss fascinating mathematical representation of the genetic code.
In article 5, the question is asked:
Can the genetic code be mathematically described?

A few intriguing properties arose from the investigation. Including:
Parity coding
Palindromic symmetry
Binary coding
Error-correction mechanism based on parity checking

The author conclude:
It remains striking, however, that different fundamental properties of the genetic code, such as degeneracy distribution, and also unexpected hidden properties, such as the palindromic symmetry and the parity marking of triplets presented here, reflect a strong mathematical order which is accurately described by means of one of the most elementary operations at the root of mathematics: number representation.

In article 6 a representation of the genetic code as a six–dimensional Boolean hypercube is proposed.
It is assumed here that this structure is the result of the hierarchical order of the interaction energies of the bases in codon–anticodon recognition. The proposed structure demonstrates that in the genetic code there is a balance between conservatism and innovation. Comparing aligned positions in homologous protein sequences two different behaviors are found:
a)There are sites in which the different amino acids present may be explained by one or two “attractor nodes” (coding for the dominating amino acid(s)) and their one–bit neighbors in the codon hypercube, and
b) There are sites in which the amino acids present correspond to codons located in closed paths in the hypercube. The structure of the code facilitates evolution: the variation found at the variable positions of proteins do not corresponds to random jumps at the codon level, but to well defined regions of the hypercube.

Article 8:

In this article it once again discusses the optimality of the code and a few fascinating conclusions were made. For example:
The genetic code has the remarkable property of error minimization, whereby the arrangement of amino acids to codons is highly efficient at reducing the deleterious effects of random point mutations and transcriptional and translational errors. Whether this property has been explicitly selected for is unclear. Here, three scenarios of genetic code evolution are examined, and their effects on error minimization assessed. First, a simple model of random stepwise addition of physicochemically similar amino acids to the code is demonstrated to result in substantial error minimization. Second, a model of random addition of physicochemically similar amino acids in a codon expansion scheme derived from the Ambiguity Reduction Model results in improved error minimization over the first model. Finally, a recently introduced 213 Model of genetic code evolution is examined by the random addition of physicochemically similar amino acids to a primordial core of four amino acids. Under certain conditions, 22% of the resulting codes produced according to the latter model possess equivalent or superior error minimization to the standard genetic code. These analyses demonstrate that a substantial proportion of error minimization is likely to have arisen neutrally, simply as a consequence of code expansion, facilitated by duplication of the genes encoding adaptor molecules and charging enzymes. This implies that selection is at best only partly responsible for the property of error minimization. These results caution against assuming that selection is responsible for every beneficial trait observed in living organisms.
Also form the article:
The SGC (Standard Genetic Code) has an EM (Error Minimization) value (see Methods for calculation) of 60.7. Ten thousand random codes have an average EM value of 74.5, and only 0.03% of these have equal or greater optimality than the SGC. These calculations once again illustrate the remarkable ‘optimization’ of the genetic code for EM.
Thus, an important point is raised:
The point should be made that explicit selection for EM seems to necessitate both the occurrence of codon reassignments and group selection to generate and select alternate codes. The proposal that explicit selection for the EM did not occur, and that EM arose neutrally from the addition of similar amino acids to similar codons, may be termed the ‘Nonadaptive Code’ Hypothesis, in contrast to the Adaptive Code Hypothesis. Finally, on a fundamental level, as a result of the analyses presented here, the presence of EM in the SGC may be used as evidence that enzymes, whether partially proteinaceous, RNA based, or based on some other macromolecule, were already extant during the evolution of the SGC.
The article cautions on blithely using natural selection as an explanation for the features of the genetic code.

Article 9:

In this article, the functional integrity and how the architecture of the code relates to it is discussed.
From the article:
The results put the concept of "codon bias" into a novel perspective. The internal connectivity of codons indicates that all synonymous codons might be integrated parts of the Genetic Code with equal importance in maintaining its functional integrity.
Thus, the properties of the code allow it to maintain its own functional integrity.
Also form the article:
The cumulative Codon Usage Frequency of any codon is strongly dependent on the cumulative Codon Usage Frequency of other codons belonging to the same species. The rules of this codon dependency are the same for all species and reflect WC base pair complementarity. This internal connectivity of codons indicates that all synonymous codons are integrated parts of the Genetic Code with equal importance in maintaining its functional integrity. The so-called codon bias is a bias caused by the protein-centric view of the genome.
The maintenance of the integrity of the code is not dependent on selection, but dependent on internal variables (feedback system) for maintaining functional integrity. Again, showing another form of optimality.

In article 10:

Fascinating research was conducted whereby a sundry of different unnatural amino acids with novel three and four base codons have been selectively incorporated (engineered) into proteins yielding viable organisms.

An intriguing question arises from this research. It is easy to imagine these to arise through chance and selection (e.g. amino acids with photoaffinity) and then be incorporated into the standard code. Yet the code seems to remain stagnant. For billions of year after fixation, little evolution happened in the code. Why?
Did it arrive at a global optimum in a pre-existing fitness landscape, with a pre-existing fitness function?

Finally article 11:
Bollenbach et al. (2007) briefly describes a few of the optimal features (some described above) of the genetic code:
Evolution and multilevel optimization of the genetic code
They (Itzkovitz and Alon) compared the actual genetic code with an ensemble of all other codes that are equally optimized with respect to mistranslation or mutation (for more on this statistical approach, see also Alff-Steinberger 1969; Haig and Hurst 1991; Freeland and Hurst 1998). Assuming that the usage frequencies of the different amino acids are fixed, while their codon assignments vary in the ensemble, they find that the actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred. This new observation by Itzkovitz and Alon could therefore be seen as reviving the basis for Crick’s theory of a comma-less code, modified by the constraints imposed on the code by the need to be robust to other kinds of translation errors and mutations. Another possible interpretation of their result is that the amino acid usage has adjusted to reduce the effects of frameshift errors; alternative genetic codes would have had a different amino acid usage coadapted to them. It has been shown previously that amino acid usage is rather malleable, and, for example, influenced by GC content (Knight et al. 2001b).
Itzkovitz and Alon suggest another, quite unanticipated, type of optimality: the code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences. Optimality for encoding additional information is particularly important and relevant given the known signals contained in the nucleotide sequence of coding regions. These include RNA splicing signals, which are encoded in the nucleotide sequence together with the amino acid sequence of the prospective protein (Cartegni et al. 2002), as well as signals recognized by the translation apparatus.
They briefly proceed to mention how it could have evolved:

(1) the code has evolved under selection pressure to optimize certain functions such as minimization of the impact of mutations (Sonneborn 1965) or translation errors (Woese 1965a); Random mutation is a source of variability, yet selection pressure is believed to have selected for a system to put constraints on variability. Why?
(2) the number of amino acids in the code has increased over evolutionary time according to evolution of the pathways for amino acid biosynthesis (Wong 1975)
Intriguing questions can arise from the above suggestions.
1) Why was selection so strong in removing the other variants with fewer codons?
2) Is there evidence of organisms using only 5, 6, 9, 13, 18 etc. amino acid codons? And why isn't the code expanding to incorporate other codons when it is not even difficult to envision it happening, as it can contribute to fitness AND variety (See article #10).

The authors point this out:
The discovery of variant codes (Barrell et al. 1979; Fox 1987; Knight et al. 2001a) made the connection between evolvability and universality even more puzzling. On one hand, they prove that the genetic codes can evolve; on the other hand, if they could easily evolve, why are all variations minor? It was recently proposed that extensive horizontal gene transfer during early evolution can account for both evolution toward optimality and the near universality of the genetic code (Vetsigian et al. 2006).
Part of the answer lies in the code's inherent capability of maintaining its own functional integrity that is independent of natural selection (article #9). Also, it is cautioned against blithely invoking natural selection as an explanation for the properties of the code ( article #8).

The authors conclude:
As we learn more about the functions of the genetic code, it becomes ever clearer that the degeneracy in the genetic code is not exploited in such a way as to optimize one function, but rather to optimize a combination of several different functions simultaneously. Looking deeper into the structure of the code, we wonder what other remarkable properties it may bear. While our understanding of the genetic code has increased substantially over the last decades, it seems that exciting discoveries are waiting to be made.

The genetic code sure is interesting. Irrespective of its origin, the code seems to be optimized for evolution and maintain its own functional integrity. Whatever the explanation for the origins of the code, whether intentional agency, only RV+NS, self-organization or a combination of these, the fact that these processes converged on a single, reasonably optimal code that is able to facilitate evolution makes it look like it was an inevitable result from the system. The system seems to be rigged and biased towards certain outcomes similar to the evolution of life. Why?