SEPTEMBER 13, 1996






2.1.1) Self replicating RNA sequences.

i. Watson-Crick base pairing is supposed to allow an arbitrary single stranded RNA sequence to act as a template and thereby to "line up" free nucleotides, AUCG, A- U, C - G as the building blocks of a new, complementary, RNA sequence.

The lined-up nucleotides are then to form the proper 3'-5' phosphodiester bonds thereby creating the complementary single strand copy of the initial arbitrary RNA sequence.

Thereafter, the newly created complementary strand is to "melt" from the initial RNA sequence leaving two complementary single stranded RNA sequences.

Finally, each of the single stranded RNA sequences is to act as a template for the replication of further complementary RNA sequences.

ii. There are deep problems with the standard model:

a. It has never worked chemically.

In particular, the only example that has come close to working requires the initial strand to be made exclusively of G and C, with the composition of C > G. But, by Watson-Crick base pairing, the complement of this C > G parental strand has a daughter strand in which G > C. This daughter strand is incapable of acting as a parent to create a further daughter strand.

In addition, even when nucleotides are lined up on the template, they tend to form the thermodynamically favored 2' - 5' bond, which is inconsistent with further replicative function.

iii. Current favored replacements for the standard model include other kinds of RNA-line polymers, e.g., PRNA, etc. which may replicate more readily. None has yet been shown to do so.

The second favored replacement theory notes that single stranded RNA can act catalytically - "ribozymes," and seeks to create a ribozyme able to act as a polymerase that can replicate any single stranded RNA sequence including both itself and its own template complement.

- Such an RNA polymerase would be a self-replicating molecule. It is unclear whether it could: a) Reproduce without incurring a runaway error catastrophe as its mutant variants competed with it for substrates creating ever less accurately replicated forms; b) whether it could evolve to novel RNA polymerases; c) whether it could "gather" about itself a coherent metabolism.


2.2.1) Success creating self-replicating molecular systems has been achieved using single stranded DNA hexamers and complementary trimers. Thus, the hexamer 3'CCCGGG5' can be used to line up two complementary trimers, 5'GGG3' and 5'CCC3'.

Once the trimers are held adjacent to one another on the hexamer, a 3'-5' bond can form between them creating a second sequence which it its own 3'-5' direction, is identical to the parental strand: 3'CCCGGG5'. Here the hexamer has acted as a template specific surface catalyst to ligate the two trimers.

The two hexamers then can melt, and, since each is identical to the initial hexamer, each can serve to ligate two further trimers. Hence this system is autocatalytic.

Similar results have been obtained with tetrameters ligating two dimers that, when ligated, constitute the tetramer.

2.2.2) More complex cross-catalytic sets have been made using single stranded DNA sequences, hexamers and trimers. Here the first hexamer ligates two trimers which are not identical to the parental hexamer. The new hexamer then ligates to further trimers into a hexamer. This third hexamer can ligate two trimers that, when ligated, constitute the first hexamer.

This small system is collectively autocatalytic via the specific ligation reactions carried out by each hexamer on its two trimer substrates.

We will base our investigation of the origin of life on the emergence of COLLECTIVELY AUTOCATALYTIC SETS OF MOLECULES in sufficiently diverse mixtures of organic molecules.


2.3.1) Consider a set of different organic molecules, M, where we think of the members of M as SUBSTRATES of reactions.

i. In general, it is convenient to consider one substrate -- one product reactions -- such as an isomerization reaction, A --> A';

two substrates -- one product reactions -- such as a ligation reaction uniting two trimers into a hexamer, A + B --> C;

two substrate -- one product reactions -- such as cleavage of a hexamer into two trimers, C --> A + B;

and two substrate -- two product reactions -- A + B --> C + D.

ii. A reaction graph is a BIPARTITE GRAPH with two kinds of nodes, here * and 0. The * nodes represent substrate and product molecules. The 0 nodes represent the reactions. Arrows lead from substrate nodes, *,*...* to a specific reaction node, 0, and arrows lead from that reaction node, 0, to a set of one or more product molecules, *,*,....*.

Since reactions are, in principle, reversible, the directions of arrows serve only to distinguish the substrate molecules from the product molecules.

The actual direction in which the reaction proceeds depends upon its displacement from chemical equilibrium and will be towards equilibrium.

iii. Consider the set of molecules, M, taken as substrates. The reaction graph can then be constructed by drawing appropriate arrows from each subset of M that can serve as substrates of a reaction to a corresponding reaction node, and arrows from that reaction node to the set of product molecules.

Such a graph constitutes a REACTION GRAPH.


2.4.1) Subcritical behavior:

i. Consider a reaction graph in which the set M consists of a single molecular species, helium, neon, or another "inert" gas. The reaction graph consists of a single * node. Since the gas undergoes no chemical reactions, the entire reaction graph consists of this single * node and no 0 nodes.

ii. More generally, the set of organic molecules, M, might be such that the reaction graph starting with set M as substrates leads, via reactions, to a product set entirely contained within M. Thus, if the reactions from substrates to products were to occur from an initial state with only the M molecules present as substrates, no novel molecular species would be created by the reactions.

iii. Still more generally, the set of organic molecules, M, might be such that the reaction graph starting with the set M as substrates leads, via reactions, to a product set containing molecular species THAT ARE NOT already contained in the set M. Call the product set M'. Then, if reactions are reversible, or more generally begin with M and have not gone to completion, the results of the reaction beginning with the set M will be to create a new set that is the UNION, [M']of M and M', where the Union is larger, contains a higher diversity of molecules, than does the set M.

iv. Now consider the further iteration of the process of writing down the reaction graph from the UNION OF M and M', [M']. This set of substrates may lead to still further novel molecular species not contained in the set [M']. Let the Union of these novel product molecules together with the set [M'] be the set [M''].

v. Over iterations expanding the successive Union set, [M], [M'],[M''],...... the sequence of Union sets may ultimately STOP ENLARGING. That is, for some iteration, "I," all successor iterations will map the Union set [M"I"] into itself.

vi. I will call SUBCRITICAL all behaviors in which the growth of the number of molecular species in the Union reaction graph, [M], [M'],.... STOPS, as described above in (i - v).

2.4.2) Supracritical Behavior

i. Again consider a "founder set" of molecules, M, and the union of that set with itself, [M]. Over iterations as described above, creating the successive Union sets M],[M'], [M''], ....... it may be the case that GROWTH IN THE DIVERSITY OF MOLECULAR SPECIES IN THE UNION SET INCREASES WITHOUT BOUND.

ii. I will call SUPRACRITICAL the behavior in which the diversity of molecular species in the Union set of the reaction graph increases without bound over iterations of the writing down of the reaction graph. In principle, at the formal level of an abstract reaction graph, the molecular diversity in a supracritical reaction system could increase indefinitely.

--NB: The subcritical or supracritical "behavior" of the reaction graph does not yet tell us the behavior of an actual chemical reaction system in terms of the diversity and identity of species that will be created from an initial founder set, for those physical behaviors depend upon thermodynamic conditions, time scales, and so forth. However, the structure of the reaction graph and whether it be subcritical or supracritical sets an upper bound on the diversity and identity of the molecular species that can arise from a founder set, M.


2.5.1) The chemical reaction graphs defined above refer only to spontaneous, uncatalyzed reactions. We now consider the CATALYZED REACTION SUBGRAPH of the full reaction graph. Consider a set of catalysts, [C]. Let each member of [C] catalyze one or more reactions in the full reaction graph defined above. Represent such a catalytic connection by denoting the catalytic member of [C] as a molecular node, * , and by a BLUE arrow directed from that molecule node, *, to the reaction node, "0," representing the reaction that is catalyzed. Further, to denote the fact that the reaction is now catalyzed, rather than merely spontaneous, change the arrows from substrates to the reaction node and from the reaction node to the products from the color BLACK to the color RED. Thus, the blue arrows denote catalytic connections from molecules onto reactions, while the red arrows represent catalyzed reactions. (Ignore in this procedure any reactions the catalyst molecules may themselves undergo. I will modify this below.)

i. The CATALYZED REACTION SUBGRAPH of the full reaction graph consists in the set of RED Arrow connections from substrates via reactions to products. In addition, we can include the Red plus the Blue arrows to denote the Catalysis Graph of the full reaction graph.

2.5.2) Phase Transition in the Catalyzed Reaction Graph

i. Consider a finite or infinite reaction graph, and the red catalyzed reaction subgraph of that full graph. We may define connected components in the catalyzed reaction graph. A set of molecular nodes [*] and reaction nodes [0] are connected in the catalyzed reaction graph if a connected red undirected graph, (ignoring directions of arrows), connects the molecular nodes.

ii. Intuitively, a connected component leads from a set of substrates, via a sequence of catalyzed reactions, to a set of products which may or may not include the initial substrates.

iii. Toy Problem: Phase Transitions in Random Graphs. Results from random graph theory consider a set of N nodes connected at random by E edges. We define a component as above. Intuitively, a graph is a set of N buttons connected by a set of E "red" threads. A connected component is a connected set of buttons such that if one were lifted up, the rest would be lifted with it.

Erdos and Renyi, 1959, showed that, as the ratio of edge to nodes, E/N increases past 0.5, the size of the largest component jumps from very small to a giant component containing most of the nodes in the system. This is a first order phase transition. Intuitively, when the ratio of edges to nodes is small, only isolated nodes and small dyads or trees will exist. As more edges are randomly added, the small components will slowly grow larger. But when these become fairly large, a few more randomly added edges will connect these modest components into the giant component. Indeed, the phase transition occurs just when the ratio of the ends of edges equals the number of nodes.


Consider a finite region of a full reaction graph, including the founder set, [M]. Let each member of the set of catalysts, [C], have a fixed probability of catalyzing each reaction in the reaction graph. Add successive members of [C] to the reaction graph as molecular nodes, *, draw the corresponding blue arrows to the reaction nodes, "0," and color the corresponding reaction arrows RED. (Ignore the reactions that the catalysts themselves may undergo!)

As in the Toy Problem of random graphs, it is intuitively plausible that, as the ratio of catalyzed RED ARROWS increases, while the number of molecular nodes * among the substrate and product set in the reaction graph remains constant, then at some point, a giant connected catalyzed reaction subgraph will "crystallize."

This crystallization of a giant connected set of catalyzed reactions is a phase transition in the catalyzed reaction subgraph.


i. A specific example of such phase transition is in the "Random Chemistry" Patent Application, (Kauffman and Rebek 1993)

ii. We need a simple estimate of the statistical structure of chemical reaction graphs. Consider two decapeptides and the number of transpeptidation reactions they can undergo. Each has 9 internal peptide bonds, thus there are 81 different reactions cleaving some amino terminal amino acids from the first and uniting them with the carboxy terminal amino acids of the second, while also uniting the amino terminal part of the second with the carboxy terminal part of the first. While all 81 are similar transpeptidation reactions, all can yield different products of these two substrate two product reactions.

iii. 81 is greater than 1. Let us estimate that any pair of modestly complex organic reactions can undergo at least one two substrate - two product reactions. If so, then in a set of [M] organic molecules, the number of reactions among them scales as [M] squared.

iv. Let our set of catalysts, [C], be the set of human antibody molecules. It is known that the probability that a given randomly chosen antibody molecule can act as a catalyst (a catalytic antibody) to catalyze a given reaction is on the order of one in a billion, as described below.

v. Plot the logarithm of the diversity of antibody [C] molecules on the X axis of a Cartesian coordinate system and the logarithm of the diversity, [M] of the organic founder molecules on the Y axis. Consider a point corresponding to two organic molecules and one antibody molecule. The number of reactions is 2 squared, hence the expected number of these which are catalyzed is 4 x 1 / 1,000,000,000. Almost certainly, no reactions are catalyzed. Call this behavior SUBCRITICAL Behavior of the Catalyzed Reaction System.

vi. Supracritical behavior. Consider a point corresponding to 1000 organic molecules and 10,000,000 antibody molecules. The number of reactions among these molecules is 1000 squared. Hence the expected number of catalyzed reactions is 10'6 x 10'7/ 10'9 = 10,000.

vii. Since 10,000 reactions are catalyzed among the 1000 organic molecules, most of the products will be novel with respect to the founder set, [M]. The union of [M] and these novel molecules, [M'] is about 10,000 in diversity. Hence the number of reactions among this enlarged set is [M'] squared, or 10'8. Thus, at the next iteration, the expected number of catalyzed reactions is 10'8 x 10'7 / 10'9 = 1,000,000. Thus, the initial diversity, [M] = 1000 has exploded on the catalyzed reaction graph to 1 million different organic molecules. Call this SUPRACRITICAL BEHAVIOR OF THE CATALYZED REACTION SUBGRAPH.

viii. The random chemistry patent application, now in the public domain, goes on to describe means to identify molecular species in this exploding diversity that are useful, perhaps by binding hormone receptors or in other ways, then utilizing a version of "sib selection" to "throw away" all those catalyzed reaction pathways and founder molecules that DO NOT lead to the useful molecule. The result is the recovery of a set of substrates and catalysts catalyzing a sequence of reactions to a desired molecule which can then be identified and produced for practical purposes.

ix. In particular, the patent application considers the fact that, with a fixed founder set, [M], the concentrations of the later members of the exploding supracritical system will fall and do so in a non-isotropic way. Indeed, flux down catalyzed and spontaneous reaction pathways will be non-isotropic.

x. The subcritical - supracritical boundary. Consider the X Y coordinate system. A roughly hyperbolic curve separates the region near the origin which exhibits subcritical behavior and a region of high substrate, [M], diversity, high catalyst, [C], diversity, or both. In the supracritical domain, diversity explodes indefinitely. In the subcritical domain diversity does not explode indefinitely and/ or does not even increase.


i. As described in Origins of Order, consider a reaction graph such as that in the X Y coordinate system but modify the assumptions such that the molecules in the set [M] constituting substrates and products are now ALSO members of the set of catalysts, [C]. Then as [M] increases in diversity, the ratio of reactions and hence reaction nodes, (0) to molecular nodes, (*) increases as [M], for the number of reactions, [M] squared, divided by the number of kinds of molecules, [M] is [M].

ii. Assume a fixed probability that any molecule in [M] can catalyze any reaction in the reaction graph. For any fixed probability of catalysis, say P, when the diversity of [M] = 1/P, each member of [M] is expected to catalyze a single reaction. At this point, a giant connected RED catalyzed reaction graph will crystallize.

iii. Thus, at a sufficient diversity, a collectively autocatalytic set of molecules that builds itself out of the founder set, [M], will emerge. This is a self-sustaining, indeed self amplifying, hence reproducing COLLECTIVE METABOLISM.

iv. Note that all cells are collectively autocatalytic. No molecule reproduces itself, including DNA.

v. On this view, life emerges as a phase transition in sufficiently complex reaction systems!

vi. Numerical studies show that such auto-catalytic systems can evolve without a genome. They do so by incorporating fluctuations in the spontaneous reaction graph "penumbra" surrounding the collectively autocatalytic set. Such fluctuations can bring forth molecular species that are catalyzed from the autocatalytic set and can abet the set itself. When this happens, old molecular species may be "ejected" from the set.

vii. Collectively autocatalytic sets can form ecosystems. Indeed, such systems can create an expanding diversity of niches where each "makes" a living" due to its interactions with the others.

Mutualisms among such autocatalytic systems are even molecular versions of ECONOMIC systems. Thus, two such sets can "help" one another reproduce, but at metabolic cost to themselves. Here an advantage of trade emerges. In simple models of two pairs of reproducing RNA +/- strands, for some parameter conditions of help and cost, the two species are mutualists who jointly out reproduce either single +/- RNA sequence pair.

In this simple economy, "utility" maps to "rate of reproduction." Increasing utility maps to increasing the rate of reproduction, hence Darwinian "R" selection. And "price" becomes the exchange ratio of "help" given metabolic cost.

Further, in this simple model, the Invisible Hand of natural selection acting on each RNA +/- pair alone can attain the Nash equilibrium ratio of help versus cost that maximizes the joint reproduction of the mutualists.

CAVEATS: THEORETICAL AND EXPERIMENTAL WORK IS REQUIRED. Theoretical work on subcritical and supracritical chemical systems and the emergence of collectively autocatalytic sets is still in an early stage, as are the experimental supports for the theory. Notably, as such systems form, concentrations of molecules distant from the "food set" will fall, and flow along reaction pathways will not be isotropic, hence the time- concentration profiles will be very heterogeneous. How these factors influence the time scale of catalyzed reactions, attaining catalytic closure at adequate concentrations and so forth remain to be investigated in far more detail, even if early work with reasonable thermodynamic parameters supports the conclusions. Experimental work using mixtures of peptides, single stranded DNA, RNA and other polymers and small molecules is just getting underway in several laboratories. Search for the formation of larger polymers by smaller polymers catalyzing reactions among one another is one important first step to establish catalysis itself and possible supracritical behavior. Increase in the number of chemical species, as seen by mass spec or on gels or HPLC are useful approaches to test for supracritical behavior.


i. Consider a founder set, [M], and the subsequent supracritical behavior of the system. Now consider an arbitrary molecule that is NOT a member of [M]. Given the set of all possible chemical reactions sanctioned by quantum mechanics, we can ask whether or not a sequence of catalyzed and/or spontaneous reactions leads from the set [M] to the arbitrary molecule, X. It seems intuitively plausible that this can suffer the HALTING problem. That is, just as in a computation, given a program and input data, it is in general formally undecidable whether or not the program will halt, and in Gödel's sense, given a set of axioms and inference procedures, it can be formally undecidable whether or not a given statement in the language is derivable from the axioms via the inference procedures - - here too, it seems that we can regard molecules as statements and reactions as transformations in some "chemical grammar." If so, in general, it may be formally undecidable whether a specific molecule is "derivable" from the initial molecules by the reaction procedures.

ii. While the above general conjecture is uncertain, it is clear that chemical reaction systems can be made to carry out universal computation. Hence, such systems can behave in ways beset by the halting problem.


i. I now introduce a concept which will play an expanding role below: The Actual and the Adjacent Possible.

ii. Consider a supracritical reaction system. At any stage in the explosion of molecular diversity from the founder set, [M], define the ACTUAL as the set of molecular species ALREADY GENERATED IN THE UNION OF SETS [M], [M'],......Define the ADJACENT POSSIBLE AS THE NOVEL MOLECULES, NOT YET GENERATED, THAT CAN BE FORMED IN A SINGLE REACTION STEP FROM THE ACTUAL.

iii. The supracritical reaction system is indefinitely expanding from the Actual into the Adjacent Possible.

iv. Note that a real chemical potential exists from the Actual to the Adjacent Possible.

a. Consider any reaction from substrates in the Actual creating novel products in the Adjacent Possible.

b. Any such reaction has a change in entropy in going from the substrates to the products, and can be exothermic or endothermic (say at constant temperature and volume). For example, for a gas, the standard enthalpy can be defined as the concentration of the gas approaches zero. Hence the enthalpy of members of the Adjacent possible can be defined.

c. The balance of entropy and enthalpy changes govern the value of the EQUILIBRIUM CONSTANT of the reaction of the substrates and novel products.

d. However, the chemical potential of the reaction is governed by the displacement from equilibrium. But since the substrates in the Actual are present at finite concentrations while the products in Adjacent Possible are present at zero concentrations, the reaction couple is displaced to the LEFT of equilibrium, and the chemical potential therefore lies in the direction CREATING the novel molecules in the Adjacent Possible.

e. Thus, there is a real chemical potential gradient from the Actual to the Adjacent Possible across EACH reaction couple from substrates in the Actual to the Adjacent Possible.

f. Further, as the supracritical reaction system expands, in this simple model, the ratio of VOLUME of the Adjacent Possible to the Actual can INCREASE at each step, [M] = 1000, [M'] = 10,000, [M''] = 1,000,000.

Thus, as the Actual expands into the Adjacent Possible, the Adjacent "phase space" of accessible chemical structures can ALWAYS be larger than those already created and in existence.


2.10.1) Noah's Vessel Experiment:

i. Take two of every species, normalized for mass.

ii. Place these in a Vessel - Noah's Vessel.

iii. Grind up all species, breaking all cell and organelle membranes.

iv. Monitor changes in total organic molecular diversity.

2.10.2) Expected results:

i. Rough number of different protein sequences in Biosphere may be 10 to the 13th -- based on 50,000,000 and 100,000 genes per species, plus mutant variants in each population. Let [C] = 10 to the 13.

ii. Small organic molecule diversity of biosphere may be on the order of 10,000,000. Hence [M] = 10 million.

iii. Using the approximation that the ratio of reactions to small molecule diversity scales as the square and the estimate that the probability any protein has an epitope able to catalyze an arbitrary reaction of 1/billion, the expected number of catalyzed reactions is:

10 million squared x 10 to the 13th divided by 1 billion = 10 to the 18th! Each reaction available would be catalyzed by about 10,000 different proteins!

iv. Thus, the Biosphere appears to be wildly Supracritical.

v. Are cells supracritical?

a. Assume a cell with 100,000 genes, hence proteins. Assume 1000 small molecules in metabolism. Assume the proteins have evolved to catalyze desired reactions and to AVOID unwanted side reactions.

b. Add a novel organic molecule, Q, to cell. Q forms one member of a two substrate reaction with each of the 1000 metabolites. Hence addition of Q affords about 1000 new reactions.

c. Assume, as before, that any protein has a probability of 1 in a billion to catalyze an arbitrary reaction via some epitope.

d. Then the expected number of novel catalyzed reactions is 1000 x 100,000/ 1,000,000,000 = 0.1!

e. Since the expected number of catalyzed reactions is less than 1.0, the cell is SUBCRITICAL.

f. Were cells supracritical, any input molecule could lead to a cascade of molecular novelty, some of which would be lethal to the cell!

g. Best protection is for cell to remain subcritical during whole of evolution.

h. If so, there is an upper limit to the molecular diversity of any cell!


2.11.1) Biosphere as a whole in Noah's Vessel is Supracritical. But this requires destroying cell and organelle membranes which isolate molecular species within cells each of which is subcritical, thereby "preventing" explosion within single cells.

2.11.2) Consider another X Y coordinate system. On the X axis plot the logarithm of the number of different bacterial species. On the Y axis plot the logarithm of the exogenous diversity of added organic molecules, [M].

i. The proteins and other macromolecules in the bacteria can, like antibody molecules, serve as catalysts for novel reactions. As the diversity of bacterial species increases in a Local Metabolic Ecosystem, the total diversity of macromolecules increases.

ii. A novel organic molecule, when introduced, may disappear, may enter a cell and be sequestered, may enter a cell, may be able to interact with no molecular species in the cell, may leak from cell and eventually enter a new cell, OR it may interact within the cell via other metabolites and macromolecules to undergo a reaction to create a novel molecular species.

iii. Thus, some form of hyperbolic curve exists in the X Y coordinate system separating SUBCRITICAL METABOLIC COMMUNITIES FROM SUPRACRITICAL METABOLIC COMMUNITIES.

2.11.3) An evolutionary process of speciation and in-migration when the Community is subcritical, and extinctions when the community is supracritical, may drive such local metabolic communities towards the boundary between subcritical and supracritical behavior.

i. Suppose that the community is supracritical. Then the community generates an explosion of molecular diversity. Some of those molecular species are likely to be lethal to some of the bacterial species, causing them to go extinct locally. This REDUCES the species diversity in the community, tending to make it less supracritical. (One exogenous molecule at millimolar concentrations could yield a million different compounds at nanomolar concentrations - high enough to wreak havoc with receptor systems in cells.)

ii. Suppose the community is subcritical. Then in-migration or speciation can increase the species diversity in the metabolic community without launching an explosion of small molecules.

iii. Tentative conclusion: Local metabolic Communities may evolve towards the Boundary between Subcritical and Supracritical Behavior.

iv. At boundary, the branching probability of producing novel molecules within a community given a single novel molecular species as input should be the critical value 1.0. This should give rise to power - law bursts of molecular novelty upon such perturbation. Most avalanches of novelty will produce 0 new molecules, some will produce 1 novel molecule, fewer will produce two novel molecular species, and so on.

If these small and large avalanches include harmful molecular species, they are likely to fail to be incorporated into the community. But if the avalanche of novelty produces useful new molecules within the community, the innovation may be selected by natural selection. Hence the metabolic diversity of the local community will expand, thereby gradually expanding the molecular diversity of the entire Biosphere.


As such, the biosphere may MAXIMIZE the rate at which the Molecular Actual Crosses into the Adjacent Molecular Possible! Were there less species diversity in the local communities, that diversity could increase. Were the species diversity locally to create a supracritical condition, some bacterial species would die back, slowing the rate of crossing from the Actual to the Adjacent Possible.

This tentative picture will be incorporated into a tentative working hypothesis about a general attractor achieved by coevolutionarily constructable systems of Autonomous Agents.

A tentative definition of Autonomous Agents is taken up in the next lecture.