multidimensional evolution

[Martin Taylor 940923 12:00]

As I mentioned yesterday, I was trying out a little simulation in HyperCard
on the effect of selection and minor mutation in evolution within a fitness
space. Last night I extended that to a multidimensional space, as I had
proposed, and did a small run. Here's what I did, and what happened.

The experiment is a simplification of the setup I originally suggested.
The fitness space is defined by one parameter, distance from a target.
If the distance is D, the probability that an genome survives to the
next generation is given by p + (1-p)/(1+D). A genome is represented by
the existence of one organism that carries it. Organisms may have offspring,
and if they do, the offspring carry the same genome. The genome dies when
a carrier organism fails to have offspring.

The process is like this. On initial setup, the experimenter chooses
the size of population of organisms initially present. This size is retained
in all future generations--it represents resource availability limits.

For each generation, survival of the genome is first checked for each
organism, and some die. The remainder are mutated by changing the fitness
value of each gene by an amount chosen randomly from a rectangular distribution
+-0.1 D units. The population is then brought back to the fixed
limit value by giving extra children to organisms randomly chosen from
among the mutated survivors. Notice that every gene in an organism is
mutated each generation, which is quite unreal. It makes "life" noisy,
but it means that something happens in an overnight computing run in a
space of several dimensions.

The experimenter chooses a couple of other parameters for the experiment:
whether all the initial population have the same genome (occupy the same
location in the fitness space) or whether they are distributed randomly
around the space; the value of p in the survival formula (I have only
used 0.5); and the number of dimensions (independent genes that can mutate).

I've tried several runs in a 2-D space. What happens is always the same,
regardless of the parameter values, but the pattern is a bit different
if the initial population are all clones as compared to when they are
scattered around. In a randomly initialized population, after a very
few generations there are several distinct clumps (families, species)
of related organisms. These rapidly give way to a simple swarm centred
on the target fitness. In a population of clones, the population after
a few generations looks like the same kind of swarm, which very gradually
moves generation by generation toward the target. It takes a long time
for a monoclonal population to evolve to surround the target in this
experiment.

Last night I tried an 8-D run with a randomly initialized population
(random means scattered with a rectangular distribution in each gene over
the rectangular space of edge length 6, with the target at the centre).
The same clumping into families happened in the first few generations,
perhaps more markedly than in 2-D, but before long only one of the clumps
survived, at a substantial distance from the target, very much like one
of the swarms derived from a monoclonal initial population in the 2-D runs.
Over several hundred generations the swarm moved slowly toward the target,
but stalled at a distance of 3-4 units. In the morning, after almost 1000
generations, I found that all the organisms were distributed in the
hyperspherical shell between 3 and 4 units from the target, none closer.
I've since repeated a similar run for about 500 generations, with the same
result.

In retrospect, this result should not have been surprising, for two reasons.
One is simply that in a high-dimensional space, the volume of a shell goes
up fast as the radius increases. The shell between 1 and 2 units of distance
from the target has 255 times the volume of the central unit sphere. The
ratio between the volume of the shell from 2-3 units and the hypersphere
of radius 2 is 24.6. And so it goes. It is unlikely that a random move
of finite size will land within the smaller hypersphere when one is close
to the target. And the moves are non-trivial in this experiment, having
a standard deviation of about 0.245 distance units, on average orthogonal
to the direction toward the target. If the origin of the mutational move
is at 1 distance unit from the target, the expected distance of the end-point
of the move is about 1.03 distance units, an outward move.

And that brings up the second reason why the result should not have been
surprising: the toward-target movement of the population depends on the
differential survival of fitter and less fit genomes. If there is little
variety in the genome fitness, the movement will be slow. If it would
average less than the size of the outward move expected from a random
mutation, then the movement overall of the population would be outward.
The result is that the population becomes scattered around a shell surrounding
the target.

If this kind of result holds more generally than in the restricted
circumstances of the simulation, it suggests that populations of a species
will tend to have a variety of genome types, none truly optimum for the
current environment, but no matter how the enviroment changes (slowly),
some of the types would become more optimum, and the population variety
would track the movement of the environment, as a multidimensional shell
around the optimum. That's a speculation, but the reasons why the simulation
showed a shell seem to have validity beyond the conditions of the simulation.

E-coli types of approach to an optimum should be subject to similar effects
of high dimensionality. But they presumably would be faster.

Martin

[Martin Taylor 940926 13:20]

Bill Powers (940924.1550 MDT)

Bill,

Your descriptions of the fitness space and so forth seem correct in most
respects. This is to be expected, since I tried to make it correspond to
what I thought you intended in your own demo.

In your model you define a "fitness space" in which there is a distance
D from a target. The target is defined, apparently, as the position
0,0,...0 in this space. I infer that you compute D as

(Actually, the target is at x1,....xn, but that's an irrelevant difference,
as you note later)

sqrt(sum(g[n]^2), where g[n] is the value of the n-th gene, with perhaps
a scaling factor.

(I'd say "location in fitness space of the gene that affects the n-th
variable relating to fitness" rather than "value of the n-th gene", but
your wording is shorter, if it is intended to have the same meaning.

The implication is that to approach maximum fitness (D = 0), all the
gene values must approach zero. Is this correct?

All the gene locations must approach that of the target, yes.

If that is right, we can generalize by defining maximum fitness as a
condition in which the value of each gene approaches a value g*[n]. This
would define a position D* in fitness space as some position other than
at the origin. Then a fitness error could be defined as

e = sqrt(sum((g*[n] - g[n])^2))

Specifying all the g*[n] values is then equivalent to specifying the
target position in your fitness space, and we don't need the dummy
variable D. If all the g*[n] are zero, we have the original case.

Yes, that's actually what I have.

You can see where I am going.

I assume you are going to have the target drift around in the space. Me, too.

ยทยทยท

-------------

Second problem: survival

I assume that we're talking here about an organism that reproduces by
mitosis.

I make no assumption about how an organism reproduces.

This, of course, implies a doubling of the population for each
generation if there is 100% survival.

No, survival is of the genome, not of the organism. It matters not a whit
whether the survival into the next generation is by having a long-lived
parent or by having a parent that dies.

In your model, however, I don't see how you handle the production of
offspring, or how you assure that the steady-state average survival rate
is 50%. If each ancestor produces one offspring, the ancestor must
immediately die if the population is to be constant. It seems to me that
you make the survival rate an arbitrary function of the gene values,
unaffected by the population, and then simply make up the difference
between the number of survivors and an arbitrary total population by
adding copies of individual organisms selected at random. I don't think
that creates the same effect. You're losing a functional relationship.

You are correct on both counts: you describe correctly what I do, and you
note that it loses a functional relationship. Your variant proposal might
well be better. When I considered ways of dealing with the problem, I
thought about providing "real" resources and determining survival based
on the continued ability of an organism to acquire those resources. But I
was daunted by the computation required in a relatively slow computer (and
HyperCard isn't quick at the best of times). And then by doing this, we
get into the problem of how those resources reproduce, and predator-prey
relationships, and then we get into all those problems of chaos that plague
real ecologies. The questions that we started with would have a strong
tendency to get lost in all the other issues.

Remember, we started all this because of your objection to the metaphor of
nature as "designer of efficient organisms" through competitive evolution.
None of the subsequent discussion or simulation, real or proposed, has
really borne on this objection. The e-coli approach would produce the
same result as undirected natural selection, so far as this issue is concerned.
Nevertheless, the time spent on evolution has by no means been time wasted,
so far as I am concerned. Quite a bit of what I have learned in my simulation
can be applied directly to questions relating to reorganization within an
individual, particularly as it relates to the dimensionality of the random
effects. And much of the rest is interesting in its own right.

Without any mutations, we would find the population coming to some final
state with each genome being represented as a percentage of the total
determined mainly by reproduction rate. The genome determines the basic
reproduction rate and the environment determines the survival rate. So
as a check on the model we can start with a random distribution of a few
genomes and see if the final population comes to about the expected
distribution of genomes.

I can do that with what I have now, but I would be vastly surprised if the
end-point distribution is other than having all the population share the
same genome, which might not be the "fittest" of the original set, but
should be pretty close to it in fitness.

Finally, we can introduce the E. coli type of system in which the mean
interval between mutations is varied about the mean rate in a way
related to the rate of change of intrinsic genetic error (or perhaps e *
de/dt). The result should be a further shift of the population. We can
then start reducing the basic mutation rate, to see if there is a point
where with the control system the organisms evolve quickly to a final
state and without it they evolve more slowly or not at all.

OK, now I see your e-coli algorithm more clearly. I can put that in without
too much trouble. The only thing I am not convinced about is whether I
can easily implement your algorithm for population level.

Martin