[Martin Taylor 2018.05.23.09.24]
[Eetu Pikkarainen 2018-05-23_12:02:57 UTC]
Sorry to ask this simplistic question but I have no references available at the moment. Of course I am not so interested in bacteria but the hypothesized algorithm of the reorganization.
If an e-coli turns towards a gradient (to smaller error) does that increase the probability that next turn will be to the same direction? And other way round if it turns away from the gradient (and error grows) does that increase the probability that it will turn to some other direction? And still does the bigger error make turning more probable and smaller error less probable?
(is this all?)
Eetu
I can't say what is actually programmed in existing code for
reorganization, but here’s how I think it should work.
E-coli is a method of optimizing a set of parameters to maximize any
“fitness” criterion in a situation where the fitness changes
smoothly (or nearly so) with small changes in the parameter values.
It’s one of a large number of optimization algorithms that have
different requirements and are best for different types of fitness
landscapes.
The special problem addressed by the e-coli process (I won't call it
an algorithm because there may be different implementations) is that
the optimizer (the analogue of the bacterium) has no perception of
the direction of a gradient. What it “knows” is how to keep going in
some direction, and how to tumble so that it is going in some other
random direction. In some algorithms it may be able to change speed,
in others it may keep track of where it has been. But all of them
need the concept of a “direction” in a parameter space. So let’s see
what that might be.
A "parameter space" is defined by the set of parameters {m1, m2, m3,
…mn} that you are interested in. Each parameter is a “dimension”
of the space. In PCT reorganization (no matter which version of PCT
you are working with) involves changing the connection weights in a
hierarchy or network of relationships. In HPCT those are often
thought of as being weights connecting the outputs at one level to
the reference inputs of another level, but they you equally well be
on the perceptual side, or on both at the same time. All that
matters for the e-coli process is that there is this set of numbers,
called a vector, that describe the actual values of the parameters
where the optimizer is at any moment.
When the optimizer is at {m1, m2, m3, ..., mn) the fitness measure
has a certain value that we can call “f”.
Now the optimizer moves, and is at {m1+∆m1, m2+∆m2, ..., mn+∆mn},
where the fitness is f+∆f. The set of numbers {∆m1, ∆m2, ∆m3, …,
∆mn} define a direction in the space, and ∆f represents the change
in fitness caused by this move. The distance moved in the space is
sqrt(∆m12+∆m22+…+∆mn2 ) or ∆m. The
optimizer is supposed to be able to perceive some property of ∆f,
perhaps its magnitude, perhaps only its sign, but it must at least
be able to tell whether ∆f is positive or negative.
Here again, different e-coli algorithms may differ, but in all of
them, the optimizer is more likely to tumble if ∆f is negative than
if it is positive. Some may make the probability of a tumble be a
monotonically decreasing function of ∆f, some may make the
probability depend on ∆f/∆m, and so forth. If there is no tumble the
optimizer keeps going in the same direction, perhaps at the same
speed by adding {∆m1, ∆m2, ∆m3, …, ∆mn}, perhaps at a changing
speed that could depend of the acceleration of ∆f, ∆2f/∆m2 .
If there is a tumble, a new set (vector) of ∆m’s is chosen, each
independently of the others, except that in some algorithms the new
∆m is set so that the new speed of movement is the same as the old
speed, an others it is set so that the new speed is some standard
speed, and in yet others (which I prefer) the new speed is whatever
happens when the individual ∆m values are chosen from the same
Gaussian (normal) distribution (assuming, that the scales of the
"m"s have been normalized).
One of the problems of e-coli optimization is that it can get caught
in local optima if the tumble rate goes too low. That is why the
rate of reorganization (i.e. probability per second of a tumble)
never should go to zero no matter how good the fitness. I ran into
that problem when fitting my 5-parameter models
.
A key problem for e-coli is one that Powers mentioned in Bruce
Abbott’s quote [Bruce Abbott (2018.05.23.1015 EDT)]: "
···
http://www.mmtaylor.net/PCT/CSG2005/CSG2005bFittingData.ppt
becomes very large, even the E. coli method is defeated. There
seems to be some number that is the maximum practical limit for
this method* . " This is true. I got a mathematician in our
laboratory to work this out and gave Bill the results. The reason
for the problem is that in a high-dimensional space any random
direction is highly likely to be almost exactly orthogonal to any
specific direction, and therefore very nearly tangent
to a hypersurface of constant fitness.
The optimizer wants to find a direction close to a direction that
points to the place where f is maximum, and in a high-dimensional
space almost all of the random directions are no help at all. Many
tumbles are needed before one is found that leads appreciably
toward improved fitness If the sign of the fitness change could be
determined with great accuracy after very small moves, this
problem would not be a big deal for moderate numbers of
parameters, but when the rate of fitness change is low and the
accuracy of measuring it is not great, it takes time to decide
whether to tumble, and the e-coli process becomes impracticably
slow.
Bill's answer in LCS III and in CSGnet and private communication
has been modularization, the question of how best to form the
modules being left for further research. It seems clear that some
kind of hierarchy of modules will be needed, so that e-coli
optimization is conducted within modules and then within
super-modules that optimize global properties within the modules,
and hypermodules beyond them, to as many levels as works best. In
a quite different context, Stuart Kauffman (“At Home in the
Universe”) found that modules of 6 dimensions were optimum, so
maybe there’s a hint there about this problem.
Anyway, there isn't one answer to your question about an algorithm
for the e-coli process in reorganization. There is whatever
algorithm people like Bill or Rick or Bruce Abbott have
programmed, and then there are all the other possible e-coli
algorithms. I have tried to indicate the general space of e-coli
optimization within which these algorithms live, so as to suggest
what e-coli is about, rather than simply giving you a cookbook
solution.
Martin