multidimensional e-coli results

[Martin Taylor 970209 13:40]

Some time ago, we were talking about the effectiveness of the e-coli
optimization technique in high dimensional spaces. The issue at hand is
that the probability of a step in any random direction bringing the
"cursor" (the e-coli) closer to the "target" (the food source) diminishes
as the dimensionality increases. If the step is less than twice the
distance to the target, then in 1-D the probability is always 0.5 regardless
of step size. If the step is infinitesimal, the probability is always 0.5
regardless of dimensionality. If the step is finite, the probability
approaches zero as the dimensionality increases.

That much, we knew. I asked a mathematician colleague to do better, and
find out how the step size and dimensionality affect the probability that
the step would bring the "cursor" closer to the "target". He produced an
expression, unfortunately not in closed form, but it was not easy to
evaluate numerically. So over the last few weeks he has been running
Monte-Carlo simulations. The following results are from those simulations,
with 200,000 trials per point, for phase 1 of the study. 221 combinations
of dimensionality and distance have been tested.

In phase 1, the step is of any magnitude less than some maximum M, with
uniform probability over the space of length and direction. What that
means in the 2-D case is that, if the start point is at the centre of a
disk with radius M, then the step will land anywhere in the disk with
equal probability. This does NOT mean that the steps are uniformly
distributed in length and then in direction. In high-dimension spaces,
almost all steps are of length very close to M.

In phase 2, to come shortly, all steps will be of the same length, M.

Sample results:

For dimensionality D, what is the closest to the target for which a
step of at most 1 unit will give a probability greater than 0.05 (or
0.20 or 0.30) of getting closer to the target?

These numbers are read off an ASCII plot and the decimal is only approximate,
especially for the higher values.

Dim Minimum distance
     for p>0.05 p>.20 p>.30
4 <1 <1 1.05
8 <1 1.2 1.7
12 1.05 1.6 2.3
16 1.2 1.9 2.7
20 1.35 2.1 3.0
40 2.0 3.0 4.4
60 2.5 3.6 5.6
128 3.8 5.5 8.0
256 4.8 7.7 11.5
512 7.2 10.5 15.0
1024 10.0 14.5 23
2048 14.0 22 >24 (maximum value tested)
4096 20.5 >24 >24
8096 >24 >24 >24

These values will, of course, be reduced when the dealing with only the
single (maximum) step size, since many of the steps are smaller than the
maximum (but in high dimensionality spaces, very few of the steps are
much less than the maximum).

What this means is that the e-coli technique will work, even in high
dimensional spaces, provided that the current position is not too close
to the optimum. For an example, in a space of 2048 dimensions, there
is a probability of at least 0.2 that a step of 1 unit will result in
an improvement if the current position is further than 22 units from the
optimum (and the space is uniform in respect of penalty for distance
from the optimum). To put this in perspective (so's to speak): at a point
22 units from the optimum, the distance from the optimum projected onto
a randomly selected direction will be just under 0.5 units. Plotted in
a 2-D graph, the point would look as if it were very close to the optimum.
Getting that close should not be difficult, using e-coli in a uniform
space. Lumpy high-dimensional spaces are a different matter :slight_smile:

I expect these results to change a little in phase 2 for the lower dimensional
spaces, but not much for those of higher dimensions. But it's hard to be
sure until the computations are done.

Martin