"Science" and reorganization

[Martin Taylor 960123 10:45]

And now for Something Completely Different.

Science (the journal) has recently published two articles that I think are
relevant to reorganization. One seems to illustrate in rats part of the
action component of a separate reoganization control system for which
control error is an "intrinsic variable", and the other seems to limit the
scope of simultaneous parallel reorganization.

(1) Tissue Plasminogen Activator Induction in Purkinje Neurons after
Cerebellar Motor Learning. N.W.Seeds, B.L.Williams and P.C. Bickford,
Science, 270 (22 Dec 1995) 1992-1994.

Rats were trained to run across a kind of bridge that consisted of rods
arranged to stick out alternately from the left and the right side rail,
like this:

   --- ---
  > >------------------------| |
  > > > > > > > > > >
  > > > >
  > >> > > > > > > >
  > >------------------------| |
   --- ---

Two weeks after they had learned to do this task, they were confronted with
a more difficult bridge, in which the pegs were irregularly placed, not
necessarily in alternation, on the two sides. The results refer to the
second training.

Apparently tPA mRNA (Tissue Plasminogen Activator messenger RNA--I'll just
call it tPA here) is associated with the development of synapses. Rats
rained on this runway-bridge showed elevated levels of tPA specifically in
the Purkinje cells in the cerebellum, as compared to rats not so trained.
(It is known that rats trained to do acrobatic tasks show "an increase in
the number of synapses per Purkinje cell in the molecular layer of the
cerebllar cortex"). This elevated level of tPA was evident at 1 and 4 hours
after the first training session, and persisted to a small degree for 24
hours. After the second training session (24 hours after the first) the
level was again elevated, but less restricted in its location. Simple
running on the same bridge covered by a plexiglass floor did not cause
any elevation in tPA. And there was no increase in tPA in the hippocampus,
which is thought to be associated with specific memory.

One possible interpretation of this is that the error associated with the
initial difficulty of traversing the bridge led to a reorganization episode.
The action output of a "reorganizing control system" caused the elevation of
tPA levels, which led to the development of new synapses in the cerebellum.
Whether these synapses are concentrated in any part of the motor control
system or are generally distributed across the levels, and whether they
are all excitatory or mixed excitatory and inhibitory, is not indicated
in the article. What does seem to be the case is that the tPA is concentrated
within a particular layer of the cerebellum but not in any particular
spatial location in the 2-D space of this layer. This suggest, but does
not demonstrate, that the reorganization was occuring primarily in a specific
level of the control hierarchy.

(2) Criticality and parallelism in Combinatorial Optimization. W.G.Macready,
A.G.Siapas and S.A.Kauffman, Science, 271 (5 Jan 1996), 56-59

This paper deals with the benefits or otherwise of using parallel processing
when optimizing a large population of interacting entities. They use
a couple of examples, of which "spin" as in "spin glass" is the more
developed.

The notion of "spin" is that a spin can be in one of two states,
call it "up" and "down". If a spin is opposite to the spin of its neighbours,
the energy of the whole system is greater than if the spins of all the
neighbours is the same. In a single level of a PCT hierarchy, one can
make the analogy that if two ECUs have non-orthogonal perceptual functions
OR non-orthogonal output functions, then each disturbs the other by its
actions and the total error in the level is greater than if there were
no such disturbance. This mutual disturbance is less if the actions
of the two have components in the same direction as a consequence of
a given environmental effect than if they have components in the opposite
direction. Two non-orthogonal ECUs can thus, to a first approximation,
be considered as having "spin directions" relative to each other.

An optimum hierarchy is one that minimizes mutual disturbance. The mutual
disturbance will be zero if all ECUs have orthogonal PIFs AND if their
actions generate no side effects (which assures that their output functions
are also orthogonal). But if there are more ECUs at a level than there
are output degrees of freedom, orthogonality is impossible, and if there
are any side effects, it doesn't matter that the PIFs are orthogonal, there
will still be mutual disturbance across the ECUs in a layer.

The "energy" of a spin system can be equated to the sum of squared error
in the layer of the PCT hierarchy, which will persistently be higher in
a layer with much mutual interaction of the "wrong" sign than in a layer
with optimized signs.

Now to the article.

In the "spin" example, a serial process consists of changing the spin of
one particle chosen at random, and seeing whether the system energy goes up
or down. Choose the lower energy configuration and repeat (this choice
can be probabilistic rather than deterministic, to prevent the system from
getting stuck in a poor local minimum). An N-parallel process consists of
changing N randomly selected spins at the same time and seeing whether the
system energy goes up or down.

Clearly, if it is guaranteed that none of the changed spins are "neighbours"
of each other, one step of an N-parallel process is equivalent to N steps
of a serial process, with the added benefit of smoothing the landscape so
that at least some local minima do not trap the optimization as readily
as they would in a serial process. But if two spins of the N are neighbours,
changing both could be the same as changing neither. As the number N
increases, the probability of two spins being neighbours also increases.
The same probability also increases if the number of "neighbours" increases.
(In the PCT sense, the number of neighbours is the number of ECUs whose
perceptions are influenced by the actions of any one ECU).

What the authors of the paper found is that with increasing N, the quality of
the solution improves slowly, and then very abruptly drops to a value about
what would be found by random juggling of the spins. The value of N at
which this step happens is a function of K, the number of neighbours whose
spins affect the total energy when any one spin flips. The greater the
number of neighbours, the smaller the critical value of N. They find
Nc = 5(sqrt(1+4/K) - 1)/2, where Nc is the proportion of all spins that
flip in any one "move", and K is the number of spins in a "neighbourhood".
There is a theoretical jsutification for the form of this curve, and it
fits the data very well.

The second example is the Travelling Salesman Problem (TSP). An efficient
algorithm for TSP is to take K randomly chosen cities and to permute the
order of these cities in the tour. For example, if in one proposed tour,
the salesman would visit cities 1, 2, 3 in that order, a permutation might
reorder them as 1, 3, 2. Whichever gave the shortest result would be
included in the next evaluation, which might permute the order of cities
6,10, and 15. In the parallel version, N of these groups of size K would
be permuted at once. Without going into detail, exactly the same result is
found--slightly improving solutions up to a critical value of N, and then
a step drop to a performance equivalent to random scrambling of the tour.

It seems to me that at least one aspect of reorganization has much in
common with these problems. That is reorganization by inverting the sign
of action of an ECU. If it is true that the situations in the article are
analogous to this kind of reorganization, then it argues that reorganization
must be local to be effective. Reorganization at random places within a
hierarchic layer will be fine up to a point, but if there is too much
parallelism of reorganization, then the whole ability to control will be
abruptly lost. On the other hand, reorganization that is too slow and
deliberate may eventually produce a result, but it will be neither as
good a result nor achieved as quickly as one in which a small degree
of parallelism occurs. The touchy question is how orthogonal are the
ECUs within a layer, because the more orthogonal they are, the more
the benefit of parallel reorganization.

None of which deals with the issue of topologically continuous reorganization,
but I would not be too surprised to find a similar phenomenon there.

Martin