[Martin Taylor 960123 10:45]

And now for Something Completely Different.

Science (the journal) has recently published two articles that I think are

relevant to reorganization. One seems to illustrate in rats part of the

action component of a separate reoganization control system for which

control error is an "intrinsic variable", and the other seems to limit the

scope of simultaneous parallel reorganization.

(1) Tissue Plasminogen Activator Induction in Purkinje Neurons after

Cerebellar Motor Learning. N.W.Seeds, B.L.Williams and P.C. Bickford,

Science, 270 (22 Dec 1995) 1992-1994.

Rats were trained to run across a kind of bridge that consisted of rods

arranged to stick out alternately from the left and the right side rail,

like this:

--- ---

> >------------------------| |

> > > > > > > > > >

> > > >

> >> > > > > > > >

> >------------------------| |

--- ---

Two weeks after they had learned to do this task, they were confronted with

a more difficult bridge, in which the pegs were irregularly placed, not

necessarily in alternation, on the two sides. The results refer to the

second training.

Apparently tPA mRNA (Tissue Plasminogen Activator messenger RNA--I'll just

call it tPA here) is associated with the development of synapses. Rats

rained on this runway-bridge showed elevated levels of tPA specifically in

the Purkinje cells in the cerebellum, as compared to rats not so trained.

(It is known that rats trained to do acrobatic tasks show "an increase in

the number of synapses per Purkinje cell in the molecular layer of the

cerebllar cortex"). This elevated level of tPA was evident at 1 and 4 hours

after the first training session, and persisted to a small degree for 24

hours. After the second training session (24 hours after the first) the

level was again elevated, but less restricted in its location. Simple

running on the same bridge covered by a plexiglass floor did not cause

any elevation in tPA. And there was no increase in tPA in the hippocampus,

which is thought to be associated with specific memory.

One possible interpretation of this is that the error associated with the

initial difficulty of traversing the bridge led to a reorganization episode.

The action output of a "reorganizing control system" caused the elevation of

tPA levels, which led to the development of new synapses in the cerebellum.

Whether these synapses are concentrated in any part of the motor control

system or are generally distributed across the levels, and whether they

are all excitatory or mixed excitatory and inhibitory, is not indicated

in the article. What does seem to be the case is that the tPA is concentrated

within a particular layer of the cerebellum but not in any particular

spatial location in the 2-D space of this layer. This suggest, but does

not demonstrate, that the reorganization was occuring primarily in a specific

level of the control hierarchy.

(2) Criticality and parallelism in Combinatorial Optimization. W.G.Macready,

A.G.Siapas and S.A.Kauffman, Science, 271 (5 Jan 1996), 56-59

This paper deals with the benefits or otherwise of using parallel processing

when optimizing a large population of interacting entities. They use

a couple of examples, of which "spin" as in "spin glass" is the more

developed.

The notion of "spin" is that a spin can be in one of two states,

call it "up" and "down". If a spin is opposite to the spin of its neighbours,

the energy of the whole system is greater than if the spins of all the

neighbours is the same. In a single level of a PCT hierarchy, one can

make the analogy that if two ECUs have non-orthogonal perceptual functions

OR non-orthogonal output functions, then each disturbs the other by its

actions and the total error in the level is greater than if there were

no such disturbance. This mutual disturbance is less if the actions

of the two have components in the same direction as a consequence of

a given environmental effect than if they have components in the opposite

direction. Two non-orthogonal ECUs can thus, to a first approximation,

be considered as having "spin directions" relative to each other.

An optimum hierarchy is one that minimizes mutual disturbance. The mutual

disturbance will be zero if all ECUs have orthogonal PIFs AND if their

actions generate no side effects (which assures that their output functions

are also orthogonal). But if there are more ECUs at a level than there

are output degrees of freedom, orthogonality is impossible, and if there

are any side effects, it doesn't matter that the PIFs are orthogonal, there

will still be mutual disturbance across the ECUs in a layer.

The "energy" of a spin system can be equated to the sum of squared error

in the layer of the PCT hierarchy, which will persistently be higher in

a layer with much mutual interaction of the "wrong" sign than in a layer

with optimized signs.

Now to the article.

In the "spin" example, a serial process consists of changing the spin of

one particle chosen at random, and seeing whether the system energy goes up

or down. Choose the lower energy configuration and repeat (this choice

can be probabilistic rather than deterministic, to prevent the system from

getting stuck in a poor local minimum). An N-parallel process consists of

changing N randomly selected spins at the same time and seeing whether the

system energy goes up or down.

Clearly, if it is guaranteed that none of the changed spins are "neighbours"

of each other, one step of an N-parallel process is equivalent to N steps

of a serial process, with the added benefit of smoothing the landscape so

that at least some local minima do not trap the optimization as readily

as they would in a serial process. But if two spins of the N are neighbours,

changing both could be the same as changing neither. As the number N

increases, the probability of two spins being neighbours also increases.

The same probability also increases if the number of "neighbours" increases.

(In the PCT sense, the number of neighbours is the number of ECUs whose

perceptions are influenced by the actions of any one ECU).

What the authors of the paper found is that with increasing N, the quality of

the solution improves slowly, and then very abruptly drops to a value about

what would be found by random juggling of the spins. The value of N at

which this step happens is a function of K, the number of neighbours whose

spins affect the total energy when any one spin flips. The greater the

number of neighbours, the smaller the critical value of N. They find

Nc = 5(sqrt(1+4/K) - 1)/2, where Nc is the proportion of all spins that

flip in any one "move", and K is the number of spins in a "neighbourhood".

There is a theoretical jsutification for the form of this curve, and it

fits the data very well.

The second example is the Travelling Salesman Problem (TSP). An efficient

algorithm for TSP is to take K randomly chosen cities and to permute the

order of these cities in the tour. For example, if in one proposed tour,

the salesman would visit cities 1, 2, 3 in that order, a permutation might

reorder them as 1, 3, 2. Whichever gave the shortest result would be

included in the next evaluation, which might permute the order of cities

6,10, and 15. In the parallel version, N of these groups of size K would

be permuted at once. Without going into detail, exactly the same result is

found--slightly improving solutions up to a critical value of N, and then

a step drop to a performance equivalent to random scrambling of the tour.

It seems to me that at least one aspect of reorganization has much in

common with these problems. That is reorganization by inverting the sign

of action of an ECU. If it is true that the situations in the article are

analogous to this kind of reorganization, then it argues that reorganization

must be local to be effective. Reorganization at random places within a

hierarchic layer will be fine up to a point, but if there is too much

parallelism of reorganization, then the whole ability to control will be

abruptly lost. On the other hand, reorganization that is too slow and

deliberate may eventually produce a result, but it will be neither as

good a result nor achieved as quickly as one in which a small degree

of parallelism occurs. The touchy question is how orthogonal are the

ECUs within a layer, because the more orthogonal they are, the more

the benefit of parallel reorganization.

None of which deals with the issue of topologically continuous reorganization,

but I would not be too surprised to find a similar phenomenon there.

Martin