other structures

[Martin Taylor 970424 12:10]

Hans Blom, 970424h]

(Martin Taylor 970423 10:10)

> ... I still suspect that there cannot ever be behavioural
>experimental evidence to distinguish the two classes of structure.
>Evidence probably has to come from physiology, not the control
>properties. Maybe not, but so far it seems so.

You could well be right: when optimally adjusted, the two "bare
bones" PCT and MCT controllers show identical behavior. It is then
only the program code (aka nerve connections) that distinguish them.

Yet, it may be possible that a distinction can be discovered from
non-ideal, e.g. "surprising" behavior. But then we have to remember
that the comparison that we considered thus far -- between PCT and
MCT -- does not exhaust the possibilities: there are more contenders,
e.g. artificial neural nets and artificially evolved controllers.

I'm not clear what these "contenders" might be. An HPCT controller is
an artificial (or a real) neural net. It just isn't one of the configurations
usually studied. An artificially evolved controller? That's what
reorganization does, isn't it? The question wouldn't be whether the
reorganization involved genetic recombinations, but whether the evolved
structure would be something that is neither MCT nor PCT.

For the record, I think that there is a continuum of structural possibilities
with "pure" PCT at one end (no explicit models of anything anywhere), and
with "pure" MCT at the other (everything done by reference to one humungous
explicit model of the whole outer world so far experienced). I don't
believe either of these extremes, and if I judge rightly, neither do you
or Bill. Bill has repeatedly said that there may be models used at the
program level of a PCT structure (or, perhaps, "at the higher levels").
You seem to treat the MCT structure as involving special models for
different purposes, and to combine these into an organism presumably
involves some binding mechanism that would be hard to distinguish from
a hierarchy.

But none of that has to do with my comment, which I think you misinterpreted.

Long ago a posed a speculation that for every MCT controller there was
an HPCT controller with equivalent behaviour, and vice-versa. By "equivalent
behaviour" I meant that no matter what disturbances might be applied, the
same actions would be observed. So far, nobody has come up with either
a counter-example or a theoretical demonstration that the speculation is
wrong. A counter-example would be difficult to produce, because it would
always be subject to the criticism that there may exist an equivalent
structure of the other type, but the counter-example just failed to find it.

Martin

[Hans Blom, 970428c]

(Martin Taylor 970424 12:10)

But then we have to remember that the comparison that we considered
thus far -- between PCT and MCT -- does not exhaust the
possibilities: there are more contenders, e.g. artificial neural
nets and artificially evolved controllers.

I'm not clear what these "contenders" might be. An HPCT controller
is an artificial (or a real) neural net. It just isn't one of the
configurations usually studied. An artificially evolved controller?
That's what reorganization does, isn't it?

That's an exciting thought: HPCT controllers implement the holy grail
or the "grand unifying theory" that everyone searches for, with all
the desirable properties and none of the undesirable ones. Machine
intelligence indeed ;-). Besides control we would now have learning
properties like in neural nets (but where is the backpropagation or
whatever?), and invention of new building blocks like in genetic
algorithms (but what invents the new parts and how are they inserted
into the pre-existing scheme?).

Waiting for your results ;-).

Greetings,

Hans

[Martin Taylor 970428 12:15]

Hans Blom, 970428c]

>(Martin Taylor 970424 12:10)

>I'm not clear what these "contenders" might be. An HPCT controller
>is an artificial (or a real) neural net. It just isn't one of the
>configurations usually studied. An artificially evolved controller?
>That's what reorganization does, isn't it?

That's an exciting thought: HPCT controllers implement the holy grail
or the "grand unifying theory" that everyone searches for, with all
the desirable properties and none of the undesirable ones. Machine
intelligence indeed ;-). Besides control we would now have learning
properties like in neural nets (but where is the backpropagation or
whatever?)

I have to go outside "standard" PCT here. Well, I don't really _have_ to,
but I will, because it's something I've wanted to try for a long time.

As for the "holy grail," I do think that HPCT represents the "grand
unifying theory" of biology (including psychology). There. I've declared
my faith, and claim my seat at the Round Table discussion. Now we can
all join in the search for the grail.

In a backprop MultiLayer Perceptron (MLP) there's an "error" variable that
is the discrepancy vector between the desired set of outputs and the
actual set of outputs. The weights are tuned according to their contributions
to the discrepancy. Each element of the error vector corresponds to one
of the outputs at the output layer. The corrections then propagate back
throught the hidden layers to change _their_ weights. But if there are
N layers of M nodes each, there are N*(M^2) weights to be corrected, and
only a vector of dimension M to do the correcting. Nevertheless, the
MLP usually converges to a reasonably good configuration after a long time.

In a HPCT network, there are the same set of input interconnections, and
as many output interconnections (output of level N to reference of level
N-1). So there are 2*N*(M^2) weights to consider, which sounds worse.
But there are N*M error signals, which brings the dimension problem down
from N*M per "teacher variable" to 2*M per "teacher variable," where the
HPCT "teacher variable" is a function of the error in a particular Elementary
Control Unit (ECU).

Now see what is happening here. In the HPCT system, the teacher is the
system's adequacy in controlling in its actual environment, not some
external teacher who "knows best." The dimensionality of the search
space appears to be reduced from N*M to 2*M per learning variable, but
this is (initially) something of an illusion. Conflict will effectively
increase the search space, at least until the lower levels stabilize in
a more nearly orthogonal structure (as I discussed a couple of days ago,
and as Bill P has discussed many times over the years). When that happens,
the reduction in "learning space" becomes real, and learning can proceed
apace. This increase in learning speed doesn't happen in the MLP structure,
because the whole net has "responsibility" for the error in each element
of the output vector.

In the HPCT system, if the environment has too few degrees of freedom,
the structure will converge, but not to any unique configuration; it will
converge, like the MLP, to any structure that does the job of keeping
total absolute error low. In a simple environment, there are many such
structures. But if the complexity of the environment increases, up to
M*N degrees of freedom, the HPCT structure will still be able to adjust,
though the more complex the environment, the longer the learning will
take (as in real life:-).

, and invention of new building blocks like in genetic
algorithms (but what invents the new parts and how are they inserted
into the pre-existing scheme?).

Thre are at least three possibilities, and probably many more. The three
obvious possibilities are (1) that new "parts" (i.e. ECUs) are generated at
random and are randomly linked to existing ones, (2) that new parts
are generated by some process such as a genetic algorithm in which the
properties and linkages of the new ECU are derived from those of the parents,
and (3) that the new ECU is randomly constructed but placed "algorithmically"
within an existing or new level of the HPCT structure.

I have a prejudice in favour of possibilities 1 and 2. I speculate that
in case 1 the random linkages will soon be reorganized so that the new
ECU comes to be part of an existing or new level, "stray" linkages being
unlikely to be useful in the new ECU's attempts to control. In case 2,
the new ECU does something new with aspects of perceptions that have
proved to be useful. It's less radical than 1 in its impact on the
hierarchy, and has the same properties with regard to reconfiguration
by reorganization.

Waiting for your results ;-).

Well, when I had money to put out in contracts, we did put out one with a
view to investigating these issues, but the money ran out and I retired,
so it isn't easy. The project was called "Little Baby." The idea was for
the LB to learn the syntax of a formal grammar when it could observe
only the waveform of three or four continuously varying "phonetic features."
We were able to show some learning at the lowest level, but that's as far
as it went. So you'll have to wait a while for further results. Someone
may take it up, I would guess more probably in a university rather than
an industrial or government laboratory.

Martin

[Martin Taylor 970428 15:15]

Martin Taylor 970428 12:15]

About generating new ECUs, I said:

Thre are at least three possibilities, and probably many more. The three
obvious possibilities are (1) that new "parts" (i.e. ECUs) are generated at
random and are randomly linked to existing ones, ...

I have a prejudice in favour of possibilities 1 and 2. I speculate that
in case 1 the random linkages will soon be reorganized so that the new
ECU comes to be part of an existing or new level, "stray" linkages being
unlikely to be useful in the new ECU's attempts to control.

I should have added that possibility 1 also allows "self-control" or
"perception of own properties" to fall out as a matter of course. If the
connections are random, some of the inputs may link to the output signals
of other ECUs and some to the perceptual signals of other ECUs, and some
of the outputs may link to the reference inputs of other ECUs and some
to their perceptual inputs. If the result is non-control, or a diminution
of the control exercised by the existing hierarchy, most of these stray
links will be reorganized away. But some may not, and these could be in
two interesting configurations. One is the associative-categorical network
I've discussed before under the name "grand flip-flop". The other is a
kind of self-evaluative configuration in which the perception of one ECU
may be compared with its reference (as, of course, it does internally),
within the perceptual input function of the new ECU, and the output of the
new ECU might affect some aspect of reorganization around the "observed"
ECU.

More speculation, of course, but perhaps not uninteresting.

Martin

[Hans Blom, 970429h]

(Martin Taylor 970428 12:15)

I have to go outside "standard" PCT here. Well, I don't really
_have_ to, but I will, because it's something I've wanted to try for
a long time.

You describe an exciting research proposal, Martin!

As for the "holy grail," I do think that HPCT represents the "grand
unifying theory" of biology (including psychology). There. I've
declared my faith, and claim my seat at the Round Table discussion.
Now we can all join in the search for the grail.

Why search for the grail if we have it in our hands? :slight_smile:

In a backprop MultiLayer Perceptron (MLP) ...

There's a nice article in prof. Ljung's collection in Sweden about
the relationship (identicality?) between systems identification and
neural network techniques. It is, regrettably, written "the wrong
way": it introduces neural networks to people already familiar with
parameter estimation/model construction. It demonstrates that NNs are
"nothing but" the usual things, the only difference being a different
choice of basis functions.

There are at least three possibilities [for inventing new building
blocks], and probably many more. The three obvious possibilities are
(1) that new "parts" (i.e. ECUs) are generated at random ...

I don't believe that the "at random" plays the major role, at least
not in an individual's lifetime. In evolution, sure. For an
individual it is far too wasteful. Optimally, the insertion ought to
take place at that spot where the overall control error could be
reduced most. Not reliably, maybe, but better than random. As soon as
such an "algorithm" gets invented, even crudely, evolution will take
care that it spreads rapidly, even though it might have only a slight
edge.

Maybe we do see this, however coarsely: new "random" (?) connections
preferably seem to take place in the neocortex, I think, and not in
the "older" structures. But is that all we can say about where new
connections preferably take place? Any neurologists listening?

Greetings,

Hans

[Martin Taylor 970429 12:30]

Hans Blom, 970429h]

There's a nice article in prof. Ljung's collection in Sweden about
the relationship (identicality?) between systems identification and
neural network techniques. It is, regrettably, written "the wrong
way": it introduces neural networks to people already familiar with
parameter estimation/model construction. It demonstrates that NNs are
"nothing but" the usual things, the only difference being a different
choice of basis functions.

How does one get at this article (or this collection)?

Incidentally, on the "nothing but" line, John Bridle (now at Dragon
Systems UK) was able to show that the Hidden Markov Model and the
Neural Network approaches to speech recognition could be mapped
into each other, as well. But one-to-one mapping doesn't really mean
one is "nothing but" the other. A change in viewpoint can make a
dramatic difference in how things are understood, and in the ease
of making predictions from them. Imagine an orchard, with the trees
planted on a regular lattice. If you just look at the orchard from
afar, at a random angle, you have all the data needed to see the lattice
structure, but it looks like an irregular jumble. But if you happen to
hit a lucky angle, then you see lines of trees with alleys between them
(the principle of crystal diffraction). You have the data available
in each case, so the orchard in one case is "nothing but" the orchard
in the other case, but boy, do they _look_ different.

Martin

Martin

[From Bruce Gregory (970429.1350 EST)]

Martin Taylor 970429 12:30

A change in viewpoint can make a
dramatic difference in how things are understood, and in the ease
of making predictions from them.

This is the way I see many of your differences with Rick.
Personally, I find Rick's approach much more transparent, but
others may see the situation quite differently.

Bruce

[Martin Taylor 970429 14:25]

Bruce Gregory (970429.1350 EST)]

Martin Taylor 970429 12:30

> A change in viewpoint can make a
> dramatic difference in how things are understood, and in the ease
> of making predictions from them.

This is the way I see many of your differences with Rick.
Personally, I find Rick's approach much more transparent, but
others may see the situation quite differently.

I also find Rick's approach more transparent. I admire it, and have
told him so more than once. I don't advocate replacing one viewpoint
with another. Instead, I find that each viewpoint helps one to
understand things that might be problematic from a different
viewpoint. Rick often posts messages that clarify points for me, and
from time to time I compliment him on his clarity. It's only when he
starts pontificating on things that I think I understand a bit better
than he does that we get the confusing back-and-forth set of misunderstood
messages.

I'm glad Rick is there to keep our metaphorical feet on the ground of
practice. But I think that my approach provides a useful complement
(I would, wouldn't I). If Rick weren't there (or if Bill, heaven forbid,
decided he was no longer interested in the PCT community), I might
well find myself trying to post "practical" messages and the sort of
appeals to common sense that they do. I don't do it, because it's
already being well done, better than I could do.

At the back of my infrequently repeated attempts to discuss the information
theoretic viewpoint is the notion that it makes understanding the
hierarchy rather easier than is possible by going at one control loop
at a time with simulations or approximations by linear analysis (of
non-linear systems). By analogy, it's a bit like using thermodynamics
rather than elastic-atom ballistics for the study of gases. If you've
got one or two atoms, the thermodynamics isn't the best approach. If
you've got a lot, it may be.

But one has to get the basics across first.

Martin

[Hans Blom, 970501b]

(Martin Taylor 970429 12:30)

There's a nice article in prof. Ljung's collection in Sweden about
the relationship (identicality?) between systems identification and
neural network techniques. It is, regrettably, written "the wrong
way": it introduces neural networks to people already familiar with
parameter estimation/model construction. It demonstrates that NNs
are "nothing but" the usual things, the only difference being a
different choice of basis functions.

How does one get at this article (or this collection)?

Go to http://ankeborg.isy.liu.se/cgi-bin/reports. Interesting
articles on neural nets are numbers 1373, 1622, 1650, 1750 and 1895.
The one I refered to was the first or second of these, I believe.

... one-to-one mapping doesn't really mean one is "nothing but" the
other. A change in viewpoint can make a dramatic difference in how
things are understood, and in the ease of making predictions from
them.

Thoroughly agreed!

Greetings,

Hans