HPCT as neural network

[Martin Taylor 960822 15:15]

Bruce Gregory (960822.1420 EDT)

an HPCT hierarchy is every bit
as much a neural net as is any MLP or Hopfield net or...

There seems to be some confusion (perhaps mine) between a
theory (HPCT) and a model of that incorporates mechanisms
compatible with that theory (neural nets). The behavior of a
thermostat is understandable in terms of PCT, but it employs no
neural net mechanisms that I am aware of, although one could use
a neural net to model the workings of a thermostat.

I never said a PCT system was a neural network. I said an HPCT system was
a neural network.

A thermostat is a single control loop, not a hierarchy.

It is the hierarchy that _consist of_ a neural network (or two interconnected
ones, if you like). I'm taking "an HPCT system" or "an HPCT hierarchy"
to be the structure described by the theory called HPCT, not the theory
itself or a "model that incorporates mechansisms compatible with that
theory." I'm saying that the structure described by the theory called
HPCT _is_ a neural network, simply because it has all the characteristics
of a neural network described in any other neural network theory.

An HPCT hierarchy consists of a large number of relatively simple nodes
interconnected many to many through a set of linkages that have weights
that adapt (learn). HPCT is a restricted form of network in that the
nodes can be assigned to "layers" that can be labelled "higher" and "lower",
which is why I often use the MLP as a familiar way to describe the HPCT
perceptual (or output) structure to outsiders. Unlike the MLP, though,
the HPCT structure has no "hidden nodes." All nodes can contribute their
quality factor (typically a function of the time course of their error)
to the learning process.

Does this clarify?

Martin

[From Bruce Gregory (960822.1615 EDT)]

(Martin Taylor 960822 15:15)

It is the hierarchy that _consist of_ a neural network (or two interconnected
ones, if you like). I'm taking "an HPCT system" or "an HPCT hierarchy"
to be the structure described by the theory called HPCT, not the theory
itself or a "model that incorporates mechansisms compatible with that
theory." I'm saying that the structure described by the theory called
HPCT _is_ a neural network, simply because it has all the characteristics
of a neural network described in any other neural network theory.

An HPCT hierarchy consists of a large number of relatively simple nodes
interconnected many to many through a set of linkages that have weights
that adapt (learn).

Does this clarify?

Yes. I now understand what you mean by saying that the
structure _is_ a neural network, although it seems to me that
the structure does not "learn" in the way that a typical neural
network does. But perhaps my rudimentary understanding of
neural networks is letting me down.

Regards,

Bruce

[Martin Taylor 960822 17:00]

Bruce Gregory (960822.1615 EDT)

Yes. I now understand what you mean by saying that the
structure _is_ a neural network, although it seems to me that
the structure does not "learn" in the way that a typical neural
network does.

Different kinds of neural network learn in different ways. Some have to
have a trainer, some learn on their own. But whichever way they learn,
they have to use some kind of criterion as to whether they are getting
better or not. Some use the discrepancy between the outputs of some
"output" nodes and the output desired by a teacher. HPCT asserts that
the criterion variable is the success in controlling the intrinsic
variables. But one could imagine each node providing its own criterion--
success in its own control; this cannot be done with most kinds of neural
network, which is why I think that an HPCT system could be expected to
learn much faster than most neural networks.

Martin

[Hans Blom, 960823]

(Martin Taylor 960822 17:00)

Your explanation why you say that the HPCT structure is a neural
network makes excellent sense. The next step would be to introduce a
learning algorithm that makes a _controller_ out of the hierarchical
set of nodes. Thus far, tuning (or at least the design of tuning
rules for each node, as in Bill Power's "artificial cerebellum") had
to be done by hand. That way we don't get far.

Different kinds of neural network learn in different ways. Some have
to have a trainer, some learn on their own.

To me, this is an artificial distinction. They ALL have a built-in
criterium function of what is "best". Whether it is matching the
values of some nodes to external values offered by a "teacher" or
matching according to an internal criterium hardly changes the basic
operation, which is to provide a best match. What all networks have
in common is that they have built-in knowledge of what is best and
learn autonomously.

But whichever way they learn, they have to use some kind of
criterion as to whether they are getting better or not.

Yes. That criterium is what MCT considers to be the highest level.

Some use the discrepancy between the outputs of some "output" nodes
and the output desired by a teacher. HPCT asserts that the criterion
variable is the success in controlling the intrinsic variables.

There are two approaches in neural nets. In the first, there is ONE
overall criterium of what is "best", a top level so to say. In that
case, you need to remove the plural. If not, it won't be clear (in
case of conflicts, which will undoubtedly arise) which intrinsic
variable ought to be controlled better. The second approach makes the
criteria local.

But one could imagine each node providing its own criterion --
success in its own control; this cannot be done with most kinds of
neural network, which is why I think that an HPCT system could be
expected to learn much faster than most neural networks.

Hebb's rule does exactly this. It distributes the "quality" over all
(computing) cells. This makes perfect biological sense, because all
(biological) cells have to survive in their local environment in
order to "service" the whole. Hebb's rule is nice and simple and easy
to implement. It might be a good first step to try out as an HPCT
learning mechanism. It doesn't always succeed, however, due to local
extrema (this is true for many other methods as well). It also tends
to be slow, possibly because optimization is so local. But the
hierarchical connections may serve the task of making optimization
more global.

This whole discussion is starting to get constructive! I like that!

Greetings,

Hans

[Martin Taylor 960823 12:10]

Hans Blom, 960823

(Martin Taylor 960822 17:00)

Your explanation why you say that the HPCT structure is a neural
network makes excellent sense.

Thank you.

The next step would be to introduce a
learning algorithm that makes a _controller_ out of the hierarchical
set of nodes.

That's called "reorganization", of which there are several forms (I have
previously identified 12 aspects of an HPCT network that might be affected
by reorganization).

Thus far, tuning (or at least the design of tuning
rules for each node, as in Bill Power's "artificial cerebellum") had
to be done by hand.

Where do you get that idea? Certainly the mechanism for "tuning" is
different from the mechanism for control, in that it comes from outside
the control loop being tuned. If that's what you mean by "by hand", then
you are correct. If you are complaining that each individual hierarchy
doesn't have to learn _ab initio_ how to learn, then I think the complaint
misguided. It took billions of years of evolution for living systems to
learn well how to learn, and its unreasonable to ask computer simulations
to repeat that process within the lifetime of one experimenter.

I think we do get far by simulating different kinds of reorganizing process,
the processes themselves being "designed by hand."

Different kinds of neural network learn in different ways. Some have
to have a trainer, some learn on their own.

To me, this is an artificial distinction. They ALL have a built-in
criterium function of what is "best".

That, to me, is an artificial combination of different things. Some networks
have a human that says "this is a good result, that is worse" and some
have to determine for themselves which result is better. The fact that
the net has a built-in place for the teacher to indicate that one result
is better is irrelevant to the question of whether the determination is
made within or outside the structure under consideration.

What all networks have
in common is that they have built-in knowledge of what is best and
learn autonomously.

It does make a difference whether the built-in knowledge is that they
look at a specific "teacher's input" or at the results of their own
machinations.

Some use the discrepancy between the outputs of some "output" nodes
and the output desired by a teacher. HPCT asserts that the criterion
variable is the success in controlling the intrinsic variables.

There are two approaches in neural nets. In the first, there is ONE
overall criterium of what is "best", a top level so to say. In that
case, you need to remove the plural.

I'm not clear where you would put the usual MLP back-propagation procedure
here. For a particular data pattern, the teacher says "I claim that is an
'A', and for an 'A' I want output node 23 to go high and all others to
go low; Now for the next data pattern I claim it is a 'K', and for K I
want outputs 2, 15, and 65 to go high and all others to go low." What is
the ONE criterion in this ongoing sequence of perhaps thousands of datasets?

If not, it won't be clear (in
case of conflicts, which will undoubtedly arise) which intrinsic
variable ought to be controlled better.

In the case of biological systems, it may well be that different intrinsic
variables have stronger effects on different parts of the hierarchy--either
vertically or horizontally distinguished, or both. And it may vary by time.

Conflicts, by the way, are entirely to be expected within HPCT. Reorganization
tends to reduce them on balance, because more conflict usually means less
control, and less control of perceptual variables often means less control
of intrinisic variables. Applying the reorganizing system to its own
operation is an interesting thought, which would tend to reduce the
conflicts there, if any, by enhancing the kind of partitioning mentioned
above.

The second approach makes the
criteria local.

But one could imagine each node providing its own criterion --
success in its own control; this cannot be done with most kinds of
neural network, which is why I think that an HPCT system could be
expected to learn much faster than most neural networks.

Hebb's rule does exactly this.

True enough, but it is inadequate. If it had been enough, neural networks
would have developed decades earlier than they did. It took the invention
of the back-propagation algorithm to bring neural networks into a reasonable
state. In a back propagation system, only the mismatches between the desired
and obtained outputs of nodes in the output layer serve as criteria. The
contributions to the mismatches of the nodes in the hidden layers are
computed from these. Since there are usually far more degrees of freedom
in the weights in the hidden layers than there are in the set of outputs,
the responsibilities of the nodes for the performance of the net are
grossly underspecified.

Hebb's rule is indeed local, but it doesn't relate to quality--at least not
as I understand it. It just says "if I did something for this pattern, I'll
do more if I see this pattern again." In contrast, a local reorganizing
rule for a node (ECU) in a control hierarchy says something along the lines
of "if my control is good, I'll stick with what I've got; if not, I'll
change something about my structure."

Hebb's rule is nice and simple and easy
to implement. It might be a good first step to try out as an HPCT
learning mechanism.

I may be wrong, but I think you are a decade or three late with this
suggestion. (Bill P?)

When an ECU reorganizes locally, it may do so by continuous weight
modification or by radical surgery such as flipping the sign of a link
or by shifting what other ECUs it links to. Bill P has tried continuous
reorganization of the perceptual functions of a set of ECUs that in effect
solved a set of multivariate equations, using the "e-coli" method. The
idea is that if a particular weight change helps control, keep moving
the weight pattern in the same direction in N-space until it stops helping,
then choose a new direction _at random_.

I argue that a local reorganization criterion would enable the perceptual
hierarchy to learn to control in a particular environment much faster
than would an MLP type of back-propagation gradient search, because the
degrees of freedom for the criterion are the same as the degrees of
freedom for the thing being varied. But this would not result in a
hierarchy that necessarily controlled the intrinsic variables well. For
that, the reorganizing system needs a low degrees-of-freedom external
criterion set. It says to the perceptual control system "You may be
controlling well whatever it is you are controlling, but you are doing
nothing for our mutual well-being; go control something else." When
the side-effects of perceptual control are effective for intrinsic
variable control, then reorganization stops.

This whole discussion is starting to get constructive! I like that!

Good. Let's keep it that way.

Martin