[Martin Taylor 960304 17:20]

I've been having a sporadic private conversation with Hans Blom, in the

course of which he pointed out that the generic Kalman Filter model with

N inputs would presumably start with full connectivity (N^2 connections)

even though after learning it would be quite likely that many of the

connections would be found to have weights very near zero. However, it

would be nearly impossible to specify in advance which connections should

be left out.

Ignoring the control aspects of the situation, I treated the problem as

being to determine from incoming data x1,...,xn and the value of a function

F(x1,...xn) what is the form of that function. The result seems to be

a description of the perceptual side of an HPCT hierarchy, along with an

explanation of why the hierarchy should not be expected to be composed

simply of linear perceptual functions. As follows (slightly edited):

## ยทยทยท

----------------

In many cases, a good first approximation to F(x1, ..., xn) is

F = f1(x1) + f2(x2) + ... fn(xn). To find an optimum solution of that kind

is easier than to deal with F itself. There are n connections rather than

(n^2)/2 + n, as there would be if quadratic interconnections were to be

permitted, or n + (n^2)/2 + ... (n^n)/n! --> exp(n) in a full representation

of F. Of course, the fk have an undefined number of degrees of freedom

themselves, but again, a linear first approximation is often reasonable.

In what follows, I'm treating the perception of the world, which includes

the output-to-input relations of control systems, but to do things properly

one ought to analyze the output function separately as well as together

with the input function. I'm ignoring that complication. What follows is

complex enough for one posting;-)

If F is treated as "model" of the universe in the input of a control

system, then the fk can be treated as the perceptual input functions of

a one-level array of scalar elementary control units. But if they are,

then the on-line perceptual control action of the ECUs compensates pretty

well for any nonlinearities that do not affect the monotonicity of the

function. fk(x) ~= c*x works almost as well as fk(x) itself. So would

fk(x) ~= c*log(x), which is physiologically often reasonable. So, for the

purposes of moderately good (but non-optimal) control, the learning process

should be able to ignore the form of fk, and should concentrate on the

errors inherent in the modularization of F into sum(fk), ignoring the

interactions among the fk.

There are two main sources of possible error. Firstly, each xk is a projection

from a space of unknown dimensionality and complexity onto a single axis,

and secondly there are unknown possibilities for interactions among the

dimensions. It may be, for example, that f10(x10) = 3*x10 when f35(x35) has

the value 21, but f10(x10) = -200*x10 when f35(x35) has the value 20 (Example,

putting something "on" a table when your hand is just above the table top

or when your hand is just below the table top. The behaviour of the world

when you let go of the object is quite different in the two cases.)

The learning solutions to the two sources of error are intermixed. If the

fk are linear, as, by assumption above is not a bad first step, then the

second approximation to F is to include terms in fi*fj. But the analytic

solution to this problem is to do a principal components rotation of the

space, eliminating the quadratic terms by redefining the xk within the

original space. In PCT terms, this means altering the perceptual input

functions of the single-level array of ECUs so that they tend toward

mutual orthogonality.

In fact, I argued as long ago as 1973 that developing data-based orthogonality

exactly what peripheral sensory systems do, and I think other people have

made similar points, though I don't know what the present thinking is among

sensory physiologists. Bill Powers, similarly, has argued that the

reorganization process will inevitably lead to an orthogonalization of the

perceptual variables within a control level, because that is the situation

in which there is minimum conflict among the ECUs within the level, and

therefore it is the situation in which reorganization is slowest.

The process of orthogonalizing the first-level perceptual variables will

not eliminate all quadratic contributions to F(x1,...,xk,...,xn), but it will

eliminate the ones related to the linear components of fk(xk). Those that

remain will have no greater a sum than in the initial arbitrary partitioning

of the universe into the xk, and probably will be a great deal smaller. But

they will exist. Let us call them "intrinsic" quadratic interactions that are

inherent in the environment.

The set of {xk} spans the perceptible universe. Rotating that set into a

principal components configuration does not increase or decrease the

dimensionality of the space being observed, but it reduces the interactions

among the terms, reducing the interference of one scalar control on another

if the fk are taken to be the perceptual functions of individual ECUs. The

"intrinsic" interactions represent interference between individual ECUs

in the one-level array. Since the linear compnents have been orthogonalized

in this one-level array, such individual interferences can only be handled

by controlling fk*fj directly -- or more properly, by controlling the two-way

interactions, quadratic or not, fkj=f(fk, fj).

In a monolithic model, the interactions can be introduced by modifying

F ~= f1(x1) + ... + fn(xn) to add terms in fij(fi(xi),fj(xj) for

those i, j, that have appreciable intrinsic interactions. Using scalar

control systems, the interaction term fij _could_ be introduced by

adding a new scalar control system in parallel with the initial array,

but such an added control system would conflict directly with the fi

and fj control units already existing. To avoid such a conflict, the new

control system would have to supplant fi and fj, and be vector rather than

scalar valued. It would have as inputs both xi and xj, with two outputs

and a control function that incorporated fi, fj, and fij in one monolithic

vector function. And even that approach would not work if fi had an intrinsic

interaction with fk as well as with fj. The fik control unit and the fij

control unit could not both comfortably supplant fi without conflict.

In a system of scalar control units, the intrinsic interactions are

more naturally handled, by controlling fij directly, treating fij as

the function of fi and fj that it is. In other words, the fij control

is a scalar control unit whose inputs are the results of functions fi

and fj, and whose output uses the outputs of the fi and fj control units

to affect xi and xj, rather than conflicting with the fi and fj controls.

Using this approach, it does not matter if fi has an additional intrinsic

interaction with fk. The fij and the fik control units act independently,

and neither "sees" or acts upon the outer world directly. Their "seeing"

and acting are through the existing maximally orthogonalized first-level

controls on xi, xj, and xk.

Used in conjunction with the first level array, this second-level array

provides a second approximation to the original and true (unknowable)

function F.

Of course, this second-level array of scalar control units, may itself

prove to be non-orthogonal, and therefore may experience conflict within

the layer. But if we treat each fij as a new function g(n) of a new set of

variables yn, where yn is the output of an arbitrary one of the original

first-level functions fk, we can see the second level by itself as

representing an approximation to a quite new function G(x1,..,xn), where

G represents the failure of the first approximation F ~= (f1,...,fn).

We have created a linear approximation G ~= (g1,...,gn), where the gk

represent the quadratic intrinsic interactions that could not be handled

by principal components rotation of the original xk space.

It is natural to continue this process. At level G, the functions gk can

be "naturally" rotated into a principal components configuration that

minimizes their conflict, but this rotation will probably leave still some

error in the approximation to the original F that is a true (unknowable)

representation of the environment spanned by the sensory variables xk.

It seems to me that this process, carried to n levels, would produce an

exact representation of the unknowable F, and would do it in a learnable

fashion. How long it would take, though, would be a strong function of

n, in fact an exponential function of n. The development of level k would

depend largely on the stability and accuracy of level k-1. There would be no

point in developing complex structures that control for interactions among

elements of a level that has not been at least approximately orthogonalized,

and the initial orthogonalization takes some time. The time to develop

each level will be longer than the time to develop the previous one, in

what is presumably a self-similar process level by level. I assume that

much of the work at the lower levels has been done by evolution, and is

pretty well fixed in an individual organism, but it doesn't matter in

principle whether this is so.

The difference between this approach and the Kalman Filter approach is

that the K-F starts with all exp(n) (or at least n^2) connections and then

learns which ones to ignore, whereas the development of a set of scalar

layers starts with n units and expands when there are interconnections

that should not be ignored. Increments on each layer can be done locally,

and conflicts between individual functions within a layer can be reduced

by orthogonalization, either analytically (if you are an omnipotent

designer) or through perceptual reorganization (if you are an ordinary

organism growing toward maturity).

Martin