More on templates

[From Bill Powers (940607.0930 MDT)]

I want to add something to the templates discussion. Suppose that at
some level, the perception of lateral positions in the field of view
gets reported as perceptual signals: x1 and y1 or x2 and y2. The
magnitudes of the signals correspond to the positions of points on
the retina, or in a neural map in the brain.

One way we describe relative position is to say "the hat is three
feet left of the orange and four feet above it." If x1,y1 is the
position of the hat and x2,y2 the position of the orange, then the
amount of "leftness" can be computed as x1 - x2, and the amount of
"aboveness" as y1 - y2. This would require two input functions that
carry out subtraction of the position signals to yield a leftness
signal and an aboveness signal. A third function might compute (x1 +
x2)/2 or (y1 + y2)/2, to yield the perception meant by "midway
between the hat and the orange" (in x or y). A more complex
computation of sqrt[(x1-x2)^2 + y1-y2)^2] would yield the perception
meant by "the distance between the hat and the orange."
Incidentally, the considerable increase in complexity of computation
required to see separations in terms of radius and angle is a good
reason for saying that perceptions most naturally are in terms of
linear dimensions.

An aside:

The fact that neural signals can't change sign has a curious
consequence in perceptions of such kinds. An elementary neural
subtractor is created by connecting one signal to an inhibitory
input of a neuron and the other to an excitatory input. This works
fine as long as the excitatory signal is larger than the inhibitory
one. So if x1 is the excitatory signal and x2 is the inhibitory one,
we can compute x1 - x2 (the output of the neural cell) quite easily.
That output signal then becomes the meaning of "leftness." But if x1
becomes greater than x2, that computation ceases to work: excitation
is less than inhibition, and no matter how much less it is, the
output is zero. There is zero leftness.

Therefore we need a second neuron with the excitatory and inhibitory
connections reversed in order to perceive the amount by which x1 is
less than y1: the amount of rightness. This means that there is a
physically separate signal representing all cases of x1 < y1. We end
up, consequently, speaking separately of leftness and rightness,
aboveness and belowness, and many other such perceptions. The amount
of each one is indicated by a signal that is larger than zero. This
is completely superfluous, because mathematically all we need is d =
x1 - x2, where d is a signal that is positive for leftward positions
and negative for rightward positions, one single signal. But neural
signals can't go negative. Therefore we have different names for
physically different perceptual signals representing positive and
negative values of what is really the same variable: push and pull.

End of aside.

These four signals -- leftness, aboveness, midway, and distance --
are now invariant with respect to translations or rotations as a
unit on the retina or in the map. If the leftness or aboveness
changes, the corresponding signals will change accordingly, as will
the other signals, but in a way that is independent of translations
and rotations (for the mathematically deprived, a "translation" is
simply a bodily movement in x or y). If you tilt your head (or move
your fixation point), the four perceptual signals remain unchanged.
The state of each spatial relationship is represented by one simple
signal that is unaffected by tilts. There is no "derotation", no
"compensation for the tilt." The signals are computed in such a way
as to be unaffected by the tilt (or translation).

A more complex computation can perceive rotation independently of
separation and location, using trigonometric inverses to produce an
angle-signal. The angle signal would be invariant with respect to
translations on the retina or changes in the distance between the
two objects.

This is how analog computation works. No templates are involved;
only computations based on magnitudes of signals that represent, but
do not look like, physical measurements.

The template idea arises, I think, from a confusion of levels of
perception. If you pick up an (empty) cup and watch it while you
tilt it, at the sensation level you experience a changing perceptual
field; the sensations change, some getting larger and some getting
smaller. So it seems natural to think of doing pattern recognition
and extraction of perceptions like angle and position by doing
point-by-point comparisons of brightnesses in the visual field. If
you have a store of templates in your head in the form of patterns
of brightnesses, you can simply compare the visual field with each
template in turn until you find an exact match. If you have a
template for "cup tilted 20 degrees to the left," that template then
identifies the angle of rotation of the cup, as well as identifying
the cup itself.

Of course this means that you have to have templates for different
sizes and shapes of cups as well as the same cup held at different
distances, and templates for cups of various colors and brightnesses
held up against backgrounds of assorted colors and brightnesses, and
for each situation, templates for cups held in different rotational
orientations in three dimensions and lit from different directions.
The number of templates involved, even allowing for sparse sampling
and interpolation, would be staggering. And that is just for cups,
not for the hands holding the cups or for other objects.

When you hold up a cup and rotate it, at the sensation level you
experience a very detailed and changing visual field. But in
_addition_ you experience other things: the speed of rotation is
directly sensed, even though it is not a sensation. The angle of
rotation is sensed; the size of the cup is sensed, the shape is
sensed; the curvature of the handle is sensed; the distance across
the mouth is sensed. None of those things is a sensation. In fact,
none of those things is visible to the eye in the same way that the
brightess, color, shadings, and edges of the cup are visible. You
can't "look at" the size of the cup -- that sort of perception isn't
to be found in the visual field in the same way that brightness is.
The curvature of the handle is not to be seen anywhere on the

These extra impressions are higher levels of perception, existing as
signals which are, directly, the impressions we get of the amount of
size, curvature, distance, angle, and movement. Those perceptions
have no brightness, color, shading, and so on, which is why we can't
really "see" them. Each of these higher-level signals simply
indicates an amount of something: an amount of bigness, an amount of
motion, an amount of angleness. You can have only more or less of
each one. As you turn the cup, the relative amounts of the different
signals change, so you get the characteristic sense of a real object
being turned to different angles. The motion signal is present only
while the angle, distance, or size of the cup is changing, and is
zero when all these things are constant.

But all those impressions are added to the experience of the
sensations of the cup, the distribution of brightnesses of different
kinds (colors) at points in the visual field. The levels of
perception are collapsed into a single experiential field, a fact
that makes subjective separation of the levels so difficult. And
this is what leads us to think of templates, because there is
nothing to see in the actual field of perception, at the level of
sensations, but the brightnesses at each point.

The Perceptron of Frank Rosenblatt was the first small step for
mankind away from the template idea. While the Perceptron worked
only with on-off signals, the signals it generated were not
generated by point-for-point matching with a reference pattern, but
by computing on the magnitudes of point-signals to produce single
signals that directly represented the presence of a given pattern or
attribute of the intensity distribution.

Then, after the assassination of Rosenblatt, came the multi-layer
Perceptron, which introduced intermediate levels of abstraction. The
intermediate levels of signals were not identified as perceptions in
their own right, but the output on-off signals could now stand,
individually, for even more complex patterns, such as patterns with
holes in them. And following that came the neural networks based on
-- almost -- analog computations, in which the signals could at
least potentially indicate not only what pattern was present, but
how much of it. The digital heritage of research in this field still
has a strong influence, but the adaptive neural net with back-
propagation (crude reorganization) is now beginning to approach the
sort of operation I have always imagined to be necessary to account
for the levels of non-template-driven perception.

The next step in neural net research, I think, will be to realize
that the intermediate levels in a multilevelled system contain
signals having a meaning of their own in relation to the outside
world, and that the complete model of a perceptual neural network
(to be human) has to have specific levels corresponding to the
levels at which we naturally see and operate on the world:
Intensities, sensations, configurations, ... or whatever they
finally turn out to be. Right now the levels are abitrary and are
naturally not recognized as having meaning. But that will change
when someone starts using types of computations at the intermediate
levels beyond simple weighted summation.

So self-reorganizing neural networks are the ultimate answer to the
template error. And, of course, the ultimate justification of


Best to all,

Bill P.