redundancy; predictability

_Martin_Taylor · March 29, 1996, 7:21pm

[Martin Taylor 960329 12:00]

Bill Powers (950327.0630 MST)

These statements are somewhat artificial, in that the words are chosen
to have theoretical significance rather than being actual descriptions
of the process and experience. When I drive on icy roads, it's not the
unpredictability or lack of control that I imagine, it's trying to go
around a curve at my present speed and sliding -- quite predictably --
off the road.

You have a habit that is a useful debating trick, but unhelpful in getting
to an understanding of situations: you change the presuppositions and then
announce that the conclusions are invalid for the newly presupposed
situation. It happens a lot, and makes things difficult when one is
attempting to work through what may be a language misunderstanding or may
be a real difference of interpretation in which one or the other party could
possibly be wrong.

Yes, if you see ice on a road and are going fast enough that you would slide
off at a curve, your imagined perception of the future has little uncertainty.
Unfortunately, that wasn't the situation proposed. What was proposed is that
you are driving along and see ahead of you a snow-blown section of road.
For the benefit of those not accustomed to winter driving, what this usually
means is that there is a possibility of underlying ice, on which there is
very little traction. There is also the possibility of underlying bare
road surface, on which the traction is almost normal, and there are likely
to be patches that are really bare, often along the track of the wheels of
most cars.

When you encounter such a stretch ahead of you, you simply don't
know what will happen when you turn the steering wheel. The car may react
normally, or it may not. You may be disturbed by icy bumps, or you may not.
It is prudent to slow down, not because you "know" you will slide off the
road if you don't (normally you won't if the road is straight and the
camber not too severe, even if there is a patch of glare ice), but because
you don't know whether you will be able to compensate for the effects of
an ice-bump or a cross-wind, or even a change in your own reference
perception for where you want to be within your lane. At a slower speed,
control is possible despite the foreseen unpredictability of the environmental
feedback path (steering wheel to perceived track deviation), whereas at a
higher speed the changes in the environmental feedback path make control
much more difficult--you start to slide, the steering making no difference,
and then you hit a patch of good traction when your front wheels are off-
centre and you turn abruptly, then you slide some more, sideways...You
control for a perception of future controllability. (Of course, as with any
controlled perception, you may not be able to bring this one to its
reference value, but you try to).

I'm simply controlling a perceived outome as usual, real or imagined.

With that, I agree. But what is it the outcome of? In the originally
postulated situation, the imagined perceived outcome is that I will be
able to control the car's direction, and at a lower level the imagined
perceived outcome is that the car stays in its lane where I want it.

Actually, what I really imagine seems even less like planning: I imagine
something like a feeling of the tires "gripping" the road, with ice
considerably reducing my sense of the connection between tires and road.
This is much like Hans Blom's model-based control, but without the
calculations of variance that he uses.

That "without the calculations of variance" is a quite important difference,
isn't it? Whether you are talking about his analytic theoretical structure,
or the real-world situation originally proposed.

I might classify sliding off the road as an example of
"unpredictability" or "uncertainty," except that what really bothers me
about going too fast is the clearly predicted outcome. Also, this kind
of supposed unpredictability is not at all like the kind that arises
from a loose steering linkage, which really would make me uncertain
about how the car will behave when I turn the steering wheel.

Well, perhaps you might present an analysis of the control you might do
if you foresaw a near-future loosening of the steering linkage, about
which you might (or might not) be able to do something. What would you
do about it, and why?

For my part, I think that getting onto this kind of snow-blown road (what
our road reports call "centre-bare to snow-covered") is quite like driving
with an erratically loose steering linkage about which I can do nothing.

Are we talking about a conclusion drawn by an observer of a behaving
system, or about an actual process carried out in a behaving system?

An actual process--hypothetical, of course, since we can't state as a fact
that perceptual control is a hierarchic process.

···

------------------

    By "pattern" I mean something like { 1, 5, 23, 6, 15, 2, -17 }, a
    set of numbers that represent the values of a variable at many
    points. A pattern doesn't need a pattern recognizer.

An interesting statement. What I see in the brackets are seven numbers.

Yes, that's the pattern.

How do you know there's a pattern there without perceiving it?

I perceive it. It is { 1, 5, 23, 6, 15, 2, -17 }.

I have
written down another seven numbers: can you tell me what the pattern in
them is? Or would you need to perceive them before telling me?

Of course I don't have a telepathic perceptual input sedium, so you would
have to tell me. here are some more patterns, all different:

1, 4, 15, 6, -2
156, 198, 23, -143
4, 8, 5, 15, 20, 97
143, -176, 34, -29 , 2
....

These are patterns that might be the values of five input sensors. They are
a small subset of the possible patterns within the range of these sensor,
which apparently can provide outputs at least over a range of about +-200.
If that is actually the available range, there are 400^5 possible patterns
of values that could be observed from these 5 sensors.

    The statement was exactly that the reproduction is _of the values
    at the sensor inputs_.

     ...
    It seems to be very hard for you to believe that I mean this. I
    don't know why that should be. I do mean it.

I do find it hard to believe you can mean this, for a real system. If
you're just talking about a mathematical system I can believe it,

Well, that's an advance. We can progress from there. I used a mathematically
exact system as a simplified illustration, as I believe I said on each
occasion I introduced it.

In the mathematically exact system, every input value can be reconstructed
_exactly_ from the coded values of a far smaller number of items than
there are of inputs.

Now let's think more about "real" (actually, unreal, but somewhat relaxed
from the previous example). Let's take our five sensors whose possible
patterns I illustrated above, and list a few patterns that they might output
in a "real" hypothetical example:

3.1, 3.1, 1.6, 1.6, 1.6
-45.21, -45.21, -45.21, 27.07, 27.07
178, -97, -97, -97, -97
-143, -143, -143, -143, 56
35.3, 35.3, 35.3, 35.3, 35.3
...

The values are supposed to be illustrated as being on a "real" continuum,
even though they have to be represented in discrete numerals.

Do you notice anything particular about these patterns? They are constructed
so that each of them has two sets that are contiguous, and within each set
all the values are the same. The patterns seem to be "structured". If that
result, with the same structure, were to hold over many observations,
our "super-physiologist" might say to himself that it is silly to record
all the values. Instead, he might record the location of the break, and
the values on the left and the right of the break, as follows:

2, 3.1 1.6,
3, -45.21, 27.07
1, 178, -97
4, -143, 56
0, 35.3, 35.3
...

Now he uses only three numbers, and one of those is always a single digit,
whereas the others are as precise as the original measurements. (There's
always a limit to the precision of a measurement).

How might such artificial patterns, or something like them, arise in "realer"
life? Imagine that these are five sensor outputs, measuring the brightness of
a strip of a scene. In the scene are patches of constant brightness, and
sometimes the boundary between two patches crosses the strip (only once
in the examples does it fail to do so). Our super-physiologist knows nothing
of such patches, and I mention it only to suggest an ordinary phenomenon
that might lead to this kind of number relationship.

In such a relaxed (but still "mathematical") hypothetical environment, not
all of the 40000^5 possible patterns will occur (assuming the measurement
resolution is 2 decimals and the range is +-200 as before). Of course, any
one of them _could_ occur, based on the structure of the sensor system,
which is known to the super-physiologist who built the sensors and knows
there is no necessary relation among their values. But as he continues to
observe more and more patterns as they occur, he decides that there are
only 5 * 40000^2 patterns that he is ever likely to see, rather than
40000^5. That's a drastic reduction.

If he is sure enough that the other patterns will never happen, he may
resort to the three number code, as suggested above, saving a lot of storage
space (almost 60%). But one of the other "irregular" patterns might happen,
and he doesn't want to miss it, so he might use for it a separate code,
say a "-1" followed by the actual sensor values. If the irregular value
never happens, he hasn't lost much by providing for it, and if it happens
but seldom, he's still made a big gain in storage space. If all the values
turn out to be irregular, he's lost space by using 6 numbers rather than
5 in his storage. But that's what you pay for attempts at efficiency.

However it turns out that later sensor patterns occur, our super-physiologist
can always find the value of any sensor on any observation, and if it
turns out that the later value patterns are structured like the ones he
earlier observed, he has saved 60% of his storage space. In real life,
the savings will be many orders of magnitude greater, because the number
of sensors will be far more than 5 and patches of more-or-less constant
brightness are bigger.

Those words "more or less" are critical when we talk about "exact"
representation. No sensor is exact in giving the same result when the
same real-world condition occurs--at least we would never know if one were.

There is a limit to the repeatability and the resolution of any sensor,
so we cannot interact through perceptual control with an _exact_ real
world. The real world is what it is, but what we know of it is limited
by the repeatability and resolution of our sensor systems. This leads
to some interesting issues, analytically, physiologically, and practically.

The set of numbers above, for a real system, would never repeat: on a
second occurrance of "the same" pattern we might have {1.02, 4.87, 22,
6.15, 15.5, 2.1, and -12} -- and the pattern recognizer would report
that the same pattern as before was present.

(The "pattern as before" was { 1, 5, 23, 6, 15, 2, -17 }). Whether a
"pattern recognizer" would report the same pattern as before depends on
the recognizer. A system dedicated to the exact reproduction of the data
values would not. Let's consider an example pattern recognizer, and for
lack of a better, let's consider the recognizer {1, 1, 1, 1, 1, 1, 1}.
(A "pattern recognizer" is a device that looks for the magnitude of its
unique pattern in the input data pattern). The output of this pattern
recognizer if the original pattern was input would be +40, and for the
second occurrence it would be +37.6. So it wouldn't give the same result
for the two inputs.

Now we consider many "pattern recognizers" (seven would be enough for this
particular set of input sensors). We could have the recognizer (1, 1, 1,
0, -1, -1, -1), and the recognizer (1, -1, 1, -1, 1, -1, 1) ... for
example. Looking at the outputs of all of them would be enough to permit
reconstruction of the input data set to within the precision of the sensor
system (on the assumption that the outputs of the pattern recognizers are
available to the same resolution). In other words, the reconstruction would
be indistinguishable from the original.

Next, we look at the outputs of these "pattern recognizers" and notice that
some of them never change their outputs enough to be resolvably different,
simply because the sets of input patterns that actually happen are all
indistinguishable to those particular recognizers. Why bother with having
those recognizers? Let's get rid of them, because we can reconstitute the
original data just as precisely without them as with them. We can do so if
the input data patterns are redundant and if the set of pattern recognizers
is chosen so as to take advantage of the redundancy.

The objective of the wasp-waisted perceptron is to find a set of recognizers
that allows the idealization of the previous paragraph to be as closely
approximated as possible. There is a criterion for "closeness," perhaps
a mean-square error, perhaps something else. Learning brings the representation
to a state where the values at the waist give as good an approximate
reproduction as possible, within that criterion. If there are enough values
at the waist, the reproduction will be indistinguishable from the original.
And, given the structure of the everyday world, "enough values" is in
practice a lot fewer than the number of input values.

There's another problem. Suppose your system uses monocular vision.
Clearly, there can be many states of the three-dimensional world that
will lead to the same two-dimensional view.

So what? All that the monocular view can use is the sensors in one eye.
Redundancies in the values there can be used to encode that view, and one
cannot ask for more. The one eye (at a given moment) hs nothing to do
with 3-D views.

However, when you add the view from the second eye, what it sees is highly
similar to what the first eye sees. The joint view is much more redundant
than either monocular view. There are twice as many sensors, but only a
little that distinguishes the pattern in one eye from that in the other.
If there is a light-dark boundary _here_ in one eye, there will almost
certainly be a light-dark boundary near _here_ in the other. And that
eye-to-eye redundancy is what allows us to extract a 3-D view from two
2-D views (without the redundancy, we would only have two 2-D views, not
one 3-D view).

No matter how redundant the
inputs may seem to the two-dimensional input function, there is no way
to reproduce the inputs to this perceptual function exactly (or even
approximately) on the basis of the two-dimensional representation.

I take it that "this" perceptual function is the 3-D one into which the
two 2-D views feed. If so, it's quite irrelevant to how the 2-D views
might be represented individually. The redundancy of the 2-D view
has to do with the sensors that contribute to it. It knows nothing
of the 3-D world. Certainly its representation is not sufficient in itself
to reconstruct the 3-D view, and I can't imagine why you introduced it
into the discussion.

If
the stripes at the input are actually at different distances from the
input function, the reproduction of the _appearance_ of the stripes can
never recapture the distances of the stripes, no matter how compact the
representation.

So? Nobody is talking about the "real world" in which this distance may
mean something. When we talk of the 2-D view, we are representing the 2-D
view, no more, no less.

When you claim that the wasp-waisted perceptron can be used as the basis
for recreating the _inputs_ to the perceptron, aren't you tacitly
assuming that the dimensionality of the environment is less than or
equal to that of the set of input sensors?

It's the other way around. If, as happens to be true in a lot of real cases,
the wasp-waisted perceptron provides at its outputs a high-fidelity
reproduction of its inputs, then it is true that the dimensionality of
the environment is less than that of its sensors. Building a wwp shows
faith that this will turn out to be the case. Success is a justification
for the faith.

Another way of saying "redundant" is to say that the true dimensionality
of the data is less than the dimensionality of its current description.
In the visual system, the dimensionality of the sensor array is about 10^8,
whereas in the optic nerve it is around 10^6. Since we survive pretty well
on that, it is probably fair to say that the dimensionality of the visual
world is, for practical purposes, less than 10^6.

------------------

There must be some conditions that you're assuming which aren't being
mentioned.

I think you are looking for complexity in a simple situation. Seeking
complexity where it doesn't exist is often at the root of misunderstandings.
I remember how confusing it was for me when I first encountered PCT, and
could not bring myself to believe that the "control systems" you were
talking of were ordinary, simple-minded, engineering control systems.
It is as difficult to clear the mind of complexity as it is to build the
complexity in the first place.

Try thinking first of the accuracy that a good control system attains in
the face of random disturbances--say 1%. Then ask yourself whether an
internal representation that allowed for reconstruction of the input data
to within 0.01% would make a detectable difference in the quality of control.
Then ask how important is the difference between "exact" reproduction and
"indistinguishably different" reproduction.

Martin