redundancy; bye for a while

[From Bill Powers (960319.1900 MST)]

Look for comet Hyakutake for the next week of evenings for the Northern
Hemisphere. It rises tonight low in the SE at around 10 PM local time
and is climbing, day by day, toward the Big Dipper (Ursa Major). Easy
naked-eye object when it's high enough, great in binoculars.

···

----------------------------------------------------------------------
Martin Taylor 960319 10:00 --

     Let's at least give it a bit more of a shot. I realize you are
     going "on tour" and won't be able to respond immendiately, but here
     goes, anyway.

OK, I'm all packed, so it's worth half an hour.

     This super-physiologist would very soon find that the outputs of
     the different rods and cones were _not_ independent. If he looked
     at two cones near each other--call them A and B, he would find that
     sometimes they gave wildly different outputs, but most of the time
     their outputs were quite similar. He would find this to be true for
     every pair of cones, the more so the closer the pair.

Why is this so? Because objects usually subtend enough of an angle that
most adjacent cones are similarly illuminated.

BUT: suppose you want to discriminate not only the kind of object, but
its position, with as much resolution as the eye permits. If the object
moves by one pixel, whatever the size of a pixel, the difference in
position must be detectable. This means that the position-locating
system needs to be able to discriminate an on-off transition accurate to
one pixel _anywhere on the retina_. Therefore the optical signal must
preserve the _independent_ brightnesses of all the pixels. The fact that
these brightnesses are redundant for some purposes does not mean they
are redundant for all purposes. Redundancy is not an objective fact that
inheres in the inputs.

     So our super physiologist would soon come to the conclusion that
     although the cone outputs were capable of taking on mutually
     independent values, the world is arranged so that they don't. That
     is another way of saying that the patterns of outputs from the
     retinal cones is "redundant."

To say "they don't" is to exaggerate, if I understand the nature of your
example. When the information from the retina is being used to
discriminate one kind of object from another, the retinal information is
highly redundant, even after the convergence to the optic nerve. But
when it's being used to discriminate edge positions, it is far less
redundant.

There are no objects at that level.

     Correct. Our super-physiologist doesn't say there are. But what
     he does say is that neighbour cones A, B, C, D,... X often have
     much the same value, and that when they don't, the differences are
     almost always between a small number of subgroups of differing
     near-uniform values. Since he observes this to be true, he
     recognizes that he can save on storage for his data by not
     recording the output of every cone individually. Instead, he can
     record the overall average output for all the cones and rods,
     describe the changing shapes of the regions of more or less uniform
     output level, and record the deviation from average of the level of
     each subregion. To get more detail, he might divide each subregion
     further and repeat the strategy, or he might note that there are
     gradations of output across a subregion and note the spatial
     derivative within the subregions, or....or...

This would be a very bad design, because our superphysiologist has
forgotten that it's not just "differences" between regions that count;
it's _where those differences occur_. He's forgotten to allow for cases
in which the full detail is needed. The designer may not have thought of
them, but it's not hard to show that they exist. If you use this
approach, you will have to forego the ability to perceive positions
accurately -- unless your successive subdivisions take you all the way
to the pixel level again.

     The optic nerve is serving the same purpose as the super-
     physiologist's data-collection machinery. And it does not transmit
     the independent output of each rod and cone.

And for that reason, the information it does transmit could not be used
to reconstruct the inputs at the level of 100 million rods and cones.

You're considering only fluctuations in time, and object-sized regions
in which the exact placement of boundaries is immaterial. This means
you're ignoring all the kinds of perceptions in which exact placement is
the whole point. And you're also ignoring the fact that recognition of
any pattern is, strictly speaking, an illusion: real objects are never
exactly the same. You can duplicate the _output_ of a pattern-recognizer
because the recognizer will ignore considerable differences in the input
data sets, even if those differences are neurally represented.

     He will be wrong in his reconstruction of the original sensor data
     if the world changes so that a "never happens" configuration
     actually occurs. And so will we, if the world gets into a kind of
     configuration our ancestors never experienced.

But this is my point. If the inputs were actually redundant, we would be
unable to detect a new pattern. Yet we detect new patterns all the time.
You can look at two toy balls and perceive them as the same
configuration. But all you have to do is look at a lower level, and you
will see that one ball has imperfections that the other doesn't have,
that the colors aren't exactly alike, that the outlines are not equally
circular, and that there are shadings on one that are different from the
other. All that information, fully discriminated, is there in the lower-
level signals. But when we're just interested in configurations, it's
ignored. We could reconstruct the exact sense of configuration if we
wanted to, but only if we filter the details through a configuration
recognizer. Even after this perfect reproduction had been achieved, we
could look at the sensations and intensities and see that we had not
actually duplicated the input set, not even nearly.

     It's the other way round--rather like the difference between PCT
     and S-R. Because the intensity patterns are redundant, therefore it
     is possible _usefully_ to perceive objects and configurations.

You're still saying that the _patterns_ are redundant, which I agree to
-- but the _intensities_ are not redundant. The intensities vary in many
ways which our pattern-recognizers ignore. The patterns are there only
because there's a perceptual function to extract them and represent them
as signals, while ignoring variations in the intensities that are not so
great as to cause a different pattern to be perceived.

     In the example I used, the perceptron was trained on black-and-
     white stripes of various widths and orientations. Its code at the
     waist was assumed to record the width and orientation of the
     stripes and nothing else (two units having continuous output values
     would do it). At the output, it will produce stripes and nothing
     else, no matter what the input. If the input continues to be
     stripes, the output will be exact, no matter how many input sensors
     there are.

You're thinking of mathematically perfect stripes. If you were using
real stripes with a real perceptron, you would find that the photocell
signals never reproduced exactly the same intensity signals, and that
the stripe pattern itself differed in dirtiness, fading, exact
placement, orientation, and so on. Yet the perceptron would still
recognize and reproduce "the same" stripes. The actual input signals are
not redundant because they are never reproduced exactly. Each signal
_does_ vary to some extent independently of all the others. But the
perceptron ignores those small variations, creating an illusion of
redundancy. All it has to do is produce a pattern of stripes that it can
recognize as the same. It does not have to recreate the actual
intensities.

Aren't you assuming that if the pattern-representing signal is
duplicated, the intensities must have been duplicated as well?
-----------------------------------------------------------------------
I am really enjoying the spontaneous interchanges among our educators!
See you all in 5 days.
-----------------------------------------------------------------------
Best to all,

Bill P.

[Martin Taylor 960321 14:00]

Bill Powers (960319.1900 MST)

Our power went off and we had no access to e-mail for a while. I realize
you are away and won't see this until it is buried in a swamp of other
messages, but I'm sending it now anyway.

You have said that you are not interested in finding out about the concept
of redundancy, which is fair enough. It was only at my request that you
considered one more posting. I tried to provide for you some simplified
analogies to make it easier for you to see the underlying concepts, clear
of the complexities faced by real perceiving systems. You chose to nitpick
on the details of the analogies. When I postulate that IF the inputs have
certain characteristics, THEN certain things are possible, then rather than
considering the implications, you choose to say that the inputs wouldn't
have those characteristics. That's no way to try to discover the concepts
the simplifications are trying to expose.

Be that as it may, I persist, in the hope that others may be interested,
even if you are not.

    This super-physiologist would very soon find that the outputs of
    the different rods and cones were _not_ independent. If he looked
    at two cones near each other--call them A and B, he would find that
    sometimes they gave wildly different outputs, but most of the time
    their outputs were quite similar. He would find this to be true for
    every pair of cones, the more so the closer the pair.

Why is this so? Because objects usually subtend enough of an angle that
most adjacent cones are similarly illuminated.

Sure, that's presumably why the sensor patterns are redundant. But neither
we perceivers nor the postulated super-physiologist KNOW that there are
objects out there, objects that constrain the independence of the sensor
outputs. Our postulated super-physiologist knows nothing about what is
being observed by the sensor array he is measuring. He knows only that
certain kinds of pattern occur more often than others, and that he can
take advantage of this fact to reduce his storage requirements very
drastically.

BUT: suppose you want to discriminate not only the kind of object, but
its position, with as much resolution as the eye permits.

That's what our super-physiologist wants to do, in effect. He wants to be
sure of being able to recover the value of every sensor output at all times,
to within the resolution of his measuring apparatus. If there are objects
out there, he can find them in his data just as readily as he could in the
original sensor array.

This means that the position-locating
system needs to be able to discriminate an on-off transition accurate to
one pixel _anywhere on the retina_. Therefore the optical signal must
preserve the _independent_ brightnesses of all the pixels.

Yep. No problem.

The fact that
these brightnesses are redundant for some purposes does not mean they
are redundant for all purposes. Redundancy is not an objective fact that
inheres in the inputs.

This is the point at which we begin to diverge. Redundancy does not inhere
in the inputs, but in the patterns that actually occur at the inputs. In
the situation we are discussing, the input sensors must be _able_ to respond
independently, and it is an empirical question as to whether what they
sense turns out to be independent across the different sensors. We have
a hidden postulate that what they sense is based on the existence of external
"objects". You chose to make that postulate overt. I don't mind if you
explain the cause of the redundancy. I'm only trying to explain its nature.

My postulated superphysiologist finds that neighbouring sensors give nearly
the same output most of the time, but not always. They _can_ respond
independently, but usually they don't. As you say, I'm thinking that the
sensor array is looking at a world that contains objects, though I didn't
say so, since the super-physiologist doesn't know it.

When the information from the retina is being used to
discriminate one kind of object from another, the retinal information is
highly redundant, even after the convergence to the optic nerve. But
when it's being used to discriminate edge positions, it is far less
redundant.

How can it be differently redundant depending on what you want to do with it?

Anyway, one presumes that the edge positions are part of the record that the
super-physiologist uses to allow himself to reconstruct the values of the
outputs of every sensor. At least I said so in my original presentation:

                                             Instead, he can
    record the overall average output for all the cones and rods,
    describe the changing shapes of the regions of more or less uniform
    output level, and record the deviation from average of the level of
    each subregion. To get more detail, he might divide each subregion
    further and repeat the strategy, or he might note that there are
    gradations of output across a subregion and note the spatial
    derivative within the subregions, or....or...

This would be a very bad design, because our superphysiologist has
forgotten that it's not just "differences" between regions that count;
it's _where those differences occur_. He's forgotten to allow for cases
in which the full detail is needed.

No he hasn't. He has made explicit provision for it. How can one describe
a region without noting where it is? I suppose it's possible to throw
away such information, but in a situation that is _explicitly_ designed
to allow complete recovery of every individual sensor output, why would
the superphysiologist be so perverse? And more to the point, why would
you have assumed him to be?

Let's be hypothetically numeric here, and postulate that the visual field
contains 1000 objects (and don't quibble that it might be 10,000--that kind
of difference doesn't matter). Each of these objects can be described by
its boundary, which might be representable within less than one pixel by
a combination of lines and quadratic curves, say 100 parameters (to be
extremely generous). Across each object, there may be a gradient of brightness,
say 6 parameters, being again generous. The sensor brightnesses are then
all describable within the resolution of the measuring instrument by
600,000 parameters instead of the original 100,000,000 (times, of course,
whatever is necessary to represent the value of a single brightness in each
case).

You're considering only fluctuations in time, and object-sized regions
in which the exact placement of boundaries is immaterial.

False.

And you're also ignoring the fact that recognition of
any pattern is, strictly speaking, an illusion: real objects are never
exactly the same.

We know nothing of "real objects". We know only of sensor outputs. We are
concerned only with patterns that occur in the sensor output array. Anything
to do with "real objects" is an inference (by the superphysiologist)
or a perception (in ordinary organisms). The superphysiologist is concerned
only that he loses no data regarding the value of the output of sensor
number 1,397,254 and all the other sensors. Our perceiving systems are
more tolerant, and in many cases work just as well with approximate data.

    He will be wrong in his reconstruction of the original sensor data
    if the world changes so that a "never happens" configuration
    actually occurs. And so will we, if the world gets into a kind of
    configuration our ancestors never experienced.

But this is my point. If the inputs were actually redundant, we would be
unable to detect a new pattern. Yet we detect new patterns all the time.

There's no absolutes about this. The input patterns are redundant if any
of them have a higher probability of occurrence than any of the other
possibilities. You are talking about how redundancy is _used_, not about
whether it exists, even though your language suggests you are talking
about whether it exists.

IF our super-physiologist uses ALL the redundancy in the input patterns
he observed when he was looking for ways to reduce his storage requirements,
he won't be able to see new patterns. He won't correctly reproduce the
data for every sensor, if the world has produced a pattern outside his
experience during the learning period. But it's highly unlikely that he
would be able to use all the redundancy, and if he does not, then certain
kinds of novel pattern will still be OK. What those kinds might be will
depend on how he has used the redundancy. For example, if he has chosen
to record the data in terms of regions and gradients, any pattern that
can be described in those terms will be recordable without data loss. The
superphysiologist will pay a penalty when he records the new pattern, in
that it might be quite complex to record it that way.

All that information, fully discriminated, is there in the lower-
level signals. But when we're just interested in configurations, it's
ignored.

Yes. You hit the nail on the head when you say "But when we're just
interested in X." As soon as you say that, you are saying "But when we
don't care if we discard information..." It is tautological to say that
we lose information when we discard information. And it has nothing to
do with a discussion of coding/recording/perceiving redundant patterns
without data loss.

You're still saying that the _patterns_ are redundant, which I agree to
-- but the _intensities_ are not redundant.

Only patterns can ever be redundant. A single number cannot ever be redundant.
The concept does not apply. It applies only to the relations between possible
and actual patterns. Those patterns might be across time or across space.
It doesn't matter. It could be the values of one sensor measured every
millisecond or every year, or it could be the values of a clump of seven
sensors at the same moment, or anything else. But it cannot be the value
of one sensor at one moment.

You shouldn't confuse the notion of "pattern" with the output of a
pattern recognizer. A pattern is whatever set of values happens to occur.
It's a set of numbers, not a single value out of a recognizer. The set
could be an input to a pattern recognizer, but never its output.

    In the example I used, the perceptron was trained on black-and-
    white stripes of various widths and orientations. Its code at the
    waist was assumed to record the width and orientation of the
    stripes and nothing else (two units having continuous output values
    would do it). At the output, it will produce stripes and nothing
    else, no matter what the input. If the input continues to be
    stripes, the output will be exact, no matter how many input sensors
    there are.

You're thinking of mathematically perfect stripes.

No. I'm _postulating_ mathematically perfect stripes so that you can see
how the notion of redundancy works in a super-simple situation. But rather
than saying "Oh, I see that, but how does it work when the situation is
less precisely defined?", you say:

If you were using
real stripes with a real perceptron, you would find that the photocell
signals never reproduced exactly the same intensity signals, and that
the stripe pattern itself differed in dirtiness, fading, exact
placement, orientation, and so on.

Of course. And for exact reproduction those aspects would have to be in
some way represented at the wasp-waist of the perceptron. Not a big deal,
especially if they could be represented in the terms you mention. (And I
did include orientation and exact placement, already).

The actual input signals are
not redundant because they are never reproduced exactly.

You see--that's the kind of statement you can make _only_ if you choose
not to consider the concept behind the examples. It's a statement that
strikes me as quite extraordinary, coming from someone like yourself.
I take it that you consider that written English is not redundant?

All it has to do is produce a pattern of stripes that it can
recognize as the same. It does not have to recreate the actual
intensities.

The statement was exactly that the reproduction is _of the values at the
sensor inputs_. There is no claim that the device reproduces stripes. The
claim is that _so long as the input is stripes_ the output will reproduce the
input _values_ precisely. When the input is not stripes, the output will be
something different from the input, most probably matching whatever pattern
of stripes would have been "closest" to the actual input.

···

---------------------
The point at issue is not really a point about exact reproduction. It's
a point about what perceptual functions make sense to include in a control
hierarchy that works in a real world. The control hierarchy works because
its actions have a somewhat consistent effect on its perceptions. If those
perceptions are values of functions that relate to frequent patterns that
the sensors provide, control will work better than if the perceptions are
values of functions relating to patterns that almost never occur. In the
latter case, the values of the perceptions will be near zero, almost
regardless of the control actions. The perceptions won't occur, and
reorganization will not allow those perceptions to be controlled. The
perceptual functions that persist are those for which actions can, and
probably will, make a difference. And that depends on the perceptual
functions taking advantage of the redundancies of their inputs at _all_
levels of the hierarchy.

Since the simplest possible form of the hierarchy involves the perceptual
functions being laid out _exactly_ as in an MLP, any higher perception
has (as Shannon said) a neural network learning array as a component.

The wasp-waist "replicator" network is an existence proof (in the sense
that such networks really exist and function as they should) that low
dimensional coding (i.e. the perception of objects, relations, configurations
and so forth) can work without losing _much_ information about the
high-dimensional world in which they function. It is the redundancy
afforded by such things as:

Because objects usually subtend enough of an angle that
most adjacent cones are similarly illuminated.

that makes this possible.

---------------------

Going way, way, back, we seem to have a conceptual impasse on the notion
that if something is not known _exactly_, it is not known at all. I hope
this isn't at the root of your unwillingness to consider seriously the
concept of redundancy. You are happy enough with control that reduces the
effects of disturbance by a factor of 100. Why not with prediction or
reproduction that similarly leaves 1% of the error that would otherwise
have been there?

Martin