Tracking at the configuration level

[Richard Kennaway (970716,1221 BST)]

According to the discussion in B:CT of configuration perceptions (and
perceptions at all other levels, for that matter), perceptions are
constructed from the bottom-level sense data by multi-level perceptron-like
networks, the perceptions of each level being combined in various ways to
produce the perceptions of the next level. They are controlled by the
output functions which propagate down to the reference signals of the level
below, and at the bottom to muscle actions.

I am wondering how, within this theory, one can account for the following
phenomenon. I can display on my computer a rendering of a rotating
semi-transparent cube. Due to a Necker-like effect, I can perceive the
cube as it "really" is, or inverted back-to-front and rotating in the
opposite direction (and distorted, since it is displayed in perspective).
I can switch from either perception to the other at will, but each
perception, once obtained, tends to persist.

My muscles appear to be doing little more than keeping the screen in front
of my eyes. When I make the switch of perception, I notice myself
attending (and moving my eyes) to a small part of the cube, and "trying" to
see it e.g. not as the front face rotating to the right, but as the back
face rotating to the left.

It is not clear to me how to account for these phenomena in terms of the
mechanisms described in B:CT.

-- Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
   School of Information Systems, Univ. of East Anglia, Norwich, U.K.

[Hans Blom, 970716b]

(Richard Kennaway (970716,1221 BST))

I can display on my computer a rendering of a rotating semi-
transparent cube. Due to a Necker-like effect, I can perceive the
cube as it "really" is, or inverted back-to-front and rotating in
the opposite direction (and distorted, since it is displayed in
perspective).

Actually, you present only a two-dimensional picture (projection) of
a three-dimensional object on the screen. So it is impossible to
perceive the cube as it "really" is: a full dimension is missing. The
best you can do is to _induce_ what it "really" is. Induction,
however, is a fallible process; it is "going beyond the data".

I can switch from either perception to the other at will, but each
perception, once obtained, tends to persist.

Our perceptions have the basic property that they try to make sense
of what we see -- in order to guide us in how we act. It is very
important for us to know what we're dealing with even though we may
only see part of it. To quote psychologist Vroon: "A vague shimmering
yellow blur may be perceived as a banana by a hungry traveler". Once
we have attributed some (most likely) meaning to the perception, that
meaning tends to stick, and we act on it. Interpretations have value
only if they are more or less definite, not if there is no
convergence in "meaning": we would not be able to act on conflicting
hunches.

That you can switch "at will" means, I think, that you already had
the a priori information that there is no "preferred" meaning. A
naive observer would readily converge to one interpretation, and may
even be quite surprised when she suddenly sees "something else".

When I make the switch of perception, I notice myself attending (and
moving my eyes) to a small part of the cube, and "trying" to see it
e.g. not as the front face rotating to the right, but as the back
face rotating to the left.

Attention is a term for the fact that some of an object's features
are more important than others in the process of recognition or
assigning of meaning. How attention is directed is quite mysterious.
Although we can consciously direct our attention (to a small degree),
most of this process seems to proceed at levels that are not
consciously accessible. Unless unanticipated and very energetic
perceptions occur: a loud noise or an intense light flash force us to
attend to them soonest.

It is not clear to me how to account for these phenomena in terms of
the mechanisms described in B:CT.

B:CT talks about a level where "categorization" takes place. This is,
I think, akin to recognition or attribution of meaning. I call this
process "model building" or "model maintenance": given a set of
perceptions, induce what they most likely tell us about (changes in)
the "world". With the implied assumption that the better the model,
the better the quality of control.

My two cents ;-).

Greetings,

Hans

[From Bill Powers (970716.-0613 MDT)]

Richard Kennaway (970716,1221 BST)--

According to the discussion in B:CT of configuration perceptions (and
perceptions at all other levels, for that matter), perceptions are
constructed from the bottom-level sense data by multi-level
perceptron-like networks, the perceptions of each level being combined >in

various ways to produce the perceptions of the next level. They >are
controlled by the output functions which propagate down to the >reference
signals of the level below, and at the bottom to muscle >actions.

That's the right general picture. However, the "perceptron-like" aspect is
only a potential explanation (due mainly to Martin Taylor, although it's
mentioned in B:CP) for how the required perceptual functions might come
into being. I am far less concerned with the genesis of the perceptual
functions than with what their nature might be. The basic problem is to
account for the world that we experience, and for the relationships among
its parts that are not due to properties of the environment.

For example, why is it that a perception of a configuration seems to depend
on the presence of perceptions that are not themselves configurations? I
refer to such things as colors, edges, textures, and gradients, which are
themselves clearly perceptions of a different kind. In a similar way, how
is it that a sense of motion can derive from a series of configuration
perceptions? How do we get a sense of one unitary event from a pattern of
movements and configurations and sensations and intensities -- an event
like a spoken word or the "bounce" of a ball?

For me, the primary observation is that experience seems to be made of
different types of perceptions, some of which are functions of others. The
next question would be what the nature of the functions is for each
transition from one kind of perception to the next. The next level of
question is the extent to which these functions are hard-wired, versus
modifiable through learning (or, a subject to which I have devoted almost
no attention, modifiable through the actions of higher systems). And far
down the line is the question of how, if these perceptual functions are
learned, the learning process works. The perceptron proposal is only one
possibility, and any such proposal must meet the constraint that it produce
a world of perceptual experience structured like the one we do actually
experience.

I am wondering how, within this theory, one can account for the >following

phenomenon. I can display on my computer a rendering of a >rotating
semi-transparent cube. Due to a Necker-like effect, I can >perceive the
cube as it "really" is, or inverted back-to-front and >rotating in the
opposite direction (and distorted, since it is >displayed in perspective).
I can switch from either perception to the >other at will, but each
perception, once obtained, tends to persist.

There is a small class of phenomena like this, in which the visual
presentation is ambiguous. Figure-ground reversals are similar, in that
either of two perceptual interpretations is possible, yet they seem to be
mutually exclusive. These effects seem to occur when normal depth
information is missing -- when there's a flat display of what would
normally be a three-dimensional scene, so binocular vision is no help.

One way to approach this problem would be to ask how the perceptual systems
might work when 3-D information _is_ available. If you see a pair of faces
looking at each other in three dimensions, you would be very unlikely to
notice that the space between them has the shape of a vase, because the
background between them is clearly in a different plane; the faces are
distinguished from the background -- literally, the "back" ground -- by
depth information.

The Pathfinder web page has published some 3-D pictures of Martian scenes
that can be viewed through red-blue spectacles. I happened to have a pair
obtained from Sky and Telescope some time ago, and they work very well.
What's interesting is that the depth information thus obtained is really
quite coarse and fuzzed-out, yet it adds just the depth signal that is
needed to make the objects appear to be at different distances. With that
hint, I noticed that when I take my glasses off, I still get good depth
information even though my left eye is far more myopic than my right eye,
so there's no possibility of getting sharp registration of the two images
for objects at any distance. Depth signals are clearly generated separately
from lateral position signals.

When looking at a flat scene with both eyes, the binocular depth signal is
missing -- or, considering the effects you're talking about, if it is
present it has to be imagined (B:CP, p. 222). The ambiguous scenes allow of
imagining two different depth signals, which create opposite impressions.
Since a scalar depth signal can have only one value at a time, the
alternative interpretations are mutually exclusive; you can imagine one
element of the rotating cube as near or as far, but not as both near and
far. The depth signal associated with each area of a scene can have only
one magnitude at a time: it's a scalar. With normal binocular vision, each
element would provide perceptions of lateral position and depth which fix
the element unambiguously in space, but when you're imagining the missing
information, you have a choice.

The Great Randy, a magician I knew for some years and who was enthusiastic
about PCT, showed me a series of optical effects he had constructed, much
like the "magic" pictures with repeating patterns that turn into 3-D scenes
when viewed with the eyes converged or diverged by the right amount.
Randy's scenes were geometrical patterns, including one that the was the
mother of all staircase/cube illusions. When viewed with the eyes diverged,
the entire frame was filled with ambiguous connected rectangles sharing
edges and vertices, perhaps 50 of them, all forming inside or outside
corners and planes. The most interesting aspect of it was that as one
scanned over it, various parts of it would pop inward or outward as the
local "depth hypothesis" spontaneously changed. Somehow Randy had managed
to build ambiguity into this figure even though there was binocular depth
information available. I never did figure it out, and Randy couldn't
explain it.

Another example of Randy's extreme ingenuity was a picture constructed like
a repeating pattern of that "impossible figure" with three tines, in which
the solid bars making it up were connected at the wrong corners so when you
followed along any one bar, it would suddenly have to be interpreted as
pointing in a different direction (like the Escher drawings). In Randy's
drawing there were dozens of these things, all interconnected; it was
exhausting to look at for very long. Some kind of inner effort is obviously
needed to supply the missing information, or correct conflicting
information, and one can feel the expenditure of energy needed to exert it.

None of this, of course, tells us what the perceptual functions are that
produce these signals, but one can sense some strong hints. Where there is
missing information, we obviously supply it from inside, and the "effort of
will" obviously does something to alter this information. We decide what we
want to see, and via imagination the missing information is altered until
we see what we want to see.

It is not clear to me how to account for these phenomena in terms of >the

mechanisms described in B:CT.

I've tried to indicate how imagination might fit into the picture, but all
this is still very speculative. If we knew how objects were perceived, so
they retain their identity as they move among, in front of, and behind
other objects, even while changing in size, orientation, and illumination,
we would be much further along.

It's interesting to set up a control-system experiment using one of these
ambiguous figures and a mouse to affect it. A simple one that I have tried
is a circle rotating around one diameter. If you see the circle as rotating
one way, you can control its angle, but if your interpretation flips, the
feedback becomes positive and the control system runs away for a moment
before you can switch back or reverse your output function. I think that
embedding these perceptual effects into control experiments is a good way
to investigate them; for one thing, you can easily detect a change in the
perception inside another person! The action gives it away.

Best,

Bill P.

[Hans Blom, 970716e]

(Bill Powers (970716.-0613 MDT))

There is a small class of phenomena like this, in which the visual
presentation is ambiguous.

Ambiguous? Or incomplete? If the latter, it may be a _large_ class.
Given the right context (set of expectations), it may include _all_
visual presentations. Anyway, a nice one can be found at

  http://web.wwnorton.com/norton/figures/fig0102.gif

or, if you're lazy, simply do an Alta Vista search for fig0102.gif.

Greetings,

Hans

[From Bill Powers (970716.1210 MDT)]

Hans Blom, 970716e--

There is a small class of phenomena like this, in which the visual
presentation is ambiguous.

Ambiguous? Or incomplete? If the latter, it may be a _large_ class.
Given the right context (set of expectations), it may include _all_
visual presentations.

Ambiguous, because incomplete in that one dimension is missing altogether.

Anyway, a nice one can be found at

http://web.wwnorton.com/norton/figures/fig0102.gif

I'll look it up.

Best,

Bill P.

Richard Kennaway (970717.1302 BST)

[From Bill Powers (970716.-0613 MDT)]
How do we get a sense of one unitary event from a pattern of
movements and configurations and sensations and intensities -- an event
like a spoken word or the "bounce" of a ball?

That's what I was thinking about in my first message in this thread. It
looks as if a perception such as that of a cube rotating either way does
not "passively" arise from a linear chain of computation, but is actively
sought out. Another example: a piece of text may be too small to read if
you know nothing about it, but if you already have some knowledge of the
sort of thing it's likely to be saying, the words become decipherable.

"Actively sought out" means control systems. Could there be control
systems which control for obtaining a perception at a given level, and
maintaining such a perception, once obtained? Well, without a concrete
proposal, I'm just talking speculative fluff. I'll shut up until I have
one.

-- Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
   School of Information Systems, Univ. of East Anglia, Norwich, U.K.

[From Bruce Gregory (970717.1015 EDT)]

Richard Kennaway (970717.1302 BST)

"Actively sought out" means control systems. Could there be control
systems which control for obtaining a perception at a given level, and
maintaining such a perception, once obtained?

The general answer to your question is obviously yes. Most
control systems work in eactly this way. You doubtless mean
something more.

Bruce

Richard Kennaway (970717.1714 BST)

[From Bruce Gregory (970717.1015 EDT)]

Richard Kennaway (970717.1302 BST)

"Actively sought out" means control systems. Could there be control
systems which control for obtaining a perception at a given level, and
maintaining such a perception, once obtained?

The general answer to your question is obviously yes. Most
control systems work in eactly this way. You doubtless mean
something more.

Yes, sorry about the brain fade there. What I had in mind was control
systems where the loop from output back to input lies entirely within the
nervous system, instead of propagating down to the bottom level and out to
the muscles, because switching from one view of an ambiguous figure to
another doesn't involve (much) muscular action.

I say "much", because when I do this there always seems to be at least some
small twitch of the eyes. Has anyone studied the perception of ambiguous
figures when the image is stabilised on the retina?

It's just occurred to me that the imagination connection is one way of
completing the loop, but I haven't thought how that might relate to this.

-- Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
   School of Information Systems, Univ. of East Anglia, Norwich, U.K.

[Martin Taylor 970717 12:15]

[From Bill Powers (970716.1210 MDT)]

Hans Blom, 970716e--

There is a small class of phenomena like this, in which the visual
presentation is ambiguous.

Ambiguous? Or incomplete? If the latter, it may be a _large_ class.
Given the right context (set of expectations), it may include _all_
visual presentations.

Ambiguous, because incomplete in that one dimension is missing altogether.

Not so. The illusion Richard kennaway originally introduced can be seen
quite strongly in a real 3-D scene--a rotary water sprinkler is a good
example. In the 1960's some colleagues and I did some studies on this
kind of effect in various perceptual dimensions (visual, auditory,
moving, static, symbolic, analogue...) and found that it is _very_
difficult to set up a situation which would not flip into different
perceptions if it was observed steadily for long enough. Most flip
into a wide variety of things.

One of our examples was a disk. Around the rim of this disk, several
long screws were placed. The disk was rotated, and viewed edge on from
reasonably close, using normal viewing but with a fixed head position.
At a particular moment it might look like this:

     > > > > >> >
    -|---|-|--|--||--|-

As it rotated, the viewer most probably saw it initially as rotating
as a whole, one way or the other (not necessarily the way it was physically
rotating), but after a while, s/he might see the end pair as bouncing off
a rotating central group, or as any of a large number of other motion
configurations. The switches were always quite abrupt--no double perceptions
or transitions between perceptual modes.

We were able to find one configuration in which subjects seemed to see
only two different things. We took a piece of Plasticene, and dented it
by hitting it repeatedly all over with a ping-pong ball. This 3-D
surface was viewed directly, binocularly, and was seen by everybody
as flipping between a dented surface and a bubbly (frothy?) surface.
No ambiguity, no transition time, just precise, quick, flips. We were
able to show from the timings of the flips that the phenomenon has
nothing to do with habituation, by the way.

Bottom line. It's _hard_, very hard, to find a visual (and probably an
auditory) sensory pattern that will not change abruptly from one perception
to another if it is observed statically for long enough.

Martin