[From Bill Powers (960920.0900 MDT)]
Rupert Young (960920 1100 BST)--
I am interested in how and why we attend, by eye movements (for
example) to elements of the environment ...
You didn't say "how we respond, by eye movements" so you already have 37
points in your favor.
I haven't seen any of the demos yet, are there any for UNIX?
Sorry, everything is for a PC. There was a period when people told me that
my demos would become wildly popular if I wrote them in C, so I did, and
nobody actually wanted to work in C. So I went back to Pascal, which is what
all of the present work is done in. Your best bet is to find a friend with a
PC and run the programs there.
My work involves trying to get a robot (robot arm with camera) to locate
foveate) targets by a feedback process whereby the current scene is
compared >with a memory model (goal) and the difference is minimised by
moving the >sensor to bring the target into a position (viewpoint) that
corresponds to >the viewpoint in which the model was learned.
You will be particularly interested in the two Little Man models, Arm
versions 1 and 2. These models simulate a little man with one arm having
three degrees of freedom (two at shoulder, one at elbow). The little man
reaches out to track a user-movable target with the tip of the "finger," in
three dimensions. Version two includes muscle properties, stretch and tendon
reflex control systems, physical dynamics of the arm, and a visual tracking
system using binocular vision.
The vision system includes everything up to presenting images on the two
retinas (with a sort of ray-tracing to represent target and finger
positions). But then I simply assert that there are perceptual signals
standing for the various positions, without working out HOW they are
generated, and the rest is done with those signals. Each eye "foveates" the
target; the two eyes are turned to fuse the target images. The result is a
doubled image of the fingertip, indicating the relative depth disparity of
target and fingertip. This provides the signal for controlling in radius.
It appears that you're working on the "how" part of this model. If your
project succeeds, you should be able to substitute your vision model for my
sketched-in one, with the result that we would have the Grand Master Model
of visual pointing control.
An object is represented by a hierarchy of features. Features at each
level >of the hierarchy represent an increasingly complex combination of
features >from the preceding levels. Associated with each level is a
difference >measure (error signal?) between model and data. (At the moment
I am >combining the measures, though am considering giving priority to
This is exactly the organization of the perceptual side of the HPCT model.
If you add control loops at each level, you will have a means for
_controlling_ features at one level by _varying_ features at a lower level.
This includes the relationships for passive recognition that you're working
on, but also provides for adjusting the lower-level world so it matches a
higher-level reference condition. So a singer can adjust the pitch of a note
to sing in harmony with a pitch someone else is singing.
Others will answer your question, too; I'll just tackle two of them very
b) PCT and HPCT seem similar to some of the recent approaches in AI and
A-Life (eg Brook's subsumption architecture), where there is an emphasis on
perception for action, what is different about PCT ?
In Brooks' architecture, as I understood it the last time I looked a couple
of years ago, there is no truly hierarchical structure. Each new control
system at a higher level seems to start from scratch, duplicating low-level
functions of other high-level systems. In HPCT, there is maximum econonomy
of systems, in that a single low-level system is used by ALL high-level
systems that entail its services. I don't know if this is still correct, or
what other differences there may be. Also, it doesn't seem that Brooks'
robots are clearly designed with control of perception in mind. But he's
some some admirably clever things.
c) If I'm correct (from what I've read) a high-level goal is achieved (with
HPCT) by behaving in such a way that the error signal between a mental
model >and current reality is reduced. Let's suppose that the goal is simply
to >look at an object in the room which has been seen before but is not
currently in the field of view. How are we able to represent the object
position in memory such that turning one's head in a particular direction
will reduce the error signal even though the object is not yet directly in
To paraphrase Mark Twain, I am happy to say that I can answer your question
immediately: I don't know.