[From Rick Marken (961126.1330)]
Bruce Abbott (961126.1215 EST)--
Yes, it [the reward model] does have a number of gaps -- I've only begun to
develop it. It requires that I suggest specific mechanisms, such as one to
activate the "replay" of the perceptual memory for recent acts -- which
would act as the reference variables for reproducing the remembered acts via
lower control systems.
I can hardly wait to see it up and "running"
Me:
Your model assumes that learning leads to the particular act or act stream
(perception of the lever in a particular state) that produces reward...
Under most circumstances, organisms can control a reward only by varying the
reference state to which the "act" perception is brought.
Bruce:
That is how the act is voluntarily produced.
But that doesn't explain how the animal learns to control. If you ever
actually develop a working version of this model, you will find that the
model, as you describe it, cannot possibly learn to control -- a big failing
in a model of learning;-)
I'd like you to try the following and give my your thoughts:
Demonstration 1
Move your arm so that your finger tip traces out a circle in the air.
Now that you've done it, can you remember what it looked like? Felt like?
Can you reproduce the act?
Yes.
If so, how?
Got me. I think it has something to do with my ability to select remembered
reference specifications for perceptions.
Demonstration 2
Watch someone else perform some simple act (e.g., picking up a coffee cup).
Now reproduce that act (i.e., do it yourself). Were you able to?
Yes.
How did you do that?
Again, it was probably by selection of remembered reference specifications
but, in this case, the selection was based on what I _imagined_ the
perceptual consequences of controlling these perceptions would be _for
another person_, one who would be in the same position watching me as I was
when watching the person I am now trying to imitate.
I think these demonstrations indirectly suggest just how impossible it would
be for a rat to know which perceptions (acts) to repeat after receiving a
reward. In these demonstrations I knew, in advance, which perception I was to
repeat. The rat has no idea when it is going to get a reward so it has no
idea what perception(s) to repeat when it finally actually does get one.
A reward occurs at some point during hundreds (thousands?) of continuous
parallel streams of controlled and uncontrolled perceptions: the organism is
continuously perceiving and controlling many visual, kinesthetic, visceral,
auditory, etc perceptions before and during the occurance of the reward. Your
model has the rat selecting, from all these perceptions (controlled and
uncontrolled) one or a "series" of several percpetions to repeat. The chance
that the organism will repeat just _the_ one perception (or series of
perceptions) that actually has anything to do with the occurance of the
reward is ridiculously small.
I think this model of yours is based (surprise) on an old S-R proclivity to
perceive the world only at the event level. There are stimulus events (lights
going on or off, levers appearing or disappearing), response events (presses,
licks, pecks) and reinforcing events (food pellets, shocks). It's a nice,
discrete world where there is a place for everything and everything is in its
place. It's a world that exists at only one level of the 9+ level PCT
hierarchy.
When you eventually decide to leave the dark side and go all the way with the
brilliance of PCT, you'll realize that the world we experience is a world of
many different _types_ of continuously changing perceptual _variables_. It's
really a very nice world -- a world without the ugly stain of behavior
modification;-)
Give my regards to Jabba the Hutt
Best
Han Solo