[From Bruce Abbott (961126.1215 EST)]
Rick Marken (961125.1600) --
Bruce Abbott (961125.1540 EST)
Actions are movements, acts are perceived consequences of movements. An act
might be a circle, drawn in the air by the tip of my finger as a result of a
certain sequence of actions (muscle contractions)...The act is a perception
arising from behavior.
A particular act may be intended or unintended.
This is an interesting new concept -- an unintented perceived consequence of
actions. Ok. So learning (according to your model) involves changing
perceived consequences of action from being unintended to being intended.
This means developing a reference for some state of these perceptions.
Hey, an "interesting new concept." Now we're getting somewhere! Yes,
exactly: this means developing a reference for some state of these perceptions.
When the rat first depressed the lever, that act was probably a side-effect
of activity related to controlling some other variable. When that act is
followed by the delivery of a food pellet, the rat tends to repeat the
series of acts that immediately preceded that delivery.
Interesting. So the rat has some memory of all the perceived consequences of
its actions -- intended and unintended -- that happened "prior" to the
reward.
That's right, Rick: I am assuming that the rat remembers what was doing when
the pellet appeared. You're doing great!
And then, for some reason, after the reward the rat "tends" to repeat
a "series of acts" that immediately preceded that delivery.
The number of perceived consequences of actions (acts) that precede (by how
much?) a reward is very large - - probably equivalent to the number of
afferent neurons in the NS. Which acts or series of acts does the rat _tend_
to repeat?
Before continuing, I want to make it clear that I include the sensory
feedback produced by those movements (including tactile and proprioceptive
ones) among the "perceived consequences" of actions. Which acts does the
rat tend to repeat? Excellent question. I would hypothesize that these are
the ones that the rat was doing just prior to the appearance of the pellet.
Why does the rat "tend" to repeat them?
I used the word "tend" because the rat is not always observed to repeat
these acts -- quite possibly because other control systems (e.g., those
producing exploratory behavior) sometimes get asserted at the time when this
repetition of the acts should be getting underway. For theoretical
purposes, we can assume that these acts _will_ be repeated, under the
assumption that there are no conflicts with other systems getting in the way.
How strong is the tendency
to repeat them. If a reward occurs _while_ the rat is repeating these acts,
does the rat stop repeating the acts and then restart again? If so, at what
point?
Great questions. I believe the proposal would say that if a reward occurs
_while_ the rat is repeating these acts, that they would be cut short; now
that the pellet is available, the portion of the sequence beginning with
approaching the pellet and ending with its consumption would commence. When
the pellet was gone, the rat would then repeat what it was doing previously
when the pellet appeared.
How does it know about the beginning and end of "acts" and "rewards".
The end of the acts is defined by pellet delivery. How the beginning would
be defined is a matter for research, but it is no doubt related to the
duration over which the memory of the acts prior to pellet delivery
persists. Appearance of the pellet is what the currently active control
system is "looking for." The appearance and disappearance of the pellet is
a simple matter of perception.
Your model has more loose ends than I originally suspected.
Yes, it does have a number of gaps -- I've only begun to develop it. It
requires that I suggest specific mechanisms, such as one to activate the
"replay" of the perceptual memory for recent acts -- which would act as the
reference variables for reproducing the remembered acts via lower control
systems.
What was a side-effect of activity directed toward other goals
What does this mean? The side-effect (non-goal directed) activity was
directed toward goals all along?
No. An example may clarify: Depressing the lever occurred not because the
rat wanted to press the lever, but because, e.g., it wanted to rear up and
sniff the upper part of the chamber. It placed its paw against the lever
and, pushing down in order to raise itself up, as a side-effect the rat
tripped the switch.
now becomes the goal itself -- the rat intends to repeat that perceived
pattern of activity, i.e., what it was _doing_ on the previous occasion just
before the pellet was delivered.
OK. So the rat ends up repeating (for some unspecified reason and by some
unspecified means) one perceived consequence of its actions -- say, the
perception of the lever in a particular state. But what if putting the
level in a particular state was not the repeated act? Suppose the
rat had repeated the tongue movement perception it produced just before
the reward. How does the rat select new perceptual consequences to repeat?
Good question. It could try something else at random, though I doubt this
would be the case. More likely it will return to exploration of the chamber
("looking for food"), which may eventually lead to another tripping of the
lever. This is probably the default mode what might be called the "food
acquisition" control system.
Your model assumes that learning leads to the particular act or act stream
(perception of the lever in a particular state) that produces reward. But it
is often the case that quite different acts -- ones that have never before
been followed by reward -- must be produced in order to produce the reward.
Yes, and my proposal allows those previously unrewarded acts to occur.
The "acts" you describe are actually perceptual _variables_.
Yes, but they are very special ones: they are under the animal's control.
Under most
circumstances, organisms can control a reward only by varying the reference
state to which the "act" perception is brought.
That is how the act is voluntarily produced.
This can be demonstrated in a
tracking task where subjects must (and DO) learn how to vary their reference
for the kinesthetic perception of handle position in order to get the reward
of having the cursor stay in the intended location.
Yes, it's called "trial and error" learning. But these are not random
changes; one tries slight variations, looking for improvement in terms of
minimal cursor movement, reduction in overshoot, etc. What goes on is, I
think, more sophisticated than e-coli type random reorganization.
This is basic hierarchical control theory. I suggest that you carefully
reread my "Spreadsheet hierarchy..." paper (and play with the spreadsheet
model a bit) to see how it works.
I know how it works, I've played with the model, and it's a nice
demonstration. But I doubt that it works efficiently enough in the enormous
parameter space in which living organisims exist. Something vastly more
efficient is required there than random reorganization of parameters, in my
opinion, and my little story, vague as it is in some important details, at
least provides a starting point on the way toward an alternative.
I'd like you to try the following and give my your thoughts:
Demonstration 1
Move your arm so that your finger tip traces out a circle in the air.
Now that you've done it, can you remember what it looked like? Felt like?
Can you reproduce the act? If so, how?
Demonstration 2
Watch someone else perform some simple act (e.g., picking up a coffee cup).
Now reproduce that act (i.e., do it yourself). Were you able to? How did
you do that?
Regards,
Bruce