# A Problem for Reinforcement Theory?

[From Rick Marken (950621.1410)]

The following is data from the R. coli experiment. It shows a running average
of cursor position over time.
x = random
o = reinforcememt
z = subject
>
start |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> o
> ozzzzzzzzzzzzzzzzzzzzzz
target> o ooooooooooooooooooooooooooooooooooooooooooooo
y | z

···

z

> zzzzzzzzzzzzzzzzzzzzzz
>
>
--------------------------------------------------------
Time

The x's are the average cursor positions that result from random responding
(probability of response is .5 at each time instant); the cursor stays near
the starting position (start), which is about 75 pixels above the target
position (target).

The o's are the average cursor positions that result from running a
reinforcement model; this model is a pure learning model (no extinction) so
P(R|S+) is 0 and P(R|S-) quickly goes to 1.0. The model responds when and
only when the cursor moves away from the target. The model quickly moves the
cursor to the target position and keeps it there.

The z's are the average cursor positions that result from running a human
subject. Like the reinforcement model, the subject quickly moves the cursor
to the target (the z and o lines actually overlap though the first half of
the graph). But during the middle of the run the cursor quickly moves to a
new position and stays there. This behavior of the subject is in marked
contrast to that of the reinforcement model.

Here's a question for EAB experts: How do I change the reinforcement model to
make it behave like the subject?

The reinforcement model was very simple. I just incremented P(R|S-) according
to:

P(R|S-) = P(R|S-) + .8 (1-P(R|S-))
n+1 n

when R occurred in the presense of S- and the result was a reinforcement
(movement toward the target). P(R| S-) was NOT decremented when a press in
the presence of S- was not reinforced. P(R|S+) remained at 0; the model never
pressed when the cursor was moving toward the target (S+).

What seems to be happening with the subject is that, in the middle of the
run, P(R|S-) suddenly changes; the probability of a press when the cursor is
moving away from the target, P(R|S-), becomes much less than 1.0. And
P(R|S+), the probability of pressing when the cursor is moving toward the
target, suddenly becomes much greater than 0.

How does reinforcement theory explain this change in P(R|S-) and P(R|S+)?

Best

Rick