1200 dollar E. coli

[Martin Taylor 950626 17:15]

From Rick Marken (950624...)

Rick, I'm afraid I must be quite obtuse. Obviously I don't understand
reinforcement theory as well as Bruce, and you say:

The mistake I made on the net was
to invite reinforcement theorists to explain the results of that experiment.
This was a mistake becuase I understand reinforcement theory better than the
reinforcement theorists.

So take this as a naive comment/question.

I would have thought that making the cursor move faster toward, or less
rapidly away from, the target would be reinforcing, and making it move
faster away, or less rapidly toward, would be negative reinforcement.

If that is the case, then seeing the cursor moving away from the target,
and pushing the button, would lead to reinforcement for button pushing
more than it would lead to negative reinforcement, and if the cursor was
seen moving toward, pushing the button would lead to negative reinforcement
more often than positive. Both these ratios would depend on the angle:

            100 | x
                > x
   percent | x
   positive 50 | x
reinforcement | x
                > x
              0 | x

ยทยทยท

----------------------------
                0 90 180
                   degrees away from toward
                        target direction

The subject is heavily punished for pushing the button when the cursor is
moving toward the target, heavily rewarded when it is moving directly away
from the target, and punished and rewarded equally when it is moving at
right angles to the direction to the target. Wouldn't this lead to the
observed effects? The consequence of a button push when the cursor is
seen moving to the target is usually bad, and of a button push when the
cursor is seen moving away is usually good. So, selection by consequences
would seem to yield more pushing when the cursor is moving away, and less
when it is moving toward, the opposite of what you say it predicts.

I'm afraid I don't understand your analysis, and this naive approach may
not be what "reinforcement theorists" would say. But it is what I would
think of first, if someone asked me to describe a reinforcement-based
description of the e-coli experiment.

I haven't observed too carefully the regions devoid of angel footprints,
but I suspect this might be one of them.

Martin

[From Rick Marken (950626.1600)]

Martin Taylor (950626 17:15) --

           100 | x
               > x
  percent | x
  positive 50 | x
reinforcement | x
               > x
             0 | x
                ----------------------------
               0 90 180
                  degrees away from toward
                       target direction

The subject is heavily punished for pushing the button when the cursor is
moving toward the target, heavily rewarded when it is moving directly away
from the target, and punished and rewarded equally when it is moving at
right angles to the direction to the target.

Yes. That is what the empirical data show; in my graph the y axis was the
empirically derivered measure of the probability of a response following the
consequence on the x axis.

Wouldn't this lead to the observed effects?

It seems like it but no.

The consequence of a button push when the cursor is seen moving to the
target is usually bad, and of a button push when the cursor is seen moving
away is usually good.

This is what fooled the reviewers. In fact, every consequence (from 0 to 180)
is equally likely regardless of the current direction of movement relative to
the target. So over all trials, the expected "percent positive reinforcement"
following a press (regarless of the direction of movement before the press)
is 50.

So, selection by consequences would seem to yield more pushing when the
cursor is moving away, and less when it is moving toward, the opposite of
what you say it predicts.

That is ONLY true if the consequence of a press is defined as a CHANGE in
direction. In that case, what you say is true. But CHANGE in direction after
a press is not random in the sense that some changes (improvements) are more
likely when you are going away from the target and other changes
(deteriorations) are more likely when you are going toward the target.

I'm afraid I don't understand your analysis, and this naive approach may
not be what "reinforcement theorists" would say.

No. It IS what reinforcement theorists would say. That's how you can tell
that they don't know their... never mind;-)

But it is what I would think of first, if someone asked me to describe a
reinforcement-based description of the e-coli experiment.

If you actually write the program that implements the reinforcement model you
would see that the verbal analysis does not match the way the model actually
behaves. That's why the reviewers never got it.

The reinforcement model works if you define reinforcement as the CHANGE in
direction of cursor movement -- and assign positive reinforcing values to
changes toward the target and negative or zero values to changes away from
the target. This model works because the consequences of responding (changes
in direction) are no longer (uniform) random. It is possible to make these
change consequences uniform random by biasing the direction of movement
following a press, the bias being based on the direction of movement before
the press. When you do this, the reinforcement model behaves randomly (of
course); random consequences select random responses. People in the same
situation (consequences of prtesses are now random changes) do not produce
random responses; they respond, as necessary, to keep the cursor on target.

Best

Rick