Testing models; more PCT considerations re TRT

[From Bill Powers (941130.1600 MST)]

Tom Bourbon (941130.0833) --

The altered environments provide tests for _both_ models, not just the
TRT model. Under the new conditions, either model, or both of them, can
succeed, or fail.

Before Bruce Abbott pops his cork, I hasten to point out that the Law of
Effect model was supposed to illustrate _learning_, not just the
performance of going up a gradient. The learning was in the form of
changes in the probability per unit time of a tumble for dNut positive
or negative (separately). The control models did not exhibit any
learning (that Bruce would accept as real learning), so the same aspect
of the two models was not being tested.

Bruce's model contains a control model, but the form is a little
different from the one I proposed. If dNut is positive, the delay is
determined by PTS+, which tends toward zero (producing maximum delay)
but PTS+ is not affected by the error signal. If dNut is negative, the
delay is determined by PTS-, which tends toward 1.0 (zero delay) and is
also not affected by the error signal. The error signal itself is simply
the condition of dNut being greater or less than zero by any amount
however small. It simply causes switching between using the two
probabilities as appropriate for positive and negative dNut.

ยทยทยท

---------------------------------------
An addendum to this morning's post about models of learning.

One thing I meant to go into, but kept getting sidetracked from, was the
fact that there is normally no one behavior that will produce a given
consequence. This fact doesn't show up in the special situations that
are commonly studied, because in these situations _there are no
disturbances_. It is only when the environmental situation is completely
constant that there can be any "discriminative stimulus" that predicts
the "right behavior" for producing a given consequence.

This is really a branch off the remark about Skinner's idea of the
operant as a class of behaviors with a common consequence. In fact, we
generally name behaviors by the most striking consequence they produce,
which means that if different behaviors are required to produce the same
consequence, we will not notice the difference. We use the same name for
all the different actions: opening the puzzle box, pressing the bar,
running the maze. No matter what the cat does to open the puzzle box, we
call it "opening the puzzle box" or an "escape response," as if this
were just one unitary act.

There are really three ideas here. First, disturbances normally require
that _different_ actions be performed to achieve the _same_ consequence.
Second, when different actions are performed but the behavior is named
for the consequence that they produce, actions that are actually
different are reported as having been the same. And third, different
actions are required because in a normal environment, independent causes
-- disturbances -- contribute to the consequence, so it is impossible to
repeat a consequence except by varying the action.

And maybe fourth: behavior is typically reported qualitatively: the cat
escaped from the puzzle box, or it did not. But what makes behavior work
are quantitative relationships between the actions produced by the
organism and their effects on the environment. To escape the puzzle box,
the cat might have to rotate a latch by 90 degrees. But for the kind of
latch I have in mind, a crossbar pivoted in the center, rotating it
either 110 degrees or 80 degrees will not work: the rotation must be
close to 90 degrees. What matters is not _that_ the latch was rotated,
but _how much_ it was rotated. So what the cat learns is to produce a
quantitative change in an environmental variable, the angle of the
latch's crossbar.

However, the experimenter who is not thinking quantitatively may not
notice this. The result may be reported as if either the action is
effective in opening the door, or it is not. There can thus seem to be
discriminative stimuli that signal this behavior in a logical way, when
in fact a mere logical (on-off) signal can't discriminate between
amounts of effect on the environment caused by actions.

These are somewhat esoteric matters. The main point, however is the
effect of disturbances when they are allowed to happen.

A variable-ratio schedule can be represented as a fixed-ratio schedule
plus a disturbance. The PCT-predicted relationship of behavior to the
disturbance, however, is not easily seen, because the random variable is
considerably too random; any ratio can be followed immediately by any
other ratio within the possible range. This would be like a tracking
experiment in which the cursor was disturbed every 1/60 second, jumping
instantaneously from any position to any other position in a blur too
fast to follow. The bandwidth of the disturbance is too wide to reveal
the control system at work.

To apply the disturbance in a way that would let us see the operation of
the control system, a variable-ratio schedule should have the schedule
changed in small increments: for example, 20, 23, 25, 27, 23, 19, 16,
18, 24, 30, 29 ... and so on -- unpredictably, but not rapidly. Under
these conditions we should see the behavior rate changing to compensate
for the change in ratio.

When disturbances are present, we can find that a given consequence is
repeated by means of _any behavior within the range of the relevant
behavior_. In this sort of situation, it is obviously futile to provide
a discriminative stimulus that indicates what behavior is needed to
produce the consequence. The amount of behavior that achieves the
consequence on one trial will prevent its achievement on the next. The
mere fact that one amount of behavior produced the consequence on one
trial is no indication of the amount that will be needed to achieve it
on the next trial.

The Law of Effect makes sense only in those cases where the effects of
actions are regular consequences -- where a given action will always
have the same effect. Such cases do exist, of course. But the model we
need must also be able to take care of cases where this regularity is
not seen, because we can demonstrate many cases, the majority in fact,
in which only variable behavior can possibly create a repeatable
consequence.

If we have a model that can account for the more difficult case, it will
automically handle the case without disturbances. But the reverse is not
true: the Law of Effect which can handle behavior with regular
consequences can't handle the case when there are unpredictable
disturbances. Unless we want to claim that the basic organization of the
nervous system changes depending on whether disturbances happen to be
present, we must go with the model that handles the wider range of
cases.
-----------------------------------------------------------------------
Best,

Bill P.