The Model, the Law of Effect, and Galileo

[From Bill Powers (941122.0730 MST)]

Bruce Abbott --

Let's cut to the chase:

Your program says

       If NutSave > 0 then { If S+ present during tumble then ...

Here you are defining Nutsave as S+.

I have already pointed out that in DoTumble, NutSave is actually set to
the value of dNut _prior_ to the tumble, even though your comment says
it is dNut _after_ the tumble. So S+ is defined as the value of dNut
just prior to the previous tumble. It's a good thing you wrote the
program as you did instead of as you said. DeltaNutRate is dNut -
NutSave, where dNut is the first new value of dNut after a tumble (when
JustTumbled is true). If NutSave were also the first new value of dNut
immediately after a tumble, DeltaNutRate would always be zero.

In the section that actually implements the tumbles, you say

    if dNut > 0 then { S+ present; tumble probability determined by S+ }

Note that now the _present_ value of dNut is said to be S+. If you had
not changed the definition of S+ within the same iteration, you would
have had to say

   if NutSave > 0 then {S+ present ...

and the program would have ceased to work correctly.

In my rearrangement of your code for changing the probabilities, we have
the following simplified logic:

If NutSave > 0 then
  if DeltaNutRate > 0 then increase (PTS+) else decrease(PTS+);
If NutSave <= 0 then
  if DeltaNutRate <= 0 then increase(PTS-) else decrease (PTS-);

As you have pointed out, when NutSave > 0, it is most likely that
DeltaNutSave will be negative. Therefore a correct description of the
situation is that an S+ is converted by ensuing reinforcement to a
_decrease_ in the probability of the behavior. That is, if a tumble
results in an _increase_ of dNut relative to an already positive
previous value, that must be treated as a punishment.

Indeed, that is what your code says. When DeltaNutSave is greater than
zero and the previous value of dNut was greater than zero, we must
increase the probability of the behavior, a tumble, which in turn
shortens the delay before the next tumble. The positive dNut is rejected
by an early tumble. Fortunately for the model, a positive NutSave is
most often followed by a less positive value of dNut, resulting in an
increase in the delay time for positive values of dNut.

···

-------------------------------------
You have offered a program which does in fact produce the right
behavior. But you have not considered the question of whether your
explanation is unique, or if not unique then best. There are several
other models that will create the same results, only faster. The
simplest one is

  if dNut > 0 then decrease(PTS+); (toward 0)
  if dNut <= 0 then increase(PTS-); (toward 1)

This modification will produce "learning" just as your program does, but
much faster and without using any knowledge about previous values of
dNut. Aside from the goal of making the model fit the language of
operant conditioning, what motive could we have for preferring your
model over the alternate one, which works even better? Yours is
computationally more complex, and results in slower learning with at
least some probability of failure. The modification is very simple and
also infallible.

The simplest model of all is the control model, which uses no
information about previous values of dNut AND requires no computation of
probabilities.
------------------------------------

Where on Earth do you get these ideas? Dormative principle? I don't
think I can name a single person doing work in the experimental
analysis of behavior who would subscribe to your conception of THEIR
conception of reinforcement. You are, as I and, apparently, quite a
number of others have tried to tell you, constructing a straw-person to
demolish.

Of course they wouldn't. They think their approach is perfectly
reasonable, otherwise they wouldn't use it. I'm not saying they're
stupid. But from my outsider's point of view, they are saying that the
reason a food pellet alters behavior rates is that the food pellet
contains the ability to reinforce behavior, which means it affects a
probability of producing the behavior, which means it alters the
behavior rate. This probability is itself unobservable: all we observe
are changes in mean behavior rates (or inter-response intervals), not
changes in the presumed probabilities that bring them about or the
imagined effects of the food pellet on those probabilities.

So the parallel to Bateson's "dormitive principle" (someone has
mentioned that the real source was Voltaire) is quite exact.
Reinforcement alters behavior rates because the events said to be
reinforcing contain some unknown property that alters a probability and
altering that probability alters the behavior rate. What is that
property? It is the property of altering the probability of emitting a
behavior. This is an exact example of a dormitive principle: the
explanation of why an effect results from a cause is that there is
something about the cause capable of producing the effect. In short, the
explanation is completely devoid of content. The statement boils down to
" there is a relationship between mean behavior rates and mean rates of
delivery of food pellets because there is a relationship between mean
behavior rates and mean rates of delivery of food pellets."

This is the mode of explanation of mysterious phenomena that was used
before Galileo. It is prescientific. Before you object that EAB people
would not be so dumb as to employ a prescientific mode of explanation,
let me remind you that the people who lived before Galileo were no less
intelligent than those who lived afterward; evolution doesn't work that
fast. The only difference between the pre- and post-Galilean modes of
explanation was that after Galileo we began demanding to know the
mechanisms behind apparent cause-effect relationships. Our brains didn't
get any better; only our conceptions did.
---------------------------------
<... how else is reinforcement to exert effects on the organism if not

through mechanisms inside the organism?

The concept of a cause "exerting" an effect is a preGalilean notion,
despite the fact that this concept dominates the behavioral sciences. It
is the same kind of concept as "impetus:" A gun imparts impetus to a
projectile, and the projectile does not fall until the impetus is
exhausted.

You do not find hard scientists such as physicists talking about causes
exerting effects (outside quantum mechanics, where there has been a
regression). Causes do not send out little particles that impinge on
other events where they force certain effects to appear. Instead, we now
speak of functional relationships among variables. This becomes
especially important in speaking of systems where there are closed loops
of causation, as there are in operant conditioning. A pellet of food is
simultaneously a consequence of the organism's outputs and an input to
the organism's perceptions. It is neither cause nor effect; it is a
variable embedded in a closed loop of causation.

When a model for the organism is proposed, the result is a system of two
equations, one describing the organism and the other describing the
apparatus and schedule. No matter what model is used for the organism,
the solution shows that the rate of delivery of food pellets is a
dependent variable, just as is the behavior rate. The independent
variables are disturbances originating independently in the environment,
and any offset added inside the organism to the perception of the food
pellets (in control theory, a reference signal). On a plot of behavior
rate versus food delivery rate, there is one curve that describes the
schedule, and a second curve that describes the organism. These curves
must cross if the outcome is to be determinate. The point of crossing
tells us both what the final food delivery rate will be and what the
final behavior rate will be. Neither one is determined independently.

Shall we do without cause and effect?

Yes, it's definitely time to stop trying to explain behavior in terms of
cause and effect. Each component of a behavioral loop can be described
as a cause-effect relationship (output dependent on input), but when
such components are connected into a closed loop, the same concept can
no longer be correctly applied to the result. Trying to explain circular
causation in terms of ordinary cause and effect leads only to confusion
and infinite regress. Picking out any one variable in such a loop and
assigning it the role of a cause is simply a conceptual error.

The mechanisms specified by traditional reinforcement theory are
admittedly vague (association by contiguity somehow increases response
probability) and thus difficult to relate to specific physiological
mechanisms. However, my representation of learning in ECOLI4a is
precisely as specified by the law of effect: alteration of response
probabilities as a function of experience. The physical mechanism is
unknown, so its input-output relationships are modeled instead (i.e.,
functional relationship).

What is wrong with this approach is contained in "alteration of response
probabilities as a function of experience." This omits the fact that at
the same time response rates, said to be caused by an intervening
probability, are a function of experience, experience is a function of
the response rates. While the situation is closed-loop, your description
is not. Your statement presents only half of the situation, the part of
the loop from the experience to the response rate. That is not enough to
determine what will happen.

Also, I must point out again (and will continue to do so until you
acknowledge) that the "experience" alone -- that is, the rate of
reinforcement, is not sufficient to determine the response rate. You
must also specify the amount of that experience that is wanted: the
reference level, which is determined by the organism independently of
the rate of reinforcement. If the reinforcement rate is equal to or
greater than the reference level, there will be no responses at all. If
the reference signal changes, the response that goes with a given rate
of reinforcement will change. The nature of the reinforcement event or
object has no control over how much reinforcement is wanted. What is
reinforcing to one organism is not reinforcing at another time, or to a
different organism. This alone tells us that the quality of
reinforcingness does not reside in the event called the reinforcement.

Horse puckey. Increases in dNut following a tumble increase the
probability of a tumble and decreases in dNut following a tumble
decrease the probability of a tumble.

_HORSE PUCKEY_? I am momentarily distracted.

But what you just said doesn't fit your model. Your model says that an
increase in dNut following a tumble (i.e., DeltaNutRate) can either
increase or decrease the probability of a tumble, depending on what the
previous value of dNut (before the tumble) was. Perhaps "The model
literally EMBODIES the law of effect, the very core of
reinforcement theory." But if so you did not describe the law of effect.
Or if the law of effect fits your description, then the model does NOT
embody the law of effect.

We can do completely without the calculation of the change in dNut
across a tumble. We need no probability calculations.

Sure. But such a model will not LEARN appropriate behavior to control
its perceptions.

I have shown a modification of the model which WILL "learn" appropriate
behavior in the same sense that yours does, and which does not use the
change in probability across a tumble, DeltaNutRate.

This is what your model of human performance on the e. coli simulation
did--it skipped over the very rapid learning phase the participants
passed through and then modeled the terminal performance. You then
papered over this omission with phrases like "learning was almost
nonexistent." You have yet to produce a model that does what your
human participants did.

Only because it would be so trivial, given what we can measure. We could
say that if the error signal is positive, the gain increases gradually
toward some limit from zero, and if negative it decreases toward zero.
This does not involve any probability calculations. That is an ad-hoc
model that would reproduce the effect your program generates, at a rate
depending on how we define "gradually" (as you do with your
"LearnRate"). I doubt that we could fit an actual learning rate to the
data for one person, because learning takes place too fast and there is
too much randomness due to the tumbles.

ECOLI4a may not model the physical organism correctly, but I believe it
captures the correct functional relationships that the physical
mechanisms underlying the learning process in real organisms establish.

It proposes ONE possible mechanism, but it is a very elaborate mechanism
and others would work equally well or better. And I am not convinced
that your model actually does embody the law of effect as you state it.
You will have to be much more precise about your definitions before I
will believe that.
-----------------------------------------------------------------------
Best,

Bill P.