[From Bill Powers (941029.1730 MDT)]
Bruce Abbott (941029.1600 EST) --
You cite Thorndyke's comment:
By a satisfying state of affairs is meant one which the animal does
nothing to avoid, often doing such things as attain and preserve
it. By a discomforting or annoying state of affairs is meant one
which the animal commonly avoids and abandons. (Thorndike, 1911)
This is the approach by which scientists have attempted to characterize
behavior without trying to guess what's going on inside the organism
(i.e., without modeling). It's vague enough to fit just about any
theory. The problem with it shows up immediately: if some "states of
affairs" are discomforting or annoying, then we have to try to say what
is discomforting or annoying about them, in the same terms in which the
behavior is expressed: what the external observer can see. So we end up
with stimuli that are reinforcing, that have value, that have salience,
and so forth -- mysterious nonphysical properties of the physical world.
This approach pays lip-service to physical causality, but in a way that
attributes to the physical environment powers that actually belong
inside of organisms.
It's the organism, not the environment, that decides what is to be
reinforcing, what stimuli are to be meaningful, what states of affairs
are to be sought or avoided. The attempt to make it seem that the
properties of the environment cause behavior has simply made psychology
look suspect to anyone in the physical sciences. You say that M&M's are
reinforcing? OK, here are some M&M's -- show me where and what the
reinforcing quality is.
···
------------------------------
RE: the reinforcement theory challenge.
You've neatly changed the challenge:
repeat
inc(Clock);
If Going_up_gradient then continue;
If Going_down_gradient then tumble;
until Clock > EndSimulation
Turning this sketch into a runnable model requires filling in some
details. If you add sensors to detect the time rate of change of
concentration that results from the direction of swimming, you have the
input function; if you compare that input signal with a reference signal
(for positive time rate of change) you have a comparator and an error
signal, and if you convert the magnitude of the error to an effect on
the current timing prior to the next tumble, you have the output
function. This is exactly the E. coli model, a control system that
controls the sensed time rate of change of concentration (of an
attractant). To model behavior relative to a repellent, you have to set
the reference signal a little negative and reverse the effect on the
time delay. What you have modeled is a control system; only a control
system can do what your stimulation-sketch describes.
So now your problem is to start with this model and show that it fits
the definition of reinforcement. Skinner was challenged in his statement
that behavior is controlled by its consequences because if we take it
literally we have a future event controlling an event that came before
it. Skinner explained very carefully that of course that was not what he
meant. He meant that the consequence produced by a present behavior had
an incremental effect on the probability of producing that same behavior
in the future. The consequence did not affect the same behavior that
produced it.
The "future" in this case means a time _after_ the reinforcement has
occurred. Obviously there can be no effect on the current behavior by
its consequence, the reinforcement that has not occurred yet and may or
may not occur. Only after the behavior has taken place (swim for some
length of time and then tumble) and the consequence occurs can the
consequence be known (end up swimming up or down the gradient, and thus
experiencing a positive or negative time rate of change of
concentration). Incidentally, Koshland established, by perfusion
experiments, that it is indeed the time-derivative of concentration that
E. coli senses and that governs the delay in its tumbling in a smooth
analog way. He also showed that the reference level for sensed time rate
of change of concentration varies among bacteria, going all the way from
a never-tumbling mutant to an always-tumbling mutant.
I claim that to produce a true reinforcement model, you have to show
that a _consequence_ of the current behavioral variables' setting
modifies the _next_ behavior, the next setting of that behavioral
variable. The only possible behavioral variable here is the delay prior
to the next reinforcement. I model this as a timer that counts up toward
a trigger level, with the trigger level being adjusted upward or
downward by the error signal. You could also have a fixed trigger level
and vary the size of increments of the timer according to the size of
the error signal. Of course that adjustment has no effect until the
timer reaches the trigger level, producing a tumble and resetting
itself. There is no visible consequence of the changes in trigger level
until after the tumble takes place.
I think you'd better give some deeper consideration to this problem. If
"reinforcement" has any technical meaning, we have to use that technical
meaning to answer this question. If it just means anything that the
organism likes and manages to get for itself, then there isn't any
theoretical issue; just loose talk.
------------------------------------------
RE: Turbo Pascal Graphics --
Your suggestion about the BGI files is a good one (although I don't
think that was Tom's problem). Unfortunately, my Pascal runs on drive
E:, so each person would have to modify that const expression anyway. I
suggest trying
const BGIDIR = '\tp\bgi'
which will be relative to the drive you're on. Also, you have to be sure
there _is_ a BGI directory beneath the \TP directory, with the BGI files
in it. My old version of Pascal didn't have one.
I'll put that in future versions and see how it works.
----------------------------------------------------------------------
Best,
Bill P.