[From Bill Powers (970901,0400 MDT)]
Bruce Abbott (970831.2010 EST)--
Some loose ends in this post exist, which I failed to deal with properly. I
feel that the exchange is going off the track, with qualitative and
approximate arguments beginning to creep in. The informal meanings of words
are beginning to play a larger role, especially terms that have customarily
been used in behaviorism to assert a direction of causality (like maintain
and support) and to which I object for that reason _as interpretations_. If
we are trying to construct a precise description of phenomena, we must
stick with denotative and quantitative words as much as possible and
mathematics where we can.
At the point where the rails began to give way, we had established that the
reinforcer was to be defined as an observable variable just prior, in the
loop, to the CV, with the CV being a function of it. The CV itself
(nutrient level in the running example) is a hypothetical variable, not
observable in any ordinary behavioral experiment. When we talk about the
CV, or the reference signal, or the error signal, we are talking about a
model, not about observations.
I will go through the predictions that can be derived from the PCT model.
The particular CV we have chosen can be related to food intake in the
following way:
d(CV)/dt = k1*R - L
where R = reinforcements per unit time (reinforcement rate)
L = loss rate per unit time.
The constant k1 absorbs the weight of the pellets and their nutritive value
per gram.
If the loss rate is proportional to the level of CV (L = k2*CV), a
reasonable first approximation, this equation leads to the steady-state
relationship (where CV has become constant, and therefore d(CV)/dt = 0),
k1*Rss = k2*CVss.
The value of CVss at that point will be
CVss = k1/k2*Rss
This tells us that the steady-state value of the CV is simply proportional
to the steady-state reinforcement rate. Thus the steady-state reinforcement
rate is a measure of the steady-state CV. The Test applied to reinforcement
rate (when the loop is closed) will yield the same results as the Test
applied to the CV.
When the appropriate establishing operation has been carried out, the CV
will be at a level CVmin lower than the reference level CV' (which we
assume constant). The error will be (CV' - CVmin). CV will be declining
slowly at a rate k2*CV, since R is zero (the contingency has not yet been
enabled).
We assume that the behavior rate B is proportional to the error by
B = gain*(CV' - CV)
Note that at this point, just before the contingency has been turned on,
the behavior rate is at its maximum: Bmax = gain*(CV' - CVmin). This, of
course, is reasonable only if we are talking about an existing control
system, not about the search phase or the learning phase.
The contingency is turned on when CV has declined to the value that we
designate as CVmin.
To complete the system equations we must specify how R depends on B. The
reinforcement rate R is some function f of the behavior rate B:
R = f(B)
For our purposes now, we can simply assume a proportionality factor k3:
R = k3*B
The initial dynamic response of the system will be such that the error
declines from its initial value to a final value, CV increases from CVmin
to some steady-state value CVss, and the behavior declines from its initial
rate Bmax to some steady-state value Bss. All these changes will follow
negative exponential courses with a time constant of k1/(loop gain). Note
that loop gain is not "gain" in the equations below.
We can solve for the steady-state values of the variables by solving the
steady-state system equations simultaneously:
CVss = k1/k2*Rss
Rss = k3*Bss
Bss = gain*(CV' - CVss)
Solving for the behavior rate, we have
gain * CV'
Bss = ----------------------
1 + gain*k1*k3/k2
This looks different from our usual solutions because the _loop_ gain is
not all concentrated in the output function.
Note that k3 appears only in the denominator. This means that as the
contingency ratio decreases, the behavior rate will decrease, with the
other constants remaining the same, even if the output gain is very high.
Contingency ratios (as in FR schedules) correspond to 1/k3.
Note also that R, the reinforcement rate, has disappeared from the
equation. This shows that R is not an independent variable. In fact,
because we have no disturbance, the only independent variable is the
reference level CV'.
We can solve the same set of equations for the steady-state rate of
reinforcement:
k3*gain*CV'
Rss = ------------------
1 + k3*gain*k1/k2
Now the behavior rate has disappeared, showing that Rss depends only on the
reference level. Behavior rate is not an independent variable, either.
So we have the classical case of apparent causality being created when two
variables depend on a single third variable: Bss depends on CV' alone, and
Rss depends on CV' alone. There is, of course, a relationship between Rss
and Bss: it is given by
Rss = k3*Bss
This is one of the initial system equations. If the solutions for Rss and
Bss above are solved together to eliminate CV', exactly the same result
will be found (the hard way).
But that is the ONLY relation between Rss and Bss. Neither one can be
changed without a change in the other; the relation Rss = k3*Bss must
always remain true. This relation is just a description of the
environmental feedback connection, and is not a property of the organism.
The only causal relation between Rss and Bss is the effect of Bss on Rss.
When the system as a whole is nonlinear, and when the contingency is made
more complex, the above equations will in general not be solvable by
analytic means. However, they can be solved in simulation (if any solution
exists) and the major results will remain exactly the same. Rss and Bss
will each be a function of CV'. Rss will be determined by Bss, through some
function Rss = f(Bss), and that will be the ONLY causal relation between
them. The only _independent_ variable will still be CV'.
If anything "maintains" anything else, it is the setting of CV' that
maintains both Rss and Bss.
After we have finished the comparison of EAB terminology to the above
analysis, we can turn to the other two situations: search and learning. The
search phase involves trying different kinds of behaviors in different
places until some reinforcement is obtained. Then the learning phase
involves increasing the skill with which a specific behavior is produced in
a specific place to control the CV. This latter phase has the primary
result of raising the output "gain" factor.
Best,
Bill P.