[From Bill Powers (970901,0400 MDT)]

Bruce Abbott (970831.2010 EST)--

Some loose ends in this post exist, which I failed to deal with properly. I

feel that the exchange is going off the track, with qualitative and

approximate arguments beginning to creep in. The informal meanings of words

are beginning to play a larger role, especially terms that have customarily

been used in behaviorism to assert a direction of causality (like maintain

and support) and to which I object for that reason _as interpretations_. If

we are trying to construct a precise description of phenomena, we must

stick with denotative and quantitative words as much as possible and

mathematics where we can.

At the point where the rails began to give way, we had established that the

reinforcer was to be defined as an observable variable just prior, in the

loop, to the CV, with the CV being a function of it. The CV itself

(nutrient level in the running example) is a hypothetical variable, not

observable in any ordinary behavioral experiment. When we talk about the

CV, or the reference signal, or the error signal, we are talking about a

model, not about observations.

I will go through the predictions that can be derived from the PCT model.

The particular CV we have chosen can be related to food intake in the

following way:

d(CV)/dt = k1*R - L

where R = reinforcements per unit time (reinforcement rate)

L = loss rate per unit time.

The constant k1 absorbs the weight of the pellets and their nutritive value

per gram.

If the loss rate is proportional to the level of CV (L = k2*CV), a

reasonable first approximation, this equation leads to the steady-state

relationship (where CV has become constant, and therefore d(CV)/dt = 0),

k1*Rss = k2*CVss.

The value of CVss at that point will be

CVss = k1/k2*Rss

This tells us that the steady-state value of the CV is simply proportional

to the steady-state reinforcement rate. Thus the steady-state reinforcement

rate is a measure of the steady-state CV. The Test applied to reinforcement

rate (when the loop is closed) will yield the same results as the Test

applied to the CV.

When the appropriate establishing operation has been carried out, the CV

will be at a level CVmin lower than the reference level CV' (which we

assume constant). The error will be (CV' - CVmin). CV will be declining

slowly at a rate k2*CV, since R is zero (the contingency has not yet been

enabled).

We assume that the behavior rate B is proportional to the error by

B = gain*(CV' - CV)

Note that at this point, just before the contingency has been turned on,

the behavior rate is at its maximum: Bmax = gain*(CV' - CVmin). This, of

course, is reasonable only if we are talking about an existing control

system, not about the search phase or the learning phase.

The contingency is turned on when CV has declined to the value that we

designate as CVmin.

To complete the system equations we must specify how R depends on B. The

reinforcement rate R is some function f of the behavior rate B:

R = f(B)

For our purposes now, we can simply assume a proportionality factor k3:

R = k3*B

The initial dynamic response of the system will be such that the error

declines from its initial value to a final value, CV increases from CVmin

to some steady-state value CVss, and the behavior declines from its initial

rate Bmax to some steady-state value Bss. All these changes will follow

negative exponential courses with a time constant of k1/(loop gain). Note

that loop gain is not "gain" in the equations below.

We can solve for the steady-state values of the variables by solving the

steady-state system equations simultaneously:

CVss = k1/k2*Rss

Rss = k3*Bss

Bss = gain*(CV' - CVss)

Solving for the behavior rate, we have

gain * CV'

Bss = ----------------------

1 + gain*k1*k3/k2

This looks different from our usual solutions because the _loop_ gain is

not all concentrated in the output function.

Note that k3 appears only in the denominator. This means that as the

contingency ratio decreases, the behavior rate will decrease, with the

other constants remaining the same, even if the output gain is very high.

Contingency ratios (as in FR schedules) correspond to 1/k3.

Note also that R, the reinforcement rate, has disappeared from the

equation. This shows that R is not an independent variable. In fact,

because we have no disturbance, the only independent variable is the

reference level CV'.

We can solve the same set of equations for the steady-state rate of

reinforcement:

k3*gain*CV'

Rss = ------------------

1 + k3*gain*k1/k2

Now the behavior rate has disappeared, showing that Rss depends only on the

reference level. Behavior rate is not an independent variable, either.

So we have the classical case of apparent causality being created when two

variables depend on a single third variable: Bss depends on CV' alone, and

Rss depends on CV' alone. There is, of course, a relationship between Rss

and Bss: it is given by

Rss = k3*Bss

This is one of the initial system equations. If the solutions for Rss and

Bss above are solved together to eliminate CV', exactly the same result

will be found (the hard way).

But that is the ONLY relation between Rss and Bss. Neither one can be

changed without a change in the other; the relation Rss = k3*Bss must

always remain true. This relation is just a description of the

environmental feedback connection, and is not a property of the organism.

The only causal relation between Rss and Bss is the effect of Bss on Rss.

When the system as a whole is nonlinear, and when the contingency is made

more complex, the above equations will in general not be solvable by

analytic means. However, they can be solved in simulation (if any solution

exists) and the major results will remain exactly the same. Rss and Bss

will each be a function of CV'. Rss will be determined by Bss, through some

function Rss = f(Bss), and that will be the ONLY causal relation between

them. The only _independent_ variable will still be CV'.

If anything "maintains" anything else, it is the setting of CV' that

maintains both Rss and Bss.

After we have finished the comparison of EAB terminology to the above

analysis, we can turn to the other two situations: search and learning. The

search phase involves trying different kinds of behaviors in different

places until some reinforcement is obtained. Then the learning phase

involves increasing the skill with which a specific behavior is produced in

a specific place to control the CV. This latter phase has the primary

result of raising the output "gain" factor.

Best,

Bill P.