[From Samuel Saunders (951222:14:46:54 EST)]

The following presentation is "borrowed" in part from James Greeno and

Frank Restle. It illustrates one class of reinforcement approach, and

affinities can be seen in the approaches of Premak, Allison, and Timberlake,

among others. This is an incomplete presentation, but it may help to

further the discussion. This presentation does not address 'acquisition'.

I will present acquisition of a bar press according to W.K. Estes in a few

days to illustrate reinforcement approaches to 'acquisition'.

Let us assign a value to each behavior possible in some situation. We can

then express the probability of some behavior Ai in terms of the values:

P(Ai)= v(Ai)/SUMx(Ax).

When opportunity to engage in some behavior is denied for some time

(deprivation), the behavior will have an increased probability when it is

first made available again. Data from Skinner (1938) The Behavior of

Organisms (p.348) show that a rat deprived of food except in an apparatus

ate pellets, after a brief initial burst, at a rate of 1 pellet a minute.

When eating was interrupted by making pellets unavailable for a 13 min

period 25 min into the session, rate of pellet consumption went to about 3

per minute for several minutes before returning to the 1 per minute rate.

(This cited to show two kinds of deprivation- deprivation of eating for a

long period raises the probability of eating, and 'relative deprivation' by

interfering with the resulting rate produces an additional increment).

Evidence can be marshalled that this is typical of behavior in general, that

deprivation produces an increment in the behavior when opportunity to

perform it is first introduced. This can be treated as a change in the

relevant v(Ax), as long as the Ax are independent. In some cases (eating

and drinking, for instance) there are interactions, so that deprivation of

one response response can affect the probability of another response

directly (for example, water deprivation decreases eating).

When opportunity to engage in a behavior is made contingent on another

behavior, a base response B must be performed in order to perform a

contingent response C, so, if B is much less likely than C, B must be

performed more than it would without the contingency in order for C to be

performed as much is it would without the contingency. The result is a

compromise. Letting v0(Ax) indicate the pre-contingency value of Ax and

v1(Ax) the value of Ax with the contingency, these considerations may be

expressed:

Reinforcement:

v1(B) = b * v0(B) + c * v1(C)

Deprivation:

v1(C) = v0(C) + h(1-t)

where h is a constant and t is proportion of the time C is

available.

The vx(Ax) cannot be measured directly, but only relative to a context that

offers other alternative behaviors. If the pre-contingency and contingency

conditions are presented in a constant context, the other Ax can be

combined into one term, say M, and the above can be expressed _re_ a

constant v(M):

Reinforcement:

v1(B)/v(M) = b * v0(B)/v(M) + c * v1(C)/c(M)

Deprivation:

v1(C)/v(M) = v0(C)/v(M) + h/v(M) * (1-t)

setting h' = h/v(M)

v1(C)/v(M) = v0(C)/v(M) + h'(1-t)

Let us consider the proposed Marken experiment. The contingent response is

viewing pictures for a fixed length of time p, while the base response is

pressing the mouse button, which must be pressed n times to view a picture.

Then for a session of length d in which a total of s presses occur, the

value t can be calculated

t = (s/n * p) / d .

Since button pressing is not very likely in the absence of the

contingency, we may set v0(B)/v(M) = 0. Then

v1(B)/v(M) = c * v1(C)/v(M)

Substituting the deprivation equation into the above:

v1(B)/v(M) = c [ v0(C)/v(M) + h' (1-t)] .

Some considerations:

If we vary the picture time by occasionally presenting the picture longer,

but not so much that we cause changes in b or c, then t will increase, so

(1 - t) deceases, P1(B) = v1(B)/(v1(B) + v(M)) will decrease.

The value in the contingent condition will be higher for a response with

0 initial probability than for a neutral response.

In the literature, b varies as a function of FR value when FRs have been

used with non-neutral base responses. A quick and by no means definitive

examination suggests that b ~= log(n) / k , where k is fixed at least for a

particular experiment, may apply. This could reflect something like

'perceived numerosity'. If v0(B) is not 0, but < 0, then increased ratio

would produce an increased subtractive term in the reinforcement equation.

Negative v0 could not be observed directly, of course, but might be

measured in a situation that include a B1 (presumed 0 or negative), a B2

(measured v0 > 0), and C. Then providing the contingent relationship for

B2, a negative v0(B2) could be seen from a v1(B2) less than predicted as

above.

This all appears to have a hidden control model inside. The v0

particularly suggest reference levels. Timberlake has gone part way toward

a control interpretation, expressing things in terms of "behavioral set

points". From there, it is not a very big step to move to control of

perception, and perceptual reference. I wonder if that step has been

blocked in part by a lack of insight into a method for determining the

perceptual function and perceptual set point (the TEST)?

I would like to spend my time working of PCT models, and maybe even

experiments, rather than trying to put reinforcement thinking into

reasonable form for comparison and contrast. I volunteered to do this,

however, so I will continue to do it. I think it is important to represent

the reinforcement view as accurately as possible, but I don't assert that

view myself, so please try to keep comments on a scientific rather than

personal level.

//----------------------------------------------------------------------------

//Samuel Spence Saunders,Ph.D.