exchange theory???; what Ecoli stuff was about

William_T_Powers2 · December 16, 1994, 11:55pm

[From Bill Powers (941216.1410 MST)]

Dennis Delprato (941214) --

RE: punishment and coercion

I just ran across "Is Punishment Effective? Coercive
Strategies in Social Exchange" (L. D. Molm,
Social Psychology Quarterly, 1994, v. 57, pp. 75-
94).

...

"These results
refute the classical exchange theorists' arguments that
punishment is ineffective and leads to retaliation...."

and

"more frequent punishment for nonexchange increased the
partner's reward exchange without increasing retaliation
or negative affect."

What is "exchange theory?"

My main question is, are either the classical arguments or the
refutation based on facts? If I say "Punishing your child will be
ineffective and will lead to retaliation," what percentage of the time
will this prediction prove false? If someone else says, "No, punishing
your child will lead to greater reward exchange (whatever that is) and
will not result in the child's retaliating," what percentage of the time
will that prediction prove false? Since you have seen the article,
perhaps you can deduce from the data presented what these percentages
would be.

···

----------------------------------------------------------------------
CHUCK TUCKER (941214)--

I do not understand what was being done with the various simulations of
ecoli. My question is: Does the simulation based on TRT replicate
(or reproduce) the movements of ecoli equally as well as the simulation
based on PCT?

The short answer is yes, the two models reproduced the behavior equally
well in terms of its general appearance (progress toward the source of
the chemical concentration, random-looking zig-zags). If we had real E.
coli data, perhaps some discernible differences would have shown up in
how well the real behavior was reproduced. But we weren't looking at the
model to see how well it fit the data; we were looking at alternative
mechanisms that might have been involved.

What was somewhat confusing about the whole interchange was that Bruce
Abbott was using the E. coli model to introduce the subject of learning.
He was not claiming that a real E. coli learns to behave as it does, but
only using the original model as a base for building a hypothetical
learning process on top of it. This is especially confusing because the
normal type of E. coli behavior already resembles a learning process, a
process of trial and error, due to the random effects of the tumbles. So
Bruce's model was adding a probabilistic process to vary the interval
between behavioral acts which themselves had probabilistic effects.
Keeping these two random processes conceptually separated was pretty
difficult.

In the original model, E. coli was said to contain a perceptual signal
proportional to the time rate of change of nutrient concentration (which
Bruce called "dNut"). Our original model calculated the angle of travel
of E. coli relative to the direction of the gradient, and computed dNut
as the speed of travel (constant) times the cosine of that angle, times
the concentration gradient at that distance from the source. Bruce did
the calculation more simply: he just subtracted the old value of
concentration from the new value on every iteration (during which the
bacterium moved) and obtained dNut directly. In either case, the model
ended up containing a variable representing the time rate of change of
concentration as sensed by the bacterium. So this difference between the
models made no difference at all.

Bruce's way uses these steps:

          1. Calculate present concentration from the position of
             E. coli in the concentration gradient; save the value.
          2. Move E. coli a fixed distance in the direction it is
               going (constant speed)
          3. Calculate new concentration
          4. Subtract old concentration from new concentration
             to get dNut.
          5. Save the new concentration as the old value
          6. Execute other steps in the model that occur during a
             single iteration (including a possible random change
             in direction).
          7. Go to step 2.

The original model then compared the sensed value of dNut with a
reference value to produce an error signal (reference minus perception).
The error signal determined the size of the increment added to a timing
variable; when that timing variable exceeded a limit, a tumble occurred
and the timing variable was reset to zero. Many iterations could occur
before the limit was reached, so many iterations of swimming in the same
direction might occur. If the error were large, the timing variable
would increase in large steps and reach the limit in a short time (few
iterations). So a large error caused a tumble to occur after only a
short delay and a short distance traveled. A small error meant a small
increment being added to the timer on every iteration; thus the next
tumble would not occur until after a large number of iterations, during
which E. coli would move, one step at a time, through a relatively large
distance.

In Bruce's model, the perceptual signal dNut was compared with a
reference value set to zero, and the error signal was a logical value,
either 0 or 1. If the value of dNut was negative (less than 0), the
error signal was 1; otherwise it was 0. This meant that even the tiniest
negative value of dNut, the smallest number representable in floating
point notation, would lead to an error signal of 1 unit, while a value
of exactly zero would lead to an error signal of 0. This made the
effective gain of the comparator nearly infinite, for an extremely tiny
variation in dNut could change the error signal from 0 to 1.

The output function in Bruce's model did not use a timer; it used a
probability calculation to determine the interval between tumbles. If a
number k (between 0 and 1) represents the probability that a tumble will
occur on any one iteration, whether or not a tumble occurs can be
determined by generating a random number uniformly distributed between 0
and 1 and comparing it with k. The random number is generated by calling
the library function "random" which returns a random number between 0
and 1. Suppose k is 0.01. The logical statement "random < k" will then
be true, on the average, only once in 100 times. If it is true, a tumble
is initiated during that iteration. If false, no tumble occurs. This
comparison is done on each iteration, so on the average there will be
1/k iterations before the next tumble occurs.

Thus _increasing_ the probability k has the effect of _shortening_ the
delay before the next tumble, on the average.

In Bruce's original attempts to use this way of determining the tumble
interval, the model made k depend not only on the current value of the
error signal, but on past values as well (to imitate learning from
experience). Various attempts to get this model to work were frustrated
by the lack of regularity in E. coli's tumbles, and the fact that each
new tumble produced a direction independent of the previous one. The
concept Bruce was trying to develop depended on experience with the
effects of past behavior having an effect on current behavior, so that
the progress of E. coli up the gradient could be explained as operant
conditioning. However, using only a single value of k meant that any
carryover of effects of previous error signals to the present behavior
led to incorrect values of k (for best performance), for half of the
tumbles. So one by one, these early attempts were shown, by running the
models, not to work -- not to produce the wanted result.

The actual requirement to make the model work is that the delay between
tumbles must be short for negative values of dNut (movement down the
gradient) and long for positive values (movement up the gradient).
Furthermore, the change from a short delay to a long one and vice versa
must take place _immediately_ upon a change in dNut from positive to
negative or negative to positive. Any averaging of k over several
tumbles would slow this change and make the delay be wrong for the
direction of travel.

Bruce's final solution, which does work, was accomplished by using two
values of k, one when the direction of travel was up the gradient and
the other when the direction was down the gradient. The value of dNut,
the rate of change of nutrient concentration, was treated both as a
discriminative stimulus ("S+" and "S-") and as the current stimulus. One
value of k, used when the current value of dNut was greater than the
reference level (0), was called "Probability of a tumble given S+",
which I abbreviated as PTS+. The other value of k, used when the current
value of dNut was less than the reference level, was PTS-.

The "reinforcing" effect was determined by whether S (that is, dNut)
just after a tumble was more or less favorable than just before the
tumble. This effect was used to change the two probabilities -- to
"reward" them by increasing them, or "punish" them by decreasing them.
But any method that would systematically decrease PTS+ and increase PST-
would have led to the same effect.

Separating the probability calculations into two parts means that when
the current value of dNut changes from greater than zero to less than
zero, the probability being used switches _instantly_ to the value
associated with the sign of dNut. If the two probabilities are
approriately large and small, the result will be that the delay before
the next tumble will immediately switch to the proper value, long or
short. This gets rid of lingering effects from previous tumbles.

What Bruce wanted to do (to illustrate learning by the Law of Effect or
the three-term contingency) was to start with the values of these two
probabilities at 50%, and show that his model would gradually decrease
PTS+ (reducing the probability of a tumble when moving the right way)
and increase PTS- (increasing the probability of a tumble when moving
the wrong way). So E. coli, not initially knowing when to delay a tumble
and when to do another one right away, would gradually change its
criterion for tumbling until the interval was always correct for the
direction of travel (as represented by dNut being positive or negative).
With the initial values of PTS+ and PTS- set to 0.5, E. coli would
tumble at the same intervals whether going up or down the gradient. It
would then just execute a random walk and make no progress either up or
down the gradient. As the probabilities gradually changed, it would
begin delaying its tumbles more and more when going up the gradient, and
hastening them when going down the gradient. Then it would start to
exhibit gradient-climbing behavior.

Since the probabilities were limited to be greater than zero and less
than 1, the probability of PTS+ would gradually go to zero, and that of
PTS- would go to 1. This would mean that whenever dNut was greater than
zero there would be no tumble, ever, and when less than zero a tumble
would occur immediately. This becomes exactly equivalent to the original
PCT model when the output gain is set to "infinity" (a very large
number) and the reference signal is set to zero.
-------------------------------
One very nice lesson can be learned from this exchange. We have to
remember that Bruce never made any claim that E. coli had to learn its
method of locomotion. The real E. coli is born organized to behave
essentially as the original PCT model describes it, without learning.
Bruce said that a number of times, although his disclaimer was rather
persistently ignored. What Bruce was saying was that IF E. coli
contained an organization like the successful one he finally proposed,
it would be able to learn the right relationship between the experienced
value of dNut and the delay until the next tumble.

What Bruce did was to DESIGN an E. coli system that would learn in
accordance with the Law of Effect. We can be pretty sure that E. coli
does NOT learn to behave as it does, whether by the Law of Effect or any
other principle, so it is clear from the start that Bruce's model is a
model of an imaginary E. coli, one that does not (as far as we have
evidence) exist in nature.

This is educational, because it tells us that we can model systems that
correspond to no real system, and study their behavior. We can even
model a system that behaves just like the real system, but does so by
means of a mechanism that exists in no real organism. In short, a model
that behaves correctly can be wrong. If Bruce had claimed that E. coli
swims up gradients of attractants because it learns to do so according
to the Law of Effect or operant conditioning, then he would have been
wrong even if his model ended up behaving (after running a while) just
like the real E. coli.

So how can we distinguish right models from wrong models, if simply
behaving like a real system is not an infallible criterion?

Sometimes we simply can't decide. Two models with different internal
organization (not reducible to each other through any series of
mathematical equivalences) may behave identically. In that case we
simply have to carry both of them along until we can find some basis for
preferring one over the other. The most obvious basis would be to
dissect the real system, trace its internal functions in detail, and see
which model matches the actual organization the best (if either).

But dissection is not very useful if we can't understand what we find
inside an organism. In most real cases, the best we can do is try to
find variations on the experimental conditions that will cause one model
to fail to behave properly but not the other. This is the most practical
way to approach behavioral modeling.

In Tom Bourbon's and my paper, "models and their worlds," we presented
several changes in experimental conditions that seemed to us to rule out
first the top-down, command-driven "cognitive" model and then the
externally-driven S-R model, while the same PCT model, with no changes
in parameters, continued to predict the real behavior correctly and
accurately. This approach led to cries of outrage from some quarters,
and claims that we had misrepresented the rival models. We had asked
that readers submit alternative models that would better represent the
two other positions if our representation was thought incorrect, but so
far the cries of outrage are all that have actually come forth
(sufficient, however, to prevent publication in a refereed journal, the
outraged cries having come from the referees).

Nevertheless, this approach to modeling seems to me the right one. Every
model that is defined well enough to generate simulated behavior will do
something, by some means. But very few models produce behavior that is
like that of a real system, and among those that do produce similar
behavior, very few do it by the same mechanisms as in the real system.
There may be some usefulness in exploring models for their own sake,
just to see the kinds of behaviors that can be produced and the sorts of
mechanisms that could produce them, but if the aim is to model real
systems in order to understand them, we simply have to use methods that
eliminate models that behave wrong or behave by the wrong means -- if we
can find them.

It seems to me that this systematic and rigorous approach to modeling is
the best way to settle arguments that are based mainly on loose rhetoric
and subjective definitions, as most arguments in psychology are. And
what Bruce and I and Rick were doing in the E. coli discussions was
exactly that: first trying to develop models that behave correctly
according to some principle being tested, and then trying to find
variations in the conditions that would eliminate all but one model. We
saw two or three models eliminated, and two that remained. But one of
the models that remained applied to imaginary behavior and it failed
under a fairly simple change of conditions, leaving us only one that is
a plausible explanation for the way the real E. coli behaves.
-----------------------------------------------------------------------
Best,

Bill P.