Reinforcement theory vs reorganization theory

[From Bill Powers (941208.2015 MST)]

Bruce Abbott (941208.1000 EST) --

Let's concentrate on a single point, the difference between
reinforcement theory and reorganization theory. You said to Rick Marken,

In other words, because string-pulling leads to the consequence of
affecting an intrinsic variable, it gets selected and "connected," to
use your term.

Try it this way: The cat reorganizes, resulting in a change of the
variables it is controlling and the way it controls them. It tries to
use the new organization to control its environment. If the attempt
fails, the cat continues to reorganize, and sets up a new organization,
and tries it again. It keeps doing that until at last it succeeds --
that is, until the consequence it wanted is brought about by the
behavior. To oversimplify, the result is that no new organization is
tried, and the organization that brought about the desired end remains
in effect.

In order for an effective behavior to occur, it must already have been
"selected and connected" to produce a real result. So the selection and
connection must occur _before_ the consequence occurs, not after.

What's missing from the TRT model is any explanation for why a behavior
that produces one consequence is modified by that consequence so as to
create a _different_ consequence. Where does this _difference_ in
behavior come from? What is it that aims the changes in behavior and the
subsequent changes in consequence toward producing a particular
consequence, especially when that consequence is not currently being
produced at all?

Reorganization theory answers these questions compactly and without
invoking any mysterious nonphysical effects. It simply says that the
organization of behavior keeps changing until the motivation for
changing it goes away. That motivation is, of course, the difference
between the consequence that the organism wants to experience and the
one it does experience. Each reorganization creates a control process
with certain properties, a control system that manipulates the world to
produce a specific perceived effect (like pulling a string or making a
scratching sound on the cage). If doing that results in the consequence
that is wanted, then no further reorganization takes place. Period. That
is all that is required.

Reinforcement theory and the law of effect only _seem_ to offer an
explanation. Basically, they beg the question. Look again at your
description directed to Rick:

In other words, because string-pulling leads to the consequence of
affecting an intrinsic variable, it gets selected and "connected," to
use your term.

To say that string pulling get "selected and connected" is only to say
that string-pulling occurs again. String-pulling occurs, and has the
desired result, and occurs again. That is all you have said. All the
other words in the description hint at a mechanism, but do not say what
it is. They only assert that there must be a mechanism that gives a
particular consequence an effect that makes the same behavior tend to
occur again.

But what simpler mechanism could there be than the organization that
produced the string-pulling the first time? If the cat was organized in
such a way that it pulled the string once, what simpler explanation can
there be than saying that if it pulls the string again, the same
organization must be present? In other words, it pulled the string the
second time because its organization did NOT change.

That is the essence of the reorganization explanation. Instead of trying
to explain why a specific behavior is produced that has a specific
consequence, reorganization theory explains why behavioral organization
_changes_, and the circumstances under which it ceases to change. If the
organization of behavior comes to some final form, it is not because
anything wanted that final form to exist. It is only because every other
form failed to correct completely the error or errors driving the
process of change.

ยทยทยท

--------------------------------------------
All this, of course, is in addition to what I spoke about in my last
post. What is learned is an organization of behavior, not a specific
correspondence between an action and its consequences. This becomes
obvious when we allow disturbances, both additive and in the form of
changes in the feedback fuction, and changes in reference signals into
the picture. Now consistenct consequences can be created only by
variable behaviors, and the only kind of organization that can produce
that effect is a perceptual control system.
-----------------------------------------------------------------------
Best,

Bill P.

[Martin Taylor 941209 1210)

Bill Powers and Bruce Abbott (many postings)

I've been skimming through the last couple of months of postings, with some
fascination and no small degree of awe. As you may imagine, I haven't
been able to grasp the details of the various arguments and models, but
I do have one impression that might be worth sharing. Perhaps it is
totally off-the-wall and screwy, but here goes, regardless.

Bill argues, I think persuasively and independently of even the core theory
of PCT, that the situation called "reinforcement" cannot lead directly
to the strengthening of whatever it is that leads to certain actions in
certain situations. Bruce argues, I think persuasively (and has models
to show) that it LOOKS as if this is happening. How can they both be
right, as I think they are--at first glance?

My off-the-wall notion is that there is not only a difference in viewpoint
(S-R vs PCT), but also a difference between microscopic and macroscopic
views, analogous to the difference between a molecular dynamic view of
heat and a bulk view (so many calories per kg of fuel, for example).

In a perceptual control UNIT (a single ECU with one scalar perceptual signal
and one scalar output signal) there CANNOT be different acts to bring
about the same perceptual consequence. All there can be is a greater
or lesser degree of the same act. In a perceptual control HIERARCHY,
the single act of any higher unit is implemented in many different ways
at lower levels, depending on perceptual contexts that are totally
unknown to the higher unit (which has only the single scalar value to
work with). As soon as you get to multiple hierarchic levels in which
one higher output contributes to many lower reference signals, you get
the often quoted result: many means to a single end.

Reorganization affects the interrelation between levels (ignoring for now
any changes that might occur in the perceptual or output functions). In
that sense, learning consists of finding ANY way to act at lower levels so
as to make the feedback loop of the higher level have a negative loop gain.
There may be many ways that lower levels could be connected so that this
desired end--perceptual control at the high level--is achieved, but only
one way needs to be found; Hans Blom finds a satisfactory bicycle route,
though later it is found not to be the best, or the cat finds a pattern
of movement that brushes this and pushes that, though many of these acts
are not actually needed. The initially successful reorganization provides
ONE way to bring the upper-level feedback loop into a state of negative
loop gain. One is all that is needed; as far as intrinsic error dependant
on that perception is concerned, reorganization should stop.

This is fine, so long as the environment is sufficiently stable that this
ONE kind of lower-level action always works. Environments are not usually
so kind. In the cat's puzzle box, though, the environment is usually
particularly stable, so the visible actions become pretty stereotyped. It
LOOKS as if the "reinforcer" is "increasing the strength" of the "actions
in response to the stimulus." But it isn't, when viewed microscopically.

Microscopically, the cat is "doing" many things besides "escaping from the
box" (which it may not be "doing" at all, in the PCT sense, having perhaps
no perception of "escape"). Microscopically, there are many uncontrolled
perceptions, and so long as the cat does not escape and "get the reinforcer"
(reduce the intrinsic error), reorganization continues. Actions that
occur while the cat is not escaping may well, randomly, be disconnected
from the output of the higher-level ECU perceiving, say, hunger. Actions
that occur in conjunction with escaping and getting the food will have
less likelihood of being disconnected, because reorganization stops shortly
thereafter. While the cat is not escaping, new acts are being randomly
reconnected and disconnected, so that after a few successful escapes,
the ones left connected are those that have occurred in conjunction with
prior escapes, and others that occurred without escape on previous trials
have a good chance of having been disconnected. Proportionately, the
escape-adjacent acts form a larger part of all the acts performed by the
cat in the box, and it LOOKS as if they have been "strengthened." And so,
in a macroscopic view, they have. In a microscopic view, the other possible
lower-level references have been disconnected from that particular higher-level
control's output.

There is a certain quantity of heat available in a litre of fuel, whether
you look at it as an component of the fuel ("caloric", wasn't it?) or as
the potential to make atoms and molecules vibrate. But you get a much
wider range of applicability of the ideas if you take the microscopic
view. Lots of times, the caloric view gives the wrong answer, as compared
to the few times it gives the right answer. Likewise with "reinforcement."

As I said, maybe this is totally screwy. If so, I plead jet-lag. If not,
I'll claim responsibility later.

Anyway, the whole discussion has been fascinating to skim through.

Martin