Taylor on PCT and SR, and more

[From Bill Powers (940409.0600 MDT)]

Martin Taylor (940408.1110) --

RE: not the same p

To see the problem, remember that f(r-p) might be (and often
is) an integrator, perhaps a leaky one, but nevertheless a
function that depends on past values of p as well as present
(almost--it can never be really the present) values. The
function f is a time function. For that matter, so is g(o),
which I take to be the feedback function, the way you used it.

You're confusing three meanings of "not the same p." One is that in
a real system, the p that results from o + d occurs at a slightly
later time, so p(t) = o(t-tau) + d(t-tau). Another is that due to
our iterative computer program, an additional delay dt is
introduced, where dt is the physical time represented by one
iteration: thus

p(t) = o(t - dt - tau1) + d(t - dt - tau2)

And the third, the one of which you speak, is that the total value
of p may be a time integral accumulated over the entire time of the
run:

p(t) = INTEGRAL[o(t - dt - tau1) + d(t - dt - tau2)].

However, what Rick wrote was not incorrect, either, in its use of p:

p = o + d

o = f(r-p)

When you substitute the p in the first equation into the second
equation, it is literally the same p (at the same place and time)
that is represented in both equations -- otherwise the substitution
would be invalid:

p(t) = o(t-dt - tau1) + d(t-dt-tau2)

o(t+dt) = f(r - p(t))

where the function f is whatever defines the output function. If you
have an integrator being simulated digitally, the second equation is

o(t+dt) = o(t) + f(r - p(t))

An integration does not give past values of the error an effect on
the present value of the output. It simply adds the present effect
of the error to the present value of the output to create a new
value of the output. The observer must use a consistent reference-
point for time.

···

---------------------------------------------------
RE: why fuss about information in perception about disturbance

(1) a fear that if it happens to be true that information about
the disturbance appears in perception, then FROM A PROPAGANDA
POINT OF VIEW, PCT becomes harder to distinguish from S-R
approaches to psychology.

In S-R psychology, a stimulus is a variable that can be manipulated
independently of the behavior of an organism. Outside psychophysics,
the question of how a stimulus affects the nervous system never
comes up: a stimulus is defined sufficiently if manipulating the
physical situation embodying it causes a correlated change in
behavior.

The PCT definition of disturbance was chosen in part to provide a
mapping of PCT onto S-R phenomena. While there is no counterpart of
a controlled variable in S-R theory, there is a counterpart of the
disturbance: it is what is called the stimulus. In S-R psychology it
is tacitly assumed that if there is an observed effect of a stimulus
in producing a response, the manipulated variable must have directly
affected the senses of the organism, causing neural activity which
in turn caused the response. Counter to that conception, PCT offers
the explanation that in general a stimulus is simply a physical
variable linked in some way, through physical laws, to a variable
that an organism is controlling. Manipulating the stimulus tends to
alter the controlled variable, and the action observed as a response
has the effect of applying an equal and opposite influence to the
same controlled variable. So the action of the system is actually
_preventing_ the so-called stimulus from having any significant
effect on what the organism is sensing and controlling. Furthermore,
any manipulation of the environment that produced the same tendency
to change the controlled variable in the same way would result in
the same action preventing the change.

At the time you and your colleagues entered this discussion, we had
defined "disturbance" in one particular way: a disturbance is an
independent physical variable distinct from the controlled variable,
linked to the controlled variable through some function expressing
physical laws. For example, the brightness of a light bulb expressed
in foot-candles or watts disturbs the controlled illumination of the
retina through an inverse-square law. The disturbance is measured in
units of brightness or wattage of the light bulb. This made
"disturbance" equivalent to "stimulus."

You and your colleagues, however, immediately substituted another
definition: a disturbance is a change in the controlled variable.
All of your arguments were designed to prove that given a change in
the controlled variable, information about that change would appear
in the perceptual signal. I attempted to say on many occasions that
this is not what we meant by a disturbance; that perceptions
contained no information about disturbances as we used the term, and
by their nature could not. A control system had no way of knowing
what physical processes lay between a cause of a disturbance and its
ultimate effect as a change in the controlled variable. It had no
way of knowing how many disturbing causes were acting at the same
time, or through what individual paths. This was important in
differentiating PCT from S-R theory, because if what the organism
was actually reacting to, a change in its own controlled variable,
could not be traced to any unique event in its environment, it made
no sense to say that behavior was "caused" by "stimuli." To use a
different definition of disturbance would only tend to suggest the
opposite of what we were trying to explain.

But you brushed our definition aside, pointing out that the actual
cause of "the disturbance" didn't matter, and that we could speak of
the net _effect_ of the disturbance without mentioning causes -- and
indeed, call these net effects the disturbance. That rendered our
argument irrelevant to S-R theory because it completely ignored what
S-R theory calls a stimulus. So we objected.

This objection did us no good, however, because you had reduced the
problem to one with which you were familiar and were determined that
those were the terms in which everyone would have to deal with the
question of information in the perception about the disturbance. I
attempted many times to point out that you had trivialized the
problem by changing a basic and important definition, but you simply
insisted that your definition was the right one. Even when you said
that OF COURSE you didn't believe that the perception contained
information about the CAUSE of the disturbance, you were insisting
on your definition, in which the disturbance is the EFFECT on the
controlled variable, and rejecting ours, in which the disturbance is
precisely the cause.

In desperation, and not really realizing what you were doing, I then
said OK, let's call the disturbance the net influence acting
proximally on the controlled variable, and forget about distal
causes. The perceptual signal _still_ doesn't contain information
about the disturbance, even the proximal disturbance. The reason is
that the controlled variable is something being affected by that
proximal cause through some physical law, as the position of an
object is affected by forces through laws of inertia. The situation
is exactly as if we were talking about a distal variable and the
laws through which its state could affect the state of the
controlled variable. The "proximal variable" or "net force" argument
was just a smoke-screen.

This argument got nowhere with you. I could not get across that if
the perceptual signal represented the _position_ of an object, it
did not also represent the _mass_ of the object or the _force_
applied to that object. It represented only the position, the
controlled variable. If the perceptual signal did not represent the
net force at all, how could there be information in the perceptual
signal about the net force? The same "information" would appear for
a large force applied to a large mass, or a small force applied to a
small mass.

In fact, I tried to show, the ONLY information that exists in the
perceptual signal is information about the controlled variable
itself: the single dimension of variation of the environment that is
under control. Of course if THAT is what the information in the
perception is about, the controlled variable itself, then there is
nothing to discuss. Nobody would claim that the perceptual signal
did not represent the controlled variable and do so quite
accurately.

This argument had absolutely no effect. It was as if I hadn't
uttered a word. It was then that I began to lean toward Rick's point
of view: it simply made no sense to you that a control system could
control a variable without knowing ANYTHING about what was causing
that variable to change. Somehow, some usable information had to be
getting into the control system from outside it that would tell the
control system what it had to do.

Yet this is the fundamental theorem of PCT: a basic control system
can control an environmental variable by means that is completely
independent of what is causing that variable to change. The
operation of a control system is completely explained by knowing the
properties of the functions in the loop and how they are connected.
A control system is defined by functions and connections, not by the
behavior of signals or disturbances. It is totally indifferent to
the causes of changes in the controlled variable, to their
waveforms, spectral distributions, or probability distributions. It
makes no adjustments at all based on changes in those properties of
a disturbance. All of its behavior is based on the behavior of the
controlled variable, and on nothing else. And the WAY it behaves is
determined by the fixed physical properties of its components.

If a known disturbance is acting through a known path, we can
calculate its correlation with the controlled variable. There will
be some correlation because control is imperfect. But the better the
control, the lower the correlation will be, not because of
information flow or any such explanation, but because the best
control system can sense the smallest deviations of its controlled
variable from the reference level and act to keep them from getting
larger -- using ONLY the information in the controlled variable. It
does not matter why the controlled variable changes, because the
system itself senses that variable and acts directly on it. Nothing
else is required.

A control system can operate successfully in an environment where a
hundred simultaneous causes are acting on its controlled variable in
completely unpredictable ways, through a thousand unknown physical
linkages. When we introduce a large known disturbance into this
process, we see some correlation with the behavior of the system,
but in general this correlation will be low. It will be low because
we don't have any _a priori_ way of knowing what the system is
controlling, or relative to what reference level, so we have no way
of knowing what our "stimuli" are disturbing or what side-effect of
the action we see is actually critical for maintaining control. The
most important thing to understand about a control system is that
only the controlled variable matters to it. Our external
manipulations are, in themselves, of no concern at all, and of all
the effects of the actions of the system that we observe, only those
that help to control the controlled variable are of any concern to
the controlling system.

Beyond this I don't know what to say. I think that to characterize
our arguments as propaganda intended to emphasize differences with
S-R theory that really don't exist is not only tactless, but
indicative of failure to understand why PCT is constructed as it is.
Not that all of us have any complaints coming when the subject is
tactlessness.

Your arguments about information theory would be more interesting if
they were based on work you had actually done. But most of what you
say is based on intuitions, which in turn seem to be guided by
principles at variance with PCT. It's simply not very convincing to
be told that such-and-such an analysis COULD be done, or that such-
and-such a result WOULD be found -- if the work of deriving it were
actually done. Your track record with respect to predictions so far
is not exactly outstanding. This could be taken as evidence that
there is something wrong with the world-view that is guiding your
deductions. For it is clearly a world-view, and not analysis of real
data, on which your predictions have been based.
----------------------------
RE: nonlinear systems

Bill P and others know very well how it works from a signal
processing viewpoint, but that viewpoint runs into difficulties
with non-linear systems. Simulations and demonstrations
provide much insight into how these more difficult systems
actually work, and under what conditions they run into
problems, but these insights are hard to derive or to represent
analytically.

The Little Man contains some marked nonlinearities; in "Spadework" I
developed an analysis for handling quasistatic nonlinear systems,
and also demonstrated a model matching real behavior when the
external feedback functions was a cubic with a reversal of slope in
the middle. I have not exactly ignored the subject of
nonlinearities.

As to "analytic" representations, they are generally not as
informative as simulations in real situations. What usually happens
is that the analysis simply can't be done using the properties of
the actual system; nobody knows how to solve the equations. So what
is done is to propose another system that bears some resemblance to
the real one, but which can be handled analytically. The imaginary
system is analyzed, and its properties are then assumed to apply to
the real system. The room for error is quite large, at least as
large as it is in a simulation that can use the actual relationships
found in the real system without having to linearize or otherwise
idealize them. If you want the truth about how a system behaves,
simulate it.
-----------------------------------

Information analysis provides a different viewpoint that in no
way conflicts with the signal-processing viewpoint, and that
can be used for complex hierarchies of nonlinear control
systems.

When do we get to see this actually demonstrated?

It asks and answers questions about the systems that may be a
little different from those asked and answered by simulation or
signal-processing analysis.

I haven't seen it answer any questions yet.

But it should never provide a different answer when the two
approaches deal with the same question.

Maybe it shouldn't, but how do you know it doesn't?

In spite of this, informational analysis has been repeatedly
cast into an antagonistic role, for reasons that I suspect have
to do with issues (1) and (2).

Since your protests are based on what IT could hypothetically do,
and not on demonstration that the antagonisms are unjmustified, they
lose some of their force.

I have explained why issue (1) arises. It arises from your
insistence on using a non-PCT definition of the disturbance.

Issue (2) arises for the same reason as well as others. The only way
to reconstruct the disturbance from the reference signal is to use
an analytical model and know everything about the system and its
environment except the value of one variable. That is not only a
trivial demonstration, but it makes no use whatsoever of information
theory.

My intent was to demonstrate that if the perceptual signal
passed ALL the information about the disturbance, as
demonstrated by the fact that it could be used to replicate the
disturbance, then it was certain that SOME of the information
about the disturbance was passed through the perceptual signal.

How can you say that a signal representing the difference between
two variables passes "ALL" the information about either of them? If
the two variables are always equal, it passes no information at all
about them, because it is always zero.

And my objection was that you used a great deal more information
than could possibly have been passed into the perceptual signal:
namely, the form of the function connecting the disturbance to the
controlled variable, the physical properties of the controlled
variable that turn proximal effects of the disturbance into changes
in the controlled quantity, and the forms of all the functions in
the control system as well as the setting of the reference signal.
It was all this information, in addition to knowing the perceptual
signal, that enabled you to reconstruct the disturbing variable. If
even one part of that other information had been missing, knowing
the behavior of the perceptual signal would have done you do good at
all. And even that reconstruction was posited on knowing that only
one disturbance was acting.

Furthermore, you made no use whatsoever of "Information" in the
perceptual signal: i.e., the calculated reduction in uncertainty in
d given p. You have not established that that special mathematical
definition of the term "Information" has anything to do with the
analytical solution of the system equations. You could just as well
call the results of this calculation a measure of "flenishness."
When we did this calculation with real data, we got a number like
1.2. What does "1.2" tell us about the nature or behavior of the
disturbance? Nothing. Yet that was the amount of flenishness in the
first derivative of the perception "about" the first derivative of
the disturbance. Even "correlation" would have a clearer meaning --
and we know that the existence of a correlation is not what makes a
control system work. In fact, the greater the correlation of p with
d, the worse the control system is working. That's actually true of
flenishness, too. So how come the control system works worse when it
gets more "Information" about the disturbance?

I'm sorry, Martin, but so far all I have seen from information
theory is a mathematical game that starts in the middle of the air
and ends in the same place. I can learn to play that game on an
elementary level, but having played it I find that I don't know any
more about control systems than I did before. The only real
application I have been able to imagine, studying the difference
between the expected behavior of a model and the actual behavior of
a person, you have rejected. You won't settle for less than having
information theory explain the whole thing, which as far as I can
see it can't do. Maybe your only choice, other than giving up on
information theory, is to do what you claim can be done.
-------------------------------------
RE: Big picture

If you then add an integrating output function, still with no
delays, the prediction comes to within about 5 percent of the
actual behavior, so now we have accounted for 95% of what we
observe.

As soon as you have put in the integrating function, you HAVE
the delay, and the algebraic equations that ignore time are no
longer valid.

An integrator is not a transport lag: the value of the integral
begins to rise the instant the input becomes nonzero, with no delay.
The output of a transport lag only begins to rise a fixed time after
the input begins to rise, and all frequencies are passed through
with equal amplitude. A pure integrator does not create "infinite
delay." It just makes the rate of change of output a constant for a
constant input.

The algebraic equations remain valid as the limiting case for a
zero-frequency or constant disturbance. They are the steady-state
solutions of the system's differential equations.

In the expression that you deride as "having fun" because it
uses convolution instead of algebraic multiplication, the
function f(t) is unity for all t greater than zero.

How was I to know you were using a trick convolution? In general,
the values of f(tau) in a convolution g(t-tau)*f(tau) start large
and fall off to zero after some finite value of tau. The "smear"
depends on the form of f(tau). The form of convolution you suggest
is just a pure integrator, anyway (the integral of an impulse is a
step function).

Putting in a single time-lag raises that to perhaps 97%, and >>at

that point I tend to start losing interest.

That's equivalent to setting f(t) = 1 for t>(time lag),
removing the influence of more recent perceptual signals.

Nonsense. Look at the program. The output of the time-lag is simply
the input d iterations ago. This is an f(tau) transfer function with
a single '1' at a certain delay, and zeros elsewhere. This is a pure
lag: the output follows the input as of tau iterations in the past.

You say that removing these values improves the prediction by a
factor of two or thereabouts.

I'm not removing these values; the output of the lag still gets
integrated by the output function. The output is simply the integral
of a delayed value of the input.

That's fairly dramatic evidence that the p on the two sides of
the equation really MUST be different, if getting rid of recent
values improves prediction so much.

It doesn't. It improves the prediction by about 2 or 3 percent. It
reduces the _error_ in the prediction by close to half, but that
isn't what matters. What matters is that the modeled handle and
controlled variables, which were already behaving almost exactly
like the real ones, now behave a little more like them -- not that
you'd notice the difference. That's the BIG picture.

(b) Whether a time delay is important depends on how fast the
disturbance changes, as you yourself have often pointed out.

No, that isn't what I pointed out. It depends on the value of the
integration factor in the loop. The optimum value of integration
factor is set so that a step-disturbance is exactly corrected in one
time-delay around the loop. This has nothing to do with the waveform
of the disturbance. If you make the integration factor more than
twice the optimum, the control system will spontaneously oscillate.
Between optimum and maximum, the effect will be to create overshoots
for any sudden changes of disturbance (or anything). If you give it
too small a
value, the effect will be to make control more sluggish than it
needs to be. These properties exist regardless of the waveform or
bandwidth of the disturbance. The stability of the loop doesn't
depend on what disturbance is applied to it, in a linear system. In
a nonlinear system, the parameters are adjusted for stability in the
worst case, and then the waveform of the disturbance STILL doesn't
make any difference in the stability of the system.

And you can't get much more smeared in time than a single
integrator, so it is unlikely that you will make much
difference to the result by adding other smearing functions--
though it might matter where in the loop you put them, before >or

after the comparator.

Try putting two integrators into the control system, and see if it
doesn't make much difference. You'll create a nice sine-wave
oscillator. I'm afraid that "smearing" is not an adequate term for
describing the effects of integrators or time lags.

You might improve prediction somewhat by reducing the effective
average delay, altering the pure integrator to something that
took less account of long-past perceptual signals, such as a
leaky integrator with a reasonable time-constant.

Might gain another 1/2 percent. I've tried this and couldn't see
much difference (transport lag plus integration constant plus
leakage constant -- three parameters to optimize). How do _you_ know
it would make any difference?

When you plot the maximum error full-scale, you can no longer
judge whether it's an important amount of error or
insignificant.

But evolution can.

Who the &&$%@#? cares what evolution "can" do? God could do it too.
The point is, YOU can't do it. So plotting errors without any scale
reference is a waste of time. You could be staying awake nights over
an error that is 0.1% of the reference signal.

One aspect of notation that is very misleading is its
complexity. A complex expression can be more wrong than a
simple one, if you've chosen the simple one carefully.

True, the key word being "carefully," and with the addition of
"keeping always in mind the effects of the simplification you
have chosen."

Yes, this means doing it the hard way first, satisfying yourself
that it's a waste of time, and then finding a simplification that
still tells most of the truth. That's how I do it. But I admit,
sometimes I pick the simple way if it gives the right answers, when
I can't understand the hard way.

Yah' eh t'eh to Ben Whorf.

'Fraid I missed this one.

Yah' eh t'eh just means "Hi." In Navajo -- didn't you say Whorf
worked with the Navajo?

That's enough for a day, especially since there was a power dropout
and I lost most of the first try (SAVE, dummy!).
---------------------------------------------------------------
Best,

Bill P.

<Martin Taylor 940411 11:11> Armistice time, if not day.

Bill Powers (940409.0600 MDT)

I must say that Bill's 520-line posting took me aback more than somewhat.
Over and over, I found myself saying "Why on earth did he say that?"
What's he going on about? I do not see myself or my postings reflected
in a great deal of the stuff that he includes (of which Rick says "I love
it when you get angy" or some such). I have to conclude that Bill is angry
about something to which he has attached me. Since I don't know what that
is, I can't respond properly.

But I'm not angry. Just disappointed. And I am puzzled that very often
when I post something that supports one of Bill's statements but from a
different viewpoint, Bill responds with a strong contradiction of what
I say. This particular response seems to have been occasioned by my
support of Bill's point that the way we notate the mathematics of a process
has a strong effect on the way we think about the process. It's a point
Bill has often made, and one I think is at the heart of many apparent
conflicts in science. So why this long and somewhat intemperate posting?

···

========================
Part 1: On whether one can legitimately treat as the same the notation "p"
in the two equations:

p = o + d
o = f(r-p)

You can, of course, if your objective is only to make valid the substitution
of the first p in the second equation, but this is only playing games with
mathematics--a pastime Bill suggests I play.

The output is not a function of the current value of the perceptual signal
and the reference signal, except in the special case that it is so defined.
If the output function is an integrator, it can take ANY value, given only
the current values of p and r. But in the first equation, p is the current
value of the perceptual signal. So, if you really want to write the
equations so that "p" is the same in both, and nevertheless have the
equations represent a control system, you have to change "f." You have
to write

o = f((r-p current), (r-p over all past r and p))

Then the question is how important is current "p" in determining the value
of "o". And the answer, if there is ANY transport lag, is "not at all." All
of the current value of o is determined by past values of r and p.

Maybe it would be easier if I repeat with graphs what I was trying to say

<Martin Taylor 940408 17:50>following >>><Bill Powers (940408.0945 MDT)>

The point I originally tried to make <Martin Taylor 940407 19:15>
was that what happens at any function in the loop at any moment depends
not only on the current input to that function (which is always a past
output of some other function) but also possibly on past values of its
input. The operation is always a convolution. In particular, if the
output is an integrator, ALL past inputs to it contribute equally to the
current value of its output.

A proper
notation involves the operation of convolution at each stage, showing how
the effects are distributed over time:

Not o = Ge, but o(t) = G(tg) X e(t-tg) where "X" represents the convolution
  operator. (I apologise for the bastard notation, but I want to show the
  "tg" time reversal of the convolution explicitly, and putting in integral
  signs is too clumsy even for this posting.)

Bill complained about this, and stated:

The
simple algebraic equations (which actually represent steady-state
solutions of differential equations) give a prediction of real
behavior that is within about 10 percent of the actual behavior,
when the constants are adjusted.

Graphically, the contribution of p(t) to the output function in this case
is an impulse function:

                              past perceptual | future perceptual
contribution signals are not used | signals are not used
of perceptual |
signal to present |
output signal ________________________________|__________________
                  past perc present future perc
                                                 perc
                               Time--->

This way of looking at it assumes that the output is affected by the
present perceptual signal and only the present perceptual signal (with,
of course, the present reference signal, which I ignore throughout this
exercise because the same time functions apply to it as to the perceptual
signal if there is a simple subtracting comparator to generate the error
signal).

This arrangement, according to Bill P, accounts for "90 percent of the
problem." (I'm not sure whether this is 90% of the variance or correlation
of 90% between handle and model).

The next stage, according to Bill is

If you then add an integrating output function, still with no
delays, the prediction comes to within about 5 percent of the actual
behavior, so now we have accounted for 95% of what we observe.

Now the graph becomes:

         ----------------------------------------------
contribution |
of perceptual past perceptual | future perceptual
signal to present signals are all used | signals are not used
output signal |
      _________________________________________________|_________________
      past present future
_
This equalization of contributions from all past values of the perceptual
signal is better than taking only the present value. It halves the error.

But the error can be halved again, nearly, by ignoring the present value
and very recent values of the perceptual signal:

Putting in a single time-lag raises that to perhaps 97%, and at that
point I tend to start losing interest.

         ---------------------------------- /////////// future perceptual
contribution past perceptual |/////////// signals are not used
of perceptual signals older than |/////////// and neither are
signal to present the lag time are |///////<-----recent ones up to
output signal used in computing "o" |/////////// the lag time
      _____________________________________|___________._________________
      past present future

I don't think I've been seduced into believing anything. I've taken
lags into consideration and have decided that they make very little
difference in the way real control systems work. The real systems
are so designed that they don't.

The real systems are so designed that they allow the organism using them
to propagate its genes. No more, no less. What this means is that if it
is useful to control a perceptual signal, evolution has provided us with
lags short enough that we CAN control against most disturbances that we
will encounter to that signal. If lags are too long, the control system
is likely to be unstable, and it will not be able in any case to control
against fast-changing disturbances.

Anyway, what all this says is that for REAL control systems, and for models
that provide results that fit their actions well, you cannot use the
simplification of pretending that

p = o + d
o = f(r-p)

describes the relationships of the signals around the loop if p is the
same in both equations and f is a point-time (impulse) function. You have
to treat the values as time-varying waveforms, and allow for convolution
with non-impulse functions (which can include time-lags).

======================
On RE: why fuss about information in perception about disturbance

I tried to put forward three possibilities as to why it causes such
consternation whenever I try to develop the constructs of information
theory in control systems. And it does. I have never been able to get
to the stage of really illustrating the development, because always I
am stopped as soon as I describe the initial concepts. It has seemed
to me pointless to go to stage 2 when stage 1 is still in dispute. That
is why I have been engaging Bill P. in discussion on the topic, rather
that doing it publicly. It is much easier to develop something that
appears complex in cooperation with one person than it is within a
maelstrom of conflicting misunderstandings. But even with one person
success in arriving at a common viewpoint is not assured.

Anyway, I am astounded that you could interpret my posting in such a way
as to generate the following:

to characterize
our arguments as propaganda intended to emphasize differences with
S-R theory that really don't exist is not only tactless, but
indicative of failure to understand why PCT is constructed as it is.

I said that I had seen in recent posting, mainly by Rick:

(1) a fear that if it happens to be true that information about the
disturbance appears in perception, then FROM A PROPAGANDA POINT OF VIEW,
PCT becomes harder to distinguish from S-R approaches to psychology.
Since most psychologists subscribe wholly or partially to the idea that
stimulus (perhaps via cognition) determines response, any such muddling
would be harmful to the cause of getting people to understand and use
PCT. I am most sympathetic to this issue, because I do think it important
that people see the difference. At the same time, the PCT I would like
to propagate is a technically clean version, not one that has been
smudged in areas that are awkward from a propaganda view.

That is in no way close to suggesting that you emphasize differences that
don't really exist. Quite the opposite. It emphasizes the importance
of helping people to see the differences that are real.

The point with which I am in sympathy is that people who don't understand
PCT might be inclined not to see the differences that do exist, if they
take a superficial look at information analyses. Consider--even after
all this time, Rick can't see the difference between my approach to
control using information theory and an S-R approach. If that is so,
then how much more difficult would it be for someone NOT acquainted with
control theory?

I think it is very important for psychologists to understand the differences
between PCT and conventional views. There are enough so-called "feedback"
theories that make the essential differences hard to see, and I have no
wish to contribute to that confusion from another direction. I can see that
developing what I think to be a correct information-theoretic approach
might lead to muddying the very real distinctions, and I appreciate what
I see as a legitimate fear.

That's VERY different from what you seem to have been responging to.

On the word "disturbance."

You spend a long time castigating me for not using "disturbance" in a
specific way. But the way I use it coincides with the wordings that
we worked out over a long discussion period. It is the output of
the disturbance function in your diagram of 940405.1400.

This is a very common initial misunderstanding about what we mean by
"the disturbance." Here is a diagram:

                      ref signal
                         >
                --->|COMPARATOR| -->
               > >
             input output
           function function
              > >
              > feedback |
         controlled<------------ output
          variable function variable
              >
              >
         disturbance
           function
              >
               <-----------disturbing variable

As we agreed, there is no reason to confound the terms "disturbance"
and "disturbing variable." The disturbing variable is out in the
wilderness somewhere unobservable by, the control system. The
"disturbance" is the effect of the disturbing variable on the CEV.
No more, no less. And that may not be the way you used the term before
I joined CSG-L, but you agreed that this usage made a valuable distinction.

Now (940409.0600) you are annoyed because I don't use "disturbance" as
a synonym for "disturbing variable." You spend pages on why now it is
a good thing to have two words for one unimportant aspect of the situation
and no word for an important aspect. And the source of your annoyance seems
to be that our prior discussion pointed out the usefulness of making such a
distinction.

In fact, I tried to show, the ONLY information that exists in the
perceptual signal is information about the controlled variable
itself: the single dimension of variation of the environment that is
under control. Of course if THAT is what the information in the
perception is about, the controlled variable itself, then there is
nothing to discuss. Nobody would claim that the perceptual signal
did not represent the controlled variable and do so quite
accurately.

This argument had absolutely no effect. It was as if I hadn't
uttered a word.

Because it was and is orthogonal to anything I have been trying to
discuss. Orthogonal forces have no effect on a perceptual variable.

Look: if A(t) = B(t) + C(t), the value of A(t0) does not determine B(t0)
or C(t0), but it sure changes your bet about whether B(t0) is likely
to have had a high or a low value. The more you know about B(t0), the
more you therefore know about C(t0) if you have a value of A(t0). We
(Allan and I) STARTED from the position that the only observable was the
CEV. We reiterated time and time again, then as now, that the only
observable was the CEV. Why should your argument that this is the case
have any effect? You preach to the converted. And you totally ignore
all of the arguments that start from the position you think you are trying
to push me into.

According to PCT, p(t) = P(f(o(t)) + h(d(t))), where p is the perceptual
signal, P the perceptual function, f the feedback function, o the output,
h the disturbance function, and d the disturbing variable. This is an
example of a function of the form X = F(S+N). It is wrong to say of such
a function that X _inherently_ is independent of S or of N. It is even
wrong factually of ANY non-degenerate function F (one that changes its
value when its arguments change). And in the control environment, you have
shown numerically that the perceptual signal is NOT independent of the
disturbance signal h(d(t)). So listen to your own argument, and don't
accuse me of not listening to it.

It was then that I began to lean toward Rick's point
of view: it simply made no sense to you that a control system could
control a variable without knowing ANYTHING about what was causing
that variable to change. Somehow, some usable information had to be
getting into the control system from outside it that would tell the
control system what it had to do.

It is really distressing that after all this time, none of my arguments
have had any effect on your understanding of what I have been trying to
do. OF COURSE the control system doesn't need to know what is CAUSING
the perceptual variable to change. But if you want to know how well
the control system will work under various conditions, you'd better
find out how changes in those conditions affect its results. Those
"conditions" are the characteristics of the world in which the control
system operates, namely the disturbance signal, the reference signal
and the feedback function. Those are its world, and they may change.

Judging from past experience it won't do any good, but let's see
where there is agreement or otherwise about your next paragraph:

Yet this is the fundamental theorem of PCT: a basic control system
can control an environmental variable by means that is completely
independent of what is causing that variable to change.

Agreed. That's basic for all the lower levels of control.

However, I'd raise a little red flag here when it comes to the Program
Level. For all levels below that (and possibly above) I agree entirely,
and always have. The reason I wonder (not disagree) about the Program
level is that its perceptions have the form "if X then Y else Z" which
seem always to draw me to think of the program level as CHOOSING which
of two output connections to use. (That's why I see "program level"
when Bob Clark writes "DME").

That means that a program-level control system may be controlling in a
way that is (locally) not independent of what is causing the variable
to change. Maybe I view the program level differently than you do. I'm
certainly somewhat hazy about it.

The
operation of a control system is completely explained by knowing the
properties of the functions in the loop and how they are connected.

Right. Except that I wouldn't use the word "explained," which relates
to other people's understanding. I'd prefer "constrained" or "determined."

A control system is defined by functions and connections, not by the
behavior of signals or disturbances.

Right. The defined control system exists in a world defined (for it) by the
behaviour of signals and disturbances, but is not itself defined by them.

It is totally indifferent to
the causes of changes in the controlled variable, to their
waveforms, spectral distributions, or probability distributions.

"It" is, but the results of its actions--its success or otherwise in
its control task--are not. The results of a particular control system's
behaviour depend critically on those things to which it is totally
indifferent, the independently generated signals to which it is exposed
(though _their_ "causes" are always irrelevant).

It
makes no adjustments at all based on changes in those properties of
a disturbance.

Adaptive control systems do, but fixed ones don't. A minor quibble, but
a possible element of reorganization.

All of its behavior is based on the behavior of the
controlled variable, and on nothing else.

Right.

And the WAY it behaves is
determined by the fixed physical properties of its components.

Right.
---------------
Think a bit more closely about what you say in the following, and see if
you don't want to soften it a little:

A control system can operate successfully in an environment where a
hundred simultaneous causes are acting on its controlled variable in
completely unpredictable ways, through a thousand unknown physical
linkages.

Do you really believe "completely unpredictable" in that? Will it work
if the control system has a loop delay of L, and the disturbance might go
from +R to -R and back in time L/2? (R = control range) That's one
possibility incorporated in "completely unpredictable." I think you
mean "provided that they act in moderately unpredictable ways." And
you can define "moderately unpredictable" precisely, in a linear system
by thinking of variances and spectra, and in a non-linear or even a
symbolic system by thinking of uncertainties and information rates.
Uncertainties and information rates more or less are interconvertible
with variances and spectra in linear systems, which is why I've tried
to deal largely with the latter until we get _those_ results cleared up.

The
most important thing to understand about a control system is that
only the controlled variable matters to it. Our external
manipulations are, in themselves, of no concern at all, and of all
the effects of the actions of the system that we observe, only those
that help to control the controlled variable are of any concern to
the controlling system.

This is another of those always agreed statements.

Your arguments about information theory would be more interesting if
they were based on work you had actually done.

Well, I tried that first. I gave you a theorem, which you said was
nonsense _a priori_ because it talked about information. You didn't try
simulations to see whether I might have been right. You decided I was
wrong because the underlying ideas didn't fit your conceptual foundations.
So I started to try to work on those foundations, so that you could see
what I was talking about when I did demonstrate real results. But so far,
it is clear that we don't even have a common language. That's why I keep
trying to probe deeper, to find out just where the common basis for
discussion sits. As far as the straightforward operation of control
systems goes, I have no problem. I can talk with you on that quite
comfortably. But I can't when it comes to anything dealing with uncertainty,
and I haven't been able to find out why.

I had hoped that we would jointly work out developments. You are a much
better and quicker programmer than me. We would get on much quicker with
analyzing real data if you could understand what I try to get across.
Failing that, I have to try to program myself, with neither the time
nor the expertise to do it well and soon.

If we could get on with analyzing the real data, as I am attempting to do,
you wouldn't make comments like:

For it is clearly a world-view, and not analysis of real
data, on which your predictions have been based.

We have lots of data. We both want it analyzed. I can take months or years,
or we can take weeks or days.

----------------------
RE: nonlinear systems

Bill P and others know very well how it works from a signal
processing viewpoint, but that viewpoint runs into difficulties
with non-linear systems. Simulations and demonstrations
provide much insight into how these more difficult systems
actually work, and under what conditions they run into
problems, but these insights are hard to derive or to represent
analytically.

The Little Man contains some marked nonlinearities; in "Spadework" I
developed an analysis for handling quasistatic nonlinear systems,
and also demonstrated a model matching real behavior when the
external feedback functions was a cubic with a reversal of slope in
the middle. I have not exactly ignored the subject of
nonlinearities.

I think if you read what you quoted of what I wrote, I said "simulations
and demonstrations provide much insight into how these more difficult
systems work." Is that not a compliment to you on your simulations and
demonstrations? Did I imply that you have ignored the subject? Hardly!!

As to "analytic" representations, they are generally not as
informative as simulations in real situations. What usually happens
is that the analysis simply can't be done using the properties of
the actual system; nobody knows how to solve the equations.

Yes, that was indeed my point.

If you want the truth about how a system behaves, simulate it.

Yes. And if you want the truth about how a CLASS of systems behave,
or how A system behaves under a CLASS of environments, you can do that
for each member of the variable class, or you can try to find an approach
that parameterizes the situation usefully. And that doesn't always mean
linearizing it. You can't linearize when the feedback function contains
bifurcations, for example, which is the case in most real-world situations.
Simulations with Monte-Carlo seetings for the class parameters sometimes
may be the only way to go.
-------------------

Information analysis provides a different viewpoint that in no
way conflicts with the signal-processing viewpoint, and that
can be used for complex hierarchies of nonlinear control
systems.

When do we get to see this actually demonstrated?

When we get a mutual understanding of what we are talking about, rather
than long passionate diatribes about how wrong the whole notion HAS to be.

------------------

I have explained why issue (1) arises. It arises from your
insistence on using a non-PCT definition of the disturbance.

I think I have shown that this statement is false.

Issue (2) arises for the same reason as well as others. The only way
to reconstruct the disturbance from the reference signal is to use
an analytical model and know everything about the system and its
environment except the value of one variable. That is not only a
trivial demonstration, but it makes no use whatsoever of information
theory.

You see? Even with my renewed attempt at a really careful presentation
of the reason for the reconstruction demonstration, and my renewed
presentation of why the argument is important, you STILL make this totally
irrelevant comment. Here's another attempt. I've lost track of how
many different ways I've tried to get this across, and doubtless this
one won't be the last.

Questions that have to be asked:

(1) Does knowledge of the form of the output function convey any
information about any value of the disturbance waveform?

(2) Does knowledge of the form of the feedback function convey any
information about any value of the disturbance waveform?

(3) Does knowledge of any parameter of the output function convey any
information about any value of the disturbance waveform?

(4) Does knowledge of any parameter of the feedback function convey any
information about any value of the disturbance waveform?

I think (I sure hope) you would answer all these questions "No."

(5) Does knowledge of the perceptual signal, the output function form
and parameters, and the feedback function form and parameters convey
any information about the disturbance waveform?

If the answer to (5) is "Yes" then, because the answers to (1) to (4)
are "No," the answer to (6) HAS TO BE "Yes."

(6) Does knowledge of the perceptual signal waveform convey any information
about the disturbance waveform?

And my objection was that you used a great deal more information
than could possibly have been passed into the perceptual signal:
namely, the form of the function connecting the disturbance to the
controlled variable, the physical properties of the controlled
variable that turn proximal effects of the disturbance into changes
in the controlled quantity, and the forms of all the functions in
the control system as well as the setting of the reference signal.
It was all this information, in addition to knowing the perceptual
signal, that enabled you to reconstruct the disturbing variable. If
even one part of that other information had been missing, knowing
the behavior of the perceptual signal would have done you do good at
all.

It was shown that under the conditions of question (5), the disturbance
signal could be reconstructed exactly, so the answer to (5) is "Yes."
Since no part of that information comes from knowledge of the output
and feedback functions, it all has to come through the perceptual signal.
It didn't (and doesn't) matter in the slightest what else one had to
know, PROVIDED that the extra stuff conveyed no information about the
signal in question--the disturbance signal. No static parameter could
possibly convey information about a time-varying waveform.

And even that reconstruction was posited on knowing that only
one disturbance was acting.

You knew then and know now very well how we were using "disturbance."
(See above).

If you remember, Allan and I thought the demonstration too trivial to
even bother performing, but Rick (and I think you) insisted that our
intuition was wrong, and that the reconstruction couldn't work, so it
clearly was not trivial to everybody. Trivial or not, it works, and
the implications cannot be rationally controverted, so far as I can see.
So long as the output function and the feedback function are fixed, ALL
the information in the disturbance is retrievable from the perceptual
signal AND IS RETRIEVED by the actual output and feedback functions
during the normal operation of the control system.

How can you say that a signal representing the difference between
two variables passes "ALL" the information about either of them? If
the two variables are always equal, it passes no information at all
about them, because it is always zero.

This, if you follow through the operations of the control system, is
equivalent to saying 0/0 gives an undefined result. Of course it does,
but if 0/0 is the limiting case of F(x)/G(x) as x goes to infinity (or
some other limit), then the result is often taken to be the value of
that limit. Sin(x)/x is undefined when x = 0, but has a limit of 1 as
x approaches zero.

I think my answer to your comment is either "so what--real control systems
can't make the output equal to the disturbance so the point doesn't apply"
or "what are you getting at?"

Furthermore, you made no use whatsoever of "Information" in the
perceptual signal: i.e., the calculated reduction in uncertainty in
d given p.

Allan gave you all that. Anyway the answer is that I(d|p) = U(d). In
other words, the calculated reduction is the prior uncertainty about d.

Even "correlation" would have a clearer meaning --
and we know that the existence of a correlation is not what makes a
control system work. In fact, the greater the correlation of p with
d, the worse the control system is working. That's actually true of
flenishness, too. So how come the control system works worse when it
gets more "Information" about the disturbance?

It doesn't GET more information about the disturbance when it works worse.
It GETs less. What is more is the relation between present values,
whether you measure it as correlation or as information. The faster
and more accurate the use of the information by the output, in countering
the disturbance, the less you see in the perceptual signal. This is basic,
isn't it? I don't know why you get so hung up on this point. You always
say it backwards, it seems to me.

I'm sorry, Martin, but so far all I have seen from information
theory is a mathematical game that starts in the middle of the air
and ends in the same place.

Well, I'm sorry, too, because I've tried in a variety of different ways
to ground your balloon, but every time we get close to catching a landing
line, you give it another blast of hot air, and away we go again. The
ground is there, even if you keep thinking I'm trying to haul you down
into a low-lying fog.

------------------
RE: Big picture

As soon as you have put in the integrating function, you HAVE
the delay, and the algebraic equations that ignore time are no
longer valid.

An integrator is not a transport lag: the value of the integral
begins to rise the instant the input becomes nonzero, with no delay.
The output of a transport lag only begins to rise a fixed time after
the input begins to rise, and all frequencies are passed through
with equal amplitude. A pure integrator does not create "infinite
delay." It just makes the rate of change of output a constant for a
constant input.

There's a terminological confusion involving the words "delay" and "lag"
and I can never remember which is conventioanally which. However, that
doesn't account for your self-contradictory last two sentences, since
you have used "lag" to mean a finite transport time, so "delay" presumably
means delay of effect. If the output of the integrator rises at a
constant rate for a fixed input, the average delay goes to infinity
as the duration of the integration goes to infinity. At any moment,
the average delay of effect on the output of the input at any earlier
moment is half the time since the input at that earlier moment. So the
average delay of the effect of the first moment of the input after
the move to the constant value is half the length of the constant
portion of the step.

The algebraic equations remain valid as the limiting case for a
zero-frequency or constant disturbance. They are the steady-state
solutions of the system's differential equations.

That, I'll go along with.

In the expression that you deride as "having fun" because it
uses convolution instead of algebraic multiplication, the
function f(t) is unity for all t greater than zero.

How was I to know you were using a trick convolution?

What was the trick? I tried to point out that ALL the stages around
the loop involve convolutions. I am aware of no trick. Convolution is
convolution, and I know you understand it well.

In general,
the values of f(tau) in a convolution g(t-tau)*f(tau) start large
and fall off to zero after some finite value of tau. The "smear"
depends on the form of f(tau). The form of convolution you suggest
is just a pure integrator, anyway (the integral of an impulse is a
step function).

Yes. That's why I pointed out that by introducing an integrator, you had
reached the maximum possible smearing of the effect of any particular
moment of the perceptual signal. I didn't suggest using an integrator.
You did. I just used it in the convolution.

Putting in a single time-lag raises that to perhaps 97%, and
at that point I tend to start losing interest.

That's equivalent to setting f(t) = 1 for t>(time lag),
removing the influence of more recent perceptual signals.

Nonsense. Look at the program. The output of the time-lag is simply
the input d iterations ago. This is an f(tau) transfer function with
a single '1' at a certain delay, and zeros elsewhere. This is a pure
lag: the output follows the input as of tau iterations in the past.

You said that you were putting in the lag WITH an integrator, not
substituting the lag for the integrator.

I'm a bit lost in what comes next:

You say that removing these values improves the prediction by a
factor of two or thereabouts.

["These values" being the most recent values of the perceptual signal].

I'm not removing these values; the output of the lag still gets
integrated by the output function. The output is simply the integral
of a delayed value of the input.

If that isn't removing the most recent values of the input, I don't
know what it is. If you delay them, they haven't got there yet to
be used. So I don't know what you mean.

That's fairly dramatic evidence that the p on the two sides of
the equation really MUST be different, if getting rid of recent
values improves prediction so much.

It doesn't. It improves the prediction by about 2 or 3 percent. It
reduces the _error_ in the prediction by close to half, but that
isn't what matters.

Most people would think that is precisely what matters. If dozens of
differently parameterized control systems can get within 5%, but most
fail at 2%, doesn't this tell you something about the relative likelihood
of the different models being right? To me, that's what matters, not the
fact that control systems control. How much more evidence of that do you
want?

What matters is that the modeled handle and
controlled variables, which were already behaving almost exactly
like the real ones, now behave a little more like them -- not that
you'd notice the difference. That's the BIG picture.

(See below for comment on this point).

(b) Whether a time delay is important depends on how fast the
disturbance changes, as you yourself have often pointed out.

No, that isn't what I pointed out. It depends on the value of the
integration factor in the loop.

For a fixed control system, you have indeed often pointed out that
fast disturbance changes can't be countered if the loop delay is too
long--in both private and public messages and in print. Not that it
matters, since it is a point well known to any control engineer.

Try putting two integrators into the control system, and see if it
doesn't make much difference. You'll create a nice sine-wave
oscillator. I'm afraid that "smearing" is not an adequate term for
describing the effects of integrators or time lags.

Of course it isn't adequate. It was never supposed to be. All I was
trying to show was that you can't use the same value of "p" in the
two equations we started with. There's no phase information in the
simple "smearing" account. It only deals with how important any past
value is in accounting for present values. In an integrator, all past
values are equally important. In real systems, they probably aren't.
And if you want to know how the system really will perform, you'd better
worry about relative phase.

You might improve prediction somewhat by reducing the effective
average delay, altering the pure integrator to something that
took less account of long-past perceptual signals, such as a
leaky integrator with a reasonable time-constant.

Might gain another 1/2 percent. I've tried this and couldn't see
much difference (transport lag plus integration constant plus
leakage constant -- three parameters to optimize). How do _you_ know
it would make any difference?

"You might" doesn't seem to me like a claim of knowledge--at least it
wasn't intended that way. It was a suggestion to try it out, which you
have done, apparently with some but not much of the suggested improvement.

When you plot the maximum error full-scale, you can no longer
judge whether it's an important amount of error or
insignificant.

But evolution can.

Who the &&$%@#? cares what evolution "can" do? God could do it too.

I should have thought it the MOST important thing to care about. Most
of life is played out in an environment containing competition. Better
controllers win, in an otherwise equal competition. But not at the
cost of depleting their resources. So it is very important to control
at the right level of error.

Take formal games as an analogy. It makes a very great deal of difference
whether a baseball player can perceive the location of a ball to within
a meter, a centimeter, or a millimeter vertically--a few million dollars
difference, actually. But anyone can have the same range of heights over
which they can see that a ball exists. However, it makes no difference
to the ballplayer as to whether he can see a micron's difference, and
he can't. If 100 m is a reasonable extreme range for the possible perceived
height of a ball, the important thing to the ballplayer is whether he
controls at .01% or .001%. And he couldn't care less about whether he
could see the ball at 100 m high. Nobody hits it that high. He cares
about that millimeter or two in the middle of the cylinder of the bat.

The point is, YOU can't do it. So plotting errors without any scale
reference is a waste of time. You could be staying awake nights over
an error that is 0.1% of the reference signal.

And I might be dying because I didn't correct that error.

(I started this at 11:11, and it is now almost 18:00. I hope the day
hasn't been totally wasted, because even with 2 weeks between sleep
study runs, there isn't time for much of this.)

Martin