Info theory; nonlinear models

[From Bill Powers (921221.1500)]

Martin Taylor (921221.1200)

Martin, our discussion of information theory and PCT seems to be
flying apart into very strange pieces. I don't follow your
reasoning about information flow or channel capacity in a control
system at all. If you want me to understand, you're going to have
to do a lot more specific spelling-out of what you mean.

In my last post and your answer the following exchange occurred:

You:

The central theme of PCT is that a perception in an ECS should
be maintained as close as possible to a reference value. In
other words, the information provided by the perception, given
knowledge of the reference, should be as low as possible.

Me:

I think you'd better take [that one] back to the drawing board.
The reference in no way predicts the perception by its mere
existence.

You seem to be taking the position of an external observer who
has one probe on the reference signal and another on the
perceptual signal. Knowing that a good control system is acting,
the observer knows that the perceptual signal will track the
reference signal closely, and so is predicted by the reference
signal. I understand this to imply that the perceptual signal
adds little information to what this external observer is already
getting from the reference signal. The same could be said the
other way around: observing the perceptual signal, the observer
knows essentially what the reference signal is doing, and so the
reference signal adds little information to what the perceptual
signal is already supplying.

But the receiver of the information in either case is external to
the behaving system. What does that external receiver's
information input have to do with the properties of the system
being observed? Why should it make a difference in the behaving
system if the external observer uses the reference signal to
predict the perceptual signal, or the perceptual signal to
predict the reference signal? Does the information being carried
in a channel depend on what the external observer is paying
attention to?

If the reference signal and the perceptual signal are both
varying in a pattern that requires a bandwidth of, say, 2 Hz,
doesn't this mean that both signals are carrying information at a
rate corresponding to that bandwidth?

If you want me to understand this, you're going to have to take
it slow and simple. I'm not following you. Today's post just made
the whole thing more baffling to me.
---------------------------------------------------------------- Martin Taylor
(various posts) and
Tom Bourbon (921221.1015) --

Tom, I think that the meaning of your challenge isn't completely
clear to Martin: that is, what you think of as a demonstration
and what he thinks of as one are very different. What you (and I)
want is a program, or at least the design of a simulation that we
could program and run on a computer, which would generate
behavior that can be compared with real behavior. What Martin
seems to think of as a demonstration is showing that a specific
behavior is an instance of a more general class of behavioral
phenomena.

We have to be very careful here not to ignore Martin's complaint,
that it is as hard to get PCTers to listen to information theory
as it is to get conventional journals to listen to PCT. Perhaps
in learning how to understand what Martin is trying to say, we
can also learn something about why we have difficulties in
getting mainstream psychologists to listen to us. I recommend
patience here, and not leaping to conclusions.

Martin, the difference that Tom is talking about, I believe, is
between a descriptive model and a generative model. A descriptive
model provides a general picture of which a specific behavior is
only one example. A generative model actually generates
(simulated) behavior for direct point-by-point comparison with
real behavior. So conceptually, the arrangement from most to
least detail is

   generative model ==> observed behavior ==> descriptive model

I think the different relationships of the two kinds of model to
observed behavior is the source of much of our mutual
difficulties. The generative model is a proposed system design;
it connects components with physical properties (mathematically
represented, but close to the component level) into a system that
behaves as it must according to the design. If the system design
is successful, it will behave like the real system: that is, its
variables will change through time as the same variables do in
the real system.

The descriptive model, on the other hand, is a generalization
drawn from classes of behaviors. It attempts to extract general
principles and laws from the details of behavior. It looks for
truths about behavior that are more general than any specific
behavior.

If these truths are true of observed behavior, they are also true
of the behavior of a generative model that can mimic observed
behavior. That is, for example, if Ashby's law of Requisite
Variety can be shown to encompass certain control behaviors, then
it will also encompass the behavior of a successful simulation of
those behaviors.

I think our problems arise when we try to make one kind of model
work in place of the other. The concept of information is a generalization, not
an explanation. If we begin with the
phenomenon of messages passing between behaving systems, we can
show that those messages carry a certain amount of technically
defined information, dependent on what the receiver wants from
the message. But this tells us nothing about HOW those messages
are generated and received. We can't use information theory to
provide a system design, a generative model, because it is on the
wrong end of the scale of abstraction. Neither can we use the
generative model to provide an analysis of information flow; the
generative model handles physical signals and quantities, and its
specifications say nothing about information.

The PCT model is fundamentally a generative model. As such it is
only partly successful. It will become more successful as we
become able to simulate more and more complex behaviors, thus
showing that the structure of the model is plausible. What we may
guess about higher levels of organization is largely irrelevant
now to the modeling process.

The applications of information theory that are relevant depend
to some extent on the model that is assumed. Given an assumed
model, its behavior or hypothesized behavior can be found
consistent with information theory, and perhaps information
theory will be able to explain why some designs work better than
others. But information theory can only specify requirements --
for example, adequate bandwidth, or decreasing bandwidth at
higher levels. It can't supply the system design at the
generative-model level that will meet those requirements. It
can't specify what information is needed in the generative model,
or how signal paths should be arranged, or what functions should
be applied to the signals.

In the end, the generative model will explain behavior, while
descriptive models show that the behavior thus explained and the
structure of the successful model are consistent with general
laws.

ยทยทยท

-----------------------------------------------------------
Greg Williams (921221) --

What would a PCT model look like for a step tracking task?
Would it be the same as the model used so successfully for
continuous tracking? Would its parameters change with the
amplitude of the step and/or the speed of the step's rise
(actually a ramp, in this case)?

I'm working on an experimental setup (for David Goldstein) that
will partly answer this question. By using a little control
system, the program adjusts the difficulty of a task (by varying
the speed with which a table of disturbance values is scanned)
until a specific amount of RMS tracking error is produced by the
participant. This amount of error is then maintained quite well
in a subsequent one-minute tracking task. The purpose is to
measure parameters of control at standard levels of tracking
error, and also to monitor long-term changes in tracking skill.
  In experimenting on myself, I find that as the amount of mean
tracking error increases from one run to another, the integration
factor in a best-fit model decreases. This is a crude way of
measuring nonlinearity in the overall system response. With large
mean errors, the slope of the output curve flattens out, as we
would expect on neurological grounds (signal saturation).

It may be possible to estimate this nonlinearity and build it
into the output function of the model. Then step-disturbances
could be tried to see if the model approximates real behavior
better than the linear model does.

The chief difference between continuous and stepped disturbances,
which I have looked at, is that the stepped disturbances entail a
transport lag much longer than the lag used to fit continuous
behavior (twice as long). This could be due to nonlinearity in
the output function, or (more likely, I think) to higher-level
control systems being involved. After a relatively long period of
no disturbance, a sudden step disturbance seems to surprise the
system, so it doesn't start tracking right away. If you begin
even a continuous-disturbance run with a few seconds of zero
disturbance, there is a longish lag, 250 milliseconds or so,
before tracking actually starts after the first significant
amosunt of disturbance appears. But once it has started, the lag
drops to 100 milliseconds or so. So perhaps the 250 milliseconds
includes the normal 100 milliseconds, plus another 150 for a
higher-level system to turn the tracking system on.

Should be easy for somebody who can predict for One whole
minute, right?

Yeah. When are you going to do it?
-------------------------------------------------------------
Best to all,

Bill P.