Information; controlling effects

[From Bill Powers (930622.0730 MDT)]

Martin Taylor (930621.1630 EDT) --

I'll simply repeat what I have said several times before. The
only argument (initially) was that it was inconsistent to say
two things at the same time:

(1) that the output signal exactly mirrors the disturbance
signal (a claim made very strongly by Rick and you in many
postings), and

(2) there is no information about the disturbance in the
perceptual signal (a claim made even more strongly,
particularly by Rick).

These arguments are indeed inconsistent; when they are stated
without qualification they contradict each other. The cause of
the difficulty, for which both Rick and I are responsible, is
that these qualitative statements are derived from a quantitative
analysis in which a parameter, G/(1+G) where G is loop gain, is
allowed to go to the mathematical limit of 1 to yield an
approximate solution (i.e., G goes to infinity). Furthermore,
even the approximate solution depends on an assumption that had
been stated but tends to get left behind: that the bandwidth of
the disturbance is far less than the cutoff frequency of the
various components in the control loop. Strictly speaking, the
above two statements are true only of a system with infinite loop
gain presented with a constant disturbance.

The conditions you assumed in showing that these statements are
inconsistent do so precisely by violating to the maximum degree
possible the assumptions that led to our approximations. Instead
of using a constant disturbance, you used a step-function to show
that the disturbance must have an effect on the perceptual
signal. A step function contains an instantaneous transition,
which is guaranteed to be far outside the bandwidth of any
physical control system.

Our statements and your counterstatements depend on adopting
assumptions that go to opposite extremes and then making
qualitative statements about the consequences. If the arguments
then proceed on that qualitative basis, as they have done, the
only possible result is a barrage of attacks and denials. In
fact, your argument is unable to handle the case of a constant
disturbance, while ours is unable to handle the case of a high-
frequency disturbance. There is no way to reconcile conclusions
based on such extreme opposite assumptions.

···

------------------------------------
To say that there is NO information about the disturbance in the
perceptual signal, whether one defines information formally or
informally, is clearly wrong. After a step function, the change
in the perceptual signal represents the change in the disturbance
at least for some short time after the step. The degree of
resemblance depends on the rise-time of the input function and
the time-scale on which we observe the signal (the rise-time of
the measuring instrument).

If there were no feedback connection, the perceptual signal would
simply rise to a new value and stay there, the new value being
proportional to the disturbance (in a linear system) and
accurately representing the magnitude of the disturbance.
However, with the loop closed and an integrator in the output
function, the initial value of the perceptual signal will exist
only for an instant, after which the signal will decline
exponentially to zero (we assume zero reference signal). The
speed of the exponential decline will be determined by the
constant multiplier in the output integration (or by the
amplification of the error signal prior to the integration). This
exponential decline of the perceptual signal clearly does not
represent what the step-disturbance is doing; the disturbance
remains constant while the perceptual signal disappears,
returning eventually to the same value it had prior to the step.
After the step has occurred, therefore, the perceptual signal
rapidly loses its similarity to the disturbance.

This is a time-domain representation of the difference between
our positions. Yours is based on the state of affairs immediately
after occurrance of the step. Ours is based on the state of
affairs long after the step. In your case, the change in the
perceptual signal is a reasonably faithful rendition of the
amplitude of the step and is essentially unaffected by the
reference signal. In our case, the perceptual signal bears no
relationship to the amplitude of the disturbance and represents
_only_ the setting of the reference signal.

So whether the perceptual signal in any sense contains
information about the state of the disturbing variable depends on
when, in relation to the time of occurrance of the step, you
observe the perceptual signal. The longer you wait to observe the
perceptual signal, the less able you are to deduce the state of
the disturbance from it.
-------------------------------------------
All of this suggests a way to apply information theory that has a
point of contact with PCT. Instead of using a step-function, use
an impulse disturbance. With an impulse disturbance and no
feedback connection, the perceptual signal will rise and fall in
a way typical of the impulse response of the input function. The
error signal, which we can assume to be instantaneously computed,
follows that waveform. With the loop open in the environmental
path, the integrator output will show a transition from an
initial steady level to a new steady level.
When the loop is closed, the perceptual signal will show the
impulse-response of the closed-loop system.

Now applying superposition (this is a linear system), the dynamic
response of the system can be built up by representing any
arbitrary disturbing waveform as a series of impulses of
appropriate amplitude and sign, and superimposing the smooth
responses to the successive impulses. Therefore if you can
analyse the behavior of the loop for a single impulse
disturbance, you can sum over the set of overlapping impulse
responses to get the continuous behavior of the system. This is a
well-known method called convolution, with which I'm sure you are
at least acquainted. It bears a strong resemblance to your
approach that assumes sampling, but it straddles the pure
"sampling" concept and the pure continuous-variable concept.
While the input is treated as a purely discrete sample, the
output is a continuous waveform. The waveform of the perceptual
signal after an impulse disturbance of the integrating control
system would be of the form exp(-kt)*[1 - exp(-kt)]: the product
of a rising and a falling exponential. It seems to me that this
would provide a much smoother transition from the world of
strictly discrete calculations to the kinds of continuous
analysis we use in PCT. And I would surely understand it better
than I understand Shannon. An added advantage is that I already
have a nonanalytical method for directly deriving the impulse
response of a real person's control system from the continuous
record of performance (assuming linearity).
---------------------------------------------------------------

How is it possible for you to take my discussion of the
time-dependent increase in resolution in observations of low-
bandwidth signals, and come up with:

If the noise is 10% RMS of the range,
Martin's intepretation would be that there are only 10
possible values of the perceptual signal with a range of 10
units, 0 through 9.

Easy. You said that the equivalent of r (in D/r) for a continuous
system with noise would be the RMS noise level. If the RMS noise
level is 10% of D, then D/r is 10, and the probability of any one
measure would be r/D or 1/10. In the discrete case, D/r
represents the number of possible values of d. Thus there are D/r
= 10 possible values of d (or some signal) in the analog case,
too, if you're computing probability the same way.

That violates high-school electricity, let alone what a
graduate engineer would think ...

Yes, but because of what I know about electricity, not because of
what I know about information theory. I can already arrive at a
correct conclusion from electricity theory; I'm trying to see how
you arrive at it from information theory. So far, the information
theory I know seems too elementary to come up with the right
conclusion. No doubt I'm applying the calculations incorrectly.

You assert that I gave you D/r as a measure of resolution,
ignoring that I pointed out that for continuous signals (D+r)/r
is more appropriate ...

So there are 11 values instead of 10 when the RMS value of r is
10% of D. Isn't that rather a quibble?
----------------------------------

Oh, I forgot. Psychophysical results are not to be mentioned in
the same breath as PCT, because they might say something about
perceptual resolution, which we know to be infinite, don't we?)
engineer would think.

Now, now. I understand resolution to be the least possible
difference between two readings. How do you understand it?
---------------------------------------------------------------
The power of a sine wave is independent of its frequency. It is

1/x (integral from x=0 to x=t (sin^2 fx) dx)
(omitting the odd 2 pi factor)

You're right. Sorry. This means I can take the inverse Fourier
transform of a set of equal amplitudes in an artificial transform
and get the required waveform, right? I can make it random just
by picking random phase-pairs for the real and imaginary parts.
--------------------------------------------------------------

The reason is that the Gaussian variation provides the maximum
uncertainty for a given power.

I'm getting confused here. The disturbing variable is an
arbitrary waveform. If we add a Gaussian noise to it, we will
still have just an arbitrary waveform, won't we? I thought the
place to add the noise would be in the input function, so we have

                            ^ perceptual signal = CEV + noise
                            >
                        input funct
                           > >
     dist --------------CEV -- noise generator

Is there any point in putting noise into the environment? I can
see using a Gaussian distribution for the noise generator, but
why for the disturbance?

The output of the system has to come near mirroring the waveform
of the disturbance. The disturbance waveform is just whatever it
is. If you make the disturbance itself Gaussian, the output then
has to be nearly the same waveform, with the same Gaussian
distribution, if control is to be near-perfect. I think maybe the
above diagram is really what you're getting at. The above diagram
can be implemented easily with Simcon.
--------------------------------------------------------------

If you do not have the Gaussian
variation, the signal can be translated into another signal of
lower power but the same resolution, that can convey the same
information (equally, it can be converted into one of lower
bandwidth but the same power per unit frequency or one of
shorter time but the same power per unit time).

Have you been following my discussion with Randall? If you
translate a variable with a range of 8 units and unity resolution
into a signal with a range of 4 units _and unity resolution_, you
will lose information, won't you?

Also, isn't the resolution you have to use the resolution with
which you express the Gaussian distribution? If you use a 1-sigma
resolution, you won't end up with a Gaussian distribution around
the mean. You'll end up with a noise that changes amplitude in 1-
sigma jumps.

---------------------------------------------------------------
Hans Blom, (930621)--

All right, we are getting some apparent disagreements out of the
way. There are some left, but maybe they will go away, too.

you notice how each time I stress the 'optimal' usage
of what is perceived...

It's hard to define "optimal" in a massively parallel hierarchy
where a jillion different goals are being satisfied all at once
-- and some are in conflict. There's no evolutionary reason for
optimal control, unless in "optimal" you include quitting the
improvement of a control system when it's just good enough to
assure survival. My view is that living control systems have just
barely enough "quality" to enable them to survive to the extent
that they do, and not a bit more. An engineer who has to satisfy
only a few performance criteria in an artificial system can hone
the behavior of the system to a very high degree of precision,
speed, efficiency, and so forth. I don't think that happens in
many living control systems.

I am talking in terms of the response of a control system to,
say, a step change of a reference level or of a disturbance. As
you know, the response might be overly damped or oscillatory.
By a high quality response I mean one that is approximately
critically damped.

OK.

I do not believe in a monolithic "I", and neither do your
models.

Good.

The only way to make sense of self-reflexive ideas is to treat
a person as if that person were solid, like a potato: only the
whole person perceives and acts.

Demonstrably false. In split brain patients, one hand of the
patient may caress his wife while the other hand hits her.

Woops. I was saying that the only way you can talk about a self
describing the SAME self (that's what I meant by self-reflexive)
is to assume just a single self. Otherwise you're talking about
one subsystem describing another one -- and specifically not the
subsystem doing the describing. Von Foerster's idea of
"recursive" consciousness is what I was talking about, where the
recursion involves just one system doing things to itself. His
model is mathematical recursion: sin(sin(sin(.....x)))), in which
the very same function is applied to its own value, recursively.
That's what I'm arguing against.

Maybe David Goldstein will some day favor us with a description
of his application of PCT "levels" to a multiple-personality
patient.
---------------------

Loop gain, in PCT, is not "internal to the device."

In a hifi amp, it is. That was my example.

That depends on where you draw the system-environment line. The
last active component in a hifi amplifier is the output power
stage. You can draw the line immediately after that, for the
output side. On the input side, the feedback from the transformer
secondary (if used) or from the voice coil voltage, is the
sensory input from the environment. The music-signal input
voltage is the reference signal in a hifi amplifier; the whole
system makes the feedback from the output equal to the reference
signal, and keeps it matching with a very wide bandwidth. This is
no different from the way you would describe such an amplifier,
but it's rearranged to show the parallels with the PCT model.

This separation is always important in detailed control-system
design, but especially so when the effector is coupled to the
controlled quantity loosely or through complex intervening
processes. Then we clearly would expect the effector output to
be changing far more than the controlled quantity is changing.

That would be bad in any control system, especially if the load
can vary.

On the contrary, it's a necessity if you want good control. When
you design a regulated power supply, the voltage sensor does not
measure the output right at the terminals of the power supply.
Instead, the sensing lines are moved out to where the load will
be applied, sometimes several feet away. The reason is that
controlling the immediate output from the filter capacitor will
allow a voltage drop along the line, and the voltage will not be
well regulated at the load. Moving the sensor to the load makes
THAT voltage remain constant, while the output at the capacitor
varies up and down to compensate for varying losses in the
connecting wires.

LIkewise, if you want to control shaft speed, you don't put the
tachometer right where the motor is attached to the shaft. If you
do that, the shaft will "wind up" and the speed at the other end,
which can be meters away, will fluctuate up and down as the load
varies. You put the tach at the other end, and let the motor end
advance and retard in phase as loads come and go.

You don't put the bimetallic element of a thermostat in the duct
leaving the furnace. You don't put the sensor for a nuclear
reactor's coolant flow on the motor that drives the valve, or on
the button the turns the motor on (unless you're the designer of
the Three Mile Island coolant control system). You put the sensor
where the controlled variable is, and make sure it is sensing
exactly the thing you want controlled. Sensing the physical
output of the effector simply puts that output under control; it
doesn't put the downsteam consequences of that output under
control.

Engineers whio build control systems know this. They just
redefine "output" to mean whatever variable the sensor is
attached to, regardless of how far it is from the effector.

An optimal control system needs to know as exactly as possible
what the effects of its OUTPUTS are.

You put the emphasis on the wrong word: that should read

An optimal control system needs to know as exactly as possible
what the EFFECTS of its outputs are.

It doesn't give a damn what its OUTPUTS are. The outputs
automatically adjust to make sure that the EFFECTS are those that
are wanted. That's what PCT is about: controlling EFFECTS, not
controlling OUTPUTS.

This can be done by SENSING THE ACTIONS OF THE EFFECTORS (by
sensors in or near the effectors) and OBSERVING THE REACTION >OF

THE OUTSIDE WORLD to the actions of the effec- tors (in any

sensory modality).

If that's how you always design control systems, you're missing a
good bet. If you can observe the reaction of the outside world,
you already have a sensor for the reaction. Put that sensor in a
feedback loop and you can control the reaction directly, out
there at the end of the causal chain. You don't really want to
control the actions of the effectors, do you? What you want to
control is the effect they're having. Sometimes you can't sense
the effect in time, but usually you can. When you can sense the
effect, the result will always be better. When you can't, you're
not going to get great control by any means -- you can't correct
changes in the result due to unforseen disturbances acting
directly and independently on it, downstream from your carefully-
controlled effector outputs. If you sense the result directly,
letting the effector outputs vary as required, you can counteract
any disturbances, even those that can't be sensed before they
happen.

Note that this directly solves the adaptation (learning)
problem as well: correlating the effector's action with the
world's reaction 'identifies' ('systems identification') the
world.

But it's not as good a solution as detecting the world's
reaction, comparing that against the reaction you want, and
converting the difference into the direction of output that will
make the difference smaller. Decidedly smaller. When you can do
it that way, you don't need any correlations.

Our effectors do in fact have built-in and/or built-on sensors,
as you know. In my opinion, they have this additional function
above the one that you describe, the comparator function.

Yes, they do. That allows us to control sensed force and very
roughly, velocity and position of a limb. The comparator
functions for these loops are the spinal motor neurons. But
higher loops use these controlled effectors as outputs in systems
that sense the EFFECTS of the motor outputs and use higher-level
comparators. Nothing runs open loop in an organism, with only a
few trivial exceptions. Every consequence of action is directly
sensed and controlled, and consequences of those consequences are
ALSO sensed and controlled. Nothing is left to change.

When necessary, living control systems at some levels can run
open-loop for short periods of time. If the lights go out, you
can keep walking across the room, maintaining balance
kinesthetically and approximating the actions required to get to
the light switch. But you won't hit the light switch on the first
try, and if you have to walk too far you'll start running into
things. The trouble with open-loop behavior is that it works only
in an absolutely constant environment, which is hard to find.
Open-loop systems have to extrapolate on the basis of the last
known situation; there's no way they can deal with the effects of
unpredictable disturbances (unless you build them like
locomotives). And they are terrible at producing extended
sequences of action in which the starting point for one action is
the ending point of the previous one. The only way to make that
work is with stepper motors and guiding rails -- and strict
attention to preventing disturbances.

When we see the controlled variable separated from the
effector output, we can much more easily understand that the
visible behavior of an organism is really just its actuator
output.

Not when the actuator is loosely coupled to its environment
(through some layers of skin tissue, for example). An analogy
is a battery with a large internal resistance. Its 'visible'
voltage greatly depends on its load.

In the hierarchy, the "actuator" for higher-level systems
consists of lower-level control systems. Thus what we call
actions are generally actually controlled inputs of lower
systems, and so don't show the loose coupling that is actually
there. Only at the lowest level are the actions hidden, inside
the muscles. However, when we're looking at one "action" that is
really a controlled variable, we miss the variable that is being
controlled by the action, so unless you understand control you'll
still miss the real controlled variable. In truth, practically
everything we call "behavior" is actually a controlled outcome of
some lower-level action. We name behavior by the outcome, not by
the action that produces it. If we see a person varying the
position of a pencil held in a hand, we don't name the behavior
as "moving a pencil against varying frictional resistances by
altering muscle tensions." We say that the person is "writing",
because what is produced by these actions is seen to be writing.
Yet in the next breath we can treat the "writing" as an action,
and say "He's writing a letter apologizing to his sister." Of
course he isn't (at another level): he's writing words, folding
paper, licking a stamp, and so forth.
---------------------------------------------------------
I'm going to leave power gain for another time.

Best to all,

Bill P.

[Martin Taylor 930623 15:00]
(Bill Powers 930622.0730)

Your time-domain representation of the way the control system works is
exactly the way I see it and have seen it since before the beginning
of the information theory argument. No problems. If you remember, I
used a colloquial phrase to which you objected, about the information
about the disturbance in the perceptual signal being "bled off" over
time into the output signal, so that after a while there was almost
none left in the perceptual signal, if the perception was effectively
being controlled.

Instead
of using a constant disturbance, you used a step-function to show
that the disturbance must have an effect on the perceptual
signal. A step function contains an instantaneous transition,
which is guaranteed to be far outside the bandwidth of any
physical control system.

When the disturbance moves very slowly compared with the control bandwidth,
the phase lag is very small, so there is at any time very little information
that has not been "bled off." But if you integrate that "very little"
error over time, it turns out to be exactly what is required to create
the output signal (assuming a simple integrator output function). The
changing disturbance, however slowly it changes, does create an error
that is corrected over time (exponentially as you say given the simple
output function). In my "silver tongued" posting, to which you refer,
I argued that the same function applied as the magnitude of the step
approached zero, in the classic way that differentials are taught as the
infinitesimal end-point of differences. There are no peculiarities in
this system about the approach to zero, and the sum of step function
effects approaches the integral of continuous effects. I don't think
we are talking about different extreme conditions in any way that
alters the argument.

···

=====================

How is it possible for you to take my discussion of the
time-dependent increase in resolution in observations of low-
bandwidth signals, and come up with:

If the noise is 10% RMS of the range,
Martin's intepretation would be that there are only 10
possible values of the perceptual signal with a range of 10
units, 0 through 9.

Easy. You said that the equivalent of r (in D/r) for a continuous
system with noise would be the RMS noise level. If the RMS noise
level is 10% of D, then D/r is 10, and the probability of any one
measure would be r/D or 1/10. In the discrete case, D/r
represents the number of possible values of d. Thus there are D/r
= 10 possible values of d (or some signal) in the analog case,
too, if you're computing probability the same way.

I still don't understand this. 3.01572 is different from 3.01583,
even if the RMS error of measurement is 300. The readings differ,
even if they don't usefully differ when referred to the thing being
measured. There is no finite number of possible readings when we are
dealing with continuous systems. There is a finite amount of information
that you can get from a reading about the thing being measured. That's
quite a different kettle of fish.

I understand resolution to be the least possible
difference between two readings. How do you understand it?

You are an old radio astronomer. I take the resolution of a telescope
to be roughly the distance at which two point-spread functions meet at
their 3 dB points. I've forgotten the name, but isn't the point spread
function called the Airey disc or something like that? The centre points
of the star images can be anywhere, but a another star closer than the
diameter of the Airey disc cannot be resolved. That's what I mean by
resolution. (Sorry if I have the name totally screwed up; the concept
is what counts).

The resolution is referred to the star field, not to the film grain.

------------

I think your diagram looks OK for the added noise. The control system
can see the value of the CEV perturbed by the added noise, but can affect
only the value of the CEV. What it will presumably do is to affect the
CEV in such a way that the perceptual signal matches the reference, and
therefore in such a way that the CEV does not have the desired value.
The real-world effectiveness of control depends on the reliability with
which the perceptual signal reflects the value of the CEV (you might
drive off the cliff if you couldn't tell where you were on the road
to better than 20 metres). So if you want to control the CEV's value
(as seen by a neutral and infinitely precise observer) to within +-X,
you had better be sure that your perceptual signal is reliably different
when the actual value of the CEV changes by 2X. You can do that, whatever
the RMS noise level, by averaging over a long enough time.

Have you been following my discussion with Randall? If you
translate a variable with a range of 8 units and unity resolution
into a signal with a range of 4 units _and unity resolution_, you
will lose information, won't you?

Yes, if it is quantized resolution you are talking about, or if you
are talking about the same observation conditions when determining
the value of the resolution.

=========================

I appreciate your problems with too much irrelevant mail. You put
information theory among the "Yes, but..." class of stuff that wastes
your time. I think of it as neither orthogonal nor competitive, but
as a foundational aspect of PCT that has been ignored to date.

The discussion above does not use the word "information" with or without
a capital "I" (unless I slipped up), but it is mostly about the
importance of information. So is the example that Tom Bourbon kindly
reposted to the net, about the characteristic differences between a
one-level simple ECS and human behaviour in the sawtooth tracking
task. Whether these characteristic differences depend on a world
model that forms part of an always active imagination loop, or depend
on shifting reference signals (of a form I do not understand) from
above (as Tom suggests), they represent information used by the human,
derived from the redundancies found in the history of the task, that
helps the human track.

A one-level ECS that tracks as well (or badly) as the human on average,
with its parameters fitted to correlate 0.997 with the human, does not
overshoot when the sawtooth motion of the target stops at the centre
point, and does not reverse direction when the target stops at a peak
of the sawtooth. The human does, however, because of expectations about
the future movement of the target, wherever those expectations are stored.
The model ECS uses only information from the sensory input. The human
uses that in conjunction with information from the history of the movement,
and takes longer than the model to respond to changes in the parameters
of the movement. The model would not care if the movement changed from
sawtooth to random. But would the same parameter settings of the model
provide the optimum fit both to the human tracking the sawtooth and to the
same human making the transition to tracking a randomly moving target
of the same maximum slew rate?

==========
Answering Hans Blom (related to the previous section)

Note that this directly solves the adaptation (learning)
problem as well: correlating the effector's action with the
world's reaction 'identifies' ('systems identification') the
world.

But it's not as good a solution as detecting the world's
reaction, comparing that against the reaction you want, and
converting the difference into the direction of output that will
make the difference smaller. Decidedly smaller. When you can do
it that way, you don't need any correlations.

But you get the best of both worlds by using the correlations, because
so long as the world remains consistent in its reactions to output,
you can reduce the phase lag of control. The penalty is increased
error when the effect of output on the CEV changes.

Martin

From Tom Bourbon (930623.1619)

[Martin Taylor 930623 15:00]
(Bill Powers 930622.0730)

..

I appreciate your problems with too much irrelevant mail. You put
information theory among the "Yes, but..." class of stuff that wastes
your time. I think of it as neither orthogonal nor competitive, but
as a foundational aspect of PCT that has been ignored to date.

The discussion above does not use the word "information" with or without
a capital "I" (unless I slipped up), but it is mostly about the
importance of information. So is the example that Tom Bourbon kindly
reposted to the net, about the characteristic differences between a
one-level simple ECS and human behaviour in the sawtooth tracking
task. Whether these characteristic differences depend on a world
model that forms part of an always active imagination loop, or depend
on shifting reference signals (of a form I do not understand) from
above (as Tom suggests), they represent information used by the human,
derived from the redundancies found in the history of the task, that
helps the human track.

A one-level ECS that tracks as well (or badly) as the human on average,
with its parameters fitted to correlate 0.997 with the human, does not
overshoot when the sawtooth motion of the target stops at the centre
point, and does not reverse direction when the target stops at a peak
of the sawtooth. The human does, however, because of expectations about
the future movement of the target, wherever those expectations are stored.
The model ECS uses only information from the sensory input. The human
uses that in conjunction with information from the history of the movement,
and takes longer than the model to respond to changes in the parameters
of the movement. The model would not care if the movement changed from
sawtooth to random. But would the same parameter settings of the model
provide the optimum fit both to the human tracking the sawtooth and to the
same human making the transition to tracking a randomly moving target
of the same maximum slew rate?

Martin, apparently I did not (I will not yet say cannot) urge you strongly
enough to examine the chapter by Rick and Bill. The point was that a
two-level PCT model duplicated the "overshoots" that occurred when a person,
who was performing a simple tracking task, encountered various alterations
of the conditions in the task -- sudden reversal of the direction the mouse
moved the cursor, a signal to change reference perceptions for where the
cursor should be relative to the target, and a couple of others. The point
was that the models, and presumably the persons, "kept on tracking" using
the old reference signal. What they did for a few hundred milliseconds was
"wrong," relative to the new condition, but exactly "right," relative to
the old reference signal.

Your remarks in the following section of your post lead me to believe you
really should read that chapter:

...Whether these characteristic differences depend on a world
model that forms part of an always active imagination loop, or depend
on shifting reference signals (of a form I do not understand) from
above (as Tom suggests), they represent information used by the human,
derived from the redundancies found in the history of the task, that
helps the human track.

The effect occurred, for the 2-level model, as a direct result of its
hierarchical connections, with error from level 2 as the reference signal
for level 1. There were no world models, or imagination loops, or
redundancies found in the history of the task, or stored expectations about
the future movement of the target. Not that those things never
occur or are never important; but in the tracking task, a simple bare-wire
two-level PCT model reproduced phenomena that you believe require those
things.

Please, read the chapter.

Until later,
  Tom Bourbon