Uncertainty, Information and control with transport lag (was ...part I: U, I, and M)

[Martin Taylor 2013.02.01.16.43]

``````        [Martin Taylor
``````

2013.03.01.13.10]

``````        Probably. Maybe you can think of easier wording. Something
``````

along the lines of “If it takes some time for the effect of
a control action to arrive at the CEV, the disturbance value
may have changed. If the disturbance is changing only
slowly, control could still be good, but if it is likely to
change much during that lag time, control will not be very
successful.”?

AM:

Yes, something like that.

Is there a mathematical formulation with frequencies?

``````      As in: if frequency of d is smaller than 1/timelag, then
``````

control will not be good.

Yes.

``````But it is not trivial if you don't have the right background. I
``````

append to this message a brief skim of that background. It is long,
but very short compared to the length of courses required to teach
it properly. Even a brief flyover may seem to be a long trek.

``````The quick takeaway from it is about bandwidth, not frequency as
``````

such. If the effective disturbance bandwidth is Weff, then if the
lag is greater than 1/(2*Weff), no control at all is possible, and
attempts to control will add to the variability of the CEV. (The
“effective bandwidth” concept is discussed in the “brief skim”
appended to this message.) For zero lag, the limit on control for a
control system with an integrator output function is set by the gain
rate of the integrator, since that determines the available bit rate
for control. I do not discuss that any further in this message. For
lags between zero and 1/2Weff, control is less the longer the lag. I
copy here a few final paragraphs from the end of the “brief skim”.
They may be all you need, as most of the long derivation is
concerned with the interplay between uncertainties of frequency and
time representation.

``````...

If the loop transport lag is L seconds, then at time t0, Eo, the
``````

effect of the output at the CEV, cannot be derived from any values
of the CEV later than time t0-L. Since then, the CEV has been
influenced by all the changes in the disturbance over the subsequent
L seconds. The net result of those changes is uncertainty that is
proportional to the time since the last observation:

``````U_L(X) = min(2*Weff*L*Umax, Umax) where X represents the possible
``````

values of the CEV and U_L(X) is its uncertainty L seconds after X
was last observed.

``````As a function of lag L, this is a linear rise from zero to Umax at a
``````

rate 2Weff*Umax bits/second. The actual value of Umax depends on the
resolution of the observation if the variation of the CEV is
continuous, but not if it is discrete.

``````![Re Negative feedback too simpl4.jpg|367x237](upload://wHj11pfvYINPJ88rD5AWToqBuaD.jpeg)

U_L(X) is the minimum possible uncertainty of the CEV for a loop
``````

with transport lag L, given Eo and its history (EoHist). Since the
global uncertainty of the disturbance is defined as Umax, the
maximum value of the mutual information M between EoHist and d (the
disturbance waveform) is

``````Mmax(d:EoHist) = Umax - min(2Weff*L*Umax, Umax)

The uncertainty-based measure of control quality is the difference
``````

between the uncertainty of the CEV without (Umax) and with control
(min(2WLUmax, Umax)). That difference just happens (no coincidence
if you understood the “long tutorial”) to be Mmax(d:EoHist).

···
``````    On Fri, Mar 1, 2013 at 7:17 PM, Martin > Taylor <mmt-csg@mmtaylor.net> >         wrote:
``````
``````---

Now for something completely different:-)
``````
``````        MT: Looking only at a
``````

single control system, there’s no way r could covary with d.
But if r were to be derived from the output of another
control system that observed d directly, it could. I’m not
sure what such a connection would do for control, though.

AM:
Perhaps not observing d, but using memory to approximate it.

``````No. the value of r comes from somewhere else, and is independent of
``````

anything that happened with d, unless that value went through some
other pathways and control systems that eventually contributed to
the reference value of the system we are looking at. But in the
general case we don’t know of any such pathways, so we have to say
that variation in r is independent of variation in d. In words: “The
reference value is the control system’s purpose; the disturbance
tends to keep it from fulfilling its purpose; to the extent that it
can counter the disturbance, the output action allows the control
system to achieve its purpose.” Wording it like that should make
clear that there isn’t any relation between reference and
disturbance values.

``````Martin

====================================

---------Appendix----------

What follows is the (long) brief scan of the background to the
``````

uncertainty calculation. It assumes that you have understood most of
the tutorial that started this thread [Martin Taylor
2013.01.01.12.40]. I am quite likely to include much of what
follows, or a paraphrase of it, in Part 2 of the tutorial.

``````------------------

``````

something that varies continuously over time. I treat the continuous
case because it is more difficult than the discrete case, which is
fairly easy. Call the “something” X and its value at time t, x(t).
The Fourier spectrum of X can be symbolized X(f).

``````If you look only once at X, at moment t0, you will know the value
``````

x(t0), but you won’t know anything else about how X varies. You will
know nothing about the value of X at any other moment, even
infinitesimally soon after t0. However, if you know that the
spectrum X(f) is zero when f > W – nothing happens to x(t)
faster than the speed of a wave fluctuating at speed W cycles/second
– you can say that x(t) won’t change much in a time short compared
to 1/W seconds, the length of one cycle of the highest speed
component of the spectrum of X.

``````If you believe that last paragraph, you forgot something. If the
``````

power P in a waveform is very high, the component at frequency W may
change hugely in even a tiny fraction of one cycle. So, to say
something about the limits, we have to know something about how much
power is in the waveform. In analyses like this, power is defined as
one of two things; the average of the square of the signal amplitude
x(t), or the average of the square of the signal spectral values
X(f). There is a theorem called Parseval’s Theorem that shows that
the two definitions give the same answer.

``````At this point, we know that if we are to be able to say anything
``````

about X based on having observed only x(t0), we have to be able to
put limits on both W and P (the average power). If we had observed a
very large number of values of x(t), we would be able to estimate W
and P approximately from those observations. The more observations
we have, the better we can estimate these limits, and not only the
limits. With several observations, we can also begin to estimate the
power averaged over ever smaller portions of time or frequency. Of
course, if we don’t take observations of x(t) often enough, we will
not be able to estimate the average power over very small intervals
of time. Nor will we be able to distinguish different frequencies
that have more than one cycle between sample moments of x(t)
(Actually the limit is half a cycle, but it’s easier to visualise
why it is true for more than one cycle).

``````At this point, you may have recognized another problem. If the
``````

waveform X was actually created by adding up a finite number of
sinusoids in the way the Fourier analysis suggests, then it repeats
its pattern indefinitely over time, and its values for all time will
be known exactly as soon as we have gathered enough samples (values
of x(t) for different moments in time) to compute all its spectral
components. So, if we are interested in randomly varying signals
whose future cannot, by definition, be predicted exactly, something
is wrong.

``````What is wrong is not the analysis, but the presuppositions. If X
``````

varies randomly, it means that something new is continually being
added into the mix as time goes on. That contradicts the idea that X
was created by prespecifying a finite number of sinusoids to be
added together. We may compute the spectrum X(f) correctly from the
measures x(t), but as time goes on we measure new values of x(t).
These change the results when we compute X(f). X(f) is valid only
within the confines of the time span over which we have collected
the samples x(t). Nevertheless, if we have a lot of samples of the
history of X, the addition of one new one will be unlikely to change
X(f) very much.

``````How much the result changes depends on whether we keep all the
``````

oldest measures of x(t), just keep the most recent N measures, or
allow the weight of a measure to decline gracefully as it gets older
and older. All of these procedures are defensible, but which one is
preferred depends on what you know about X apart from your measures
of x(t).

``````Usually, individual observations are not infinitely precise, and
``````

that’s where the intermediate “graceful aging” approach is useful.
It allows you to take advantage of the older x(t) observations to
improve your estimate of the signal statistics while guarding
against possible changes in the statistical properties of X over
time, if those changes are slow compared to the fluctuations of the
X waveform itself.

``````So far, we haven't mentioned lag at all, but never fear, we are
``````

getting there. Just not immediately. First we have to deal with the
fact that the entire course of the waveform within the sampleing
range of time can be recovered from the values of 2WT+1 equally
spaced samples (see “Nyquist-Shannon sampling theorem” in
Wikipedia). Shannon also proved that the number 2WT+1 also holds if
the sampling moments are not equally spaced). At least, that is true
when the values of X(f) do not change over time.

``````But now we have to take note of yet another problem: the sinusoids
``````

that compose X(f) extend to infinite time in both directions. They
cannot change. That being the case, where does randomness come in?
Obviously, we cannot use simple Fourier transform analysis properly
if we are dealing with a randomly varying waveform. We have run into
an extreme case of a more general fact; the more precisely you
specify the frequency description of a waveform, the less precisely
can you specify the time when the representation is relevant, and
vice versa. If you have an exact Fourier spectrum, it is able to
give you an exact representation of the waveform over some interval,
but you cannot know where in infinite time past and future that
interval might lie. Conversely, though we haven’t provided an
example of this, if you know exactly the time interval in which your
waveform lies, you cannot know exactly its spectrum (see
“Uncertainty principle” in Wikipedia and scroll to “Signal
Processing”, or try “Gabor Limit” which scrolls you there directly).

``````As described above, the Gabor Limit deals only in the extreme cases
``````

in which you know exactly either the time interval or the frequency
bandwidth W. But what if you know one of them approximately? Can you
then know the other approximately? Yes, you can, and there is an
expression called the Hirschman limit that provides the limit to how
well you can know it in terms of Shannon uncertainty. I’m not going
to go into the math, because I’m not sure I understand it properly,
but you can find it under “Entropic Uncertainty” in Wikipedia. The
limit is

``````U(|x(t)|^2|) + U(|X(f)|^2) >= log(e/2)

(the uncertainty of the power spectrum and the squares of the time
``````

samples cannot sum to less than log(e/2) where the base of the log
is the same as is used in computing the uncertainty. If the base is
2, the limit is log(e) -1 = 0.44 bits when the waveform statistics
are Gaussian – meaning that both the distributions of time samples
and of spectral values have a Gaussian distribution). The lower
limit of the sum of the two uncertainties is higher if the
distributions are not Gaussian, but it cannot be lower.

``````In most circumstances the uncertainty of X will be much higher by
``````

itself than the Hirschman limit, because to get this low requires
that each individual sample of x(t) be separately identified. If,
for example, we know only the sum or integral of all the prior
values of x(t), the uncertainty U(|x(t)|^2) is very large, which
means that under those conditions, the Hirschman limit imposes no
constraint at all on our knowledge of the power spectrum of X.

``````Now we are getting closer to the question of lag. I told you it was
``````

non-trivial, even though your one-liner description is conceptually
correct.

``````The reason for going into the above uncertainty relations was to
``````

reinforce the idea that one cannot be accurate in both of two
complementary views of the situation. There are all sorts of
different transforms of waveforms – wavelets is a current favourite
– but in all cases, if you have precision in the time domain, you
lose it in the transform domain, and vice-versa.

``````Let's consider a continuing waveform for which we use only a finite
``````

running set of time samples x(t). By “finite running set” I mean
that if the time now is t0, we use only the N samples from t0-k-N to
t0-k. We know we can’t limit the region of time precisely without
needing to consider an infinite range of frequency, and we can’t
limit the frequency band precisely without considering infinite
time, but we can come pretty close to limiting both of them, as we
know from the small size of the Hirschman limit. So we won’t go too
far wrong by pretending we can limit time to an interval T and
frequency to a bandwidth W, so long as we keep in mind that we are
approximating and that there are conditions in which the
approximation matters.

``````As mentioned above, the Nyquist-Shannon theorem proves that we can
``````

describe a waveform exactly over its entire course during an
interval T if we know its values at intervals of no more than 1/2W
seconds – or even less precisely, at an average interval rate of
1/2W over the interval T.

``````At this point we should discuss the concept of "effective
``````

bandwidth". So far, everything concerned with bandwidth W has
implicitly assumed that the contribution to the uncertainty is equal
for any frequency within W and zero for any frequency outside W. But
this clearly is not true. Suppose that the spectrum consisted of one
frequency that contained almost all the power of the waveform plus
low levels of any other frequency within the band W. The waveform
would look very like a sinusoid of that main frequency, slightly
changing in amplitude from peak to peak and slowly changing in phase
from cycle to cycle. If we knew its amplitude and phase at the time
of some observation, we would be able to predict its track quite a
long way into the future (and back into the past as well). Our
uncertainty about the value of the waveform at time 1/2W seconds
into the future would be almost zero. So W, the band that contains
all the signal energy, is clearly not the “effective” bandwidth of
the signal if the contributions to uncertainty from the different
frequencies are unequal.

``````Shannon considered this problem as one of filtering, as a conceptual
``````

device to link the effective bandwidth to the effective bandwidth of
the white noise for which the 2W samples per second applies. If a
filter has a certain passband shape defined by |Y(f)|^2, and a white
noise of bandwidth W is fed into the input of the filter, the
uncertainty of the output of the filter is that of the input
multiplied by (1/W)*exp(integral_over_W(log(|Y(f)|^2)df). The
“effective bandwidth” of a signal (Weff) is its actual bandwidth W
divided by the result of this expression.

``````Shannon provides an explicit result of the divisor expression for
``````

several spectral shapes and shows how to use these canonical shapes
to approximate others if an explicit form is not known. In 1949,
Shannon did not have access to the computing power now available; to
compute this formula numerically is now a simple matter if the power
spectrum of the signal is well defined.

``````Samples taken 1/2Weff seconds apart are informationally independent,
``````

so if we look at the waveform beyond 1/2Weff seconds after the end
of interval T, we will be able to determine nothing about the value
of X beyond what we would know by examining its gross statistics. So
all our enquiry about the effect of lag must be concerned with lags
less than 1/2Weff. If the transport lag around a control loop
exceeds 1/2Weff, where the waveform in question is the disturbance
input, there is no way that the output could consistently reduce the
error

``````Still not considering control, we are concerned with the change of
``````

uncertainty about a waveform as time passes since the most recent
observation, but before 1/2Weff. At the time of the last
observation, the uncertainty is set by the resolution of the
observation. When measured in units of the observation resolution,
the uncertainty at the moment of observation is zero. After 1/2Weff
seconds the uncertainty is set by the global statistics of X and the
resolution of observation. The magnitude of the uncertainty after
1/2Weff depends on the distribution of probability over the
different possibilities for X, but if that distribution is Gaussian,
the global uncertainty is

```````                (   Standard Deviation of global probability
``````

distribution )``

````````      Umax =log (
``````

--------------------------------------------------- )
``

````````                (     Standard Deviation of observation
``````

distribution`` )`

``````Define w = (time since last observation)/2Weff. Then the uncertainty
``````

after delay w is wUmax(X), and the waveform can be said to have an
uncertainty generation rate of 2Weff
Umax bits/second. Tc in the
figure is 1/2Weff.

``````<img src="cid:part2.05070506.09000101@mmtaylor.net" alt="">

We can follow a similar analysis for discrete variables, even ones
``````

with internal syntactic relations. I discussed a lot of that in the
long tutorial on information and uncertainty [Martin Taylor
2013.01.01.12.40]. The end result is the same. Provided that the
sampling moments are independent of the values of the observations,
the uncertainty grows linearly from zero at the moment of the last
observation to its maximum value at some future time determined by
the statistics of the signal. However, as with any of these general
statements, there can be individual instances in which the
uncertainty might even decline after growing for a while. For
example, if X is a sequence of characters samples from English text,
and the observation is that the last three characters in a string
are space-t-h, it is highly probable that the next character will be
e or i. But it is rare for a random observation to have been taken
at exactly the moment when space-t-h happened to be the most recent
character sequence. Such cases are averaged out by cases such as
e-d-space, where the next letter is anyone’s guess.

``````Finally again talking about control, if the loop transport lag is L
``````

seconds, then at time t0, Eo (the influence of the output at the
CEV) cannot be derived from any values of the CEV later than time
t0-L. Since then, the CEV has been influenced by all the changes in
the disturbance over the subsequent L seconds. The net result of
those changes is uncertainty that is proportional to the time since
the last observation:

``````U_L(X) = min(2Weff*L*Umax, Umax) where X represents the possible
``````

values of the CEV.

``````U_L(X) is the minimum possible uncertainty of the CEV for a loop
``````

with transport lag L, given Eo and its history (EoHist). Since the
global uncertainty of the disturbance is defined as Umax, the
maximum value of the mutual information between EoHist and d (the
disturbance waveform) is

``````Mmax(d:EoHist) = Umax - min(2Weff*L*Umax, Umax)

The uncertainty-based measure of control quality is the difference
``````

between the uncertainty of the CEV without (Umax) and with control
(min(2WeffLUmax, Umax)). That difference just happens (no
coincidence if you remember the long tutorial) to be Mmax(d:EoHist).
The actual value of Umax depends on the resolution of the
observation if the variation of the CEV is continuous, but not if it
is discrete.

``````--------End Appendix--------
``````
``````        MT: Looking only at a
``````

single control system, there’s no way r could covary with d.
But if r were to be derived from the output of another
control system that observed d directly, it could. I’m not
sure what such a connection would do for control, though.

AM:
Perhaps not observing d, but using memory to approximate it.

[Martin Taylor 2013.02.01.16.43]

AM:

Yes, something like that.

Is there a mathematical formulation with frequencies?

``````      As in: if frequency of d is smaller than 1/timelag, then
``````

control will not be good.

Yes.

``````But it is not trivial if you don't have the right background. I
``````

append to this message a brief skim of that background. It is long,
but very short compared to the length of courses required to teach
it properly. Even a brief flyover may seem to be a long trek.

[…]

AM:

Well, I’ve read it, but I can’t say I understood it. Googling unknown terms didn’t help much.

Is this right - if we want to analyse a control system using IT, we need to know Weff (the bandwidth), L (the timelag) nad Umax (the maximum random part of the disturbance)?

If yes, can we measure all of those in biological control systems and their environments, or do we measure something else and calculate the remaining parts?