[Martin Taylor 2013.02.01.16.43]

`[Martin Taylor`

2013.03.01.13.10]

`Probably. Maybe you can think of easier wording. Something`

along the lines of “If it takes some time for the effect of

a control action to arrive at the CEV, the disturbance value

may have changed. If the disturbance is changing only

slowly, control could still be good, but if it is likely to

change much during that lag time, control will not be very

successful.”?

AM:

Yes, something like that.

Is there a mathematical formulation with frequencies?

`As in: if frequency of d is smaller than 1/timelag, then`

control will not be good.

Yes.

```
But it is not trivial if you don't have the right background. I
```

append to this message a brief skim of that background. It is long,

but very short compared to the length of courses required to teach

it properly. Even a brief flyover may seem to be a long trek.

```
The quick takeaway from it is about bandwidth, not frequency as
```

such. If the effective disturbance bandwidth is Weff, then if the

lag is greater than 1/(2*Weff), no control at all is possible, and

attempts to control will add to the variability of the CEV. (The

“effective bandwidth” concept is discussed in the “brief skim”

appended to this message.) For zero lag, the limit on control for a

control system with an integrator output function is set by the gain

rate of the integrator, since that determines the available bit rate

for control. I do not discuss that any further in this message. For

lags between zero and 1/2Weff, control is less the longer the lag. I

copy here a few final paragraphs from the end of the “brief skim”.

They may be all you need, as most of the long derivation is

concerned with the interplay between uncertainties of frequency and

time representation.

```
...
If the loop transport lag is L seconds, then at time t0, Eo, the
```

effect of the output at the CEV, cannot be derived from any values

of the CEV later than time t0-L. Since then, the CEV has been

influenced by all the changes in the disturbance over the subsequent

L seconds. The net result of those changes is uncertainty that is

proportional to the time since the last observation:

```
U_L(X) = min(2*Weff*L*Umax, Umax) where X represents the possible
```

values of the CEV and U_L(X) is its uncertainty L seconds after X

was last observed.

```
As a function of lag L, this is a linear rise from zero to Umax at a
```

rate 2Weff*Umax bits/second. The actual value of Umax depends on the

resolution of the observation if the variation of the CEV is

continuous, but not if it is discrete.

```
![Re Negative feedback too simpl4.jpg|367x237](upload://wHj11pfvYINPJ88rD5AWToqBuaD.jpeg)
U_L(X) is the minimum possible uncertainty of the CEV for a loop
```

with transport lag L, given Eo and its history (EoHist). Since the

global uncertainty of the disturbance is defined as Umax, the

maximum value of the mutual information M between EoHist and d (the

disturbance waveform) is

```
Mmax(d:EoHist) = Umax - min(2Weff*L*Umax, Umax)
The uncertainty-based measure of control quality is the difference
```

between the uncertainty of the CEV without (Umax) and with control

(min(2W*L*Umax, Umax)). That difference just happens (no coincidence

if you understood the “long tutorial”) to be Mmax(d:EoHist).

## ···

`On Fri, Mar 1, 2013 at 7:17 PM, Martin > Taylor <mmt-csg@mmtaylor.net> > wrote:`

```
---
Now for something completely different:-)
```

`MT: Looking only at a`

single control system, there’s no way r could covary with d.

But if r were to be derived from the output of another

control system that observed d directly, it could. I’m not

sure what such a connection would do for control, though.

AM:

Perhaps not observing d, but using memory to approximate it.

```
No. the value of r comes from somewhere else, and is independent of
```

anything that happened with d, unless that value went through some

other pathways and control systems that eventually contributed to

the reference value of the system we are looking at. But in the

general case we don’t know of any such pathways, so we have to say

that variation in r is independent of variation in d. In words: “The

reference value is the control system’s purpose; the disturbance

tends to keep it from fulfilling its purpose; to the extent that it

can counter the disturbance, the output action allows the control

system to achieve its purpose.” Wording it like that should make

clear that there isn’t any relation between reference and

disturbance values.

```
Martin
====================================
---------Appendix----------
What follows is the (long) brief scan of the background to the
```

uncertainty calculation. It assumes that you have understood most of

the tutorial that started this thread [Martin Taylor

2013.01.01.12.40]. I am quite likely to include much of what

follows, or a paraphrase of it, in Part 2 of the tutorial.

```
------------------
Let's forget about control loops for now, and think about observing
```

something that varies continuously over time. I treat the continuous

case because it is more difficult than the discrete case, which is

fairly easy. Call the “something” X and its value at time t, x(t).

The Fourier spectrum of X can be symbolized X(f).

```
If you look only once at X, at moment t0, you will know the value
```

x(t0), but you won’t know anything else about how X varies. You will

know nothing about the value of X at any other moment, even

infinitesimally soon after t0. However, if you know that the

spectrum X(f) is zero when f > W – nothing happens to x(t)

faster than the speed of a wave fluctuating at speed W cycles/second

– you can say that x(t) won’t change much in a time short compared

to 1/W seconds, the length of one cycle of the highest speed

component of the spectrum of X.

```
If you believe that last paragraph, you forgot something. If the
```

power P in a waveform is very high, the component at frequency W may

change hugely in even a tiny fraction of one cycle. So, to say

something about the limits, we have to know something about how much

power is in the waveform. In analyses like this, power is defined as

one of two things; the average of the square of the signal amplitude

x(t), or the average of the square of the signal spectral values

X(f). There is a theorem called Parseval’s Theorem that shows that

the two definitions give the same answer.

```
At this point, we know that if we are to be able to say anything
```

about X based on having observed only x(t0), we have to be able to

put limits on both W and P (the average power). If we had observed a

very large number of values of x(t), we would be able to estimate W

and P approximately from those observations. The more observations

we have, the better we can estimate these limits, and not only the

limits. With several observations, we can also begin to estimate the

power averaged over ever smaller portions of time or frequency. Of

course, if we don’t take observations of x(t) often enough, we will

not be able to estimate the average power over very small intervals

of time. Nor will we be able to distinguish different frequencies

that have more than one cycle between sample moments of x(t)

(Actually the limit is half a cycle, but it’s easier to visualise

why it is true for more than one cycle).

```
At this point, you may have recognized another problem. If the
```

waveform X was actually created by adding up a finite number of

sinusoids in the way the Fourier analysis suggests, then it repeats

its pattern indefinitely over time, and its values for all time will

be known exactly as soon as we have gathered enough samples (values

of x(t) for different moments in time) to compute all its spectral

components. So, if we are interested in randomly varying signals

whose future cannot, by definition, be predicted exactly, something

is wrong.

```
What is wrong is not the analysis, but the presuppositions. If X
```

varies randomly, it means that something new is continually being

added into the mix as time goes on. That contradicts the idea that X

was created by prespecifying a finite number of sinusoids to be

added together. We may compute the spectrum X(f) correctly from the

measures x(t), but as time goes on we measure new values of x(t).

These change the results when we compute X(f). X(f) is valid only

within the confines of the time span over which we have collected

the samples x(t). Nevertheless, if we have a lot of samples of the

history of X, the addition of one new one will be unlikely to change

X(f) very much.

```
How much the result changes depends on whether we keep all the
```

oldest measures of x(t), just keep the most recent N measures, or

allow the weight of a measure to decline gracefully as it gets older

and older. All of these procedures are defensible, but which one is

preferred depends on what you know about X apart from your measures

of x(t).

```
Usually, individual observations are not infinitely precise, and
```

that’s where the intermediate “graceful aging” approach is useful.

It allows you to take advantage of the older x(t) observations to

improve your estimate of the signal statistics while guarding

against possible changes in the statistical properties of X over

time, if those changes are slow compared to the fluctuations of the

X waveform itself.

```
So far, we haven't mentioned lag at all, but never fear, we are
```

getting there. Just not immediately. First we have to deal with the

fact that the entire course of the waveform within the sampleing

range of time can be recovered from the values of 2WT+1 equally

spaced samples (see “Nyquist-Shannon sampling theorem” in

Wikipedia). Shannon also proved that the number 2WT+1 also holds if

the sampling moments are not equally spaced). At least, that is true

when the values of X(f) do not change over time.

```
But now we have to take note of yet another problem: the sinusoids
```

that compose X(f) extend to infinite time in both directions. They

cannot change. That being the case, where does randomness come in?

Obviously, we cannot use simple Fourier transform analysis properly

if we are dealing with a randomly varying waveform. We have run into

an extreme case of a more general fact; the more precisely you

specify the frequency description of a waveform, the less precisely

can you specify the time when the representation is relevant, and

vice versa. If you have an exact Fourier spectrum, it is able to

give you an exact representation of the waveform over some interval,

but you cannot know where in infinite time past and future that

interval might lie. Conversely, though we haven’t provided an

example of this, if you know exactly the time interval in which your

waveform lies, you cannot know exactly its spectrum (see

“Uncertainty principle” in Wikipedia and scroll to “Signal

Processing”, or try “Gabor Limit” which scrolls you there directly).

```
As described above, the Gabor Limit deals only in the extreme cases
```

in which you know exactly either the time interval or the frequency

bandwidth W. But what if you know one of them approximately? Can you

then know the other approximately? Yes, you can, and there is an

expression called the Hirschman limit that provides the limit to how

well you can know it in terms of Shannon uncertainty. I’m not going

to go into the math, because I’m not sure I understand it properly,

but you can find it under “Entropic Uncertainty” in Wikipedia. The

limit is

```
U(|x(t)|^2|) + U(|X(f)|^2) >= log(e/2)
(the uncertainty of the power spectrum and the squares of the time
```

samples cannot sum to less than log(e/2) where the base of the log

is the same as is used in computing the uncertainty. If the base is

2, the limit is log(e) -1 = 0.44 bits when the waveform statistics

are Gaussian – meaning that both the distributions of time samples

and of spectral values have a Gaussian distribution). The lower

limit of the sum of the two uncertainties is higher if the

distributions are not Gaussian, but it cannot be lower.

```
In most circumstances the uncertainty of X will be much higher by
```

itself than the Hirschman limit, because to get this low requires

that each individual sample of x(t) be separately identified. If,

for example, we know only the sum or integral of all the prior

values of x(t), the uncertainty U(|x(t)|^2) is very large, which

means that under those conditions, the Hirschman limit imposes no

constraint at all on our knowledge of the power spectrum of X.

```
Now we are getting closer to the question of lag. I told you it was
```

non-trivial, even though your one-liner description is conceptually

correct.

```
The reason for going into the above uncertainty relations was to
```

reinforce the idea that one cannot be accurate in both of two

complementary views of the situation. There are all sorts of

different transforms of waveforms – wavelets is a current favourite

– but in all cases, if you have precision in the time domain, you

lose it in the transform domain, and vice-versa.

```
Let's consider a continuing waveform for which we use only a finite
```

running set of time samples x(t). By “finite running set” I mean

that if the time now is t0, we use only the N samples from t0-k-N to

t0-k. We know we can’t limit the region of time precisely without

needing to consider an infinite range of frequency, and we can’t

limit the frequency band precisely without considering infinite

time, but we can come pretty close to limiting both of them, as we

know from the small size of the Hirschman limit. So we won’t go too

far wrong by pretending we can limit time to an interval T and

frequency to a bandwidth W, so long as we keep in mind that we are

approximating and that there are conditions in which the

approximation matters.

```
As mentioned above, the Nyquist-Shannon theorem proves that we can
```

describe a waveform exactly over its entire course during an

interval T if we know its values at intervals of no more than 1/2W

seconds – or even less precisely, at an average interval rate of

1/2W over the interval T.

```
At this point we should discuss the concept of "effective
```

bandwidth". So far, everything concerned with bandwidth W has

implicitly assumed that the contribution to the uncertainty is equal

for any frequency within W and zero for any frequency outside W. But

this clearly is not true. Suppose that the spectrum consisted of one

frequency that contained almost all the power of the waveform plus

low levels of any other frequency within the band W. The waveform

would look very like a sinusoid of that main frequency, slightly

changing in amplitude from peak to peak and slowly changing in phase

from cycle to cycle. If we knew its amplitude and phase at the time

of some observation, we would be able to predict its track quite a

long way into the future (and back into the past as well). Our

uncertainty about the value of the waveform at time 1/2W seconds

into the future would be almost zero. So W, the band that contains

all the signal energy, is clearly not the “effective” bandwidth of

the signal if the contributions to uncertainty from the different

frequencies are unequal.

```
Shannon considered this problem as one of filtering, as a conceptual
```

device to link the effective bandwidth to the effective bandwidth of

the white noise for which the 2W samples per second applies. If a

filter has a certain passband shape defined by |Y(f)|^2, and a white

noise of bandwidth W is fed into the input of the filter, the

uncertainty of the output of the filter is that of the input

multiplied by (1/W)*exp(integral_over_W(log(|Y(f)|^2)df). The

“effective bandwidth” of a signal (Weff) is its actual bandwidth W

divided by the result of this expression.

```
Shannon provides an explicit result of the divisor expression for
```

several spectral shapes and shows how to use these canonical shapes

to approximate others if an explicit form is not known. In 1949,

Shannon did not have access to the computing power now available; to

compute this formula numerically is now a simple matter if the power

spectrum of the signal is well defined.

```
Samples taken 1/2Weff seconds apart are informationally independent,
```

so if we look at the waveform beyond 1/2Weff seconds after the end

of interval T, we will be able to determine nothing about the value

of X beyond what we would know by examining its gross statistics. So

all our enquiry about the effect of lag must be concerned with lags

less than 1/2Weff. If the transport lag around a control loop

exceeds 1/2Weff, where the waveform in question is the disturbance

input, there is no way that the output could consistently reduce the

error

```
Still not considering control, we are concerned with the change of
```

uncertainty about a waveform as time passes since the most recent

observation, but before 1/2Weff. At the time of the last

observation, the uncertainty is set by the resolution of the

observation. When measured in units of the observation resolution,

the uncertainty at the moment of observation is zero. After 1/2Weff

seconds the uncertainty is set by the global statistics of X and the

resolution of observation. The magnitude of the uncertainty after

1/2Weff depends on the distribution of probability over the

different possibilities for X, but if that distribution is Gaussian,

the global uncertainty is

```
` ( Standard Deviation of global probability
```

distribution )``

```
`` Umax =log (
```

--------------------------------------------------- )

``

```
`` ( Standard Deviation of observation
```

distribution`` )`

```
Define w = (time since last observation)/2Weff. Then the uncertainty
```

after delay w is w*Umax(X), and the waveform can be said to have an
uncertainty generation rate of 2Weff*Umax bits/second. Tc in the

figure is 1/2Weff.

```
<img src="cid:part2.05070506.09000101@mmtaylor.net" alt="">
We can follow a similar analysis for discrete variables, even ones
```

with internal syntactic relations. I discussed a lot of that in the

long tutorial on information and uncertainty [Martin Taylor

2013.01.01.12.40]. The end result is the same. Provided that the

sampling moments are independent of the values of the observations,

the uncertainty grows linearly from zero at the moment of the last

observation to its maximum value at some future time determined by

the statistics of the signal. However, as with any of these general

statements, there can be individual instances in which the

uncertainty might even decline after growing for a while. For

example, if X is a sequence of characters samples from English text,

and the observation is that the last three characters in a string

are space-t-h, it is highly probable that the next character will be

e or i. But it is rare for a random observation to have been taken

at exactly the moment when space-t-h happened to be the most recent

character sequence. Such cases are averaged out by cases such as

e-d-space, where the next letter is anyone’s guess.

```
Finally again talking about control, if the loop transport lag is L
```

seconds, then at time t0, Eo (the influence of the output at the

CEV) cannot be derived from any values of the CEV later than time

t0-L. Since then, the CEV has been influenced by all the changes in

the disturbance over the subsequent L seconds. The net result of

those changes is uncertainty that is proportional to the time since

the last observation:

```
U_L(X) = min(2Weff*L*Umax, Umax) where X represents the possible
```

values of the CEV.

```
U_L(X) is the minimum possible uncertainty of the CEV for a loop
```

with transport lag L, given Eo and its history (EoHist). Since the

global uncertainty of the disturbance is defined as Umax, the

maximum value of the mutual information between EoHist and d (the

disturbance waveform) is

```
Mmax(d:EoHist) = Umax - min(2Weff*L*Umax, Umax)
The uncertainty-based measure of control quality is the difference
```

between the uncertainty of the CEV without (Umax) and with control

(min(2Weff*L*Umax, Umax)). That difference just happens (no

coincidence if you remember the long tutorial) to be Mmax(d:EoHist).

The actual value of Umax depends on the resolution of the

observation if the variation of the CEV is continuous, but not if it

is discrete.

```
--------End Appendix--------
```

`MT: Looking only at a`

single control system, there’s no way r could covary with d.

But if r were to be derived from the output of another

control system that observed d directly, it could. I’m not

sure what such a connection would do for control, though.

AM:

Perhaps not observing d, but using memory to approximate it.

Adam