Trying to grasp Kalman Filters; adaptive models

[From Bill Powers (960704.1900 MDT)]

RE: Hans Blom's adaptive model.

I did a search on "Kalman Filter" on WWW today and got 27,000 hits. I
guess this idea is pretty popular. Just before the Fort Lewis computer
went on an unscheduled vacation, I was looking at a paper by a guy in
Chapel Hill (UNC department of computer science) purporting to be an
"introduction" to Kalman Filters. The mathematical notation is beyond my
comprehension, but putting the words together with what I could grasp, I
think I got a better picture of how the Kalman filter works, at least in
the application Hans describes. I hope you will tell me how far off the
track I am, Hans.

The method seems to involves adjusting the parameters of a function to
make the value of the function match some particular value or time
function. In doing this, an error measure is calculated, and then, using
the form of the model, the partial derivative of the error with respect
to each parameter is calculated. Then, using the partial derivatives
found in this way, each parameter is adjusted by a small amount
(proportional to the partial) in the direction that would tend to
decrease the error measure, and the value of the function and of all the
partial derivatives is computed again. If the steps by which the
parameters are adjusted are not too large, this procedure will finally
converge to some stable set of parameters and a minimum in the error
measure.

Along with this basic process, there seems to be some use made of
uncertainty measures. As near as I can figure, they are used to adjust
the amounts of change in the parameters on each step. I suppose this is
done so the convergence won't be unduly affected by noise, or perhaps so
that noise fluctuations won't be treated as systematic errors.

Is this anywhere near right?

···

--------------------------------
The model proposed by Hans, I now realize, is not really a closed-loop
model as we understand it in PCT. Here is a very simple diagram of how
it works.

                              > Ref signal
  ^ |
  > modeled v
  > perceptual ----> Inverse of
  > signal / function f'
  > adj /params |
  > / |
  > Model of |
  x'<--- environment, <------ |
  > function f' |
  > ^ |
  v (adj | params) |
   \ | |
  Extended Kal. Filter | system output signal,
   / | u
  ^ perceptual signal |
  > >
  > >
  > Real |
  x <-- environment <-------
        function, f
       ("the plant")

For this model to operate, the form of the real environment function
must be known, and it must exist inside the controlling system both in
the forward (normal) form and in the inverse form. The function f
describes not only the environmental "plant" but the properties of any
transducers and actuators that convert the signal u into a physical
effect on the plant. If, for example, the actuator is a human arm, and
the plant is some device operated by positioning the arm, the function
f' must include the kinematics and dynamics of the arm itself as well as
the plant. The inverse function must include the inverse kinematics and
dynamics of the arm and also the inverse of the plant function. There is
nothing in this model that can generate f' or its inverse; both of these
functions must be given before adaptation can begin. Adaptation consists
of adjusting the parameters of f', and somehow also adjusting the same
parameters in the inverse form of f' to the same values.

The Kalman Filter algorithm adjusts the parameters of the model function
f', so that when the real environment and the model are driven in
parallel by the output signal u, the modeled perceptual signal x'
behaves the same as the actual perceptual signal x. I am assuming here
that the real perceptual signal and x are the same thing. If they are
not, then the model must also include the properties of the perceptual
input function just prior to x'.

The model f' is not used in the actual control process. Its only purpose
is to provide a way for the Kalman Filter algorithm to determine the
parameters of the external function f. As those parameters change during
the adaptation process, they are copied into the inverse model that is
driven by the reference signal. It is the reference signal operating
open-loop through the inverse of the model function that actually
generates the output signal u. There is no comparator or closed loop in
this part of the model.

The only proposed advantage of this organization over a simple negative
feedback control system (other than its adaptive properties) is that it
can continue to operate in a disturbance-free environment after the
perceptual signal x is lost. Variations in the reference signal will
continue to be converted to changes of output that would be appropriate
if the environment did not change. The modeled perceptual signal x'
would continue to vary as dictated by the changing reference signal.

To achieve the same effect with an ordinary PCT control system is not
easy. Of course very little effort has been devoted to trying to achieve
this effect -- there are certain not 27,000 references to papers on that
subject.

Both models would require a special detector to report the loss of the
feedback signal, so that the absent input signal would not be treated as
a real input value of zero. This detector would have to actuate some
device that disables the normal operation of the system: for the
adaptive Kalman Filter model, it would have to disable the adaptation
process. The adjustment required in the PCT system might be complex,
although there may be ways of organizing the system so that only one
signal need be clamped to zero. Systems with integrating outputs will
hold their outputs when the error goes to zero. In a multileveled
system, the highest derivative might be held constant, or might be
blindly set by a reference signal.

There are clearly some advantages to the Kalman Filter approach, in
terms of achieving continued operation after loss of feedback
information. However, the most critical question in relation to models
of living systems is to what extent living systems have this capacity.
Obviously, if control processes by living systems degenerate when
feedback is lost, a model that continues to operate properly would be
the wrong model. The correct model would show the same degeneration of
control (if any) that the real system shows. If the purpose of a
control-system model is to imitate a living system, finding the "best"
control organization is not the goal; finding the best organization
would be appropriate in industrial design, but not in modeling
organisms.

Also, there are alternatives to the architecture of the adaptive Kalman
filter model. It is possible that rather than the same control process
continuing when feedback is lost, some alternative control process is
started which uses different feedback information to achieve the same
end. An example is entering information onto a computer screen. When a
familiar sequence is created on the screen over and over -- a login
process is a common example -- the system is reliable enough that it is
not necessary to observe the echoed characters during typing, although
normally one does so. If the system response is slow, one can type the
correct letters by watching the keyboard instead of the screen, or for
touch typists, by feeling the correct keystrokes. Errors of typing can
be detected from visually or tactilly observing which keys have been
depressed; errors are essentially never introduced between the keypress
and the character that appears on the screen. If the typing is done
correctly, eventually the expected sequence of letters will appear on
the screen.

There are many cases where we adjust things not by watching the variable
to be adjusted but by watching something on which that variable reliably
depends. While one could adjust the temperature of a room just by
feeling the air temperature and moving the setting lever on a
thermostat, the more common way is to set the lever to a calibrated
mark. Only if this resulted in a long-term felt temperature error would
one use the larger control loop, and even then one would probably just
change the target number to which the lever is set. It's quicker to
perform a calibrated action (which is itself an example of perceptual
control) and rely on the constancy of nature to produce the desired
result. I think we do this whenever it is possible and whenever precise
control is not critical; we use the lowest level of control that will
actually produce the desired higher-level result, near enough. Naturally
if the environment changes, the higher-level loop has to be brought into
real-time action, to reset the target value of the calibrated action.
And of course in many situations there is no calibrated action that will
make control easier. These are situations where unpredictable
disturbances are frequent, and in which the effects of actions on the
environment are not exactly enough reproducible.

This kind of possibility narrows the range of situations in which true
adaptation is required. And of course it does not call for any control
system to go on operating normally when feedback information to that
system is lost. One simply changes control systems -- or, realistically,
control simply fails.

I am convinced that in some kind of behaviors, probably at the higher
levels, a model similar to the one Hans proposes might be appropriate.
The greatest disadvantage of this model, however, is that it requires
the nervous system to know the form of an environmental feedback
function, and to be able to compute its exact inverse, or at least one
valid inverse. This kind of knowledge is hard to account for in lower-
level systems -- it's hard to imagine the midbrain or the brainstem
computing the inverse of a complicated function, and hard to explain how
the internal model acquires the correct basic form which is necessary
before parameter adjustments can achieve anything.

At higher levels, where cognitive functions are to be expected, complex
operations are more to be expected (although it's still embarrassing not
be be able to give even a hint as to how they are carried out). However,
at lower levels of control behavior, the evidence is very strong that
interruption of feedback pathways grossly distorts behavior and often
makes it impossible. Few people can thread a needle or drive a car
blindfolded. The adaptive model that continues to produce organized
behavior in the absence of feedback would simply be the wrong model for
these kinds of behavior.

The Kalman Filter approach seems to be a general method for
systematically adjusting the parameters of a function to achieve a
desired input-output relationship. It could be applied in many ways
other than the particular way it is used in Hans' model. Also, there are
other methods for doing the same adjustments, involving different and
often much simpler algorithms. My artificial cerebellum model does the
same kind of thing, but uses no symbolic calculations at all (it's
basically an analog computation). The author of that paper I mentioned
above said that the basic principle is a principle of feedback control.
It's the same basic idea as "backward propagation." You set a desired
outcome, compare the actual outcome with it, and use the error as a
basis for adjusting something about the process that leads to the
outcome. There are probably many methods for doing this, of which the
Kalman approach is one. I expect that we have not yet found the simplest
of such processes, or the kinds that would be most amenable to
implementation in a nervous system.
-----------------------------------------------------------------------
Best,

Bill P.

[Hans Blom, 960709b]

(Bill Powers (960704.1900 MDT))

I did a search on "Kalman Filter" on WWW today and got 27,000 hits.

Yes, the idea is pretty popular. It is one of the standard tools
nowadays.

"introduction" to Kalman Filters. The mathematical notation is beyond my
comprehension, but putting the words together with what I could grasp, I
think I got a better picture of how the Kalman filter works, at least in
the application Hans describes. I hope you will tell me how far off the
track I am, Hans.

The text you found wasn't, maybe, as introductory as you would need.
It shows in some of your remarks, that it discusses certain extras
that expand on the basic theory. Overall, you understand the approach
but your reluctance to use (or inexperience with) probabilities may
make that it _seems_ beyond your comprehension.

The method seems to involves adjusting the parameters of a function to
make the value of the function match some particular value or time
function. In doing this, an error measure is calculated, and then, using
the form of the model, the partial derivative of the error with respect
to each parameter is calculated.

Correct. Basically, the method is a "real time" curve fit. You may be
acquainted with the least squares method of fitting a best straight
line through a number of points, resulting in two parameters: a slope
and an offset. The standard method, however, applies only after all
points have been collected. Its disadvantage is that no intermediate
results are available. This can be modified: make best parameter
estimates available after EACH new point is collected (measured). You
could use the standard formula again and again, but an iterative
(some call it recursive) formula exists that provides a CORRECTION to
the earlier results as soon as a new point becomes available.

The correction formula is relatively easy to derive (you might try to
derive it yourself). Start with the basic formula which gives you
(o, s | N), which is short for "offset and slope given N points). Now
do the same for N+1 points: you get (o, s | N+1). The "correction
formula" is to derive a function

     (o, s | N+1) = (o, s | N) + ...

Hidden (or, upon closer inspection, not so hidden) in the correction
term are what can be interpreted as a number of variances that tell
us how good the estimates of o and s are. The standard approach is to
isolate these into separate formulas, which sort of gives them a life
of their own. This means that after every new observation not only
the best (in the least squares sense) parameter estimates o and s are
available, but also their accuracies.

You can generalize the method to other "models" besides a straight
line; this does not change the basic approach or the formulas.

                           Then, using the partial derivatives
found in this way, each parameter is adjusted by a small amount
(proportional to the partial) in the direction that would tend to
decrease the error measure, and the value of the function and of all the
partial derivatives is computed again. If the steps by which the
parameters are adjusted are not too large, this procedure will finally
converge to some stable set of parameters and a minimum in the error
measure.

There seems to be a refinement here that goes beyond the basics. Such
a refinement might be required when the function that must be fitted
is very nonlinear (taking small steps will then prevent overshooting
the top of the hill) or when it is important that the parameter
estimates are more stable than they would otherwise be (small steps
introduce something like low pass filtering).

Along with this basic process, there seems to be some use made of
uncertainty measures. As near as I can figure, they are used to adjust
the amounts of change in the parameters on each step. I suppose this is
done so the convergence won't be unduly affected by noise, or perhaps so
that noise fluctuations won't be treated as systematic errors.

No, the explanation is different: they are weighting functions that
tell how much additional information will be obtained from a new
measurement. That depends on how much information is available
already. Let me give you a simple example. In the computed average of
N numbers, each number contributes 1/N'th of the result; of N+1
numbers, 1/(N+1)'th of the result. Each successive measurement is
thus "worth less", and the correction due to successive measurements
thus will decrease. The uncertainty measures remember the (relative)
worth of the existing information.

Is this anywhere near right?

Partly. You need to see the relation with simple least squares curve
fitting. This relation was probably assumed known and thus only
implicit in the "introduction" that you read.

The model proposed by Hans, I now realize, is not really a closed-loop
model as we understand it in PCT.

Yes, we've had that discussion in the past. I remember telling you,
somewhen in the remote past, that my home heating system does not
have a thermostat. It is a feedforward system that adjust the
temperature of the circulating water to the outside temperature,
which is measured. (That such a system works as well as it does may
depend on the fact that in the part of Europe where I live weather
changes are not extreme and houses have better insulation.) To get
the system functioning well, it had to be tuned. In effect, the
appropriate slope and offset of the "operating curve" had to be
found. I had to do that manually, but this procedure could have been
automated, at the cost of an extra sensor and extra "intelligence" in
the controller/adjuster.

Now assume that this adjustment takes place in real time, so that the
operating curve is always well tuned. At the lowest level, we still
have a feedforward control system, but one level up is the (control?)
system that continually adjusts the operating curve. This is, in
short, how Kalman Filter based control systems work. As you see, this
is different from the PCT paradigm.

For this model to operate, the form of the real environment function
must be known

Here, "known" must be interpreted in the same sense as we "know" a
straight line from the computed offset and slope fitted through a
collection of points. So we'd better say that it must have been
"modelled" than to say that it must be "known".

                                                  The function f
describes not only the environmental "plant" but the properties of any
transducers and actuators that convert the signal u into a physical
effect on the plant.

Correct. It is not knowledge of what is "out there" what is
important, but the (perceptual) results of our actions. It doesn't
matter, then, where we draw the line between inside and outside. At
the neuromuscular interface? At muscle-skin interface? At the skin-
world interface? It does not matter.

                                                       There is
nothing in this model that can generate f' or its inverse; both of these
functions must be given before adaptation can begin. Adaptation consists
of adjusting the parameters of f', and somehow also adjusting the same
parameters in the inverse form of f' to the same values.

In the basic approach, that is correct. We fit a straight line
because we initially assume that this will work. Remember that in a
real time curve fit we start out with only very few observations.
There are, however, indicators that tell us that a straight line is
not a good model. The single best indicator is the prediction that
the old knowledge offers. Given the prediction variance, which can be
extracted from (o, s | N), we can predict that the next point must
lie "close to" the line, say within 2 or 3 standard deviations. If
this is frequently enough (or on average) not true, a better model
must be used.

This goes far beyond basics. Now, we need to design a good test that
indicates when the prediction is incorrect; if discrepancies are
large, this is easy, if not, then not. We also need a method to
choose a better model. And we also need a method that "translates"
the old knowledge (model parameter values) into new knowledge, in
terms of the new model; without such a method, all old knowledge
would be lost. Needless to say, all of this can be automated, but
things get very complex.

The only proposed advantage of this organization over a simple negative
feedback control system (other than its adaptive properties) is that it
can continue to operate in a disturbance-free environment after the
perceptual signal x is lost.

Let's speculate a bit about what an internal model in a human could be
used for, apart from moment-to-moment control. Basically, what is
available is a description of _what can be done_: imagine a pattern
of actions and observe, again in imagination, what the outcome will
be in terms of perceptions. In short, an internal model allows
planning: imagine action (t) for t=0 to t=T, and you will, within the
accuracy of the model, observe perception (t) for t=0 to t=T.

Not that plans cannot fail, of course, but if they do you can assume
that you have an inaccurate model which needs to be tuned better. And
its prediction failure is, in itself, an indicator that further
tuning is required. The latter is fully in line with what the Kalman
Filter formulas show. If the prediction is exact, the model
parameters remain unchanged (but variances are decreased; we are more
certain); if the prediction deviates from the observation, the model
parameters are made to change. Inspect the basic formulas and see...

Planning is, of course, not different from control in the absence of
perceptions. It is different only in that it has to take place at a
far faster rate than real time. The model is there; is a mechanism
available that can run it in "fast forward"? My perceptions tell me
that this is so, although the above description of _how_ it is done
is extremely crude.

To achieve the same effect with an ordinary PCT control system is not
easy. Of course very little effort has been devoted to trying to achieve
this effect -- there are certain not 27,000 references to papers on that
subject.

As long as there is no uncertainty in a model, I doubt that you can
achieve the same effect. On the other hand, IF you achieve similar
effects (as in your "artificial cerebellum"), it must be through
mechanisms which come to grips with uncertainty, whether you see it
in that light or not. One aspect of uncertainty is that all results
are preliminary; they are not fixed but can be changed when new
information becomes available.

Both models would require a special detector to report the loss of the
feedback signal, so that the absent input signal would not be treated as
a real input value of zero.

The principle of this mechanism is easy: prediction fails. In my
blood pressure controller, for instance, knowledge is built in about
the possible values of the "features" of one period of the signal.
For instance, the maximum value (systolic arterial pressure) must lie
between certain values (is the value in the normal range?), and the
same is true for the _difference_ between the maxima of successive
periods (is the value in the stable range?). These tests are applied
to about 10 such features (maximum, minimum, mean, period, several
slopes, several differences). If the features of a certain period fail
those tests, the signal is considered invalid. In that case, "feed-
forward" control takes over. For one minute at most; after that, an
alarm sounds and manual "control" must take over. Since manual
control is impossible as well, this is actually "shifting the blame";
the control system cannot fail, but the clinician can. In practice,
the measurement problem is usually solved pretty rapidly...

                      This detector would have to actuate some
device that disables the normal operation of the system: for the
adaptive Kalman Filter model, it would have to disable the adaptation
process.

Right!

There are clearly some advantages to the Kalman Filter approach, in
terms of achieving continued operation after loss of feedback
information. However, the most critical question in relation to models
of living systems is to what extent living systems have this capacity.

Do you know of another mechanism that "explains" planning? You might,
of course, that planning does not exist. But that does not solve the
problem.

Obviously, if control processes by living systems degenerate when
feedback is lost, a model that continues to operate properly would be
the wrong model.

Some time ago, I proposed a test: have the feedback signal go away
occasionally in a tracking experiment and observe the subject's
behavior. What happens? A) The subject stops, complaining that he
cannot track when no feedback is available. B) The cursor remains at
a constant position as long as there is no feedback. C) The cursor
follows a non-constant trajectory that is some sort of extrapolation
of its previous path. D) ...

Only A would indicate that no internal model exists. B would occur
with the simplest model thinkable: a zero order hold. C might be a
first or higher order hold.

          The correct model would show the same degeneration of
control (if any) that the real system shows. If the purpose of a
control-system model is to imitate a living system, finding the "best"
control organization is not the goal; finding the best organization
would be appropriate in industrial design, but not in modeling
organisms.

That is true. On the other hand, control _theory_ might shed some
light on what is possible and what not, and therefore suggest some of
the tests that might be performed. Bruce Gregory describes this
aspect of the value of a theory extremely well.

I am convinced that in some kind of behaviors, probably at the higher
levels, a model similar to the one Hans proposes might be appropriate.
The greatest disadvantage of this model, however, is that it requires
the nervous system to know the form of an environmental feedback
function, and to be able to compute its exact inverse, or at least one
valid inverse.

This is a misunderstanding. In the first place, "smooth" functions can
be well approximated by only a few terms of their series expansion,
and many of the "laws of nature" appear to be smooth. That means that
a simple model can usually provide a pretty realistic approximation of
reality. Second, exactness is not required, precisely because control
goes on: errors will be corrected upon their perception. Control --
the continuous availability of feedback information -- allows simple
approximations to be effective.

         This kind of knowledge is hard to account for in lower-
level systems -- it's hard to imagine the midbrain or the brainstem
computing the inverse of a complicated function

Hardly more difficult than imagining the brain to compute a multi-
plication or an integral...

                                    , and hard to explain how
the internal model acquires the correct basic form which is necessary
before parameter adjustments can achieve anything.

The "correct basic form" could (see above) be an extreme simplific-
ation. And we _do_ start with some innate mechanisms!

Few people can thread a needle or drive a car blindfolded.

But _all_ people who drive a car occasionally take their eyes off the
road without causing an immediate accident.

often much simpler algorithms. My artificial cerebellum model does the
same kind of thing, but uses no symbolic calculations at all (it's
basically an analog computation).

There is an analog equivalent of the Kalman Filter. Instead of
difference equations it uses differential equations. And there are,
indeed, many simpler algorithms. Generally, however, they can be
shown to be (suboptimal) simplifications of the Kalman Filter
approach. As so often, some compromise between complexity and
theoretical correctness or maximum speed of convergence has to be
found. You lean toward simplicity. I lean toward maximum speed of
convergence. We need much more research before we will know what
"choice" biology has made for us humans.

So all in all, Bill, I suggest that you go back to basics: fitting a
straight line through sequentially measured x, y (or x, t) values
and, in doing so, rediscover Kalman Filtering for yourself. Doing so
will show you the basic mechanism in all its simplicity. All else is
embellishment of this basic method.

Greetings,

Hans