Martin's Model

[From Bill Powers (2007.05.21.1425 MDT)]

Martin Taylor (2007.05.21) --

Looking over the diagram for "control17o.vi", I believe I see that you were advancing the phase of the target perception by an adjustable fraction of the input delay time, so the controller was tracking the target as if it were some distance ahead of where it actually is, although still with some perceptual delay.

The target signal seems to come from "Reference array", but the way it's dealt with after that is still beyond me. At some point you seem to be calculating Vt = 0.9*Vt + 0.1*(dVt/dt), which supposedly produces something called "smoothed derivative," which in turn seems to be identical with Vt. That is put into an "Array of past target derivatives" which becomes an input to an adjustable prediction lag. The lagged smoothed target derivative then is subtracted from the lagged output velocity (output V) (where I take it that the "output" is actually the cursor position), and that produces the "chase error" after subtraction from the "chase reference value" which comes from the position error signal.

At this point I have lost track -- it seems that the higher order velocity signal is being used to adjust the reference for the lower order velocity controller, whereas I would have expected the position error to determine the velocity reference. But I could easily be misreading.

I hope you realize that you're proposing an extremely complex model here, unless you have some much simpler way for these computations to be carried out. Everything that is in the main rectangle of this block diagram represents part of the model, and amounts to a claim that the correponding computation is going on in the real system. By implication, each independent block calls for at least one parameter, whether given an arbitrary value or made adjustable. I understand that some functions require rather indirect ways of carrying them out due to limitations of the simulation language, but even discounting those there are parameters unaccounted for. For example, you have an overall lag parameter, but it seems to adjust not only the perceptual input lags, but the lags for derivative signals for both "input" and target, as well as affecting the relative prediction lag. Is there some fundamental reason why those lags have to be the same? And the smoothing method uses arbitrary constants of 0.9 and 0.1 (1.0 - 0.9), which should by rights be adjustable since nothing says they have to have those particular values -- in two different places.

One of the pitfalls of modeling is the problem of proving that the system you have modeled actually works in the way you say it works. Hidden or covert loops can invalidate what one says on the basis of looking at just part of the system, and when the design gets complicated enough it's really not possible to be sure that what one says is happening is really happening. To be sure I understand a model, I must either keep it extremely simple, or actually go through the chore of setting up and solving the system equations, which I would certainly not like to have to do with control17o.vi.

I hope the models you ended up using are simpler than this one!

Best.

Bill P.

[Martin Taylor 2007.05.21.17.31]

[From Bill Powers (2007.05.21.1425 MDT)]

Martin Taylor (2007.05.21) --

Looking over the diagram for "control17o.vi",

I'll have to go over it myself to know whether you are seeing it correctly (or at least as I intended it to be, which, as with any software, may be different). I won't do that until I know which models I actally used. I'm glad you are looking so carefully at the one model. If it's one I used, we are ahead of the game, and if it isn't, at least you are getting the hang of how they were built.

However, as I remember...

I believe I see that you were advancing the phase of the target perception by an adjustable fraction of the input delay time, so the controller was tracking the target as if it were some distance ahead of where it actually is, although still with some perceptual delay.

The target signal seems to come from "Reference array",

That seems right, though I'd balk at the term "phase". It's a time advance. The "Reference Array" is indeed the target. It's what the cursor would do if tracking were perfect, but except in only two of the six experimental conditions does that actual variable appear to the subject. In the other four, it must be computed by the subject from two values that appear on the screen.

This is an aspect of the model that I know to be an unfortunate approximation. Initially I had hoped to be able to fit models that treated the two different variable inputs separately and accounted for their combination, but that led to too many optimizing variables, and as my professor in graduate school said: "with four variables, you can fit an elephant; with five you can make its trunk wag". I use five as it is :slight_smile: I conceptualized the combinatory process as adding to the perceptual delay and left it at that.

Remember, the objective of the exercise was to try to determine whether tolerance thresholds matter, and whether prediction matters, as well as to track the change of parameter values over the period of sleep loss. This latter was, a priori, not likely to show as much as in the 1994 study, as this time there was only one sleepless night, whereas in 1994 there were two, and all the major effects showed up on and after the second. Anyway, that's why five parameters for the model fits. I gave myself five as a limit, setting the others (to which you refer) in a particular model either for conceptual reasons or after a brief (few days) trial to find a value somewhere not too far from optimum.

For now...

For example, you have an overall lag parameter, but it seems to adjust not only the perceptual input lags, but the lags for derivative signals for both "input" and target, as well as affecting the relative prediction lag. Is there some fundamental reason why those lags have to be the same?

No, if this model does that, it's a preperty of that model. Others don't. If it's true for this model, then it was done so as to free a parameter variation for optimization in a different place. I can't answer the question for this particular one until I see whether the model was one I used, and I can't justify it even then without comparing this model to the others that were used.

And the smoothing method uses arbitrary constants of 0.9 and 0.1 (1.0 - 0.9), which should by rights be adjustable since nothing says they have to have those particular values -- in two different places.

0.9 is indeed arbitrary. It allows the value of the derivative tobe not badly affected by the fact that sample values and sample moments are discrete, but is well within the bandwidth requirements of the target movement. In other words, at's an effective approximation. 0.8 gives slightly jerkier results, 0.95 gets closer than I would like to the band limit. I stuck to 0.9 everywhere.

I'll make a better response than this, perhaps Wednesday or Thursday.

Martin

[Martin Taylor 2007.05.21.17.44]

[From Bill Powers (2007.05.21.1425 MDT)]

One of the pitfalls of modeling is the problem of proving that the system you have modeled actually works in the way you say it works. Hidden or covert loops can invalidate what one says on the basis of looking at just part of the system, and when the design gets complicated enough it's really not possible to be sure that what one says is happening is really happening. To be sure I understand a model, I must either keep it extremely simple, or actually go through the chore of setting up and solving the system equations, which I would certainly not like to have to do with control17o.vi.

You are absolutely right, and indeed you are following the train of thought I used in doing this years-long simulation analysis.

My approach, at least in intention, was to try a variety of different plausible models, for two reasons: (1) if one of them fitted most of the data appreciably better than the others, then it might form a basis for further studies; and (2) if more than one of them gave good fits and agreed with each other in the need for such things as tolerance and/or prediction, then it would suggest that those might be elements of the real behaviour being modelled.

In the very early stages, I had planned to use non-linear gain and/or perceptual functions such as power law or exponential/logarithmic. In an excess of enthusiasm, I reported some of those results to CSGnet, but I soon realized that the data were nowhere near good enough to make the very subtle discriminations they imply (you can make a very good approximation to the effect of a slightly curve gain function by changing the slope of a linear one, if you are dealing with noisy real data). So I gave up on that kind of parameter variation. But it doesn't mean humans have linear prceptual and gain functions.

The trap, as I see it, is that (for example) if the true gain function is a power function of, power, say, 1.2, a linear function wth a threshold gives a better approximation than does a linear function without a threshold. One can be misled into thinking that tolerance thresholds are real. There are othr such traps, but this one seems not to apply. In various models I did try that kind of threshold, with a gain function like (positive half only):

···

         *
        *
       *

_________

None of the models with that kind of gain function fitted well, as I remember. From memory, the models that did fit better used a function that had zero gain up to some threshold, and thereafter was exactly what the gain would have been without the threshold.

         *
        *
       *
      *
     >

___________________

I'm not saying that humans have this kind of a gain function, but it is (post-hoc) plausible. One lets some controlled perception be, until the error exceeds one's tolerance, and when it does, one acts reasonably strongly to bring the error back within tolerance bounds. The other kind of function would suggest that when the error goes beyond tolerance, one acts very gently to bring it just into bounds. For multiplexing our limited degrees of freedom for output, the second graph seems more useful (and a hysteric function would be even more plausible).

Anyway. I think we are thinking on parallel tracks.

Martin

Re: Martin’s Model
[Martin Taylor 2007.05.23.23.52]

I had planned to keep this thread private between Bill and me,
but Rick asked me to keep it on CSGnet. So, at the risk of annoying
others who may not have broadband connections, I do so.

I’ve checked which models were used for the full test of the
sleepy teams study data. You have one of the (17o). The others were
17q, 18b and 18c. 18c proved uniformly worse than the other three, so
I’m ignoring it. Here are brief descriptions of what the other three
are [supposed to be] doing.

CtlModel17q.jpg

CtlModel18b.jpg

···

Model Descriptions (not including 18c)

The word “(parameter)” means a variable used in
optimizing the performance of the model separately for each data run,
of which there were 42 for each subject.

18b The target position is
perceived with a lag (parameter)

    The

target velocity is perceived with a “predictor lag”
(parameter) and used to compute the target’s present position

    The

computed target position is compared with the current output value

The thresholded (parameter) error (integrated x
gain) is used as reference value for output velocity

The final parameter is gain of the

velocity control loop

17o The target position is
perceived with a lag (parameter)

The lagged target position is compared with the
current output

The position error is thresholded and used
(integrated x gain) as a “chase” reference value

 The target velocity is

perceived with a different lag (parameter multiplies position lag)

   The lagged target

velocity is augmented by a predicted change

The forecast target velocity is compared
with the current output velocity to produce a current “chase
rate”

 The chase error is not

thresholded and is used (integrated x gain) as the output that
generates an output position value.

17o tends to be better than 18b, but less so for male subjects in the
“hard” conditions in which two targets are varying and the
sum has to be tracked.

17q Same as 17o, except
that if the position error exceeds the threshold, the switch affects
the chase rate control rather than the position error control. If the
position error is less than the threshold, the chase rate is
maintained at its most recent level.


Here are the Labview programs for 17q and 18b (screenshots from
ROBOLAB)
http://www.lego.com/eng/education/mindstorms/home.asp?pagename=robolab.

18b
The target position is perceived
with a lag (parameter)

    The target

velocity is perceived with a “predictor lag” (parameter) and
used to compute the target’s present position

    The

computed target position is compared with the current output
value
[From Bill Powers (2007.05.24.0240 MDT)]

Martin Taylor 2007.05.23.23.52] –

Problem 1:

If the target velocity is perceived with a lag, it is not the current
target velocity that is perceived but the velocity that existed
“lag” seconds ago. To compute the present target position you
must extrapolate the lagged velocity to present time, then take the
average of present and past velocities, and multiply that by the duration
of the position lag. This actually introduces acceleration since it
requires information about velocity changes. If you assume constant
velocity over the lag time, the model will be a little less
accurate.

However you handle this, the control system must be provided with a means
for sensing the duration of these lags, so they can be used in the
perceptual computations. This is different from including the lag in a
model of the system, because now the control system itself is given the
ability to use the duration of the lag in its perceptual computations in
order to make the prediction. In our other models, lags are present but
the system does not know what they are or even that they exist. In my
models, the system can’t compensate for the lag because it has no
information that there is a lag. In your model, it evidently does
have, a way of obtaining this information. Or that information is
simply given to the model by the programmer.

Problem 2:

Your use of the term “output” for the cursor position conforms
with current engineering practice, but not with PCT, because in general
the cursor position does not correspond to the actual output of the
system, the hand or mouse position. One test of the model once it is
matched to performance is to introduce a separate direct disturbance of
cursor position, adding to the influence of mouse position. As soon as
that disturbance is brought in, the cursor position differs from the
mouse position by some randomly-determined amount. The model should still
predict performance about as well as before (and in my experience,
does).

Problem 3:

This is an older problem. My question about these data has always been
how well the initial model matched a baseline for performance. Can you
give some numbers for what the tracking errors were, as a percentage of
the range of target movements, and what the difference between model
“output” and real “output” (prediction error) was, in
the same terms? Needed are these numbers before the experimental
manipulation, and after it.

For comparison, a typical medium-difficulty result obtained with my
tracking program and model is 7% tracking error, and 3% prediction error,
both as RMS values of error divided by peak-to-peak target range over a
single one-minute run. These of course would be compared with your
baseline results for well-practiced subjects. In general the prediction
error I obtain is about half the tracking error at all levels of
difficulty but the easiest, amounts that are two or three standard
deviations apart.

Best,

Bill P.

[Martin Taylor 2007.05.24.08.57]

[From Bill Powers (2007.05.24.0240 MDT)]

Martin Taylor 2007.05.23.23.52] --

18b The target position is perceived with a lag (parameter)
The target velocity is perceived with a "predictor lag" (parameter) and used to compute the target's present position
The computed target position is compared with the current output value

Problem 1:
If the target velocity is perceived with a lag, it is not the current target velocity that is perceived but the velocity that existed "lag" seconds ago. To compute the present target position you must extrapolate the lagged velocity to present time, then take the average of present and past velocities, and multiply that by the duration of the position lag. This actually introduces acceleration since it requires information about velocity changes. If you assume constant velocity over the lag time, the model will be a little less accurate.

Will control less well, perhaps. But without comparing it to data, you can't say it's less accurate.

However you handle this, the control system must be provided with a means for sensing the duration of these lags, so they can be used in the perceptual computations. This is different from including the lag in a model of the system, because now the control system itself is given the ability to use the duration of the lag in its perceptual computations in order to make the prediction. In our other models, lags are present but the system does not know what they are or even that they exist. In my models, the system can't compensate for the lag because it has no information that there is a lag. In your model, it evidently does have, a way of obtaining this information. Or that information is simply given to the model by the programmer.

You can make an easy theoretical case for arguing that any real-world control system must have a way of adjusting itself to compensate for the lags that exist between the initiation of output and the effect on the perceptual signal. The mechanism of that adjustment might be by hill-climbing alteration of parameter values, it might be by random reorganization, it might be by the use of internal control loops that perceive the precision of the externally acting control loop. Whatever the mechanism, organisms that compensate for the real-world lags are likely to outcompete evolutionarily ones that don't.

Problem 2:
Your use of the term "output" for the cursor position conforms with current engineering practice, but not with PCT, because in general the cursor position does not correspond to the actual output of the system, the hand or mouse position. One test of the model once it is matched to performance is to introduce a separate direct disturbance of cursor position, adding to the influence of mouse position. As soon as that disturbance is brought in, the cursor position differs from the mouse position by some randomly-determined amount. The model should still predict performance about as well as before (and in my experience, does).

The original experimental plan did have conditions that included disturbances to the cursor position, and the software contained a parameter to set its magnitude. However, the allotted experimental time didn't allow all the conditions I wanted to test, and particularly in the "hard" condition with two variables that had to be added mentally, adding any reasonable disturbance to the cursor made the task too hard for our humans (including me). So it would up being a simple pursuit tracking task.

Problem 3:
This is an older problem. My question about these data has always been how well the initial model matched a baseline for performance. Can you give some numbers for what the tracking errors were, as a percentage of the range of target movements, and what the difference between model "output" and real "output" (prediction error) was, in the same terms? Needed are these numbers before the experimental manipulation, and after it.

The "experimental manipulation" is depriving the people of sleep. Of course we have data from early trials, and from after they had a refresher sleep at the end. If that's what you want, I can send all th Excel spreadsheets of the trands in the goodness of fit index.

Here we get into the discussion on fitting models to data, which I said earlier I don't want to get into before I go away, because the discussion (I predict) will take more time than I have available. What I will say, as I have said many times before, is that RMS error is a very poor way of describing the fit, especially when you are comparing the ways different models miss when matched to human data. You must, at the very least, sompare the model's tracking error, the human's tracking error, and the model-to-human error in a triangular fashion. It's absolutely useless, if the human has a 3% error, to say that the model fits within 3%. It could do that by tracking perfectly, using mechanisms quite different from the mechanisms used by the human. Your 7% tracking error and 3% prediction error gives two sides of the triangle, so that's better. But what would be the measure of a more human-similar model? Would a model that gave 9% tracking error and 2.5% prediction error be better? The contours of the "equally likely to be human-like" performance values in this space are not at all clear to me.

Add to that the issue of microsleeps and times when the subject is obviously not tracking (e.g. rapidly moving the mouse full range from side to side), and you have a distinct problem with RMS. I used a goodness of fit index that I can describe with a LabView program, and that I can justify, though I do not think it is are as good as I would like. It attempts to get around the problem of incorporating non-tracking periods without falsifying the results for the periods when the subject was trying to track -- perhaps not very successfully. I don't think it's possible to make a clear separation, but I think it necessary to try to make as good a distinction as possible.

Martin

What I will say, as I have said
many times before, is that RMS error is a very poor way of describing the
fit, especially when you are comparing the ways different models miss
when matched to human data. You must, at the very least, sompare the
model’s tracking error, the human’s tracking error, and the
model-to-human error in a triangular fashion. It’s absolutely useless, if
the human has a 3% error, to say that the model fits within 3%. It could
do that by tracking perfectly, using mechanisms quite different from the
mechanisms used by the human. Your 7% tracking error and 3% prediction
error gives two sides of the triangle, so that’s better. But what would
be the measure of a more human-similar model? Would a model that gave 9%
tracking error and 2.5% prediction error be better? The contours of the
“equally likely to be human-like” performance values in this
space are not at all clear to me.
From Bill Powers (2007.05.24.0905 MDT)]

Martin Taylor 2007.05.24.08.57 –

Here is the result of a run that I just finished doing and
analyzing;

The black trace shows the fit error, the red trace is the model’s
mouse/cursor movement, and the blue trace is the real person’s
mouse/cursor movement. The model’s parameters are shown at the top. This
view shows the entire one minute of the run (the program allows expanding
the trace to see it in the full time resolution of one pixel per 60th of
a second).

The tracking error (obtained from the raw run data) is 7.4% and the fit
error (obtained from the data plotted above) is 3.3%, both being an RMS
difference divided by the peak-to-peak range of target position
variations. The RMS tracking error measures the difference between the
real cursor positions and the real target positions, so it shows how well
the task was being performed. The RMS fit error measures how well the
model’s mouse positions match the real mouse positions. Since the model
fits the real behavior with an error of about half the difference between
the real cursor and the target, this shows that the model is not simply a
perfect controller. I think RMS error is an excellent measure of the task
performance and of the fit to fit of the model to the real behavior. Of
course the eyeball view of the comparison of model and real behavior
shows that this fit is just what it seems: very good.

Can you show a similar plot of the performance of your model in
comparison with a subject’s performance?

This is what I would call a baseline record, showing how one person
performs when given plenty of practice, and when rested and free of
unusual problems. The model fits well in part because learning has
reached an asymptote and the properties of the tracking system are not
changing during the one-minute run, so a single set of parameters works
for the entire run.

Starting with the above data, we could then look at the effects of
various experimental manipulations such as sleep deprivation, drugs,
visual impairment, physical loads, and so on. Presumably performance
would start changing under various treatments, and the best-fit model
parameters would start to change.

As you can see, there is some suggestion of regularity in the black
trace, showing that my performance has a high-frequency component that (I
know) is lacking in the model. This probably reflects the “essential
tremor” that appears when I perform manual tasks, an inherited
condition. By examining the difference data it would be possible to
determine whether there are other systematic components that might be
explained by adding details to the model. However, since even under this
relatively high degree of difficulty that results in over 7% tracking
error (which is a lot), roughly 93% of the variance is accounted for,
there is not much room for improving the model by adding details. The
accuracy with which any remaining parameters could be measured would be
very low. We would quickly run into the “fine slicing” problems
described by Phil Runkel in “Casting Nets…”.

Best,

Bill P.

[Martin Taylor 2007.05.24.17.19]

From Bill Powers (2007.05.24.0905 MDT)]

I think RMS error is an excellent measure of the task performance and of the fit to fit of the model to the real behavior.

I know you do, which is why I expect the technical discussion on this matter to take more time and effort han I have available. We've started such discussions before, but never come to a resolution -- at least not one that satisfied both of us.

Can you show a similar plot of the performance of your model in comparison with a subject's performance?

I think I might be able to. I'll have to see. If it turns out to take much effort, I won't, at least until I get home again. I didn't create plots like that for my study. I made lots of differen kinds of plots to help me guess wherein the models might be imporved. Maybe there was one like that. If so, it should run under RoboLab and then I could take a screenshot.

This is what I would call a baseline record, showing how one person performs when given plenty of practice, and when rested and free of unusual problems. The model fits well in part because learning has reached an asymptote and the properties of the tracking system are not changing during the one-minute run, so a single set of parameters works for the entire run.

I don't have that luxury! I have trackers doing an unfamiliar task that they continue to learn over the course of the study, and who quite obviously don't even try to track for some parts of some runs.

However, since even under this relatively high degree of difficulty that results in over 7% tracking error (which is a lot), roughly 93% of the variance is accounted for, there is not much room for improving the model by adding details.

That is the statement with which I really take issue! But not now. When I get back in July.

The accuracy with which any remaining parameters could be measured would be very low. We would quickly run into the "fine slicing" problems described by Phil Runkel in "Casting Nets...".

Yes, this is a very real danger. It's very easy to fall into that trap. The issue is to do what one legitimately can. The same problem -- over-use of data -- arises in all sorts of areas of pattern recognition (which this really is).

Martin

However, since even under this
relatively high degree of difficulty that results in over 7% tracking
error (which is a lot), roughly 93% of the variance is accounted for,
there is not much room for improving the model by adding
details.

That is the statement with which I really take issue! But not now. When I
get back in July.
[From Bill Powers (2007.05.25.0635 MDT)]

Martin Taylor 2007.05.24.17.19 –

I misspoke. The tracking error is 7+% but the model matches the real data
with 3.5%, so that’s about 96.5% of the variance accounted for, though
the calculation isn’t really correct for using the term
variance.

As to the possibility that different models would fit the same data, of
course that’s true as a possibility, but you actually have to come up
with a different model that fits the data just as well to turn that into
a probability. Adding a first derivative of target movement to an
existing model might give a very slight improvement, but one has to ask
whether it’s reliable enough to be used as a model of an individual. I
already have accepted that people do some predicting, but they do a heck
of a lot more controlling without predicting, and most of the control we
see is not supplied by the predictions even when they’re occurring.

In the data I posted, adding prediction couldn’t improve the model much
since it’s already predicting behavior almost perfectly. Actually, I
think we’ll get more improvement by taking out the systematic
oscillations in a fairly narrow frequency band, although that wouldn’t
improve our ability to predict the behavior (the oscillations aren’t
systematically related to anything external). Also, there’s some evidence
of slip-stick friction in the mouse, and that could be modeled too to get
maybe another tiny bit of improvement. By that time the residual system
noise would have grown relative to the remaining errors to the point
where it would overwhelm any further systematic improvements might be
tried. It’s already very large.

The main thing to remember is that the tracking error we see sets the
limits on how much prediction we can put into the model. We can perhaps
make the model fit the data slightly better by adding target prediction,
but we mustn’t make the model track better than the real person did. And
we’re within 3.5% of doing that even without prediction, with a good
fraction of the remaining error being unrelated to target movements. How
much improvement could we expect from adding prediction?

Come to think of it, perhaps one fruit of this discussion is an idea now
taking shape in here about how to deal with the residuals. Maybe you know
of more powerful methods, but it strikes me that simply calculating the
correlation between the prediction errors and the target or cursor
movements would tell us whether there is still some way to improve the
fit of the model to the data. We could try leading and lagging
correlations, and correlations of derivatives, for a start. Also, we
could take Fourier transforms to find periodicities of the kind I’ve
referred to. These ways of looking for regularities in the residuals
might then give us some ideas of how to modify the model, or else tell us
that the unsystematic noise accounts for most of the remaining
error.

From a larger viewpoint, however, one has to ask how much return on
investment we can expect from trying to squeeze the last percentage point
out of the data, when so much territory remains uninvestigated even to a
precision of 10%.

It’s now 7:06 AM here, and at 8:30 I pick up my daughter Allie and
granddaughter Sarah to drive to Durango, where tomorrow my grandson Derek
graduates from high school. I’'ll not be on the net, as far as I know
now, until Monday, so farewell for now.

Best,

Bill P.

\