sleep study; Minitab results

[From Bill Powers (950117.1215 MST)]

Martin Taylor (950116.1320) --

     (2) Theoretical: what should be expected when the subjects are not
     'at their best?" Clearly at all times they are controlling far
     more perceptions than just the one in the experiment, but unless
     they relinquish control of the experimental perception entirely, a
     good model of that control should still be a good model, albeit
     with altered parameter values. Do you think that it would be
     better to try to fit the run with parameter values that change
     second by second, or to simply assume that the model doesn't fit
     except where it does, and say that the data can be separated into
     to classes: "bad data" and "data that fit the model?"

The model we use makes the assumption that the parameters of the human
control system and the reference signal are constant during a one-minute
run.

With a well-practiced subject who understands the nature of the variable
that is to be controlled, these assumptions appear to be valid, because
a model with constant parameters and reference signal fits the data very
accurately in every such case. This is not tested in the current set of
data because there were no well-practice subjects.

Fitting the model with parameter values that change second by second
will produce an excellent, but trivial fit: in the limit, we would
simply calculate a new parameter value for every iteration, and the fit
would be perfect, but meaningless. A few years ago Ray Pavloski made
this same mistake, calculating the "k" factor from sets of three or four
points throughout the run. He was getting model-to-real correlations of
0.9999. Of course.

I think the criterion for saying that the model fits the data is that
the correlation is high. 0.95 seems a reasonable cutoff. I have now re-
analyzed the raw data from the first 7 of your 42 disks, for the 101 to
161 series, and have extracted the header information into separate
files. I'm now working on some analysis programs to give us an idea of
how the experiments went. The first cut simply shows the percentage of
trials under each condition where the model's fit to the data was less
than 0.95 over all runs (regardless of the subject's condition). It
produced this matrix for a total of 2409 runs or 400 per task (excluding
the F task which was a two-variable task and not analyzed):

   PERCENTAGE OF TRIALS WITH CORRELATIONS < 0.95

dist task

         A B C D E

1 18.52 60.00 54.66 61.49 60.90
2 16.05 21.79 20.00 25.93 18.75
3 50.62 33.75 70.51 40.74 44.30
4 62.82 93.83 74.07 63.75 78.05
5 55.00 100.00 64.20 69.41 78.75
6 60.00 54.66 61.49 60.90 100.00

As can be seen, for disturbance patterns 4,5, and 6, a large majority of
runs failed the criterion. This shows that the subjects did not behave
in a way that can be described accurately (by our criterion) by a
control model with an integrating output and a single delay, with
constant parameters and reference signal, during a one minute run.

For the runs showing 0.95 or better correlations (100% minus the above
entries), there are instances of both good and bad tracking (large and
small tracking errors). For these cases the model was reproducing runs
with bad tracking as well as runs with good tracking, with a correlation
of 0.95 or better.

One indication of a well-learned task is small RMS tracking error as
a percentage of the peak-to-peak disturbance. The following table shows
the average tracking error for each task-disturbance combination. These
numbers are an average over one entire week and over all subjects, so
they contain both low and high tracking errors.

                AVERAGE TRACKING ERROR

dist task

         A B C D E

1 0.03 0.22 0.22 0.21 0.25
2 0.03 0.03 0.04 0.10 0.03
3 0.15 0.10 0.25 0.13 0.13
4 0.10 0.18 0.14 0.14 0.17
5 0.06 0.15 0.10 0.15 0.15
6 0.22 0.22 0.21 0.25 0.39

At a tracking error of 0.1, the subject feels that the task is extremely
difficult. For higher amounts of tracking error, the subject feels that
control is being lost. As we can see above, only for disturbance pattern
2 is there reasonably good control, on the average, over all tasks. The
pattern of tracking error is somewhat similar to the pattern for
correlations, above.

···

---------------------------------
In a properly-done study, all subjects would be run to asymptote on all
tasks prior to introducing changes in the conditions. Failure to reach
asymptote (that is, large variances in performance) would indicate,
theoretically, that the errors are large enough so that reorganization
never stops. When that happens, the difficulty of the most difficult
disturbances should be reduced until the variance settles down. When
those conditions are achieved, the model should fit essentially all data
with a correlation of 0.95 or higher, for both easy and difficult
disturbances (although not with exactly the same parameters), prior to
starting the manipulations of conditions.

When a manipulation of conditions causes a drop in the correlation of
the model with behavior, the reason is probably that the parameters or
the reference signals are changing rapidly enough so they can't be
considered constant during a whole one-minute run. This can be checked
by doing first-half, last-half analyses -- or in many cases by eyeball
inspection of the data for obvious anomalies.
-----------------------------------
I hope you will give a little more thought to

     Clearly at all times they are controlling far more perceptions than
     just the one in the experiment, but unless they relinquish control
     of the experimental perception entirely, a good model of that
     control should still be a good model, albeit with altered parameter
     values.

This is not true at all. If reorganization is still going on, we can't
expect to get good fits of any model to the behavior. If the subject's
reference signals are changing during a run (they are always being set
by higher systems), likewise. And if the subject is falling asleep or
playing around with the mouse, the whole control process will start
changing more or less at random.

This is not an either-or situation. Subjects show a continuum of degrees
of control, depending on what else is going on and on their
physiological state. A model that represents running behavior perfectly
is not going to work for swimming or high-jumping or sleeping.

There is nothing that a model "should" do: the point is to find out how
well the model works, and when it fails, why it fails. In many of the
runs, it is obvious from the data traces why the model has failed: the
person is going in and out of control during the run, so no model with
constant parameters could possibly reproduce those changes. The whole
person is much more complex than the one simple control system we try to
analyze; when conditions are such as to upset the whole person, or to
make the task appear impossible, we should not be surprised that the
system we are looking at changes. Perhaps some day we will have a multi-
leveled model of sufficient complexity to explain some of those changes.
But that time is not now.
-------------------------------
I will continue with my analysis, using the data where the model does
fit the behavior and looking for regular changes of parameters and
reference signals that are related to the drug-taking periods and the
increasing sleeplessness. Since the experiment was poorly designed from
the start, finding anything of significance will be difficult, but I
will give it a good try.
---------------------------------
     There must be approaches to a control model that does fit
     consistently well, with parameter changes, and I would have thought
     that you guys would have found some that you could tell me, over
     all the years you've been doing these experiments.

We have not done much of anything with multi-levelled models. Parameters
do not change all by themselves; they are changed by something as a
result of the operation of a system that changes parameters in another
system.

I had hoped to get enough data from this experiment to explore a model
in which the k factor changed with magnitude of error, so that a simple
nonlinearity in the output might make a single model with constant
parameters apply over a range of degrees of difficulty. But there is
simply not enough data from which to develop such a model. With only two
degrees of difficulty, the higher being much too hard, and with
important conditions starting to change after only four runs or so,
there is simply no way to use the data as I would have wished. It wasn't
my experiment, of course, but I'm still disappointed.

If we want to be able to explain more complex behavior, we have to
establish and test accurate models of simple behaviors first. This needs
a much more systematic approach than in the sleep study. The sleep study
was an example of Big, Expensive, Sloppy Science. What we need is small,
economical, careful science.

As Bill Leach pointed out, if you want to understand any system you have
to study it in a normal condition. You don't learn much about VCRs if
the only ones you have ever seen are broken.
----------------------------------------------------------------------
Bruce Abbott (950116.1730 EST) --

What a magnificent panorama (or should I say zoo?) of statistical
analysis. I think you're going to be ready for anything that your
colleagues can throw at you.

I second Rick's motion that you do the kind of analysis that tries to
sort out "contributions to variance." Obviously, the two non-controlled
cursors are going to account for very nearly all of the variance in
handle position (or vice versa), with the controlled cursor contributing
essentially nothing. I've maintained that standard practices are
perfectly designed to reject controlled variables from consideration,
and you've illustrated this perfectly. But it would be nice to have the
numbers -- say a factor analysis (being statistically deprived, I hope
I'm not talking gibberish).

The question of how to handle the disturbances is still up in the air.
Since controlled variables are normally not recognized, the effects of
disturbances on them are also not normally recognized (especially since
they have hardly any effect). Anyway, most causes of perturbations are
not visible even to the experimenter. So what should we do? Leave the
disturbances hidden? Or present them as part of the situation to be
explained? One compromise that is realistic would be simply to present
the disturbance numbers without mentioning how they affect the cursors.
In that case, the analysts would almost certainly conclude that the
disturbances are the stimuli (even if they can't be perceived), and
conclude that the cursors on the screen doesn't have anything to do with
the behavior (or at least that the controlled one doesn't).

One way to make it almost impossible to guess what the connection is
would be to use five or six disturbance tables, and link them to the
three cursors by various paths -- for example, the effective disturbance
of C1 could be 0.5*D1 + 1.3*D3 - 1.0*D4, and so forth.

But no, let's keep it simple. As I said, anyone who comes up with the
right solution is a recruit anyway.

RE: the residual plot.

I don't know what Minitab defines as "control." The residuals you show
are only about +/- 10 pixels (if that's what the numbers mean). To
evaluate the degree of control, you need to compare the residuals with
the ones obtained when the output is not working, or with an
independently-generated random output. Under the hypothesis of no
control, you would expect the output action to add in quadrature to the
disturbance. This would lead to a certain expected RMS amplitude of the
cursor movement. The hypothesis of no control is disproven by showing
that the actual cursor movement is significantly less than it would have
been without control. See the "stability factor" in my "Spadework"
article.

RE: Normal quantile plot.

This is an important measure of the residuals. If the model were leaving
out some important systematic aspect of the behavior, we would expect
some systematic pattern of errors. But if, as you conclude, the
distribution of residuals follows the normal curve of error, it's
reasonable to say that the control system is being limited by random
errors, and refinements of the model are not likely to produce much
improvement. Basically, this plot says we're about as close as we're
going to get in modeling this simply kind of behavior. Nice to know.

     The PCT analysis on these first 1800 observations yielded the same
     cursor-handle correlations as reported by Minitab. The model fit
     gave k = 0.0991with an RMS error of 4.0 pixels and a correlation
     between model and data of0.9978.

... And that's nice to know, too!
-----------------------------------------------------------------------
Got to go babysit grandchildren -- more later.

Best to all,

Bill P.