sleep study

[From Bill Powers (960923.1500 MDT)]

Martin Taylor 960920 17:00 --

Task pursuit compensatory disk-on-circle pendulum Number-at-50
Drug cor Ang cor Ang cor Ang cor Ang cor Ang
Plac 0.95 0.95 0.87 0.68 0.93 0.79 0.68 0.82 0.93 0.79
Amph 0.96 0.95 0.88 0.67 0.94 0.79 0.70 0.78 0.94 0.79
Modaf 0.96 0.95 0.88 0.70 0.93 0.82 0.67 0.80 0.93 0.82

If my interpretation of the criteria is anywhere near right, what this
says is that the model is about as good as one could hope for when treating
simple pursuit tracking. (By the way, these results are only for the G and
U disturbances). However, for the other tasks, even when the correlation
is pretty good, nevertheless there might be some structural differences
between the model and what the human actually does. This is particularly
true of the pendulum task, where the correlation is poor ...

I finally found a version of viewdata that I hadn't modified, and did a run
with the "pendulum" experiment. In fact, I did three runs:

Model vs real correlation: 0.9814 0.9882 0.9784
Real vs disturbance corr: -0.9584 -0.9660 -0.8906

The disturbance difficulty was 4 for the first two and 6 for the last one.

These numbers are displayed in the upper right corner of the viewdata
screen. The "real vs disturbance" correlation would be -1.0 if tracking were
perfect. So this correlation gives an idea of the quality of the control
under the disturbance being used. The "Model vs real" correlation shows how
closely the model matched the real behavior.

These numbers are similar to your measures, in that they show the model
matching the real behavior better than the real behavior matches that of a
perfect control system. So this is not just "generic control" -- the model
specifically fits the imperfect control of the real person, me.

Note the difference between the fit of the model to the performance of a
properly practiced subject and the fit to the performance of your subjects.
The model fit my behavior with a correlation of 0.98; it fit the performance
of your subjects with a correlation of 0.70 at best. I expect that the
second figure, the degree of opposition of the behavior to the disturbance,
was much lower for your subjects, too. A correlation of 0.98 implies a error
of fit of about 7% of the peak-to-peak excursion of the target. A
correlation of 0.70 implies an error of fit of about plus or minus 36% of
the peak-to-peak excursion (that is, plus or minus 72% of the maximum
deviation from the mean). Your subjects were hardly controlling at all on
this task.

I explained before the experiments began that the pendulum task was not what
it seemed. It is actually a simple compensatory tracking task. What makes it
seem difficult is that both the pendulum and the moving dot are moved in a
sine-wave from side to side, the SAME sine wave. If there were no
disturbance they would swing exactly together. The disturbance simply makes
the pendulum move left or right of the position of the dot. The regular
swinging motion was a distraction, not a disturbance of the controlled variable.

If the subjects had all been trained to criterion, they would all have
understood the nature of this task, and would have performed as well as I
did (I haven't done these experiments for over a year, and the figures above
are for my first three trials).

On simple pursuit tracking, the model usually fits my behavior with a
correlation of 0.98 or better with a fairly difficult disturbance. That
implies about an 8% rms error between model and data. The correlation of
0.95 or 0.96 that you report seems hardly any lower, but it implies an error
of 10 to 12%. And your subjects were young men in their 20s and 30s, I
presume, while I am 70. If they had learned the task well, they should have
outperformed me and their behavior should have been more consistent than
mine. Particularly on the easier disturbance, the model fits to their
behavior on this task should have been 0.99+.

I don't really need to run all these tasks again. I always get model fits in
the high 0.90s, well above 0.95 even with a disturbance of 6, except for the
numbers task (which I find difficult). My k factor varies only by about 5%,
if that, for each task (although it is somewhat different on different
tasks). The model always fits my behavior better than my behavior matches
the disturbance. I am a practiced subject, always working near asymptote. If
you were to test me under conditions of sleeplessness, I suspect that my k
factor would change, and that it would change very clearly because it is so
repeatable under normal conditions.

That is the entire point of my critiques of the experiment as it was done.
My vision was to obtain stable performance from all subjects which the model
could match very closely, with errors of only perhaps 5 to 8 percent on the
difficult disturbances, and less on the easier ones. I have repeatedly shown
that this is easily possible for most people, given enough practice. This
would have meant that a change in the parameters for a SINGLE SUBJECT on a
SINGLE TASK of 5 to 10 percent would have been meaningful, so we could see
that some subjects begin to control worse sooner than others do, and that
drugs affect some of them more than others.

I think you will agree that as matters worked out, this vision was doomed
from the start. By clever uses of statistics, averaging over runs and
conditions and subjects, you may well be able to discern the shadows of
regularities in the data, but they will be about the group, not about the
individual. They will be, in short, pretty much what we have come to expect
from psychological experiments.

I don't blame you for what I see as a failure of the project, and I don't
blame you for attempting to find information in the data that were obtained.
I just can't get very interested in the outcome any more.

Best,

Bill P.

ยทยทยท

---------------------------------------------------------------------------

[Martin Taylor 960926 13:45]

Bill Powers (960923.1500 MDT)

I finally found a version of viewdata that I hadn't modified, and did a run
with the "pendulum" experiment. In fact, I did three runs:

Model vs real correlation: 0.9814 0.9882 0.9784
Real vs disturbance corr: -0.9584 -0.9660 -0.8906

...

These numbers are similar to your measures, in that they show the model
matching the real behavior better than the real behavior matches that of a
perfect control system.

...

So, could you use your data to determine the "angle" criterion values for
these runs? That's what I was asking, and if your correlations for the
pendulum task are so much better than those for the sleep study subjects,
the results would be that much more valuable.

I explained before the experiments began that the pendulum task was not what
it seemed. It is actually a simple compensatory tracking task. What makes it
seem difficult is that both the pendulum and the moving dot are moved in a
sine-wave from side to side, the SAME sine wave. If there were no
disturbance they would swing exactly together. The disturbance simply makes
the pendulum move left or right of the position of the dot. The regular
swinging motion was a distraction, not a disturbance of the controlled variable.

And quite a distraction it is, too. I'm sure you are right that the subjects
were not aware of what they would best have been trying to control, and
were controlling through some other perceptual signal than the simple
positional difference. That doesn't do anything other than to say that
a model that simulates their control as a loop working on positional
difference will fail to correlate with their actual performance. A model
that simulates what they were actually controlling might fit better.

On simple pursuit tracking, the model usually fits my behavior with a
correlation of 0.98 or better with a fairly difficult disturbance. That
implies about an 8% rms error between model and data. The correlation of
0.95 or 0.96 that you report seems hardly any lower, but it implies an error
of 10 to 12%.

Perhaps I shouldn't do a simple linear average, because the runs with
correlations around 0.99 are swamped by one or two "nasty" runs in the
depths of sleep deprivation. But I did, for simplicity, use linear
averaging.

However, the _key_ point is that the angle criterion dissociates from the
correlation criterion, and that the different tasks give clusters at different
places in the correlation vs angle plot. Now this is one case in which
it is perfectly legitimate to plot the results from a single tracking run
or even part of a run, and that is what I asked if you would mind doing,
since you have both the code to run the tasks and the code to run these
same analyses, AND are practiced at these particular control tasks. If
you don't have the three-parameter model fitting code, I'll send it to you.

I don't really need to run all these tasks again.

No, you don't, if you have the data available to compute the angle criterion
from tracks previously run. What I'd like to know is whether YOU, personally,
a well-trained and motivated tracker, get the same kinds of dissociation
of the two criteria that the sleep subjects all seem to get (? I have looked
only at a subset, but the ones I have looked at all do).

By clever uses of statistics, averaging over runs and
conditions and subjects, you may well be able to discern the shadows of
regularities in the data, but they will be about the group, not about the
individual.

Not true for averages within the individual.

I don't blame you for what I see as a failure of the project, and I don't
blame you for attempting to find information in the data that were obtained.
I just can't get very interested in the outcome any more.

This is, to me, a rather different issue. Whether you are interested in
the outcome of the sleep study in regard to the effects of sleep deprivation
and drug type is irrelevant to this question. What I am seeing in the sleep
results is now a pointer for further research, specifically a way of looking
to see whether well fitting models (by the correlation criterion) are well
structured in addition to fitting well. A model that gives a correlation
fit of 0.995 could still have an angle fit of zero (unlikely, but
theoretically possible), and one with a correlation fit of 0.70 could
have an angle fit of 1.0 (again unlikely but possible).

In the sleep data, it seems that pursuit tracking has high correlation
_and_ angle fits, whereas compensatory tracking has high correlation and
low angle fits. Disk on circle, and nuber at 50, both fall in an intermediate
position. Is this true for yourself in these tasks?

Martin

[From Bill Powers (960927.0950 MDT)]

Martin Taylor 960926 13:45 --

So, could you use your data to determine the "angle" criterion values for
these runs? That's what I was asking, and if your correlations for the
pendulum task are so much better than those for the sleep study subjects,
the results would be that much more valuable.

I'll think about it; don't know when. As to trying more experimental runs
and applying your analysis, why don't you just do it? You can become
proficient at these tasks with practice, and answer your own questions. I
really have a lot to do, and have already put in a couple of hundred hours
on this project.

Best,

Bill P.

[Martin Taylor 960927 13:40]

Bill Powers (960927.0950 MDT)

So, could you use your data to determine the "angle" criterion values for
these runs? That's what I was asking, and if your correlations for the
pendulum task are so much better than those for the sleep study subjects,
the results would be that much more valuable.

I'll think about it; don't know when.

As to trying more experimental runs
and applying your analysis, why don't you just do it?

Not so easy. The experimental computers aren't available. I could probably
reprogram the programs to run on a Mac, or I might be able to persuade
someone to let me use their PC (but remember I'm not officially here any
more, so it would have to be out of the goodness of their heart). But as
you know, I'm not a great programmer, and I've never tried doing time-
sensitive stuff on a Mac. So it would be a big hassle.

More to the point, you made a statement that when you, Bill P., tracked
with the pendulum task yesterday, you got a model fit of 98% correlation
or thereabouts, which is much better than I ever got. You _did_ these
tracking runs, and you have the data and the analysis routines that would
let you compute the angle criterion for these very runs. All you have to
do is to report it. To do that should add perhaps half an hour to the
hundreds you have put in on the study.

The half hour you might put in would help to answer a significant
question about PCT, not about the sleep study.

The question is: Does the angle criterion, when used in conjunction with
the correlation criterion, allow some insight into whether a particular
proposed simulation model is structured the way that the human's control
is structured for a particular task.

To repeat: in the sleep study, the approximate average correlations and
angle criteria for the five tasks for the amphetamine group, whose
results varied least over time, were (in order of correlation average):

pursuit tracking: 0.95 0.95
disk on circle: 0.94 0.79
number at 50: 0.92 0.78
compensatory tracking: 0.88 0.67
pendulum: 0.69 0.78

So, for this group there are three tasks with about the same angle criterion
(0.78) but quite different correlation criteria, three with about the same
correlation criterion (the top three) but quite different angle criteria,
and one that is moderate to low on both. My question is whether this
relationship holds true for an individual practiced subject on these tasks,
motivated, knowledgable, and without the stress of sleep deprivation.

I'm trying to do this check on the individual sleep study subjects, minimizing
the averaging, but it's not easy with the data at hand. I'll let you know
when I have dug it out--if I can.

Martin