[Martin Taylor 2010.01.06.00.16]
[From Bill Powers (2010.01.05.0905 MST)]
Martin Taylor 2010.01.04.23.14 –
BP: The figures don’t come out very well. F
I hope the PDF I attached to a message to Rick a few minutes ago comes
out better.
BP earlier: Could you
remind me
of how you showed that the accuracy of fit to human performance was
increased?
MT:

I optimized the fit between human and model tracking by varying k, d,
and
z in this model. The parameter “z” represents the relative
reliance on prediction as opposed to direct observation of the current
position. If the optimum consistently has z > 0, then the fit is
better with prediction than without. This was the case, and moreover,
after a sleepless night or two the value of z tended to be greater
in the placebo group, but not (or at least not demonstrably) in the
amphetamine or modafinil group. I showed the graph as a Christmas
present
to you all [Martin Taylor 2009.12.25.10.46]. The value of d did not
change reliably across drugs or across degree of sleep loss, so the
increased relative reliance on prediction was not a compensation for
increasing loop delay.
BP: I still don’t see any reports on how well the model fit the
subjects’
performance. In fact, you say at one point, :
“The mean-square error provides a theory-independent view of the
results, but is of secondary interest for the present report. Of
greater
interest is the fitting of the control model to the peculiarities of
the
individual tracks, as the sleep deprivation period progressed under the
different drug conditions.”
I strongly disagree with this choice. By not showing the RMS tracking
errors or differences between model and real behavior, you concealed
information that I consider very important: how poorly most of the
subjects tracked most of the time (though without adequate baseline
data
that would be hard to prove).
I know where you are coming from, but I disagree with you. I’ll tell
you why, and perhaps you will agree with me. But before doing so, I
will say that had this been done in the last few years, the data would
properly have been deposited in an on-line “supporting material”
repository. In 1995, such things did not exist, so far as I know. I
strongly believe that the actual tracking errors would not have been
appropriate for inclusion in the actual paper, no matter how much space
I had been allowed.
Remember that there were six very different tasks, so right from the
start the RMS errors were not commensurate. Of course, that would not
affect the relative fit of the model to the human as opposed to the
target, but at the time I had not developed the triangulation approach
we discussed last winter (was it so long ago?). Compounding that issue
was the fact that there were three different difficulty levels for each
of two kinds of disturbance (smoothly varying and stepped), so the
quality of the tracking varied all over the place. The actual quality
of the track was not of interest to the study, which was related to the
relative effects of the drugs on different aspects of performance. (I
was involved in a quite distinct experiment in the same study, which
dealt with the performance of subjects who could not see one another,
one trying to guide the other to draw a route on a map containing
landmarks when the guide’s map differed in some landmark details from
the student’s map – for example, the guide’s map might have two
“watertower” landmarks when the student’s had only one.)
Since the tracking error was both incommensurate across tasks and
appreciably different across task difficulty levels, the actual RMS
values would have no meaning in respect of the objectives of the study.
But changes in the model-fitting parameters would do, and those were
most reasonably fitted by normalizing out intersubject differences and
differences due to tracking difficulty and task type.
The “microsleeps” were a subject
of contention between us at the time the data were coming in. I felt
that
they made the tracking model invalid, since periods of tracking were
being averaged together with periods of no tracking, rendering the
model
meaningless.
No, they were not. At least they were not if the criterion for asessing
a period of time as a microsleep was anywhere close to being
reasonable. Periods judged to be microsleep were omitted from the model
fit assessment, along with (as I remember) a short “guard” interval
around the microsleep period.
Compounding that error, as I see it, is the averaging of
values of a best-fit parameter over 11 or 12 subjects.
I didn’t do that, so far as I remember. I think I averaged after
normalization. It wouldn’t make a lot of difference either way, if the
parameter values were not too widely different or if their
distributions were reasonably symmetric about their means.
I notice that in figures 3 and 4 you present the best-fit position-gain
and delay parameters normalized to 1, but in the rest of the plots you
present the velocity loop gain normalized to 0 – that is, with the
mean
value subtracted instead of computing the ratio.
The gain and delay values are necessarily positive, and the relevant
feature is their scale as a multiplier. The velocity parameter is a
ratio which can be positive or negative. It also is scaled, but there
is a definite neutral point at zero, as opposed to the arbitrary choice
of 1.0 as the neutral point in normalizing the gain and delay
parameters. The relative scale in the picture doesn’t signify anything.
What does signify is that before the first sleepless night the ratios
are scattered around zero (no consistent use of prediction) whereas
later they are all positive, slightly so for the two drug conditions,
but much more so in the placebo condition, and more after the second
sleepless night than after the first. You can get an idea of the
reliability of the measures by comparing the successive points along
each curve, though I admit that you can’t see whether one task or
another is more conducive to the use of prediction.
The plot of the velocity
parameter is scaled so that the range of variation in fig. 5 looks
about
the same as in fig. 3 – but since there is no indication of what the
actual mean is, it’s impossible to tell whether the variations in
optimized velocity gain are huge or tiny relative to the mean. If the
mean is 1, they are huge. If the mean is 50, they are tiny.
Those numbers are rally meaningless, I think. The important point is
that after the end of the first sleepless night, 17 of 18 cases show a
positive value, and the 18th is zero.
I take it that the figure in your post, reproduced above, is the
missing
fig.
2 in the paper.
Yes, it’s that figure, but why do you say Figure 2 is missing? It’s
labelled in the figure caption, in my copy.
So your parameter z simply adjusts the amount of target movement that
is
added directly to mouse movement before the delay, a simple case of
feedforward since it effectively skips the comparator and output
function.
It doesn’t actually skip the comparator function. It enters the
comparator through its addition to the reference. That agrees with my
subjective impression of what is going on. One deliberately targets the
cursor to a point further along the track than the current target
position. It simply feels the way the diagram is drawn.
Mathematically, of course, the result is the same as adding to the
target movement.
This makes the z-loop a positive feedback loop with a
delay in
it. Onlookers, please check if this is right.
Yes, it’s obviously right, since the velocity component is entered into
the positive side of the comparator, and there’s no sign inversion
anywhere else in the loop. I hadn’t noted that before. I wonder how the
90 degree phase shifts involved in taking the derivative affect the
stability of this? I imagine you have analyzed it or simulated it?
Your note at the bottom of the first page, reporting that “Powers
… generously declined participation in the authorship of this
paper” is seriously misleading; my refusal to participate may have
been expressed in a generous way (I don’t remember) but you know that
the
reason was my opinion that the study was seriously flawed and I didn’t
want my name attached to it.
Well, our memories do differ. I remember your declining authorship as a
very generous gesture. You had indeed raised objections of the kind you
raise here, but I don’t remeber them as being anywhere near as
definitive as you now make them out to be.
The lack of practicing to asymptotic performance means that we have no
idea how rapidly the performance declined at the beginning of sleep
deprivation. For all we know, your measures could apply to a
performance
that is 90% degraded soon after the start of extended sleeplessness.
Isn’t it a lot more probable that the subjects continued to learn
during the course of the study, and that this learning would act in
opposition to the observed performance deterioration?
MT: I gave Rick full
credit in
the publication, if that’s what you mean. Or maybe you mean that I take
responsibility for demonstrating that it does work as Rick said it did.
BP: Neither one.I meant that no matter whose model you use, your use
implies that you have checked out the model for inconsistencies,
errors,
or other problems, so any problems with the model become yours just as
much as the author’s.
Fine, I agree with that, but it makes me even more puzzled as to why
you brought this up in the first place.
I realize that I’m being somewhat curmudgeonly about this paper and the
study behind it. But I think your paper presents an overly rosy view of
the results and omits information that a reader should be given whether
or not it’s favorable to your conclusions.
I don’t think it omits anything relevant to the conclusions I drew.
What it omits are data that might be of interest for other reasons,
such as for what tasks and difficulty levels might prediction be more
likely to be used. The results of the study are concerned with whether
the two drugs (the then standard amphetamine and the experimental
modafinil, which was in regular use by the French army) have similar or
different effects on reducing the ill effects of sleep loss. From my
two little studies, it seems that there isn’t much difference between
the drugs in tracking, but both differ from the placebo in that under
the drugs people are more attentive to the actual target position
relative to their expectation of its position, and that in the dialogue
study subjects tended to what one might call “risky” behaviour
(accepting their prior assumptions about things and not asking for
corroboration) under modafinil, but not under amphetamine.
Indeed, after this study, all the experimenters agreed that we would
not want to fly or drive with someone who was being kept awake by
modafinil. As I can personally attest, it makes one feel alive and
alert despite sleep loss, but I can give a real-world example of why
it’s not a good idea to use it when you have to make decisions. I was
at a meeting in Paris, having flown two days previously to avoid jet
lag. One of my colleagues flew (sleepless) overnight, and arrived at
the meeting direct from the airport. He was falling asleep at the
meeting table, and I happened to have a modafinil tablet which I gave
him. He woke up and felt fine, and he became alert and coherent, but
whenever a volunteer was needed for a job, he volunteered. It took him
a little while to realize how much extra work he had let himself in
for, and let the other members of the group avoid!
Martin