Triangulation Statistics, tracking data

[Martin Taylor 2009.01.24.10.46]

[From Bill Powers (2009.01.23.1034 MST)]

Martin Taylor 2009.01.22.17.40 --

MMT requested:

We need one more column: model-target %of

max.

I added that figure and did 5 runs:

` {FIT ERROR} { TRACKING
ERRORS}

Diff delay gain damping ref model-real tar-model
tar-real

 60ths                        pixels   % of max   % of max    % of

max

1 3 14.2 0.000 -1.2 0.666% 0.344%
0.827%

2 8 8.6 0.000 -2.0 1.287% 1.356%
1.920%

3 8 6.2 0.004 -2.0 1.277% 2.581%
2.970%

4 8 6.2 0.100 2.0 1.837% 3.136%
4.065%

5 9 5.6 0.120 0.0 1.966% 3.316%
3.919%
`

Note the monotonic rise in damping, again, as difficulty increases.
Gain also falls.

from the fit and tracking error percentages you
provided. I tried the triangulation test that I mentioned earlier. The
hypothesis in this case is that the model is a perfect fit to the
person’s control mechanisms and its parameters, except for “random”
noise in the person’s output (due to all sorts of possible effects
inside and outside the person).

The consequence of this variability would be to make the model-person
deviation (fit error) orthogonal to the model-target deviation. Why?
Because the underlying person-target track – the one the person’s
control system would do if not for the “extraneous” (of unknown causes)
variation – would be identical to the model track, and in a
high-dimensional system (the number of independent samples along the
track defines the dimensionality) any two independent sets of samples
are with high probability nearly uncorrelated. The correlation between
any two vectors is the cosine of the angle between them, so
uncorrelated vectors lie at right angles to each other, no matter what
the dimensionality of the space in which they are described…

What I did was to take the percentages to be distances in a space of
“fit”. The “space of fit” (my own term) for three vectors is a plane
defined by the ends of the vectors of samples in the high-dimensional
sample space. Then I used the distanced defined by the percentages to
draw triangles that represent the locations of the
target, model, and real tracks, in the planar “space of fit”, and look
at the angle Target-Model-Real. The angles, in order of tracking
difficulty, were 105.4, 93.1, 94.6, 106.7, and 92.2. They are all
fairly close to 90 degrees, but all are greater than 90 degrees. Here’s
what the triangles look like. The numbers represent the “difficulty”
index. The “Model-Target” base is scaled to be the same for each
example.

                                       ![T-M-R_triangles.jpg|250x297](upload://w27AE4d9FUc98d6MwHf8fZpZEiZ.jpeg)

What do these triangles tell us? The most obvious is that for the
easiest and second easiest examples (labelled 1 and 2 in the figure)
the model is a rather better tracker than is the human, and is somewhat
better for the more difficult examples. But is this because the model
is wrong or because the human’s performance is noisy? Either is
possible. The model parameters are the best fit to the human data, not
to the target. In fact, the target is completely ignored in setting
those parameters. So why should the model track the target better than
the human whose data are used in the fitting? One might naively expect
the model to be better and worse about equally often.

A second obvious thing is that Difficulty 1 is much farther from both
Model and target than is Difficulty 2, which in turn is somewhat
farther than are 3, 4, and 5, which are much alike. Remember, these are
scaled results, but if the difference between the Model and the human
is due to noise variation in the human, it seems that proportionately
the noise is greater for the easy task and fairly consistent for the
three more difficult ones. I won’t pursue this here, but will consider
the implications of the angle Target-Model-Real. That will take quite
long enough :slight_smile:

Going back to what I said above, let’s start a bit of Bayesian analysis
to see whether we can specify some hypotheses to compare. for these
data. We don’t have tracks to work with, so “D” in the hypothesis tests
are the fit error and the tracking errors (these are fits to the target
track, so we might as well call them all “fit errors”). Initially, at
least, I will use just one function of the three fit errors, the angle
T-M-R (the bottom left in the triangles). That’s a simple scalar value,
which makes things relatively easy. I will call it “d”, a diminished
version of D.

For the Bayesian analysis, we need at least one hypothesis, and
preferably more. In this case, I will use just one hypothesis H, and
develop something that functions like a confidence interval for d in
P(d|H). The hypothesis I will work with is “The Model accurately
represents the human control mechanism, apart from some random
variability in either the human or the mechanical control device”. This
hypothesis implies that the angle T-M-R is 90 degrees, but does not
specify the distribution of angle TMR to be expected in runs of 1
minute. For that, we must provide some conditionals, which are
assumptions that we assert for the purpose of the analysis.

For our purposes here, I think it seems reasonable to use a conditional
that the correlations among the samples are the commonly assumed linear
Gaussian ellipses, which implies that we just need to know how many
independent samples there are in the 1 minute run, because that will
tell us how much variation to expect if the true correlation is zero.
Whn the correlation is near zero, the variance of the correlation is
close to 1/N, where N is the number of samples. To get the number of
independent samples, we compute the autocorrelation function of the
error sample sequence (I used Excel). Not having the track data for the
runs in question, I used the data Bill P provided for an earlier run,
but since the same person did all the runs, and the result is used only
to estimate the variance of the angle TMR, I think this is probably
justifiable. The result looks like this:

Loop delay and limits on contro2.jpg

The correlation falls to zero near 25 samples, and there are 3600
samples in the entire run. This means that there are about 144
independent samples over the whole run. Now, working with our fallback
conditional of linearity and Gaussian distributions, the expected
distribution of the correlations in the data is nearly normal, with a
standard deviation of 1/sqrt(N), where N is the number of samples.
Remembering that correlation is the cosine of the angle between
vectors, that translates approximately into a standard deviation in the
angle TMR of about 5 degrees, from which we can compute P(d|H) for each
of the angles found above.

Since the possible values of d are continuous (in principle, the angle
could take on any value from zero to 180 degrees), P(d|H) is a
probability density rather than a probability, and it is probably more
useful to compute the equivalent of a confidence interval, using the
actual angles as limits. We ask “Given the hypothesis H (that the Model
accurately represents the human control mechanism, apart from
some random variability in either the human or the mechanical control
device) and the conditionals mentioned above, how probable is it that d
would as deviant from 90 degrees as the value we found?” In other
words, we integrate P(possible values of d|H) over the tails of the
normal distribution beyond the deviations we found, which were 15.4,
3.1, 4.6, 16.7 and 2.2 degrees.

Reading by eye off a graph of the cumulative normal distribution, I get
the following values of P(d more deviant than found, given H):

Difficulty 1 ~0.006

Difficulty 2 ~0.55

Difficulty 3 ~0.32

Difficulty 4 ~0.005

Difficulty 5 ~0.65

Note that these probabilities are not significance levels. There is no
“null hypothesis” unless “H” can be considered to be one. They refer to
the data actually obtained, but do use the probabilities that more
deviant data that were not obtained could have been obtained. These
probabilities are derived from H. They are not very precise, but should
be in the ballpark.

Looking at these probabilities, I personally would be quite happy to
accept H in the cases of difficulty 2, 3, and 5, but would be tempted
to look further in cases 1 and 4, were it not for one other pair of
observations. That pair consists of (1) that all the angles TMR are
greater than 90 degrees, and (2) that the autocorrelation function goes
negative for lags greater than 25 samples. The latter suggests the
possibility that the person may have a slight tendency toward
oscillation near 1.6 Hz, and the former that the person may perhaps
tend to deviate from the target more than the Model does, even allowing
for the effect of random variation. Both suggest the possibility that
there might be a possibility of slightly improving the structural match
between model and person.

Can we justify any of this speculation with the data at hand? Let’s try
a Bayesian analysis that compares two hypotheses for each Difficulty
level. H1 is just the same H that we used previously, whereas H2 is a
hypothesis that the underlying angle TMR is 95 degrees rather than 90
degrees, for all of the difficulty levels. We have no
control-theoretical justification for H2, but it is interesting to see
whether this notion might be worth pursuing further.

For this test, we can use the probability density values rather than
their integrals. This is philosophically cleaner, because there is no
reliance on imagined possible data. We use only the data observed. The
probability density values are read off the normal Gaussian curve.
These curves are usually drawn so that their integral is 1.0, which
puts the peak at about 0.4. The P values are relative, so scaling does
not matter. They are not probabilities, but relative probability
densities.

The angular deviations from H1 and H2 are:

`Difficulty H1 H2 P(d|H1) P(d|H2) P(d|H2)/P(d|H1)

1 15.4 10.4 0.002 0.05 25

2 3.1 1.9 0.3 0.35 1.2

3 4.6 0.4 0.27 0.4 1.5

4 16.7 11.7 0.002 0.04 20

5 2.2 2.8 0.36 0.34 0.94

                    Combined P(H2|D)/P(H1/D)~= 850/1

`If these were the only possibilities (angle TMR = 90 degrees or 95
degrees, the same for every case), the data would point pretty strongly
to a preference for 95 degrees. In practice, two things argue against
so restricting the possibilities to be considered. One is that 95
degrees was chosen completely arbitrarily, and the other is that one
might introduce a new hypothesis H3 that said there is something
different about runs 1 and 4. The first can be handled by allowing the
hypothesised angle TMR for all the cases to be an arbitrary value,
meaning that we test a continuum of hypotheses of the form “Angle TMR
is x degrees” for any x in an interesting range, say 85 to 100 degrees.
The result would be a curve of values of P(d|Hx) normalized to some
specific value of x, such as 90 degrees, or normalized so as to make
the maximum 1.0. The second could be handled similarly, but instead of
doing this, one would probably prefer to get other data to look for
possible differences among the five cases.

This has been pretty long-winded, but I wanted to suggest how even
quite sparse data can at least suggest plausible inferences about
control systems. In this case, the data suggest that the Model is
plausibly accurate apart from human variability, but even so, there may
well be a consistent deviation of human from model performance, perhaps
related to a slight tendency toward a human oscillation, perhaps to
some other characteristic that makes the best-fit model track the
target a little better than the human does, even allowing for human
noise variability.

Because in another message I said I would post this before going to
bed, I’m going to go against my better judgment, and do so even though
the time is about 1:20 am. I hope I don’t wake up knowing that I’ve
made a glaring error, as I did when I posted my message amout Rick’s
demo at a similar time of night.

Martin

[Martin Taylor 2009.01.25.11.00]

[Martin Taylor 2009.01.24.10.46]

Because in another message I said I would post this before going to
bed, I’m going to go against my better judgment, and do so even though
the time is about 1:20 am. I hope I don’t wake up knowing that I’ve
made a glaring error, as I did when I posted my message amout Rick’s
demo at a similar time of night.

This isn’t an error as such, but something I should have thought of
when writing the referenced message.

The most obvious reason (obvious this morning) why the model should
track better than the human is that the model doesn’t have lapses in
attention. The human is unlikely to control equally well throughout the
1 minute tracking task. Any momentary lapse, or a shift in posture, is
likely to cause at least a few samples of deviation from the track in
addition to the deviations caused by “normal variability” or “random
noise”. When the model is fitted, the best fit parameters apply very
largely to the part of the track when the person is tracking well, so
these extra deviations are not necessarily orthogonal to the model in
the way the random noise deviations are. Hence the angle TMR may be
greater than 90 degrees without implying that there is any necessary
structural deficiency in the model. The “deficiency” may be simply that
no model can predict momentary distractions or postural changes of the
human tracker.

If the above hypothesis holds, it predicts that there will be short
periods in the track data during which the human’s divergence is
improbable given the distribution found over the majority of the track.
If no such periods are found, this hypothesis should be discarded; or
if periods of increased divergence are correlated with specific
characteristic regions of the target track, then this hypothesis
becomes of low likelihood compared to a hypothesis that there is a
structural deficiency in the Model.

Martin

[From Bill Powers (2009.01.25.1106 MST)]

Martin Taylor 2009.01.25.11.00 –

The most obvious reason (obvious
this morning) why the model should track better than the human is that
the model doesn’t have lapses in attention.

Speaking as the control system in question here, I can say that there
were no significant lapses in attention during the run you saw. One of
the main contributors to my errors is what is called an “essential
tremor,” an oscillation that looks a bit like Parkinson’s, but
isn’t. This is missing from the model, of course, which partly accounts
for the fact that the model controls better than I do even when matched
as well as possible to my behavior. It doesn’t match the tremor component
because there would be no way to reproduce spontaneous tremors with the
same phases and amplitudes.

Another contributor is a slight tendency to overshoot and oscillate when
controlling against rapid disturbances. It may be related to the
essential tremor, or could simply be an underdamping of a higher-level
control system. I’ve been trying to think of a way to reproduce that one,
since unlike the tremor it would be synchronized to the disturbance
variables so the model could behave like the subject in more detail.
Furthermore I have not experimented with nonlinearities, which can
introduce higher-frequency components as well. As you can see from the
plots, the unmodeled variations are mostly at higher frequencies.

You now have my revised version that prints out the model’s handle
behavior (along with the target behavior), so perhaps you can generate
some data of your own. I assume you don’t have the same tremor, so a
comparison might be interesting.

Bruce Abbott is going to work with me (when his other obligations permit)
to expand TrackAnalyse into a more generally useful experimental program.
We will make sure all data, real and modeled, are recorded in a form
readable by other programs (space-delimited ASCII). Also, we will
introduce other kinds of controlled variables, or perhaps devise links to
other programs that can be written independently to allow user expansion
of the kinds of variables tested.

In the meantime, I’ll try to come up with a version that provides all the
data, real and model.

Best,

Bill P.

[Martin Taylor 2009.01.25.13.53]

[From Bill Powers (2009.01.25.1106 MST)]

Martin Taylor 2009.01.25.11.00 –

The most obvious reason
(obvious
this morning) why the model should track better than the human is that
the model doesn’t have lapses in attention.

Speaking as the control system in question here, I can say that there
were no significant lapses in attention during the run you saw. One of
the main contributors to my errors is what is called an “essential
tremor,” an oscillation that looks a bit like Parkinson’s, but
isn’t.

The question arises as to what was different between the runs at
difficulty 1 and 4 as compared to runs 2,3, and 5. It’s unlikely to
have been very obvious at the time, or you probably would have
discarded those runs. A “lapse in attention” might be as small as a
momentary look at the “time on track” display.

This is missing from the model, of course, which partly
accounts
for the fact that the model controls better than I do even when matched
as well as possible to my behavior. It doesn’t match the tremor
component
because there would be no way to reproduce spontaneous tremors with the
same phases and amplitudes.

That’s true if they are independent and frequent damped tremors
initiated by some event outside the modelled control loop, but if they
are phase-locked to each other, or if they are initiated by some
condition in the target trace, such as …

Another contributor is a slight tendency to overshoot and oscillate
when
controlling against rapid disturbances.

then it might be possible to model them. The slight overshoot of the
autocorrelation function suggests the possibility that there might be
at least some phase-locking.

It may be related to the
essential tremor, or could simply be an underdamping of a higher-level
control system.

I’ve observed the same thing in many of my fitted models, so I think it
is probably a more general phenomenon.

I’ve been trying to think of a way to reproduce that one,
since unlike the tremor it would be synchronized to the disturbance
variables so the model could behave like the subject in more detail.
Furthermore I have not experimented with nonlinearities, which can
introduce higher-frequency components as well. As you can see from the
plots, the unmodeled variations are mostly at higher frequencies.

Yes.

You now have my revised version that prints out the model’s handle
behavior (along with the target behavior), so perhaps you can generate
some data of your own. I assume you don’t have the same tremor, so a
comparison might be interesting.

I have run one test run with the new version, but I haven’t analyzed it
yet. I was working on the triangulation exercise I posted last night.

Incidentally, this morning I found why I had been unable to use the
normal distribution function on Excel properly. I was late-night
stupidity. So I have done what I described in last night’s message. I
said:

---------quote--------

The angular deviations from H1 and H2 are:

`Difficulty H1 H2 P(d|H1) P(d|H2) P(d|H2)/P(d|H1)

1 15.4 10.4 0.002 0.05 25

2 3.1 1.9 0.3 0.35 1.2

3 4.6 0.4 0.27 0.4 1.5

4 16.7 11.7 0.002 0.04 20

5 2.2 2.8 0.36 0.34 0.94

                    Combined P(H2|D)/P(H1/D)~= 850/1

`If these were the only possibilities (angle TMR = 90 degrees or 95
degrees, the same for every case), the data would point pretty strongly
to a preference for 95 degrees. In practice, two things argue against
so restricting the possibilities to be considered. One is that 95
degrees was chosen completely arbitrarily, and the other is that one
might introduce a new hypothesis H3 that said there is something
different about runs 1 and 4. The first can be handled by allowing the
hypothesised angle TMR for all the cases to be an arbitrary value,
meaning that we test a continuum of hypotheses of the form “Angle TMR
is x degrees” for any x in an interesting range, say 85 to 100 degrees.
The result would be a curve of values of P(d|Hx) normalized to some
specific value of x, such as 90 degrees, or normalized so as to make
the maximum 1.0.

----------end quote-----------

Here are the figures for two continuous sets of hypotheses, firstly a
set that includes the conditional “all runs have the same angle TMR”,
whereas the second set includes the conditional “runs for difficulty 2,
3, and 5 have the same TMR but there’s something different about runs 1
and 4 and they are discarded”.

Assuming that the prior probabilities are uniformly distributed across
all TMR angles within the range, these are the relative likelihoods of
the hypotheses given the data (the percentage fit errors). If you
assume the conditional “all runs have the same angle TMR” then the most
likely value for that TMR is about 98.3 degrees, but anything between
about 94 and 103 is plausible (meaning I wouldn’t bet 10 to 1 against
any of them). That’s the Bayesian equivalent of a confidence interval.

However, if you use the conditional that only runs 2, 3, and 5 are
useable, then the most likely value of TMR is about 93.2 degrees, and
anything between about 87 and 99 degrees is plausible. Using this
latter conditional is akin to cheating, unless you do other runs and
see whether you frequently get splits like these (you make hypotheses
that say this will happen, and test their relative likelihoods), and
whether runs under difficulties 1 and 4 are usually like the others or
are usually distinct from them.

Bruce Abbott is going to work with me (when his other obligations
permit)
to expand TrackAnalyse into a more generally useful experimental
program.
We will make sure all data, real and modeled, are recorded in a form
readable by other programs (space-delimited ASCII). Also, we will
introduce other kinds of controlled variables, or perhaps devise links
to
other programs that can be written independently to allow user
expansion
of the kinds of variables tested.

That will be good. It would be nice to have the kind of multi-level
tracking I did for the sleep study (a “cognitive” track and a “visual”
track, for example, like my pursuit tracking of a numeric display
versus the kind of track in “Track Analyse”). By the way, one problem I
have in performing the TrackAnalyse track is that the mouse must move
back and forth (toward and away from me). If it could move left and
right I think I would track more accurately for physical reasons.

In the meantime, I’ll try to come up with a version that provides all
the
data, real and model.

Great.

Martin

···

[From Bill Powers (2009.01.25.1851 MST)]

Martin Taylor 2009.01.25.13.53 –

The question arises as to what
was different between the runs at difficulty 1 and 4 as compared to runs
2,3, and 5. It’s unlikely to have been very obvious at the time, or you
probably would have discarded those runs. A “lapse in
attention” might be as small as a momentary look at the “time
on track” display.

You could be right. I’ll make a new set of runs and give you all the data
for them including the model’s behavior, plus jpegs of the plots. I don’t
have those runs any more so can’t look at them.

Best,

Bill P.

[Martin Taylor 2009.01.26.01.00]

[From Bill Powers (2009.01.25.1851 MST)]

Martin Taylor 2009.01.25.13.53 --

The question arises as to what was different between the runs at difficulty 1 and 4 as compared to runs 2,3, and 5. It's unlikely to have been very obvious at the time, or you probably would have discarded those runs. A "lapse in attention" might be as small as a momentary look at the "time on track" display.

You could be right. I'll make a new set of runs and give you all the data for them including the model's behavior, plus jpegs of the plots. I don't have those runs any more so can't look at them.

I haven't done any systematic runs with the new version of TrackAnalyze, but I have done two with difficulty level 3 (both with disturbance 1), and got angles TMR = 90.36 and 89.69 degrees for the two runs. That seems to suggest the model fits me pretty well. I will have to try other difficulty levels. But I want also to keep trying tracks using the same disturbance, to check the angle between my successive runs.

Martin