Fitting models to data

[Martin Taylor 2004.11.18.17.51]

This message combines two threads I have brought up from time to time
over the years. One is the set of tracking experiments on the effect
of sleep deprivation on tracking perceptions of differing kind and
complexity. The other is the degree to which one can assert that a
particular control model fits real data.

I'll address the second issue first. Often, on CSGnet, it is claimed
that the validity of a model fit in a tracking task is attested by
the high correlation usually observed between the real data and the
data simulated by the model. These correlations are often in the high
0.9's. In my view, such correlations may indicate no more than that
both the model and the subject are acting as control systems capable
of countering disturbances in the task at hand. If the subject
controls well, the subject controls well, and the disturbance
excursions substantially exceed the noise, high correlations between
subject and model are inevitable. But that doesn't imply that the
model is the right model.

In 1994, I was involved in an experiment in which people worked for
three days and two nights continuously except for a 10 or 15 minute
break every 2 hours in which they could chat, eat, or whatever, but
not sleep. There were quite a few different tasks, including some
tracking tasks programmed by Bill P. In that study, there were hints
of a counterintuitive result, that the sleep loss had less effect on
complex tasks than on simple ones.

In 2002, another experiment was run. This time it was over a 26-hour
span, meaning one sleepless night. I devised a tracking experiment
that incorporated three different levels of perceptual complexity and
two modes, one involving perception of visual magnitude, the other of
numerical magnitude. Each of 32 subjects was asked to do 42 separate
50-second tracks. So I have a lot of data. The question is to model
it, something that has been a problem for several reasons.

The issue I want to address now is model-fitting. The model I fit to
the data from the 1994 study was a simple "classical PCT" control
model with the addition of the "Marken prediction" element that makes
the reference signal become not the target, but the target advanced
by adding an amount proportional to the target velocity. It showed
that the effect of the drugs affected the importance the subjects
gave to prediction, but that effect showed up mainly in the second
sleepless night.

This time, because of the experimental variation in complexity of the
perceptual situation, I thought I would try a variety of different
models that varied in different structural aspects, and that included
either or both of threshold effects and non-linear gain functions.
The question then was to determine which model fitted the data best.
That involves optimizing the set of parameters that define things
like gain, loop delay, predictor gain, predictor delay, and threshold
or exponent. This optimization itself presents a problem in defining
the criterion for a good fit, because the model is properly tested
only when its parameters are optimized.

There are all sorts of possible criteria for a good fit. The simplest
probably is RMS error. But in a sleep-loss situation, that's no good,
because subjects sometimes (often) go into a kind of microsleep
condition, in which they cease tracking for a few seconds, or else
they may wildly oscillate the cursor for a few seconds, as if they
were shivering through it. The RMS measure weights heavily these
times of large error, and that means that the "best fit model" menas
the one that matches those bad times best. Sometimes I found that the
best fit was a model that simply didn't track at all!

So I needed another criterion. I can go into what I developed on
another occasion if anyone is interested (I'm not all that happy with
it, but it works well enough), but here it may suffice to mention the
principles that define a good fit in this context. Firstly, what is
important is not that it tracks well, but that when it doesn't, its
errors look like the errors made by the human subject. So we work
with the _relative_ deviations between model and target, model and
subject, and subject and target. It is important that the model is
closer to the subject than to the target, so that ratio must be part
of the fitting criterion. Secondly, the ratio itself can be
misleading, because you can get a good ratio when the model's track
doesn't look very much like either the target or the subject's track.
We include the actual deviation in the criterion. Thirdly, we try to
downweight stretches of data when the human seems not to be tracking.
Eventually we wind up with an algorithm for determining a "fitness
index" for a model with a particular set of parameter values
simulating one track.

Having a criterion allows one to look for optimum parameter values
for a model. I am using a modified e-coli method, not knowing any
computationally efficient way of finding an optimum in a surface of
unknown roughness. Here's the actual method: I choose an arbitrary
set of parameter values that allows the model to track the target.
From that point in 5-parameter space I choose an arbitrary direction
and find the best fit parameter set along that line. This provides a
new starting point.

Now here's the modification to the e-coli method. From this new
point, I move a specified distance in an arbitrary direction, to
generate the starting point for a new line in a new arbitrary
direction along which the best fit is again found. The distance moved
to this new starting point declines exponentially over the course of
the optimization. It's quite big at the start, and very small at the
end. The original concept was to try to avoid local minima (in other
contexts, the same idea is called "simulated annealing"), but it
turns out to have another benefit when the contours of equal fitness
are non-spherical in the 5-space. It helps the discovered optimum
move down a narrow valley that would otherwise be easily missed.

If the best fit along the new line is better than the previous best
fit, the step to the new starting point is taken from there.
Otherwise we revert to the earlier optimum and make the step from
there before testing in the next arbitrary direction. And so we go,
each move starting from whatever set of parameter values has so far
provided the best fit. After N trial lines (I'm using 120) whatever
set of parameter values has provided the best fit is chosen to
represent what the model is capable of for this track. I do 10 such
optimizations for each track to test the reliability.

All that technical detail may be more than you want to know, but if
you have got this far, you know that for each track I have 10 sets of
parameter values, each of which purports to be the best set for that
model fitting that track. Obviously, if the fitness index for one set
is better than for another set, the former is preferred. But that's
not why I did this. What I wanted was to test how variation in one
parameter affected the others for parameter sets near the optimum for
the model, because this affects the interpretation of the results.

What I found was fairly critical. Looking just at the pairwise
correlations among the 10 sets of parameters for one model simulating
one subject track, we find several quite high correlations (some in
th high 0.8s). This means that it is hard to tell whether (for
example) the subject is using high gain for target position or has a
high predictor gain, of those two are highly negatively correlated
across the parameter sets.

In the early stages, I tested quite a few different models, some (as
I reported a year or more ago) using an power function of the error
to feed the output integrator. The problem with this is that it is
very difficult to distinguish a change of gain from a change in
exponent in its effect on most of the track. I tested most of these
models on only a few tracks (with the computing power available. I
could do only one model on one subject's data in a day, so model
testing takes weeks).

Finishing this week, I have actually tested every subject, every
track, with two rather different models. But the results provide
another problem. On average, the fits for model A are better than for
model B, but most of the "best fits" (i.e. the best of the 10 for a
particular track) are better for model B than for model A. This
probably means that Model B is the better model, but that the way the
parameters interact make it harder to find the optimum set for Model
B. And that means that the trends in the data over the duration of
sleep loss are unreliable, because changes in one parameter may be
attributed to another. The subject may be increasing, for example,
the threshold below which the error is considered negligible, but it
could look as if on one trial the threshold is dropping and the gain
increasing, whereas on the next the reverse is happening. The data
plots become very noisy.

So, I'm in the process of rethinking, mostly in the direction of
seeing how I can automatically rotate and stretch the parameter
fitting space to make it more spherical, and allow the e-coli method
not to get caught in narrow valleys caused by highly correlated
variables.

Sorry to be so long-winded, but I thought an update on the experiment
was in order, especially as it involves fitting over 1300 individual
tracks.

I'm not really expecting comment, but if you have any, I'll be
looking. As I said, my contributions will continue to be sporadic for
a while.

Martin

[From Bill Powers (2004.11.19.0653 MST)]

Martin Taylor 2004.11.18.17.51–

Often, on CSGnet, it is
claimed

that the validity of a model fit in a tracking task is attested by

the high correlation usually observed between the real data and the

data simulated by the model. These correlations are often in the
high

0.9’s. In my view, such correlations may indicate no more than that

both the model and the subject are acting as control systems
capable

of countering disturbances in the task at hand. If the subject

controls well, the subject controls well, and the disturbance

excursions substantially exceed the noise, high correlations
between

subject and model are inevitable. But that doesn’t imply that the

model is the right model.

“Right model” has several meanings. Is it the right kind of
model, and is the particular implementation of that kind of model the
right one? The main question I have tried to answer is the first; the
second ranges from hard to answer without doing detailed quantitive
experiments to impossible to answer without knowing the detailed
neuroanatomy and neural functions.

There is, in fact, no reason to think that if two people control the same
thing in the same way at the gross level, their nervous systems implement
that control in the same way. There is always more than one way to
accomplish a given function, so there is no reason to think that all
nervous systems accomplish the same ends by using the same circuitry
– at any level of detail. But they can still all be control
systems in the PCT sense of the term.

The high correlations between model and real behavior that we find with
the simplest control model are, of course, so high as to be
uninformative. I prefer looking at RMS differences between model and real
measures, and even better simply examining the differences in
detail. Correlations mask differences – how much difference is
there between a correlation of 0.95 and 0.97? Then ask what the
difference in prediction errors is, and it will be far larger as a
proportion. I prefer measures that magnify differences rather than
minimizing them. But, all that said, we should not lose sight of the Big
Picture.

Is there any other model that, even in its idealized form, will imitate
behavior as well under the different conditions that we use? Even without
fitting a model to the data – just using a linear integrating control
model with very high loop gain and zero lags – we can show that human
control behavior in a simple tracking task is the same as that of the
model within five to ten percent RMS. As Tom Bourbon and I showed
in the article, “Models and their worlds,” the two main rival
models, “SR” and “Cognitive” as we labeled them, fail
not just quantitatively, but qualitatively: they predict entirely the
wrong KIND of behavior under the conditions we used. Only the
control-system model showed the right kind of behavior, like the behavior
of the human subject, under all three of the conditions we
used.

As far as I’m concerned, that made our case for this type of task. Tom
and I had hoped that others with more resources, seeing how we went about
testing the theories, would extend this approach to other realms of
behavior and ultimately to all of it, drawing the boundaries of the
theory of living control systems. Then it would be logical to start
trying to improve the fit of the model to real behavior,

In 1994, I was involved in an
experiment in which people worked for

three days and two nights continuously except for a 10 or 15 minute

break every 2 hours in which they could chat, eat, or whatever, but

not sleep. There were quite a few different tasks, including some

tracking tasks programmed by Bill P. In that study, there were
hints

of a counterintuitive result, that the sleep loss had less effect
on

complex tasks than on simple ones.

As you know, I took exception to the interpretations of the results,
since the behavior was entirely too complex for a simple model to handle.
The fact that subjects could be asleep with the mouse in their hands for
seconds at a time made the model fits to the tracking data meaningless.
Even a slight lapse like a sneeze can increase the RMS fit error by a
substantial factor when tracking is being done very consistently
throughout a run. What would be the effects of simply ceasing to track
for an unknown length of time? Huge. So how do you salvage meaningful
results when there are large unpredictable changes in the higher-order
organization of the system? The answer is, you don’t. You abort the
mission and go back to the drawing board, and try not to regret the
expense. Naturally, that course was politically impossible to take, I
quite understand. But scientifically, that is what should have been
done.

In 2002, another experiment was
run. This time it was over a 26-hour

span, meaning one sleepless night. I devised a tracking experiment

that incorporated three different levels of perceptual complexity
and

two modes, one involving perception of visual magnitude, the other
of

numerical magnitude. Each of 32 subjects was asked to do 42
separate

50-second tracks. So I have a lot of data. The question is to model

it, something that has been a problem for several
reasons.

I still maintain that the problem to be solved is not what model to use,
but what experiment to do that will not include so many unknown factors
as to make the results meaningless. How do you know that you had three
different levels of perception? How do you know there were differences in
perceptual complexity? Did you have clear unequivocal data about the way
control parameters changed over these conditions (or did not change) in
subjects under normal conditions? Did you have a model that fit the data
over the whole range of conditions with good accuracy, so you were sure
you could see the effects of other experimental manipulations?

The issue I want to address now
is model-fitting. The model I fit to

the data from the 1994 study was a simple “classical PCT”
control

model with the addition of the “Marken prediction” element that
makes

the reference signal become not the target, but the target advanced

by adding an amount proportional to the target velocity. It showed

that the effect of the drugs affected the importance the subjects

gave to prediction, but that effect showed up mainly in the second

sleepless night.

I have no idea what you’re talking about here. What is the “Marken
prediction element?” I have never seen a PCT model in which the
target is advanced by adding an amount proportional to the target
velocity. Would you describe this model in more detail?

There are all sorts of possible
criteria for a good fit. The simplest

probably is RMS error. But in a sleep-loss situation, that’s no
good,

because subjects sometimes (often) go into a kind of microsleep

condition, in which they cease tracking for a few seconds, or else

they may wildly oscillate the cursor for a few seconds, as if they

were shivering through it. The RMS measure weights heavily these

times of large error, and that means that the “best fit model”
means

the one that matches those bad times best. Sometimes I found that
the

best fit was a model that simply didn’t track at all!

The problem is that you can’t model the lapses from control in any
detail. You can’t say that the subject’s hand will depart from its proper
position at so many centimeters per second and follow such and such a
curve. That is, you can’t do this yet, because you have no model of the
higher-order systems and the reference signals they are sending to the
control system you’re measuring. You may as well put random changes into
the model, but of course they won’t be the same as the actual changes, so
the fit will disappear in any case.
I think we have to maintain our standards for a “good fit”. I
know that others disagree with me on this; there is a feeling that any
data must be able to tell us something, even if only
approximately. Perhaps I should just say that I’m expressing a preference
for clear results even if this limits me, for now, to simple experiments.
If you can come up with clear and convincing results through the methods
you are using I will applaud and be envious that I couldn’t do as well.
But if I were a betting man, I would put a few pence on the claim that my
approach will get us to understanding faster than yours will (althoughy
still not very fast).

Having a criterion allows one to
look for optimum parameter values

for a model. I am using a modified e-coli method, not knowing any

computationally efficient way of finding an optimum in a surface of

unknown roughness. Here’s the actual method: I choose an arbitrary

set of parameter values that allows the model to track the target.

From that point in 5-parameter space I choose an arbitrary
direction

and find the best fit parameter set along that line. This provides
a

new starting point.

I’ve thought of doing this, but haven’t actually tried it. It is
definitely a good idea, expecially for handling the local-minimum
problem. I’m not sure how real the “narrow valley” phenomenon
is – I’ve done exhaustive grid searches over four and five parameter
spaces in which several hours of computer time were used to look for all
the possible fits. All I can say is that if there are any real narrow
valleys of the kind you imagine, they have eluded grids with close to a
million cells. You’d think they would have shown up in more than one
cell, but all the minima I have found have been smooth and limited to at
most two or three that differ in the second or third decimal place. This
is bound to be the case when you do multiple determinations; the data are
fuzzy enough to conceal any real valleys.

The biggest problem, I find, is how to avoid getting so immersed in
ever-more-complex analysis that flaws in the basic approach are
overlooked. Is the experiment itself really telling us what we want to
know? Are we trying to squeeze blood from a turnip?

In the early stages, I tested
quite a few different models, some (as

I reported a year or more ago) using an power function of the error

to feed the output integrator. The problem with this is that it is

very difficult to distinguish a change of gain from a change in

exponent in its effect on most of the track. I tested most of these

models on only a few tracks (with the computing power available. I

could do only one model on one subject’s data in a day, so model

testing takes weeks).

The power function is another good idea. I think an error-detection
threshold may be warranted, too – I seem to have one, although it may be
a sign of age that wouldn’t show up in a younger subject. The problem
here is that if you have the right kind of model to begin with, you’re
already accounting for 95% or so of the variance, meaning that testing
improvements in the model has to be done by looking at changes in the
remaining 5%, which doesn’t give you much to go on. I find that the main
two parameters for tracking experiments are the integrator gain in the
output function, and the delay (which can be put anyhere from the input
to the output function). With just those two, the RMS error of fit is
about one half to one third of the RMS tracking error, when the task
difficulty is such as to produce about 10% RMS tracking error. A further
slight reduction can be obtained by making the output integrator leaky
and varying the time constant. But the fit error then looks pretty darned
random to me, and I don’t see any simple way to improve it. This is
complicated by the fact that I have a slight “essential tremor”
that puts small high-frequency waves into the data. I need some younger
subjects, but they don’t drop by very often and when they do they want to
get on the web with my DSL line.

Well, I wish you luck with your modeling of the sleep effects. Actually,
I think you will probably come up with results that are much better than
the usual run of psychological “findings,” so we can hope that
your work will bring a few more onlookers into the fold. The very subject
of research through modeling – in the new book I call it
“investigative modeling” – is pretty new to mainstream
psychology in the large, and your investigations might well create some
wider interest in this approach. Hope you can get it published.

Best,

Bill P.

[From Erling Jorgensen (2004.11.19.1325 EST)]
Re: Fitting models to data

Martin Taylor 2004.11.18.17.51
Bill Powers (2004.11.19.0653 MST)

When you guys get on the net, it’s so darn interesting
that it’s really easy to get drawn from the things I
“should” be doing…

There are all sorts of possible criteria for a good fit. The simplest
probably is RMS error. But in a sleep-loss situation, that’s no good,
because subjects sometimes (often) go into a kind of microsleep
condition, in which they cease tracking for a few seconds, or else
they may wildly oscillate the cursor for a few seconds, as if they
were shivering through it. The RMS measure weights heavily these
times of large error, and that means that the “best fit model” means
the one that matches those bad times best. Sometimes I found that the
best fit was a model that simply didn’t track at all!

The problem is that you can’t model the lapses from control in any detail.
You can’t say that the subject’s hand will depart from its proper position
at so many centimeters per second and follow such and such a curve. That
is, you can’t do this yet, because you have no model of the higher-order
systems and the reference signals they are sending to the control system
you’re measuring. You may as well put random changes into the model, but of
course they won’t be the same as the actual changes, so the fit will
disappear in any case.

Isn’t the situation here one of having two distinct phenomena
to include in the model – i.e., control & loss of control?
It seems as though “gain going to zero” would be a way to model
loss of control.

Are there any data on how often the microsleep conditions tend
to occur, or for how long? Even approximate estimates from the
tracking data would be helpful.

Then a given model (or set of parameters) could contain X number
of microseconds where gain goes to zero, for Y number of occasions.
The main problem would be fitting the occasions to appropriate
locations in the tracking run. Perhaps onset to first occasion
of microsleep could be estimated (by averaging several runs of
actual data?), with a set interval after that to insert the
other occasions into the model’s run.

I realize this would likely result in mismatched data, between
subjects & the model, since the microsleep conditions of model
& subject would not necessarily overlap. In fact, it would
likely be doubly mismatched, since sometimes the model would
continue to control while a given subject wasn’t, & sometimes
it would cease to control while that subject maintained
adequate control.

But it seems this model would provide baseline model-fitting
data, with a model that reasonably produced the two conditions
of control & occasional loss of control. The degree of fit
could then be compared with manually making the loss of control
occasions overlap, to see if the model’s predictions substantially
improved.

I agree that our knowledge of higher order references (leading
to microsleep conditions with resultant loss of control) is not
sufficient to have the model itself predict the timing. But
any model-fitting with these kind of data, where control and
loss of control are distinctly present, should at least use
some parameter to approximate those occasional periods where
control does not occur.

Those are my thoughts, at any rate. Now, back to the “shoulds.”
Sigh…

All the best,
Erling

NOTICE: This e-mail communication (including any attachments) is CONFIDENTIAL and the materials contained herein are PRIVILEGED and intended only for disclosure to or use by the person(s) listed above. If you are neither the intended recipient(s), nor a person responsible for the delivery of this communication to the intended recipient(s), you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify me immediately by using the “reply” feature or by calling me at the number listed above, and then immediately delete this message and all attachments from your computer. Thank you.
<<<>>>

[From Rick Marken (2004.11.19.1500)

Bill Powers (2004.11.19.0653 MST)]

Martin Taylor 2004.11.18.17.51--

So I have a lot of data. The question is to model it, something
that has been a problem for several reasons.

I still maintain that the problem to be solved is not what model to use, but
what experiment to do that will not include so many unknown factors as to make
the results meaningless.

I agree. I think finding the right experiment is usually the best way to
find the best fitting model. I've been fitting a couple of different models
to the baseball data I obtained -- great data inasmuch as it gives ball
trajectories and running patterns for each catch -- and both models fit the
data pretty well. Both are control models but they control different
variables. Ultimately, the only way to see which model is really best will
be to create an experimental situation where the two models make very
different predictions.

I describe this kind of experimental approach to distinguishing in the
"Degrees of Freedom in Behavior" paper reprinted in _Mind Readings_. In that
paper I describe a two dimensional tracking task and two models that account
for the basic data. Both models are control moels by one controls a
perception of the cursor represented in Cartesian (x,y) coordinates and the
other controls a perception of the cursor represented in Polar (rho, theta)
coordinates. It turns out that the models make quite different prediction
about how the controller will behave when there is a disturbance abruptly
applied to the cursor in one dimension. When you do the experiment, the
behavior of the subjects is exactly what is predicted by the model
controlling a Cartesian perception of the cursor.

I think that finding the best model of behavior is not just a matter of
varying model parameters to get the best fit to a particular data set. It's
also (and, perhaps, more importantly) a matter of varying experimental
conditions in order to produce the data that will most clearly distinguish
contending models.

Best

Rick

···

--
Richard S. Marken
MindReadings.com
Home: 310 474 0313
Cell: 310 729 1400

--------------------

This email message is for the sole use of the intended recipient(s) and
may contain privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.

Re: Fitting models to data
[Martin Taylor 2004.11.21.11.40]

In response to

[From Bill Powers (2004.11.19.0653
MST)]

Rick Marken (2004.11.19.1500)

and

< [From Rick Marken (2004.11.20.2200)]

I think both Bill and Rick completely missed the point of the
posting to which they both responded (Martin Taylor 2004.11.18.17.51).
In my turn, I may be missing their points, but I guess that’s
unavoidable in this kind of interchange. I hope I won’t be unfair in
this message.

I’ll take the last first. In response to Marc Abrams, Rick
said:

  • Powers rsponse to
    Martin yesterday was a shame and a challenge.

<It sounded like a very reasonable and measured explanation of
how to go about <testing models when fitting different models to
the same data has been <pushed to the limit and resulted in a
tie.

I strongly disagree with Marc, but for different reasons. At no
point in my original message did I suggest that a situation existed in
which “fitting different models to the same data has been
pushed to the limit and resulted in a tie.” Nor would I
ordinarily be interested in such a situation.

What I presented was a situation in which the comparison of
different models presented evidence that an apparently reasonable
method of data fitting was shown to have statistically undesirable
characteristics, and that those characteristics would be likely to be
generalizable to other PCT-based experiments. I attempted to lay out
the reasons why these problems might be generalizable to a range of
experiments that attempt to go beyond the simple question of whether
people actually control when they are tracking a variable.

To repeat the observation that led me to this conclusion: When
model simulation runs are done many times to discover an optimum set
of parameters for each individual model, the average optimum fit for
Model A is better than that for Model B, but the best of these optima
is usually better for Model B than for Model A. This suggests that it
is harder for the optimization technique (a refinement of e-coli) to
find the optimum parameter values for Model B, but that Model B is
actually the better model.

Now for Rick’s other message, and then I’ll get to Bill’s because
both have similar points, but Bill’s deserves more specific
responses.

(Bill) I still maintain that the problem to be solved is
not what model to use, but

what experiment to do that will not include so many unknown
factors as to make

the results meaningless.

(Rick) I agree. I think finding the right experiment is usually
the best way to

find the best fitting model.

(Which isn’t exactly what Bill said, but let that pass).

I think that finding the best model of behavior is not just a
matter of

varying model parameters to get the best fit to a particular data
set. It’s

also (and, perhaps, more importantly) a matter of varying
experimental

conditions in order to produce the data that will most clearly
distinguish

contending models.

The gist of these comments is that if you want to study the
differential effects of sleep loss and complexity on perceptual
control, you should not use sleepy subjects, and you should not use
complex perceptual conditions. But, on the other hand, if you want to
compare the effects of sleep loss and complexity on perceptual
control, you should use subjects who get sleepy and you should use
both simple and complex perceptual signals. Did I get that
right?

That may sound like an unfair paraphrase, but given the question
and the boundary conditions, I find a different interpretation hard to
discover.

If it is an unfair paraphrase, I’m sure Rick or Bill will let me
know.

Well, actually, I can think of a different interpretation, but
maybe that one is even less fair – it is that the PCT studies of even
straightforward tracking are impossible to do and to interpret except
under the idealized conditions of maximally alert subjects and
trivially simple presentation conditions. I really don’t think that
Rick and Bill mean this (and I hope I’m right).

Now Bill P.

Martin Taylor 2004.11.18.17.51–

Often, on CSGnet, it is claimed

that the validity of a model fit in a tracking task is attested by

the high correlation usually observed between the real data and
the

data simulated by the model. These correlations are often in the
high

0.9’s. In my view, such correlations may indicate no more than
that

both the model and the subject are acting as control systems
capable

of countering disturbances in the task at hand. If the subject

controls well, the subject controls well, and the disturbance

excursions substantially exceed the noise, high correlations
between

subject and model are inevitable. But that doesn’t imply that the

model is the right model.

“Right model” has several meanings. Is it the right kind of
model, and is the particular implementation of that kind of model the
right one? The main question I have tried to answer is the first; the
second ranges from hard to answer without doing detailed quantitive
experiments to impossible to answer without knowing the detailed
neuroanatomy and neural functions.

That’s a valid comment, in that it expresses what Bill has tried
to answer. I, on the other hand, was trying to tease out a small part
of the second question. In particular, I asked the question:
“What aspects of control change when the task is simple or
complex, and are those aspects differentially affected by loss of
sleep.” Having 32 subjects, each doing 42 separate tracks, under
conditions from fully alert through the loss of a night’s sleep to
restored wakefulness the next day, seemed to offer an opportunity to
do more than simply say “They seem to be controlling except when
they seem not to be”.

There is, in fact, no reason to think
that if two people control the same thing in the same way at the gross
level, their nervous systems implement that control in the same way.
There is always more than one way to accomplish a given function, so
there is no reason to think that all nervous systems accomplish the
same ends by using the same circuitry – at any level of detail.
But they can still all be control systems in the PCT sense of the
term.

Quite so, which is why all my analyses deal with the control
processes modelled for each individual (and each track) separately,
before any comparisons are made. All comparisons are among the
parameters modelled. That offers the otential to show up individual
differences in the ways subjects might be controlling.

The high correlations between model and
real behavior that we find with the simplest control model are, of
course, so high as to be uninformative. I prefer looking at RMS
differences between model and real measures, and even better simply
examining the differences in detail. Correlations mask
differences – how much difference is there between a correlation of
0.95 and 0.97? Then ask what the difference in prediction errors is,
and it will be far larger as a proportion. I prefer measures that
magnify differences rather than minimizing them.

Quite so. That’s one of the points I was trying to get across.
But I went a bit further, to point out that RMS differences are not
always appropriate. I have, in fact, been looking at the differences
in detail, which raises yet other questions I didn’t mention in my
initial message, and I won’t here. I might, in a later message when
the points at issue in my first message have been cleared
up.

Even without fitting a model to the data
– just using a linear integrating control model with very high loop
gain and zero lags – we can show that human control behavior in a
simple tracking task is the same as that of the model within five to
ten percent RMS…

As far as I’m concerned, that made our case for this type of task. Tom
and I had hoped that others with more resources, seeing how we went
about testing the theories, would extend this approach to other realms
of behavior and ultimately to all of it, drawing the boundaries of the
theory of living control systems. Then it would be logical to start
trying to improve the fit of the model to real behavior,

Nevertheless, you raise what seem to be objections in principle
to my attempting to follow that programme.

In 1994, I was involved in an experiment
in which people worked for

three days and two nights continuously except for a 10 or 15
minute

break every 2 hours in which they could chat, eat, or whatever,
but

not sleep. There were quite a few different tasks, including some

tracking tasks programmed by Bill P. In that study, there were
hints

of a counterintuitive result, that the sleep loss had less effect
on

complex tasks than on simple ones.

As you know, I took exception to the interpretations of the results,
since the behavior was entirely too complex for a simple model to
handle. The fact that subjects could be asleep with the mouse in their
hands for seconds at a time made the model fits to the tracking data
meaningless. Even a slight lapse like a sneeze can increase the RMS
fit error by a substantial factor when tracking is being done very
consistently throughout a run. What would be the effects of simply
ceasing to track for an unknown length of time? Huge. So how do you
salvage meaningful results when there are large unpredictable changes
in the higher-order organization of the system? The answer is, you
don’t.

Wrong. The answer is that you develop algorithms that
conservatively eliminate from the modelling those stretches of data
that clearly represent micro-sleep, and model the rest. In the 1994
study, the algorithm was very simple: stretches of longer than X
seconds in which the target moved but the cursor didn’t were
considered to be periods when the subject was not tracking (I forget
what I used for “X”). And to avoid the obvious picky
comment, yes, I included a “guard-band” around the
non-tracking interval.

You abort the mission and go back to the
drawing board, and try not to regret the expense. Naturally, that
course was politically impossible to take, I quite understand. But
scientifically, that is what should have been done.

Actually, It’s rather difficult to study the effects of sleep loss if
you eliminate, a priori, conditions in which subjects might be
expected to be sleepy. I’d say that is a scientific problem, not a
political one.

In 2002, another experiment was run. This
time it was over a 26-hour

span, meaning one sleepless night. I devised a tracking experiment

that incorporated three different levels of perceptual complexity
and

two modes, one involving perception of visual magnitude, the other
of

numerical magnitude. Each of 32 subjects was asked to do 42
separate

50-second tracks. So I have a lot of data. The question is to
model

it, something that has been a problem for several reasons.

I still maintain that the problem to be solved is not what model to
use, but what experiment to do that will not include so many unknown
factors as to make the results meaningless. How do you know that you
had three different levels of perception?

OK. I ddin’t want extend an already overlong message, but since
you need to be reminded, here are the tasks. Each was a pursuit
tracking task. In one set of three, the target to be tracked was a
varying number and the subject’s mouse varied a number displayed on
the screen. In the other set, the target was the length of a line, and
the subject’s mouse varied the length of a line displayed horizontally
on the screen.

The three levels of complexity were (1) the subjects matched a
varying target, either a displayed number or the displayed length of a
line (presented vertically onthe screen to avoid the possibility that
the subject might simply compare positions rather than line lengths);
(2) The subject matched a the same varying target plus or minus a
fixed amount, the amount being indicated before the start of the trial
run – for example, the subject might be asked to make the controlled
number be 5 less than the varying target number; (3) The subject
matched the same target plus another number or line that varied more
slowly both sides of zero (a downgoing secondary line indicated
subtraction).

I don’t think that this really matters to the points I was making
in the thread-initiating message, which was about questions relating
to model-fitting. Those questions were intended to be general to many
kinds of PCT-based experiments. The problems were exposed by the
sensitivity of the experiment, which made it seem appropriate to
mention how they arose.

The issue I want to address now is
model-fitting. The model I fit to

the data from the 1994 study was a simple “classical PCT”
control

model with the addition of the “Marken prediction” element
that makes

the reference signal become not the target, but the target
advanced
by adding an amount proportional to the
target velocity.

I have no idea what you’re talking about here. What is the
“Marken prediction element?” I have never seen a PCT model
in which the target is advanced by adding an amount proportional to
the target velocity. Would you describe this model in more
detail?

Ask Rick. He was quite pleased with the improvements of model fit
it gave, and I simply copied it from him (with credit in the
publication).

Actually, it might be relevant for me to mention that the
original reason for trying several models was that I did not have an a
priori reason to prefer one competent controller over another, but I
thought that if the relation between direct observation of the target
and prediction of its future magnitude was important, that relation
should show up in a variety of structurally different models. The
important thing was to find structurally different models that
provided nearly equally good fits. If one model turned out to be a
unique way to fit the data best, so much the better, but that wasn’t
required for an answer to the experimental question.

… Perhaps I should just say that I’m
expressing a preference for clear results even if this limits me, for
now, to simple experiments.

In other words, given that I have a question raised by earlier
experiments: “There are hints that complex tasks are less
affected by sleep loss than are simple tasks; is this true?”, I
should not attempt to look at it from a PCT perspective? Or are you
saying that when a major sleep loss study is being conducted, I should
not attempt to introduce any kind of PCT-based study?

Having a criterion allows one to look for
optimum parameter values

for a model. I am using a modified e-coli method, not knowing any

computationally efficient way of finding an optimum in a surface
of

unknown roughness. Here’s the actual method: I choose an arbitrary

set of parameter values that allows the model to track the target.

From that point in 5-parameter space I choose an arbitrary
direction

and find the best fit parameter set along that line. This provides
a

new starting point.

I’ve thought of doing this, but haven’t actually tried it. It is
definitely a good idea, expecially for handling the local-minimum
problem. I’m not sure how real the “narrow valley”
phenomenon is

Actually, you can see it very easily, by looking at the
correlations among the pairs of parameter values that lead to
near-optimum fits. If you get the same near-optimum fit with (say)
high gain and low threshold as with low gain and high threshold, but
the true optimum is with mid gain and mid threshold, and raising or
lowering both together makes a big difference, then you have an
observable narrow valley. That is what I see in my data.

(I should emphasise that gain and threshold are not necessarily a
pair that exhibits high correlations in the actual data. I use them as
abstract example parameters. There are several pairs for which the
problem shows up. Actually, it was such a tight trade-off between gain
and exponent that led me to abandon the use of the exponent in fitting
models to data. Gain and exponent were almost surrogates for each
other in the fits.)

The result of this is that the further analysis of trends in
either of the two correlated parameter is very noisy, even if the
underlying real trends are smooth. The optimization for one track may
wind up with high gain and low threshold, and for the next with a low
gain and high threshold, when the hard-to-find true underlying optimum
would have shown (say) a slight increase in threshold and a slight
reduction in gain. The trade-off that shows up as correlation adds
spurious noise that obscures the trends that are the subject of the
experimental question. This, in turn, says that it is hard to tease
out individual difference. If the data were rescaled to be near
spherical, this “narrow valley” problem would go away.

Quite separately, there is an artificial kind of narrow valley
caused by a less than optimal scaling of the individual variables in
the different dimensions. That’s a question of the choice you make in
setting up what you treat as equal sized steps in the different
directions, but it’s just as real in the results you get as is the
intrinsic “narrow valley” shown by the inter-parameter
correlations you get from multiple optimizations for the same track.
The difference is that the noise induced by inappropriate scaling
affects only one of the variables at a time. It’s more a question of
making the search for the true optimum slower than necessary (perhaps
greatly slower) than it is a question of providing potentially
misleading results.

The biggest problem, I find, is how to
avoid getting so immersed in ever-more-complex analysis that flaws in
the basic approach are overlooked. Is the experiment itself really
telling us what we want to know? Are we trying to squeeze blood from a
turnip?

I’m well aware of that issue. Much of the reason that these
questions are coming up so long after the actual experiment is that
I’ve spent a lot of time going over the basic issues involved. Every
time I do this, I find more (unfortunately, science is like
that).

The power function is another good idea.
I think an error-detection threshold may be warranted, too – I seem
to have one, although it may be a sign of age that wouldn’t show up in
a younger subject.

I thought it might show up as subjects got sleepy.I thought that
perhaps they wouldn’t care about an error that they would have tried
to correct when they were alert.

The problem here is that if you
have the right kind of model to begin with, you’re already accounting
for 95% or so of the variance,

Variance in what? My problem with this kind of statement is that
ANY system that actually controls will account for almost all of the
variation that matches the variation in the target. That tells you
nothing. What interests me is the deviation between the target
and what the subject actually does. For that, it simply isn’t true
that “if you have the right kind of model to begin with, you’re
already accounting for 95% or so of the variance.” The problem is
exactly to find “the right kind of model.” That’s the end
product of the experiment, not the a priori starting point.

Well, I wish you luck with your modeling
of the sleep effects. Actually, I think you will probably come up with
results that are much better than the usual run of psychological
“findings,” so we can hope that your work will bring a few
more onlookers into the fold. The very subject of research through
modeling – in the new book I call it “investigative modeling”
– is pretty new to mainstream psychology in the large, and your
investigations might well create some wider interest in this approach.
Hope you can get it published.

Thank you. I hope so, too. But I have to resolve the issues I
brought up in my initial post, first.

Martin

[From Rick Marken (2004.11.21.1130)]

Martin Taylor (2004.11.21.11.40) --

What I presented was a situation in which the comparison of different models presented evidence that an apparently reasonable method of data fitting was shown to have statistically undesirable characteristics...

To repeat the observation that led me to this conclusion: When model simulation runs are done many times to discover an optimum set of parameters for each individual model, the average optimum fit for Model A is better than that for Model B, but the best of these optima is usually better for Model B than for Model A.

And my point was simply that it's time to forget the model fitting and start doing more experiments that will make the distinction between model A and model B clear.

I think that finding the best model of behavior is not just a matter of
varying model parameters to get the best fit to a particular data set. It's
also (and, perhaps, more importantly) a matter of varying experimental
conditions in order to produce the data that will most clearly distinguish
contending models.

The gist of these comments is that if you want to study the differential effects of sleep loss and complexity on perceptual control, you should not use sleepy subjects, and you should not use complex perceptual conditions. But, on the other hand, if you want to compare the effects of sleep loss and complexity on perceptual control, you should use subjects who get sleepy and you should use both simple and complex perceptual signals. Did I get that right?

No. The gist of the comment is that you have to do other experiments.

If it is an unfair paraphrase, I'm sure Rick or Bill will let me know.

It's not "unfair". It's just silly. If you are having trouble distinguishing two models that make predictions about the effect of sleep loss on control then I suggest developing new experimental manipulations that will allow for a clearer test. The solution to your model fitting problems lies not, I believe, in improved parameter estimation algorithms but in improved experimental methods.

The issue I want to address now is model-fitting.

You can't separate model fitting from experimental test. Again, I point you to the "Degrees of Freedom in Behavior" paper in _Mind Readings_ where I describe (on pp. 200-202) a version of the Test that makes it possible to distinguish models the make equally good predictions of control behavior in a simple tracking task (the Cartesian and Polar perception models) but can be readily distinguished by making a slight change in the experimental procedure.

Martin says:

The model I fit to the data from the 1994 study was a simple "classical PCT"
control model with the addition of the "Marken prediction" element
that makes the reference signal become not the target, but the target
advanced by adding an amount proportional to the target velocity.

Bill Powers asks:

I have no idea what you're talking about here. What is the "Marken prediction element?" I have never seen a PCT model in which the target is advanced by adding an amount proportional to the target velocity. Would you describe this model in more detail?

Martin replies:

Ask Rick. He was quite pleased with the improvements of model fit it gave, and I simply copied it from him (with credit in the publication).

I have no memory of this. I don't know what the "Marken prediction element" is. But if I did suggest using a predictive controller then thanks for the credit. If a predictive reference actually improves the fit of the model then that seems worth studying in itself. I'd like to see a more detailed explanation of the predictive model. What disturbances were used? Were they visible? What kind of controller was used for the model (proportional, integral) and does this make a difference in terms of model fit?

Best

Rick

···

---

Richard S. Marken
marken@mindreadings.com
Home 310 474-0313
Cell 310 729-1400

[From Bill Powers (2004.11.21.1529 MST)]
Martin Taylor 2004.11.21.11.40--

To repeat the observation that led me to this conclusion: When model
simulation runs are done many times to discover an optimum set of
parameters for each individual model, the average optimum fit for Model A
is better than that for Model B, but the best of these optima is usually
better for Model B than for Model A. This suggests that it is harder for
the optimization technique (a refinement of e-coli) to find the optimum
parameter values for Model B, but that Model B is actually the better model.

I really shouldn't be drawing conclusions without knowing more details.
When you say that the optimum fit for one model is better than the other,
what sort of fit are we talking about? I assume that moving a control
device, a mouse, is the means of control. What is the RMS difference
between model and real behavior in the two cases, expressed as a fraction
of the peak-to-peak real mouse movements? (This is a standard measure of
signal-to-noise ratio inb electrical engineering).

Best,

Bill P.

[Martin Taylor 2004.11.21.17.35]

[From Rick Marken (2004.11.21.1130)]

Martin Taylor (2004.11.21.11.40) --

What I presented was a situation in which the comparison of
different models presented evidence that an apparently reasonable
method of data fitting was shown to have statistically undesirable
characteristics...

To repeat the observation that led me to this conclusion: When
model simulation runs are done many times to discover an optimum
set of parameters for each individual model, the average optimum
fit for Model A is better than that for Model B, but the best of
these optima is usually better for Model B than for Model A.

And my point was simply that it's time to forget the model fitting
and start doing more experiments that will make the distinction
between model A and model B clear.

I hate repeating myself, but maybe it will be a case of third time lucky.

The point is that the issues that turned up in this study are ones
that anybody concerned with testing PCT models against data ought to
be aware of. Your messages seem aimed at dismissing this concern, if
you even noted that it was a concern.

I'm not really interested in Model A and Model B and any attempt to
distinguish between them, but I am interested if the same kind of
trends show up in both (and in Model C and D if I were to test other
models).

I am also interested in finding _reliable_ ways to compare
strucurally different models using tracking data, but that's a
byproduct of finding that the ways normally thought of as standard
don't work for normal cases.

>I think that finding the best model of behavior is not just a matter of
>varying model parameters to get the best fit to a particular data set. It's
>also (and, perhaps, more importantly) a matter of varying experimental
>conditions in order to produce the data that will most clearly distinguish
>contending models.

The gist of these comments is that if you want to study the
differential effects of sleep loss and complexity on perceptual
control, you should not use sleepy subjects, and you should not use
complex perceptual conditions. But, on the other hand, if you want
to compare the effects of sleep loss and complexity on perceptual
control, you should use subjects who get sleepy and you should use
both simple and complex perceptual signals. Did I get that right?

No. The gist of the comment is that you have to do other experiments.

Within the context of what you said earlier, have you not denied the
very possibility of doing ANY experiments to answer the questions at
hand?

If it is an unfair paraphrase, I'm sure Rick or Bill will let me know.

It's not "unfair". It's just silly. If you are having trouble
distinguishing two models that make predictions about the effect of
sleep loss on control

(Which I am not)

then I suggest developing new experimental manipulations that will
allow for a clearer test.

Of what?

The solution to your model fitting problems lies not, I believe, in
improved parameter estimation algorithms but in improved
experimental methods.

The issue I want to address now is model-fitting.

You can't separate model fitting from experimental test.

Oh yes, you can. As I have been doing.

Think a little, please.

Imagine a "standard" PCT loop in which the output of the error
function feeds a simple integrator, which is the output function. Now
interpolate a non-linearity (as you have often proposed) between the
comparator that produces a difference between perceptual signal and
reference signal. Let this non-linearity be a simple power function.
Here's what is fed to the integrator as a function of the r-p value.
"x" shows the value if there is no nonlinearity, "*" shows the value
with the nonlinearity.

signal | * x
    to | * x
output | * x
function> *x

···

       *

         > x*
         > x *
         > x *
         >x*
          ---------------------
                 error

Clearly, unless the tracking performance is very bad, there's hardly
any difference between what is fed to the integrator over a few
up-and-down excursions of the disturbance (and hence of the error).
In a model-fitting simulation, one could get pretty well the same
quality of fit by changing the exponent or by changing the integrator
gain. Now that's less true if the tracking is bad enough for the
error to take on large values. But that's when the data become
"dirty", isn't it?

And that's why I stopped trying fits in which both gain and error
exponent were variable parameters in the model fits.

Martin says:

The model I fit to the data from the 1994 study was a simple "classical PCT"
control model with the addition of the "Marken prediction" element
that makes the reference signal become not the target, but the target
advanced by adding an amount proportional to the target velocity.

Bill Powers asks:

I have no idea what you're talking about here. What is the "Marken
prediction element?" I have never seen a PCT model in which the
target is advanced by adding an amount proportional to the target
velocity. Would you describe this model in more detail?

Martin replies:

Ask Rick. He was quite pleased with the improvements of model fit
it gave, and I simply copied it from him (with credit in the
publication).

I have no memory of this. I don't know what the "Marken prediction
element" is. But if I did suggest using a predictive controller then
thanks for the credit. If a predictive reference actually improves
the fit of the model then that seems worth studying in itself.

Which you did.

I'd like to see a more detailed explanation of the predictive model.

OK, since both of you have amnesia... Imagine again the standard PCT loop,

s -> p -> (r-p = e) -> (G*e=o) -> (o+d = s).

Now add a connection k*dp/dt + r = R, and substitute R for r in the above.

That's what you proposed, and what you showed to provide better fits
to some data you had, and that's what I used in the 1994 study, in
which I found the ratio k/G changed over the sleep-loss period
(mostly on the second night, though).

What disturbances were used? Were they visible? What kind of
controller was used for the model (proportional, integral) and does
this make a difference in terms of model fit?

I don't know. I only used an ordinary integrator. What did you do?

Martin

[From Bill Powers (2004.11.21.1613 MST)]

Martin Taylor 2004.11.21.17.35 --

OK, since both of you have amnesia... Imagine again the standard PCT loop,

s -> p -> (r-p = e) -> (G*e=o) -> (o+d = s).

Now add a connection k*dp/dt + r = R, and substitute R for r in the above.

That's what you proposed,

No wonder I didn't recognize it. You've taken a control system with a rate
component added to the perceptual signal, which is a physically meaningful
model, and turned it into a control system with the rate of change of the
perceptual signal added to a constant to generate the working reference
signal. That is not any model I recognize.

If you do the substitution, you get

e = R - p, or

e = r + k*dp/dt - p, or

e = r - (p - k*dp/dt)

where the term in parentheses is the perception being controlled.

Unfortunately, this perception involves positive feedback of the rate of
change, so unless k is negative, the system will become unstable for even
smallish values of k (depending on the gain in the rest of the loop). If
you really want to add a rate component to the perceptual signal in the
sense that will provide damping, k must be negative or you should write

e = r - (p + k*dp/dt).

An aside: Calling this "prediction" makes something simple into an example
of a broader class of processes, this class including kinds of "prediction"
that have nothing to do with first-derivative feedback. I know that this is
customary -- even heating engineers call rate feedback "anticipation" --
but I prefer the more descriptive terms, rate feedback or first-derivative
feedback.
That describes the case accurately while not implying greater generality
than may be appropriate.

In your fitting of this model, did you end up with a negative value of k,
or did you rule that out a priori? I suppose you were thinking of a system
that tries to match a perceptual signal to the future value of a changing
reference signal -- but if that is what you meant you should have written
R = r + dr/dt.
That would be a "feedforward" kind of "anticipation." Howevever, this would
be hard to compare against the real system, since we have no measure of r
or R.

Best,

Bill P.

[From Rick Marken (2004.11.21.2110]

Martin Taylor (2004.11.21.17.35) --

Rick Marken (2004.11.21.1130)]

You can't separate model fitting from experimental test.

Oh yes, you can. As I have been doing.

Think a little, please.

Imagine a "standard" PCT loop in which the output of the error
function feeds a simple integrator, which is the output function. Now
interpolate a non-linearity (as you have often proposed) between the
comparator that produces a difference between perceptual signal and
reference signal.

I have only proposed a non-linear error function as a possibility, as a
way of dealing with ever increasing error as the perceptual signal gets
farther and farther from the reference (a real possibility in a
conflict). This is really Bill's proposal, which he called the
"Universal Error Curve".

Let this non-linearity be a simple power function...

<graph of error functions>

Clearly, unless the tracking performance is very bad, there's hardly
any difference between what is fed to the integrator over a few
up-and-down excursions of the disturbance (and hence of the error).

It's not clear to me that a difference would show up even if the
tracking performance were poor. Do you know that this is the case rom
running the different versions of the model?

In a model-fitting simulation, one could get pretty well the same
quality of fit by changing the exponent or by changing the integrator
gain. Now that's less true if the tracking is bad enough for the
error to take on large values. But that's when the data become
"dirty", isn't it?

If the models make different predictions as a function of the level of
control then the data obtained when control is poor would be golden,
not dirty.

And that's why I stopped trying fits in which both gain and error
exponent were variable parameters in the model fits.

It sounds to me exactly like the situation I described with the
Cartesian and Polar perceptual functions, which is like the models with
two different error functions in your case. In order to see which model
does better it seems to me that you would have to test subjects under
circumstances that you know result in very different behavior for the
two models.

Regards

Rick

···

---
Richard S. Marken
marken@mindreadings.com
Home 310 474-0313
Cell 310 729-1400