Simulation and interpretation

[Martin Taylor 2004.06.21.10:50]

[From Bill Powers (2004.06.20.0107 MDT)]

The reason I am so strong for modeling and simulations is that once you
have frozen your premises in the form of a simulation, the conclusions from
those premises are out of your hands. You can still bend the results by
careful interpretation, but the goal of simulation is to state the model so
clearly that the least possible wiggle room is left for the interpretation.

Simulation is a double-edged sword, though. I had this discussion
with a visiting lecturer when I was an undergraduate, and the facts
haven't changed since then.

A theory, if it is to have any generality, must deal with more than
one specific situation. To apply it to any particular situation
requires the insertion of values for boundary conditions (in PCT
simulations, such things as the nature of the perceptual functions,
the nature of the output functions, the values and spectra of
transport lags, and so forth).

If the simulation does accurately model the specific situation,
that's nice, but it's hard to tell to what degree the fit is a
consequence of a good theory, and to what degree it is a consequence
of good parameter and function choices. If multiple simulations fit a
range of apparently different kinds of situation, that's stronger
evidence that the theory isn't wildly wrong.

The problem arises when the simulation does not accurately model the
experimental data. One must first ask what is meant by "accurately."
For example, Newton's laws "accurately" model what we see in everyday
life, but Einstein's laws are better. Or, Any control model will
"accurately" model the behaviour of a person tracking a movement, but
a control model with the right functions and parameters will be more
accurate. Does a correlation of 0.98 between model and human
behaviour indicate good accuracy? You can't tell. If the model's
behaviour deviates from perfect control at the same points and in the
same way as the person's, then probably "yes". But a correlation of
0.98 can also be achieved if the person controls well and the model
deviates from perfect control slightly, but in ways that are opposite
to the ways the person's control deviates.

Accuracy also depends on what the theory is being compared with. If
there is another theory that claims to model the same data, then what
is "accurate" depends on how well the other theory makes its
predictions--and its predictions, too, depend on how well the
boundary conditions (parameters) are set for it.

To return to Bill's statement, "once you have frozen your premises
[i.e. the form of the theory, the forms of all implied functions, and
the values of all parameters] in the form of a simulation, the
conclusions from those premises are out of your hands." The question
is always how one distinguishes the conclusions that depend on the
form of functions and the values of free parameters from those that
depend on the theory.

That distinction becomes possible if (and I think only if) one also
approaches the theory from the more abstract. If the theory is s
specialization of what is known more generally (as, for example, by
being a consequence of a particular parameterization of physical
laws, like PCT), then the main form of the theory has a fairly solid
foundation, and the accuracy or otherwise of a simulation can largely
be attributed to successful choice of boundary conditions. On the
other hand, if the theory is a radical departure from otherwise
accepted understanding (as PCT is with respect to the larger body of
psychological theory), then it is harder to demonstrate through
simulation that it is a better theory, unless repeated efforts to
parameterize and simulate competing theories fail to produce similar
accuracy over a similar range of situations.

One of the key points, I think, is that failure of simulation cannot
be used as a strong argument against the truth of a theory, and
accuracy of simualtion can be used in support of a theory only to the
extent that it works over many different instances and does so more
accurately than any competitor theory.

Martin

[From Bill Powers (2004.06.21.1332 MDT)]

Martin Taylor 2004.06.21.10:50 --

A theory, if it is to have any generality, must deal with more than
one specific situation. To apply it to any particular situation
requires the insertion of values for boundary conditions (in PCT
simulations, such things as the nature of the perceptual functions,
the nature of the output functions, the values and spectra of
transport lags, and so forth).

If the simulation does accurately model the specific situation,
that's nice, but it's hard to tell to what degree the fit is a
consequence of a good theory, and to what degree it is a consequence
of good parameter and function choices. If multiple simulations fit a
range of apparently different kinds of situation, that's stronger
evidence that the theory isn't wildly wrong.

There are several ways to deal with this problem, which I agree exists. One
of them, which I believe you may have mentioned some years ago but didn't
mention this time, has to do with the _nature_ of the predictions of
behavior from a model. One of the striking things about the tracking models
is the detail of the predictions. In a great many conventional behavioral
studies, the prediction (if there is any at all) is only that a certain
effect of a manipulated variable will be seen (as opposed to not being
seen), p < 0.05.This is equivalent to predicting a measurement on a scale
with only two values, 0 and 1, and looking at only one data point (barring
the unlikely event of a replication of the study).

In our tracking studies, we look typically at 1800 or 3600 data points over
a period of one minute, record the behavior on a scale with 400 or more
possible values, and make a prediction for every data point. The RMS
prediction error measured over all 1800 - 3600 data points ranges from
perhaps 6% down to less than 1%.

As I pointed out, also some years ago, this puts the predictions from
tracking experiments into a new range: instead of the probability that the
result occurred by chance being less than 1 in 20 (i.e., two standard
deviations) it becomes more like 1 in 10^9 (ten standard deviations). When
Popper spoke of "falsifiability," I don't think he ever contemplated the
results of any hypothesis about behavior approaching anything like this
degree of confirmation. You can say that in principle all that can be done
is to falsify, but I think this sort of result calls for rethinking that maxim.
It also affects the meaning of the following:

The problem arises when the simulation does not accurately model the
experimental data. One must first ask what is meant by "accurately."
For example, Newton's laws "accurately" model what we see in everyday
life, but Einstein's laws are better. Or, Any control model will
"accurately" model the behaviour of a person tracking a movement, but
a control model with the right functions and parameters will be more
accurate. Does a correlation of 0.98 between model and human
behaviour indicate good accuracy? You can't tell. If the model's
behaviour deviates from perfect control at the same points and in the
same way as the person's, then probably "yes". But a correlation of
0.98 can also be achieved if the person controls well and the model
deviates from perfect control slightly, but in ways that are opposite
to the ways the person's control deviates.

It's hard to imagine how the model could deviate in such a fortuitous way,
considering that the deviations are produced mainly by disturbances which
are the same for the model and the person. But the point is that the
tracking models fit the person's behavior better than than they track the
target (that is, they show tracking errors like the person's tracking
errors). And to put this in perspective (see Models and Their Worlds), they
predict behavior within a few percent where all other models of behavior
can't predict within better than 50% or much worse in the same situation
(random disturbances of target _and_ cursor). Even the "perfect" control
model (negligible tracking error) reproduces behavior better than other
models do, despite deviating from real behavior by perhaps 10% RMS.

Accuracy also depends on what the theory is being compared with. If
there is another theory that claims to model the same data, then what
is "accurate" depends on how well the other theory makes its
predictions--and its predictions, too, depend on how well the
boundary conditions (parameters) are set for it.

We have done that sort of comparison frequently among PCT models. The
simplest model, a control system with a pure integrating output, fits
behavior within typically less than 10% RMS. Adding a perceptual delay
reduces the prediction error to perhaps 3% RMS on the average, and adding
an adjustable leak parameter to the integrator gets us down to perhaps 2%.
The latest version postulates both velocity and position feedback, as well
as using a perceptual delay and a leaky integrator; the errors are
frequently less than 1% RMS over the whole run.

Since I don't know of any other models that can predict behavior under
these circumstances at all, I can't judge the relationship to other models..

To return to Bill's statement, "once you have frozen your premises
[i.e. the form of the theory, the forms of all implied functions, and
the values of all parameters] in the form of a simulation, the
conclusions from those premises are out of your hands." The question
is always how one distinguishes the conclusions that depend on the
form of functions and the values of free parameters from those that
depend on the theory.

I think it's hard to tell the difference. If you represent each possible
connection between variables by a polynomial function with, say, up to two
derivatives, that covers such a huge number of possible organizations that
the result depends entirely on how you set the parameters.

That distinction becomes possible if (and I think only if) one also
approaches the theory from the more abstract. If the theory is s
specialization of what is known more generally (as, for example, by
being a consequence of a particular parameterization of physical
laws, like PCT), then the main form of the theory has a fairly solid
foundation, and the accuracy or otherwise of a simulation can largely
be attributed to successful choice of boundary conditions. On the
other hand, if the theory is a radical departure from otherwise
accepted understanding (as PCT is with respect to the larger body of
psychological theory), then it is harder to demonstrate through
simulation that it is a better theory, unless repeated efforts to
parameterize and simulate competing theories fail to produce similar
accuracy over a similar range of situations.

I don't know of any theory in which the choice of boundary conditions
doesn't spell the difference between success and failure. Just consider the
question of the coefficient one assigns to the gain in the output function:
if it's positive, the model works at least a little bit; if it's negative,
the system goes unstable and the model fails totally. Perhaps I'm not
getting your point here -- an example would help.

One of the key points, I think, is that failure of simulation cannot
be used as a strong argument against the truth of a theory, and
accuracy of simualtion can be used in support of a theory only to the
extent that it works over many different instances and does so more
accurately than any competitor theory.

I can agree with that, though I don't entirely know what I'm agreeing with
here. since adjustment of parameters for best fit with a known case is part
of the process of stimulation as I see it. The case that I think should be
discussed is that of adjusting the parameters for the best possible fit
over some range of conditions, and _then_ seeing how the model fares. If a
simulation fails after the best-fit adjustments, this says there are some
critical variables that have been left out of the model, or some
fundamental error in its organization (which amounts to the same thing). A
failure implies that the model doesn't behave anything like the real
system. But I think you're talking about the case in which the model works
when fit to known data, but fails to fit data under new conditions with
anything like the same accuracy. I think that's a very useful kind of
failure, and maybe the easiest to fix.

Best,

Bill P.