Goodness of fit

[From Bruce Abbott (961128.1025 EST)]

Bill Powers (961128.0130 MST) --

RMS error has to be compared with something, doesn't it? What about RMS
error divided by mean value? This would give us the error as a fraction of
the mean observed value. Of course that wouldn't work too well if the mean
observed value were close to zero.

RMS error basically tells you how far the the data are from the model's
predictions, on average, in standard deviation units. A given RMS error can
reflect a variety of faults in the model (not to mention noise in the data).
A large RMS error might occur if:

1. the prediction tracks the data perfectly but with a constant offset;
    this constant offset would appear as a difference between the means of
    the predicted and actual values.

2. the means of the two waveforms vary according to the same pattern and
    have the same mean, but differing amplitudes.

3. the two waveforms have identical patterns, amplitudes, and means, but
    differ in phase.

4. the two wave forms have identical amplitudes and means, and nearly
    identical waveforms except for a very low-frequency component
    (i.e., a slow drift).

5. the two wave forms have identical means and amplitudes but different

RMS error tells you how much of a problem you have with fit, but it does not
tell you what the source of that error is.

The correlation coefficient (Pearson r) removes any average difference
between the predicted and observed values (DC component), and then removes
any difference in rms amplitude between the waveforms. It thus compares the
shapes of the two waveforms. However, if there is a low-frequency component
of relatively large amplitude in comparison to higher-frequency components,
the size of Pearson r will mainly reflect the agreement between model and
data in the low-frequency component. For example, if model and data both
drift upward over time, and this upward drift accounts for most of the
change in Y over time, then the correlation will be high even though the
match with the higher-frequency components is poor.

In addition, Pearson r will be strongly affected by phase differences
between the two waveforms. A solution to this problem is to shift one
waveform with respect to the other by steps and examine the fit at each
step, a technique called cross correlation.

It would seem to me that a variety of measures are needed to adequately
assess the fit of model to data with respect to all these different aspects
of the two waveforms. I'm not sure how well it would work, but it would
seem that an analysis of the two waveforms into different frequency
components (e.g., by passing them through a low-pass filter, high-pass
filter, etc.) and then comparison of these filtered model and observed
waveforms might identify where the model fits best and worst (e.g., it might
do an excellent job of following the low-frequency variations in the data
while failing to reproduce the higher-frequency components).