[From Bill Powers (961126.1200 MST)]

I have a problem that people on CSGnet might be able to help me solve. How

do we objectively evaluate the goodness of fit between a model's behavior

and real behavior?

For years we're been using correlations as a rough way of showing how well a

model fits tracking behavior. This started mostly in order to compare the

results of using a control model with other approaches to behavior which

traditionally use correlations as a way of conveying how well a theory fits

the facts. But we've known all along that correlations really don't have

much meaning in this application; the differences between models and real

performance aren't distributed according to any standard distribution, and

as several people have pointed out the data points being compared aren't

temporally independent.

Those are relatively minor problems with correlations compared to the one I

have now. Bruce and I have been fitting models to various data obtained in

operant behavior experiments. One of these models decribes what is, from the

standpoint of a weight control system, an environmental feedback function or

EFF. It is the function that converts any pattern of daily food intake into

the animal's weight, both being functions of time.

What we get from the model is a predicted waveform of weights that is

generated by the observed waveform of total food intakes. For rat 1, the

weights go as low as 175 grams and as high as 290 grams. Over the times of

interest (after initial deprivation has been compensated for), the range is

even less -- say from 250 grams to 290 grams.

The model fits the observed weight values within about 4 grams RMS. The

correlation of model weight with real weight is in the high 0.9s -- around

0.96 or 0.98 depending on the parameters taken to be the best fit.

But suppose that the model's weight, while following the variations in the

real weight, had been everywhere 100 grams too low or too high. The

correlation would remain exactly the same! This follows because the first

step in calculating a correlation is to remove the mean values from the data

arrays being correlated. If the model's output is systematically too high or

too low by a constant amount, the correlation will be unchanged.

The same holds true if the model variations are all too great or too small

by some constant factor. The correlation process normalizes the data, so

that any constant factors disappear from the result.

The upshot is that a model can appear to fit the data extremely well, in

terms of correlations, when it is actually in error by a huge amount.

The opposite problem also occurs. In matching the weight control model to

the data, one of the things we want it to predict is daily food intake in

the home and experimental cages. The average intake is something like 15

grams, per day with variations above and below that level as conditions

change (for example, absence of food in the home cage). The model's total

food intake correlates with the actual total food intake only about 0.8 to

0.9. But in calculating this correlation, no account is taken of the fact

that the model predicts the _mean_ food intake better than it predicts the

variations in food intake. The mean values are removed by the correlation

calculation, so the correlation reflects only how well the _variations_ in

model intake match _variations_ in the real intake. The model gets no credit

for predicting the mean values correctly. In fact, _too much_ credit is

given for matching the variations in food intake, because the model's

variations are generally visibly smaller than those of the real intake,

although proportional to them, a fact that the correlation calculation

doesn't pick up. And too little credit -- none at all -- is given for the

fact that the model predicts the 13-gram mean value of intake very

accurately, instead of predicting, for example, 2 grams or 200 grams.

I used this analogy in discussing this with the rat group. Suppose a model

predicts that a car will go 20,000 feet and then stop. We observe the real

car, on successive trials, travelling 20,000 feet plus or minus 20 feet.

Since the model's predictions are the same every time, when we calculate the

correlation of the model's predictions against reality, we get a correlation

of ZERO. Yet the model has predicted the real distance of travel within 0.1%

of the observed value over all the trials.

So my question is simple. Is there any standard way, akin to a correlation,

of comparing two data sets for goodness of fit that takes into account both

constant offsets and proportionality factors as measures of error?

Best,

Bill P.