[Hans Blom, 970605]
How do we arrive at models that we can believe in? Some of Galileo's
experiments have been repeated by a group of students, whose tale can
be found at
http://es.rice.edu/ES/humsoc/Galileo/Student_work/Experiments95
/index.html
One of the experiments was with a falling object (but they used a
ball rolling off an inclined plane, just like Galileo). They
collected the data with the same accuracy as they presumed Galileo
was subject to, and employed curve fitting techniques to arrive at a
mathematical expression of distance traversed versus time. They also
noted -- I hope -- that a curve fit doesn't tell all: there may be
other, independent reasons to select a certain model. One reason
might be the too limited selection of data points that the model has
to explain; although this might deliver a model that accurately
"explains" the chosen/measured data points, the model may not be
general enough. Another reason could be a limitation of the number of
models that the data are tested against: if you haven't thought of a
certain hypothesis, you will not test it, and you will not be able to
establish its explanatory power.
They note: "If the path of the projectile was a straight line, a
straight line should fit the data best and if the path of the
projectile was a parabola, an equation having the basic properties of
a parabola would fit the data best.
Using linear regression, the best fits were (y is horizontal
distance, x is vertical fall):
Type graph Equation Corr Coef
straight line: y= 0.47x + 7.39 0.9951
parabola : y= 4.26(x^0.5) [x=(0.235y)^2] 0.9998
The correlation coefficient tells, of course, how closely the
equation reflects the data, where 1 would be a perfect fit. The
straight line model is very good. But it shows, I think, that the
data point (x=0, y=0) was not included in the curve fit. Anyway, that
point is badly "explained" by the model. This data point was,
obviously, not measured and its inclusion in the data set -- although
logically obvious -- was thus "forgotten". But note how, when one
forgets this data point, the parabolic fit is hardly an improvement
over the linear fit. We might reject the parabola on the grounds of
parsimony (why make the model more complex if it hardly explains the
data any better) and of significance (both models cannot be
distinguished between, due to the magnitude of the measurement
errors).
But note what the students did subsequently: "I used linear
regression a third time to determine exactly what type of curve would
fit the data most closely. This curve is almost a parabola."
Type graph Equation Corr Coef
curve : y= 3.53(x^0.537) [x=(0.282y)^1.86] 0.9999
The correlation coefficient is even better. Yet, this is not the
model that was eventually adopted by Newton. And one wonders whether
either he or Galileo would have known general power law formulas.
This tale has a warning, I think: do not strive for high correlation
coefficients only.
Greetings,
Hans