curve fitting Galileo's data

[Hans Blom, 970605]

How do we arrive at models that we can believe in? Some of Galileo's
experiments have been repeated by a group of students, whose tale can
be found at

http://es.rice.edu/ES/humsoc/Galileo/Student_work/Experiments95
  /index.html

One of the experiments was with a falling object (but they used a
ball rolling off an inclined plane, just like Galileo). They
collected the data with the same accuracy as they presumed Galileo
was subject to, and employed curve fitting techniques to arrive at a
mathematical expression of distance traversed versus time. They also
noted -- I hope -- that a curve fit doesn't tell all: there may be
other, independent reasons to select a certain model. One reason
might be the too limited selection of data points that the model has
to explain; although this might deliver a model that accurately
"explains" the chosen/measured data points, the model may not be
general enough. Another reason could be a limitation of the number of
models that the data are tested against: if you haven't thought of a
certain hypothesis, you will not test it, and you will not be able to
establish its explanatory power.

They note: "If the path of the projectile was a straight line, a
straight line should fit the data best and if the path of the
projectile was a parabola, an equation having the basic properties of
a parabola would fit the data best.

Using linear regression, the best fits were (y is horizontal
distance, x is vertical fall):

Type graph Equation Corr Coef
straight line: y= 0.47x + 7.39 0.9951
parabola : y= 4.26(x^0.5) [x=(0.235y)^2] 0.9998

The correlation coefficient tells, of course, how closely the
equation reflects the data, where 1 would be a perfect fit. The
straight line model is very good. But it shows, I think, that the
data point (x=0, y=0) was not included in the curve fit. Anyway, that
point is badly "explained" by the model. This data point was,
obviously, not measured and its inclusion in the data set -- although
logically obvious -- was thus "forgotten". But note how, when one
forgets this data point, the parabolic fit is hardly an improvement
over the linear fit. We might reject the parabola on the grounds of
parsimony (why make the model more complex if it hardly explains the
data any better) and of significance (both models cannot be
distinguished between, due to the magnitude of the measurement
errors).

But note what the students did subsequently: "I used linear
regression a third time to determine exactly what type of curve would
fit the data most closely. This curve is almost a parabola."

Type graph Equation Corr Coef
curve : y= 3.53(x^0.537) [x=(0.282y)^1.86] 0.9999

The correlation coefficient is even better. Yet, this is not the
model that was eventually adopted by Newton. And one wonders whether
either he or Galileo would have known general power law formulas.

This tale has a warning, I think: do not strive for high correlation
coefficients only.

Greetings,

Hans

[From Richard Kennaway (970605.1800 BST)]

[Hans Blom, 970605]

http://es.rice.edu/ES/humsoc/Galileo/Student_work/Experiments95
/index.html

The correct URL is
http://es.rice.edu/ES/humsoc/Galileo/Student_Work/Experiment95/index.html

...

They also
noted -- I hope -- that a curve fit doesn't tell all: there may be
other, independent reasons to select a certain model. One reason
might be the too limited selection of data points that the model has
to explain

Perhaps this is nit-picking, but there is no limitation on the number of
data points that the model has to explain -- it has to explain the
behaviour of all falling objects. The limitation is on the number of data
points from which the students constructed their model. (And to pick
another nit, I'd say "describe" rather than "explain" for the results of
statistical curve-fitting.)

Statistically estimated models also need to be tested on data that was not
used in constructing them, otherwise over-fitting may occur. The more
parameters you add to squeeze c up ever higher, the worse the correlation
is likely to be when tested on new data.

...

straight line: y= 0.47x + 7.39 0.9951
parabola : y= 4.26(x^0.5) [x=(0.235y)^2] 0.9998

...

the parabolic fit is hardly an improvement
over the linear fit.

Depends on what you look at. The proportions of unexplained variance
(1-c*c) are 1% and 0.04% respectively -- a factor of 25.

This tale has a warning, I think: do not strive for high correlation
coefficients only.

Indeed. More important than high c is understanding. Get understanding of
what is going on, and one's predictions will have a high correlation with
observations anyway. I'll let Richard Marken do the bit about how PCT
gives just such an understanding of living systems, and how the 95%+
correlations in PCT experiments compare with its rivals.

-- Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
   School of Information Systems, Univ. of East Anglia, Norwich, U.K.

[From Rick Marken (970605.1100 PDT)]

Hans Blom (970605) --

Some of Galileo's experiments have been repeated by a group
of students, whose tale can be found at

http://es.rice.edu/ES/humsoc/Galileo/Student_work/Experiments95
  /index.html

I get a "URL not found" on this. Sounds like a cute idea, though; I'd
like to find the location. Good thing Galileo didn't have statistical
analysis around to confuse him;-)

This tale has a warning, I think: do not strive for high
correlation coefficients only.

But we can still ignore low ones, can't we. Like the one's we get
when we try to fit the MCT model to data;-)

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: marken@leonardo.net
http://www.leonardo.net/Marken

[Martin Taylor 970607 16:30]

[From Rick Marken (970605.1100 PDT)]

Hans Blom (970605) --

Some of Galileo's experiments have been repeated by a group
of students, whose tale can be found at

http://es.rice.edu/ES/humsoc/Galileo/Student_work/Experiments95
  /index.html

I get a "URL not found" on this.

There's an extra "s". It's "Experiment95".

But It's more interesting if you go to

http://es.rice.edu/ES/humsoc/Galileo/

and start from there.

Martin

(Back again, I'm afraid, with several other messages deserving
comment they probably won't get from me. It's been an interesting month
on CSGnet, I see).

[Hans Blom, 970609]

(Richard Kennaway (970605.1800 BST))

... a curve fit doesn't tell all: there may be other, independent
reasons to select a certain model. One reason might be the too
limited selection of data points that the model has to explain

Perhaps this is nit-picking, but there is no limitation on the
number of data points that the model has to explain -- it has to
explain the behaviour of all falling objects.

That's asking a bit much :-). What are we talking about? You do an
experiment with a ball rolling down an inclined plane, collect some
five or six data points. That's it. You find that the data points can
be reasonably well fitted with a straight line, a parabola and an
"almost" parabola. You have not, really, gathered enough data to even
be able to decide which of the three "models" is best.

Any other conclusion is "going beyond the data". It's tempting to
think that the model you found from such a limited data set can
describe _all_ cases of falling, but you'll soon find out that it
doesn't: when you let go of a feather, its falling velocity is kind
of random and more constant than linearly increasing.

Progressing toward a theory that accurately describes the behavior of
_all_ falling objects (including, say, the falling of a feather or of
a star falling into a black hole) is still pretty far off, I guess...

Greetings,

Hans