[From Bill Powers (2009.09.12.0933 MDT)]
Rick Marken (2009.09.11.2355) --
>> BP earlier: All of the most successful sciences depend primarily on making models.
>
>RM: I think their success depends on both modeling and observation And
> observation must come first or you don't know what to model.
Yes, it's an iterative process. But the observations don't start with correlations; they start with recording the values of variables. The relationships you look for are determined by the kind of model you have in mind. If you use an S-O-R model you identify one variable as a stimulus and another as a response to the stimulus, so you look for a correlation between S and R: you think that stimuli are causing responses. In a PCT model you're looking for a lack of correlation between S and CV and between CV and R. You have to have a reason to look for the CV -- to find the joint effect of S and R that is being kept from varying. The control-system model suggests you look for it. Otherwise, you end up with S-R theory because you're not interested in variables unless they vary.
> BP earlier: I don't agree with that. You have to have a model before you can even say what "data" means.
I think there are clear examples where this is not true. The
measurements made by Tycho Brahe were not based on a model.
What Brahe did was measure relative positions of stars and planets with greater precision than ever achieved before. That is data about positions, so you're right. But what do those positions reveal to us? That depends on what model we're using. If we used Ptolemy's model of crystalline spheres and epicycles, we would be trying to use these measures to calculate the radius and rotational speed of each sphere carrying a planet. If we used the Keplerian model that came later, we would be trying to calculate the elliptical orbits of the planets around the Sun: the period and the major and minor axes of the ellipses. If we used Newton's model of gravitation, we might use the data to calculate G, the universal constant of gravitation. What we calculate, given the observations, depends on the model we're using or testing.
> BP earlier: The model tells you which of the infinity of possible
> variables present at a given moment is relevant.
RM: I don't think that was true for Tycho. He knew what variables to
measure and he measured them well.
I don't think he was trying to predict or explain anything. He was simply observing the appearances very carefully. I agree that this doesn't require a model.
The linear assumption of correlation is not a model; it's the simplest
possible quantitative relationship that can exist between variables.
But see my post to Martin. Correlation assumes a single linear relationship, but tells us nothing about the mechanisms that create that relationship. I am interested mainly in the mechanisms; to me, the observations are just a way of learning more about the mechanisms. This isn't because I'm not interested in the observations, but because I know that until we understand the mechanisms our ability to interpret and predict observations is going to be very limited.
The r number simply measures the fit of the data points to that line.
The correlation is no more than a numerical measure of what Dag sees
when he says the data in Frans' graphs show an obvious relationship
between age and number correct (or whatever the Y variable is). What
he is seeing is that the points on the graph are visually close to a
straight line; when you connect the dots they look even more like a
straight line. But if you don't think it's the fit to a straight line
that is being visually evaluated when you look for a relationship in
the data, then you can use the rank order correlation, which is a
correlation number that simply measures the fit of the data to
monotonicity (Increasing or decreasing). Or use some other reference
for measuring the graph. The number is just a description, like a mean
or a median.
I commented on this to Martin: it's a question of whether we're only interested in the apparent relationship between X and Y, or are interested in finding the real relationship, the mechanism lying between X and Y. I'm not interested in exactly how a given person moves a cursor while trying to make it track a target; everybody does it a little differently and exactly how they do it is of no importance to me -- except as a test of the model. I want to know what kind of internal organization -- architecture, as they say -- is needed to produce the sort of tracking behavior we observe. If, just by adjusting a few parameters, I can match a model's behavior equally well to the behavior of any person doing the tracking, I know I have got the right kind of underlying architecture. It doesn't matter to me if the parameters for best fit vary from one person to another, though some day that might be valuable information. I expect them to vary. What I don't expect to see changing is the kind of organization in the model: the fact that it is a negative feedback control model.
>> You are talking about the inferential use of statistics. I have been
>> presenting correlations only as observations.
I argue that correlations are not observations or measurements. They are calculations based on measurements. And why are those particular calculations used, when the same measurements could be the basis for many different kinds of calculations, such as those we use in PCT? They are used because they contain a model which assumes a linear relationship as a first approximation to the actual form of a relationship between two variables.
Correlation is just one way to describe the relationships I find in my
research; the other ways to describe these relationships are to show
the time series next to each other and to produce scattergrams.
This makes it clear to me where we differ. You are interested in plots of the variables showing their relationships to each other. I am interested in the nature of the physical connection between the variables, the mechanism that lies between them. The plots of the relationships in no way reveal the nature of the mechanisms. An infinity of different mechanisms could produce those same plots. I am trying to narrow that infinity down to a smaller set of mechanisms, and as nearly as possible narrow that set down to just one kind of organization that would best explain what we observe. PCT is the result, so far.
RM: I have no idea why you are trying to make this out to be a statistical
issue; this is a data representation issue to me.
BP: Not representation: INTERPRETATION. The data are the points obtained from observations, the raw lists of numbers. Anything beyond that is theory and interpretation.
RM: But the whole thing
came up because I reported (among other things) a low positive
correlation between taxes and growth. Perhaps if I just presented a
scattergram of the data (which I might do next) it would have posed
less of a problem for you. But whether that data is presented as a
graph or a number, I think they provide no basis for the idea that
increased taxes slow growth. If you don't think that these data
provide no such basis, then there are a bunch of folks who could use
your help tomorrow in their anti-tax march on DC. Be sure to leave
your correlations at home and bring your tin foil hat.
BP: So if I disagree with you, I'm just another kook? Do you ever ask yourself how your remarks might seen by other people before you let your typing reflexes emit them? Sometimes you sound just like one of those guys on Fox News, or Sarah Palin and her Death Panels. Sort of resentful and vindictive.
The data you talk about provide no basis for any conclusions. Taxes might reduce profits, a drop in profits might lead to layoffs; layoffs might slow growth. But taxes might also, at the same time and in parallel, increase government revenues, redistribute spending where more income is needed, restore profitability of small enterprises, and increase growth. They probably affect growth by many other paths, too. So what is "THE" effect on growth of raising taxes? There is no such thing, unless you just look at the bottom line and ignore the details. There are multiple effects, some of them contradictory, and all of them probably variable over time. You can't predict the effect of raising or lowering taxes without a proper model that takes more into account than just taxes. You have to deal simultaneously with ALL the important variables in the system, not just a few of them. Try predicting control behavior when you omit any signal or function from the PCT model. You can't even solve the equations, and the simulation won't even run.
I should thank both you and Martin because after all this I see what my own position is a lot more clearly.
Best,
Bill P.