[Martin Taylor 2009.01.06.09.56]

This should be a short one, following on from...

[Martin Taylor 2009.01.04.15.32] (Episode 2)

Following on from ...

Martin Taylor 2009.01.03.22.54] (Episode 1)

Which should be understood before reading the new material.

The issue in this episode is the difference between prior probabilities and conditionals.

Remember that all probabilities are conditional, and the subjective probability of any event or state depends upon its condition. The question is often to make explicit what the conditions are, since they are very often largely implicit and ignored, as if to say that the probability in question is absolute, in the same way that Newton thought position in space and time was absolute.

We start from from the Bayes equation: P(H|D) = P(H)P(D|H)/P(D)

Before the data were acquired, we had a subjective probability that H was true, a subjective probability that the data we finally obtained would have been obtained if the hypothesis happened to be true, and a (prior) subjective probability that the data we actually got would be what we finally did get. From these we could have computed P(H|D) for any possible data we might have obtained. All these probabilities depended on conditionals, fixed preconditions that might or might not have been true, or even believed. In other words the conditionals say: "Provided that these conditions hold, the probabilities are ...".

We can collect all the conditionals together and label them collectively C. Bayes Theorem then should be rewritten:

P(H|C,D) = P(H|C)P(D|C,H)/P(D|C), or possibly

((P(H|D) = P(H)P(D|H)/P(D)) | C)

In episode 2 of this series, we considered two different conditionals for the "White Swan" problem. The first was that from all the swans in the world, any one of them would be equally likely to be the next one we see. The other conditional we considered was that swans travel in flocks all of the same colour, which implied that the next swan to be seen would be more likely to have the same colour as its predecessor than would a swan randomly chosen from the whole world of swans. These conditionals affect the Bayes equation because they affect P(D|C,H). (They also affect P(D|C), though this vanishes when we deal with relative likelihoods of hypotheses judged from the same data and conditionals, so we tend to ignore it).

To set a condition is to say "If this happened to be the case". There is no need to believe it.

Prior probabilities are different. They represent states of belief in the various hypotheses before the acquisition of the data from the observation. The data may (usually do) result in some hypotheses becoming more credible and some becoming less so, given the conditionals. Observations can't change the conditionals, as the influence of the data on the probabilities depends on the conditionals. Observations change the belief structure, given the specific conditionals. Prior probabilities become posterior probabilities because of the data, under the assumption that the conditionals are true.

A given set of data may modify the belief structure about the same set of hypotheses under different conditionals, as we showed with the White Swan problem. In that problem, suppose you see 10 white swans in a row followed by a black one. An infinite number of hypotheses are under test, all of the form Hx, the probability that the "next swan" will be white, where x ranges from 0 to 1.0. We also tested an independent set of hypotheses of the same form, where in this second infinite set, Hx is the hypothesis that a random swan chosen from among all swans will be white with probability x, where x ranges from 0 to 1.0.

Under the conditional "any swan is as likely as any other to be seen next", the Bayesian analysis shows the best bet to be that it is 10 to 1 that the next swan or a random swan will be white, and that a 3 to 1 bet is only likely to be about half as profitable. (I read that from the second figure in Episode 2). On the conditional that "swans tend to travel in flocks all of one colour", the best bet is that the next swan will be black, despite the preponderance of white swans among those seen, whereas under this same conditional it is 50-50 or 2-1 (depending on when the black swan was seen) that a random swan will be white. So, the same data lead to different "best bet" values depending on the conditionals and on how the swan to be evaluated is chosen.

Here is another example of the same point, that if the conditionals change, so do the probabilities. In [From Bill Powers (2009.01.05.1518 MST)], Bill says: "A gambler can do much better than chance at predicting the probability that a given horse will win, given the track conditions, if he has discovered that the probability known to the bookie comes from a delayed broadcast, while the one known to the gambler comes from a phone call from a confederate at the finish line -- five minutes ago."

Conditionals, like anything else, may be believed or not believed. That doesn't matter to their influence on the probability or likelihood calculations. It does matter to how one can generalize the results of the calculations. For example, if the conditional is "Swans fly in flocks of one colour", then the results from observing a few consecutive swans will be useless in generalizing the probabilities of whiteness to randomly chosen swans. Many flocks would have to be observed. In generating the subjective likelihoods of the different hypotheses, they all would have to be subject to the condition "given that swans fly in flocks of one colour". If that condition proved not to represent other observations, it might be perceived as untrue, but its falsity would not affect how credible the hypothesis would have been had the conditional proved true.

Conditionals, as suggested in the previous paragraph, subsume the sampling problem. I'll discuss this further in a later episode, but for now let me just mention the idea of "random sampling". There has long been a philosophical question about what is "really random", a question that has been answered in many subtle ways. I think that it is the wrong question when applied to sampling data to test the credibility of hypotheses. "Random" in this context means no more than "chosen in a way for which there is no reason to suggest it affects the relation of the data to the hypothesis". The phrase "there is no reason to suggest" always allows for a later discovery that the method of choice does affect the relation of the data to the hypothesis. It also all allows for the conditionals themselves to become hypotheses whose validity is subject to investigation using other data.

To understand this, consider a drug that has been tested on a lot of people, but the investigator never took note of the gender of the people being tested, thinking there was no reason to believe the results would be any different for men or for women. Now imagine a hypothesis such as "this painkiller works well with women", Under the conditional "the gender of the person makes no difference", the credibility of this hypothesis is the same as its credibility in the original study. A background conditional is presumably also that all women react similarly to the painkiller. Under the conditional "Men and women might react differently to the drug", the earlier study offers little evidence relevant to the hypothesis.

The conditionals "this painkiller works equally well for men and women" can become a hypothesis. So can "the conditional "all women react similarly to this drug". They can be tested, and their likelihoods assessed. But when that is done, they are not conditionals. Nor are they probabilities. They are hypotheses for which we may have a subjective probability. The probability we assign to them is conditional on whatever we think relevant, but it must be conditional on something.

I expect the next episode to consider discrimination among hypotheses, unless in writing it a different topic seems to need explaining beforehand. At any rate in the episode on discrimination, I expect to use Rick's experiment and teh Keizer study as examples...but don't hold me to that.

Martin

Martin