Statistics: what is it about?

[Martin Taylor 2009.01.06.21.59]

[From Bill Powers (2009.01.06.1557 MST)]

Control systems do not have to consider probabilities or make decisions or predict anything. They compare a perception to the given reference signal, and on the basis of the error, they act. If there is a significant amount of noise in the perception, they simply act a little erratically, as if there is a random disturbance acting on the controlled variable. All this stuff about Bayesian probability applies only if, at the level where we behave according to rules of logic or mathematics or culture, we perform certain calculations and act on the basis of their results.

It' more than that, and less. The Bayesian analysis only provides a limit, an ideal that is the absolute best that could be done by any mechanism with the current background knowledge and data. It has no relation to what method anyone or any control system might use. The essay Richard Kennaway linked has nice examples of how bad people actually are at making conscious inferences from data. Where I think there might be some value in considering it, though, is at low levels, where the control requirements have been pretty constant over long periods of evolutionary time. Natural selection is usually pretty good at coming up with near-optimum solutions, given time.

We don't have to do that [perform certain calculations and act on the basis of their results], but we can if we want to.

Sometimes we can. Most people don't have the skills or training, and those who do often get it wrong if they are in a hurry.

After we have behaved, it is of course possible to examine the details and come up with explanations that show how the behavior would have been produced if it had been the result of decision-making or cultural influences or probability calculations. However, unless we can prove that it WAS generated in that way, there is no reason to believe those explanations.

There's no reason to believe ANY explanation unless there is evidence for it that is better than the evidence for any other explanation. That doesn't mean, however, that one can simply dismiss an explanation because it isn't provable. No explanation is provable, except in pure mathematics. The best we can do is to accept the explanation for which we currently have a much higher subjective probability than any other, or hold off on accepting any if there are several of equal credibility and there's no pressing need to make a decision. The history of Science suggest (by a "White Swan" kind of analysis) that it is most likely that any "current best" explanation will eventually be superseded. There are (and in principle cannot be) any counter-examples.

It's perfectly possible, indeed likely, that there are ways other than calculating to arrive at behavior organized in the optimal way according to Bayesian analysis or signal-processing principles or decision theory. That doesn't mean that these concepts have anything to do with the way behavior works. The mistake is in offering an explanation without having a good Bayesian reason for thinking it applies.

I couldn't agree more. "Thinking it applies" is much better than requiring proof that an explanation is correct. I'd be extremely surprised if we found that any organism, including humans, actually do a Bayesian analysis when behaving normally, other than when a controlled perception involves explicitly doing a Bayesian Analysis.

Martin

[Martin Taylor 2009.01.06.22.22]


[From Mike Acree (2009.01.06.1607 PST)]
[Martin Taylor 2009.01.06.16.48]--
I agree that such rescalings are immaterial; Jeffreys pointed out long
ago that we could equally well measure (subjective) probabilities on a
scale that was the logarithm of what we currently use. But the
nonadditive "probabilities" of Bernoulli, Dempster, and Shafer are
something else. The additive, frequency

Oops! Frequency???

scale is unable to distinguish
the cases (a) a hypothesis and its contrary are supported by strong
evidence which is evenly balanced (e.g., testimony of conflicting
authorities) and (b) there is no evidence bearing on the question one
way or the other. I think you are saying that discrimination doesn't
matter to you. (I should add that I don't see much practical value in
attempts to develop a calculus of the Dempster-Shafer nonadditive
probabilities.)

As far as the subjective probability of the hypothesis being correct is
concerned, I see no distinction between cases (a) and (b). In both
cases the situation is that the existing background and data leave the
hypothesis and its contrary in balance. What is different is in the
surrounding circumstances. Case (a) suggests the possibility that there
is a further hypothesis you haven’t yet thought of, which might account
for the apparent disagreement between the evidence for and the evidence
against, but that really isn’t relevant to the current credibility of
the hypothesis at issue. Both cases suggest that it might be useful to
get more data that might shift the balance.

I’m not sure how “I think you are saying that discrimination doesn’t
matter to you” connects with the preceding sentences. Discrimination
does matter, when it matters to whoever need to make a decision. I
suppose the primary objective of Bayesian analysis in practice is to
help people make those discriminations as finely as the data warrant.


How do you understand it when I say that the Carnot cycle is a

mathematical ideal of a heat engine, an >ideal that makes explicit a
limit that any heat engine, no matter how constructed, cannot improve
upon? >Do you understand me to be advocating that real engines should be
built so as to be thermodynamically >reversible? Or to be saying that
they actually are?
The former; you had used the words "mathematical ideal":
the Bayesian system . . . isn't a model of what people
do, but a model of the best they could do. It's a mathematical
ideal
I had responded:
And I wouldn't be inclined to say, even in legitimate
applications, that Bayes' Theorem constituted a model of our
thinking, any more than I would say that matrix algebra modeled
our
thinking about spatial transformation.
To which you responded: Nor would I. Nor have I. So I understand you to be saying (a) the Bayesian system is a model of
human thinking, in the sense of a mathematical ideal,

No, I said it WASN’T a model of human thinking. Instead, it is a
mathematical ideal.

and (b) that you
would never say any such thing. When I denied that Bayesianism was a
model of human thinking, I was using "model" in the sense you say you
intended.

Then I don’t know what sense you mean. I thought I was quite explicit
that it was not a model in the sense of suggesting how thinking might
be performed. It is only a mathematical ideal.

I can only infer that you think I tacitly switched to the
empirical meaning of "model"?

So, do you understand now? The Carnot cycle stands in the same relation
to physically realizable heat engines as Bayesian analysis stands to
models of thinking. Neither purports to represent the mechanisms of the
physical (physiological) entities for which it provides a limiting
idealization.

[From Rick Marken (2009.01.06.2210)]

Martin Taylor (2009.01.06.22.22) --

The Carnot cycle stands in the same relation to
physically realizable heat engines as Bayesian analysis stands to models of
thinking. Neither purports to represent the mechanisms of the physical
(physiological) entities for which it provides a limiting idealization.

I think statistics is indeed an idealization; it is the ideal way to
double --that's right, double -- your chances of winning at "Let's
Make a Deal":wink:

Best

Monty Marken

···

--
Richard S. Marken PhD
rsmarken@gmail.com

[From Mike Acree (2009.01.07.0926 PST)]

[Martin Taylor 2009.01.06.22.22]—

This exchange is reminding me of one on economics
about 10 years ago, where it ultimately turned out that what you meant by “inflation”
was what I meant by “interest.”

[MA] The additive, frequency

[MT] Oops! Frequency???

[MA] scale is unable to distinguish
the cases (a) a hypothesis and its contrary are supported by strong
evidence which is evenly balanced (e.g., testimony of conflicting
authorities) and (b) there is no evidence bearing on the question one
way or the other.

Not sure what the “Oops” is about.
“Frequency” is what I meant; you could substitute “aleatory”
if you liked that better.

[MT] As far as the subjective probability of the hypothesis being correct is
concerned, I see no distinction between cases (a) and (b).

Yes, that’s what I was posing as a problem.
A probability of .5 could mean either fairly substantial support for a proposition
or complete ignorance. You go on to say that the ability to discriminate these two
cases does matter, but that’s something (additive) probability theory can’t
do.

[MT]         the Bayesian system . . . isn't a model of what people
               do, but a model of the best they could do. It's a mathematical
               ideal

[MA] So I understand you to be saying (a) the Bayesian system is a model of
human thinking, in the sense of a mathematical ideal,

[MT] No, I said it WASN’T a model of
human thinking. Instead, it is a mathematical ideal.

You used the word “model”
yourself in the sentence I keep quoting (above), so perhaps you can muster a
bit of sympathy for my difficulty in understanding your repeated vehement
denials that Bayesianism is a model. I’m not sure whether you’re
introducing a third sense of “model” here, but it doesn’t
matter: I deny that Bayes’ Theorem is a mathematical ideal of human
reasoning, or a model of reasoning in any other sense. I see it, as I
said, simply as the appropriate formula for a certain kind of arithmetic
problem. But I wouldn’t describe it as a mathematical ideal of our
thinking about such problems, any more than I would say that a2 + b2
= c2 was a mathematical ideal of our thinking about right
triangles. And when you get past the textbook problems used by
experimental psychologists and AI theorists, I see very few practical
applications that don’t look like a stretch.

It’s clear that you reject the basic
thesis of my paper, about the incoherence of the concept of probability, but I
still very much appreciate the attention you’ve given it. My thesis
is as radical, in its limited sphere, as PCT is in psychology, albeit much less
consequential. “All concepts are destructible,” Spencer Brown
wrote in his destructive analysis of the concept of randomness (ordinary
language conflates two meanings which are not only different, but conflicting),
“and it is not always obvious which to destroy.” All of us,
at least those at the formal-operational level, struggled at some point to
master the peculiar modern dualistic concept of probability, and it’s
hard for us now to imagine doing without it. So I see no reason to expect
my views to enjoy nearly the success that PCT has. But with enough such
patient readers as you, I might eventually draw some lessons for clarifying, or
revoking, my message.

Mike

[Martin Taylor 2009.01.07.12.54]

[From Mike
Acree (2009.01.07.0926 PST)]

[Martin Taylor
2009.01.06.22.22]—

This
exchange is reminding me of one on economics
about 10 years ago, where it ultimately turned out that what you meant
by “inflation”
was what I meant by “interest.”

[MA] The additive, frequency

[MT] Oops! Frequency???

[MA] scale is unable to distinguish
the cases (a) a hypothesis and its contrary are supported by strong
evidence which is evenly balanced (e.g., testimony of conflicting
authorities) and (b) there is no evidence bearing on the question one
way or the other.

Not sure
what the “Oops” is about.
“Frequency” is what I meant; you could substitute “aleatory”
if you liked that better.

“Oops” was about the slip in substituting “frequency” for
“probability”. My primary argument has been against using frequency as
anything other than one of the sources that a person might use to
create a perception of the probability of something. To substitute
“frequency” both suggests that it is the only way a person can
legitimately (if such a word means anything when dealing with a
perception) develop a perception of probability, and that the
perception is of something that has an objective existence in the
environment. To my mind, that’s rather a big “Oops”, and substitution
of “aleatory” makes it no better.

What, from a frequency point of view, is the objective probability that
the next reigning Queen of England will be called “Martha”? What is
your subjective probability of that event?

[MT] As far as the subjective probability of the hypothesis being
correct is
concerned, I see no distinction between cases (a) and (b).

Yes, that’s
what I was posing as a problem.
A probability of .5 could mean either fairly substantial support for a
proposition
or complete ignorance.

Yes. Why is that a problem? I really must be missing something here.
What I can see is that the context of the hypothesis is different, but
the context involves different perceptions, both at the level of the
hypothesis and above it. The context in one case includes the
perception that people have cared enough about the hypothesis to
produce or examine relevant data. The sum total of the result of their
efforts is to leave the subjective probability of the hypothesis right
where it would have been if there had been no previous interest in the
hypothesis. That seems intuitively correct to me.

You go on to
say that the ability to discriminate these two
cases does matter, but that’s something (additive) probability theory
can’t
do.

If that’s how you interpreted my response, then I misinterpreted your
assertion. My response was supposed to say that if people have gone to
the trouble of gathering evidence, then discrimination between the
“yes” and “no” hypotheses matters to them. You seem to mean “Do I
care whether it is possible to discriminate between situations about
which people care and situations that mean nothing to people”. Yes, I
care whether it is possible to discriminate between those situations,
but the competing hypotheses “several people care” and “nobody cares”
are different hypotheses than the one about which they may care – the
original “yes or no” dimension. It seems perverse to me to expect those
two separate discriminations to be represented by one scalar value.

[MT]         the Bayesian system . . . isn't a model of what people
               do, but a model of the best they could do. It's a mathematical
               ideal

[MA] So I understand you to be saying (a) the Bayesian system is a model of
human thinking, in the sense of a mathematical ideal,

[MT] No, I said it WASN’T a
model of
human thinking. Instead, it is a mathematical ideal.

You used the
word “model”
yourself in the sentence I keep quoting (above), so perhaps you can
muster a
bit of sympathy for my difficulty in understanding your repeated
vehement
denials that Bayesianism is a model.

I have never denied it is a model. I denied that it is a model of what
people do. Exactly as I said initially, it’s a model of the best they
could do. I do not comprehend your difficulty in understanding the
distinction between a model of what people do and a model of the ideal
best they could possibly do no matter what methods they might use.

It’s clear
that you reject the basic
thesis of my paper, about the incoherence of the concept of
probability,

I had thought you were arguing against inference from “objective”
“frequentist” concepts of probability. I’ll have to read more
carefully. Are you arguing that it is only an illusion that people have
a perception of the probability of some things? Is someone wrong to say
“That’s highly unlikely”, or “I’ll bet 3 to one against it”?

By the way, do you know Watanabe’s book “Knowing and Guessing”? You
don’t reference it.

All of us,
at least those at the formal-operational level, struggled at some point
to
master the peculiar modern dualistic concept of probability, and it’s
hard for us now to imagine doing without it.

Are you talking about the distinction between “probability” and
“likelihood” when you talk about a “dualistic concept”? You use the
word “probability” for both in your essay. It’s a long time since I
read Watanabe, but I seem to remember that he uses “creditation”
“credibility” or some similar word for what I call “likelihood”.

As an aside on your essay, consider the passage (p8): “Suppose, to use
an example from Jonathan Cohen (L. J. Cohen, 1977), that there were
three independent pieces of evidence that were relevant, and they had
been established with probabilities of .8, .8, and .75. Then we should
ordinarily consider the case as having been rather well established,
with the weakest proposition having a probability of .75. But on a
relative-frequency scale the probability that all three propositions
are true is the product of their probabilities, which is less than .5—a
finding in favor of innocence.”

On what basis do you say that this is a finding in favour of innocence?
If exoneration had been stated as depending on any one of the pieces of
evidence being false, then I would not ask the question. But this
conditional was not stated. If guilt could be established if any one of
them were true, the probability of guilt would be (1-.8)(1-.8)(1-.75)
= 0.99. That conditional was not stated, either. Or it could take any
two of the three, which is a more complicated expression that gives an
intermediate result that I have not calculated. Or it could be that a
particular one of them being false might by itself establish innocence,
but if it were true, the truth of either of the others would establish
guilt. Without specifying the conditionals, you have an ill-posed
example.

So I see no
reason to expect
my views to enjoy nearly the success that PCT has. But with enough
such
patient readers as you, I might eventually draw some lessons for
clarifying, or
revoking, my message.

I can see that I had not fully appreciated what you were trying to
achieve.

Martin

[From Rick Marken (2009.01.07.1230)]

Martin Taylor (2009.01.07.12.54) --

I have never denied it [Bayesianism] is a model. I denied that it is a model of
what people do. Exactly as I said initially, it's a model of the best they could
do.

I agree. For example, in order to to determine the best people can do
in the "Let's Make a Deal" situation (the Monty Hall problem) you have
to know the algebra of conditional probabilities, which I think is
basically what Bayesianism is about, or you have to know how to write
a computer simulation (which is the way I did it). I suppose the best
(Bayes) solution to that problem could provide an initial hypothesis
about what people are controlling for when asked whether or not they
want to change their choice of doors. But once you find that a person
is not controlling for the best solution (which, I've found, is nearly
always the case), I'm not sure what the merits of knowing the best
solution are. You still have to come up with another hypothesis about
what the person is controlling for, and that just takes good old
American know how.

Best

Rick

···

--
Richard S. Marken PhD
rsmarken@gmail.com

[From Mike Acree (2009.01.07.1235 PST]

[Martin Taylor 2009.01.07.12.54]–

This was a clarifying message.

“Oops” was about the slip in substituting
“frequency” for “probability”. My primary argument has been
against using frequency as anything other than one of the sources that a person
might use to create a perception of the probability of something. To substitute
“frequency” both suggests that it is the only way a person can
legitimately (if such a word means anything when dealing with a perception)
develop a perception of probability, and that the perception is of something
that has an objective existence in the environment. To my mind, that’s rather a
big “Oops”, and substitution of “aleatory” makes it no
better.

Just because the word “probability”
has been used (loosely) by Bernoulli, Shafer, and others to include a
nonadditive concept, I inserted the qualifier “frequency” (or “additive,”
or “aleatory”) to restrict my meaning to such scales. I did
not mean to exclude Bayesian probabilities. But I can see that the
qualifier may have been confusing, just because hardly anybody considers
nonadditive probability.

What, from a frequency point of view, is the objective probability that the
next reigning Queen of England will be called “Martha”? What is your
subjective probability of that event?

I suggested (e.g., p. 5), if I didn’t
actually say, that I don’t regard such probabilities as very
meaningful. It is true that we can make such judgments, and in the last
century or two (perhaps for not so long) we have become comfortable assigning
numbers to such judgments. Borrowing an analogy from Krantz, I think this
operation is somewhat like rating the esthetic pleasure we receive from viewing
a painting by matching it with the sweetness of a graded series of sucrose
solutions of known concentration. I don’t see much of scientific
value coming out of it. There may be useful information when the report
that there is a 20% chance of rain tomorrow (or that our current threat level
is orange), but I would be leery of putting numbers like that in an equation.

[MT] As far as the subjective probability of the hypothesis being correct is
concerned, I see no distinction between cases (a) and (b).

Yes, that’s what I was posing as a
problem. A probability of .5 could mean either fairly substantial support
for a proposition or complete ignorance.

You go on to say that the ability to discriminate
these two cases does matter, but that’s something (additive) probability
theory can’t do.

If that’s how you interpreted my response, then I misinterpreted your
assertion. My response was supposed to say that if people have gone to the
trouble of gathering evidence, then discrimination between the “yes”
and “no” hypotheses matters to them. You seem to mean “Do I
care whether it is possible to discriminate between situations about which people
care and situations that mean nothing to people”.

Yes.

Yes, I care whether it is possible to discriminate
between those situations, but the competing hypotheses “several people
care” and “nobody cares” are different hypotheses than the one
about which they may care – the original “yes or no” dimension. It
seems perverse to me to expect those two separate discriminations to be
represented by one scalar value.

Right; it does to me,
too. That was one of my criticisms of probability theory as a model of
inference.

I have never denied it is a model. I denied that it is
a model of what people do. Exactly as I said initially, it’s a model of the
best they could do. I do not comprehend your difficulty in understanding the
distinction between a model of what people do and a model of the ideal best
they could possibly do no matter what methods they might use.

I don’t have any trouble with the
distinction, and I took you from the beginning to mean the second. It was
evidently my slight paraphrasing of your words to which you strenuously
objected, so it looked to me as though you were simply contradicting yourself.

I had thought you were arguing against inference
from “objective” “frequentist” concepts of probability.
I’ll have to read more carefully. Are you arguing that it is only an illusion
that people have a perception of the probability of some things? Is someone
wrong to say “That’s highly unlikely”, or “I’ll bet 3 to one
against it”?

I was arguing against the concept of
statistical inference per
se. I argued that adherence to an objective, frequentist concept of
probability leads, logically and historically, to a decision theory rather than
to an inference theory (as Neyman himself insisted), which retains legitimate
application in fields like epidemiology. I argued that Bayesianism has
the only claim to being a theory of statistical inference, but that it fails both descriptively and
prescriptively (or as an ideal). You disagree with at least this last
claim.

By the way, do you know Watanabe’s book “Knowing and Guessing”? You
don’t reference it.

I don’t know it; thanks for the
reference.

Are you talking about the distinction between “probability” and
“likelihood” when you talk about a “dualistic concept”? You
use the word “probability” for both in your essay.

No (assuming you mean by “likelihood”
something like what it means in the expression “maximum likelihood”);
I was talking about the distinction between objective and subjective, or
between aleatory and epistemic, to use Hacking’s terms. These are
the two concepts that I argue got stuck together more or less by historical
accident.

As an aside on your essay, consider the passage (p8): “Suppose, to use an
example from Jonathan Cohen (L. J. Cohen, 1977), that there were three
independent pieces of evidence that were relevant, and they had been
established with probabilities of .8, .8, and .75. Then we should
ordinarily consider the case as having been rather well established, with the
weakest proposition having a probability of .75. But on a
relative-frequency scale the probability that all three propositions are true
is the product of their probabilities, which is less than .5—a finding in
favor of innocence.”

On what basis do you say that this is a finding in favour of innocence? If
exoneration had been stated as depending on any one of the pieces of evidence
being false, then I would not ask the question. But this conditional was not
stated.

I think this condition was intended, but
you’re right that it was not explicit. Such background assumptions
plague most such textbook examples.

If guilt could be established if any one of them were
true, the probability of guilt would be (1-.8)(1-.8)(1-.75) = 0.99.

I think you mean 1 - (1-.8)(1-.8)(1-.75).

I’m relieved that we’re making
some progress in communication!

Mike

[From Bill Powers (2009.01.07.1340 MST)]

Martin Taylor 2009.01.06.21.59]

[WTP] It's perfectly possible, indeed likely, that there are ways other than calculating to arrive at behavior organized in the optimal way according to Bayesian analysis or signal-processing principles or decision theory. That doesn't mean that these concepts have anything to do with the way behavior works. The mistake is in offering an explanation without having a good Bayesian reason for thinking it applies.

[MMT]I couldn't agree more. "Thinking it applies" is much better than requiring proof that an explanation is correct. I'd be extremely surprised if we found that any organism, including humans, actually do a Bayesian analysis when behaving normally, other than when a controlled perception involves explicitly doing a Bayesian Analysis.

At last. I think that wraps it up. For future reference, when I say "proof" you can always append the Bayesian conditional to it. I know that science is a matter of making the best guesses we can.

Best,

Bill P.

[Martin Taylor 2009.01.08.10.59]

[From Mike Acree
(2009.01.07.1235 PST]

[Martin Taylor
2009.01.07.12.54]–

What, from a frequency point of view, is the objective probability that
the
next reigning Queen of England will be called “Martha”? What is your
subjective probability of that event?

I suggested
(e.g., p. 5), if I didn’t
actually say, that I don’t regard such probabilities as very
meaningful. It is true that we can make such judgments, and in the
last
century or two (perhaps for not so long) we have become comfortable
assigning
numbers to such judgments. Borrowing an analogy from Krantz, I think
this
operation is somewhat like rating the esthetic pleasure we receive from
viewing
a painting by matching it with the sweetness of a graded series of
sucrose
solutions of known concentration. I don’t see much of scientific
value coming out of it. There may be useful information when the
report
that there is a 20% chance of rain tomorrow (or that our current threat
level
is orange), but I would be leery of putting numbers like that in an
equation.

A fundamental question: E.T. Jaynes (Probability Theory: the logic of
science, Cambridge U.P, 2003) lists a set of desiderata that should
apply to any measure of plausibility, and goes on to show that
logically these desiderata lead to the standard (but not frequentist or
objective) measure of probability. Jaynes is arguing strongly for a
Bayesian approach, and by this point in the book has introduced a
hypothetical robot that discovers the plausibility of a hypothesis
based on available data. The desiderata are:

  1. Degrees of plausibility are represented by real numbers.

  2. There is a qualitative correspondence with common sense.

3a. If a conclusion can be reached in more than one way, then every
possible way must lead to the same result.

3b. The robot always takes into account all of the evidence it has
relevant to a question. It does not arbitrarily ignore some of the
information, basing its conclusions only on what remains. In other
words, the robot is completely non-ideological.

3c. The robot always represents equivalent states of knowledge by
equivalent plausibility assignments. That is, if in two problems the
robot’s state of knowledge is the same (except perhaps for the
labelling of the propositions), then it must assign the same
plausibilities in both.

The question is whether you subscribe to these desiderata?

[MT] As far as the subjective probability of the hypothesis being
correct is
concerned, I see no distinction between cases (a) and (b).

Yes, that’s
what I was posing as a
problem. A probability of .5 could mean either fairly substantial
support
for a proposition or complete ignorance.

You go on to
say that the ability to discriminate
these two cases does matter, but that’s something (additive)
probability
theory can’t do.

I believe I said that discrimination between these two cases was not
representable by a single scalar number at the same time as that number
represents the subjective probability (or plausibility) of the
competing hypotheses. One number can’t represent two independent
variables. You agreed with this a couple of paragraphs later, so I’m at
a loss to understand wherein we differ here.

I had thought you were
arguing against inference
from “objective” “frequentist” concepts of probability.
I’ll have to read more carefully. Are you arguing that it is only an
illusion
that people have a perception of the probability of some things? Is
someone
wrong to say “That’s highly unlikely”, or “I’ll bet 3 to one
against it”?

I was
arguing against the concept of
statistical inference
per
se. I argued that adherence to an objective, frequentist concept of
probability leads, logically and historically, to a decision theory
rather than
to an inference theory (as Neyman himself insisted), which retains
legitimate
application in fields like epidemiology. I argued that Bayesianism has
the only claim to being a theory of statistical inference, but that it fails
both descriptively and
prescriptively (or as an ideal). You disagree with at least this last
claim.

Yes, I disagree with the last claim, insofar as I argue that Bayesian
analysis provides limits on how much can be determined by any method
about the discrimination among discrete or continuous hypotheses based
on the data at hand; but I agree with the middle sentence. As for
arguing against the concept of statistical inference, I think the term
has to be defined a bit better before I will agree or disagree.

In my language, what Sherlock Holmes does is often largely statistical
inference. He observes a datum, and has noted that on many occasions
this datum is accompanied by another (example, Holmes observation that
Watson will not(?) invest in gold shares, based on billiard chalk on
Watson’s hand, and the knowledge that Watson usually plays billiards
with a certain opponent). He may well also have a model that reinforces
this relationship. From several such data, Holmes infers that the
person is of this or that character, or has done thus and so. Is that
the kind of statistical inference concept against which you argue,
because if it is, then I disagree with you.

I’m not clear how you can argue against a concept. I can see how you
could argue that it is misunderstood, misapplied, or is in principle of
no value. But if a person has a concept, how can you argue that it
doesn’t exist? It’s a perception, and I don’t think you can argue that
any perception someone has is invalid. As Bill P. from time to time
reminds us, our perceptions are the only contact we have with whatever
“real world” might be out there. Perceptions are the ultimate facts,
and concepts are perceptions. If you accept that, then it cannot be the
concept of statistical inference against which you argue, but its use.

If guilt could be established
if any one of them were
true, the probability of guilt would be (1-.8)(1-.8)(1-.75) = 0.99.

I think you
mean 1 - (1-.8)(1-.8)(1-.75).

Yes. “Brain fart”.

I’m relieved
that we’re making
some progress in communication!

I hope it eventually results in convergence.

Does absolute truth exist ?-)

Martin

[from Tracy B. Harms (2009-01-08 11:14 Pacific)]

Martin,

What you can say about the "White Swan" problem is that if the
conditions remain the same, the likelihood that the next swan
you see will be white increases the more swans you have seen
without seeing a non-white one.

I'd like to propose a change to this sentence:

With a Bayesian approach to the "White Swan" problem we may say that
if the conditions remain the same, the estimated likelihood that the
next swan you see will be white may be systematically increased the
more swans you have seen without seeing a non-white one.

Perhaps you won't find that change agreeable. The objection I expect
from Bayesians is that the only thing we can call likelihood is
estimated likelihood, so the qualification is counterproductive for
suggesting that some other likelihood might apply. I maintain that
qualification is necessary out of the following consideration: If in
fact all swans are white, the next observed swan will be white.

It will not do to say that there is a change in likelihood through
observations and calculations when what is in question is a matter of
fact, not a matter of estimation. Bayesian techniques are dandy
insofar as the topic is improving estimation. They are not when they
would supplant attention to the truth value of propositions.

Tracy Harms

···

On Sat, Jan 3, 2009 at 10:25 PM, Martin Taylor <mmt-csg@mmtaylor.net> [2009.01.03.22.54] wrote:

[From Bill Powers (2009.01.08.1229 MST)]

Tracy B. Harms (2009-01-08 11:14 Pacific) --

With a Bayesian approach to the "White Swan" problem we may say that
if the conditions remain the same, the estimated likelihood that the
next swan you see will be white may be systematically increased the
more swans you have seen without seeing a non-white one.

It will not do to say that there is a change in likelihood through
observations and calculations when what is in question is a matter of
fact, not a matter of estimation. Bayesian techniques are dandy
insofar as the topic is improving estimation. They are not when they
would supplant attention to the truth value of propositions.

I like "Estimated likelihood" a lot better than "likelihood" alone for perhaps a different reason from yours. And now that you have introduced the term "estimated", I would like to substitute it for "subjective" in the other context, too. An estimated probability (or an estimated likelihood) is one that someone estimates. Whether this has anything to do with what that someone experiences as a feeling of uncertainty is, as far as I am concerned, an unsettled question with my own hunch being that the answer is no.

Best,

Bill P.

[Martin Taylor 2009.01.08.16.40]


[from Tracy B. Harms (2009-01-08 11:14 Pacific)]
Martin,
What you can say about the "White Swan" problem is that if the
conditions remain the same, the likelihood that the next swan
you see will be white increases the more swans you have seen
without seeing a non-white one.

I'd like to propose a change to this sentence:
With a Bayesian approach to the "White Swan" problem we may say that
if the conditions remain the same, the estimated likelihood that the
next swan you see will be white may be systematically increased the
more swans you have seen without seeing a non-white one.
Perhaps you won't find that change agreeable. The objection I expect
from Bayesians is that the only thing we can call likelihood is
estimated likelihood, so the qualification is counterproductive for
suggesting that some other likelihood might apply. I maintain that
qualification is necessary out of the following consideration: If in
fact all swans are white, the next observed swan will be white.
It will not do to say that there is a change in likelihood through
observations and calculations when what is in question is a matter of
fact, not a matter of estimation. Bayesian techniques are dandy
insofar as the topic is improving estimation. They are not when they
would supplant attention to the truth value of propositions.

I’d have no objection to your suggested change, to the extent that Bill
P would have no objection to a suggested wording change that said all
“perception” was “conscious perception”, and required a different word
for the “perceptual signal”.

Maybe that sounds glib, but the point is that you refer to some
objective fact about the real world that is not accessible to any
observer. Nobody, least of all the person who has observed only 10
swans, could know whether it is a fact that every swan is white,
especially not in all the situations for which the “White Swan” problem
is a metaphor. I use the word “likelihood” in the context of
observables, where the facts are all perceptions, so there really
should not be any difficulty in distinguishing the two uses.

That having been said, in the course of writing these “episodes” I have
been contemplating using a different word for the same concept, namely
“credibility”, which seems to me to convey what is meant better than
does “likelihood”. So, I would reword what you reworded slightly
differently:

What you can say about the "White Swan" problem is that if the
conditions remain the same, the credibility that the next swan
you see will be white increases the more swans you have seen
without seeing a non-white one.

Would you go along with that? I think that it actually conveys the idea
in a way that is correct both in a Bayesian analysis and in everyday
intuition.

Martin

···

mmt-csg@mmtaylor.net

[From Bill Powers (2009.01.08.1926 MST)]

Martin Taylor 2009.01.08.16.40 –

Maybe that sounds glib, but the
point is that you refer to some objective fact about the real world that
is not accessible to any observer. Nobody, least of all the person who
has observed only 10 swans, could know whether it is a fact that every
swan is white, especially not in all the situations for which the
“White Swan” problem is a metaphor. I use the word
“likelihood” in the context of observables, where the facts are
all perceptions, so there really should not be any difficulty in
distinguishing the two uses.

That having been said, in the course of writing these
“episodes” I have been contemplating using a different word for
the same concept, namely “credibility”, which seems to me to
convey what is meant better than does “likelihood”. So, I would
reword what you reworded slightly differently:

I don’t think this is helping us much. Using a word like
“credibility” assigns a property to that which is believed, and
whether you mean something objective or a perception, believing is not
done by the thing that is believed. It’s a bit like a beer commercial
that just ran on my TV, claiming that this beer has a property called
“drinkability.” Does something to which you pay attention have
“attendability?” Does something you hate have hateability? How
about ignorability, findability, likeability, solvability,
postponability, and so on forever?

I could accept the term “credulity,” but not
“credibility.” We need language that makes clear whether we
mean the object or the subject of the verb.

Best,

Bill

[From Mike Acree (2009.01.09.2219 PST)]

[Martin Taylor 2009.01.08.10.59]–

A fundamental question: E.T. Jaynes (Probability Theory: the
logic of science, Cambridge U.P, 2003) lists a set of desiderata that should
apply to any measure of plausibility, and goes on to show that logically these
desiderata lead to the standard (but not frequentist or objective) measure of
probability. Jaynes is arguing strongly for a Bayesian approach, and by this
point in the book has introduced a hypothetical robot that discovers the
plausibility of a hypothesis based on available data. The desiderata are:

  1. Degrees of plausibility are represented by real numbers.

  2. There is a qualitative correspondence with common sense.

3a. If a conclusion can be reached in more than one way, then every possible
way must lead to the same result.

3b. The robot always takes into account all of the evidence it has relevant to
a question. It does not arbitrarily ignore some of the information, basing its
conclusions only on what remains. In other words, the robot is completely
non-ideological.

3c. The robot always represents equivalent states of knowledge by equivalent
plausibility assignments. That is, if in two problems the robot’s state of
knowledge is the same (except perhaps for the labelling of the propositions),
then it must assign the same plausibilities in both.

The question is whether you subscribe to these desiderata?

I’m
familiar with some of Jaynes’ earlier work, and, as Bayesian theorists
go, I sort of like him, maybe because he’s an engineer. I have
trouble with Desideratum #1, as I indicated in my previous message. All
of us can assign such numbers easily enough, but whether those operations mean
anything of scientific value is not so easily established. Many
theorists, like R. T. Cox in his Algebra of
Probable Inference
(1961), have put forward sets of criteria for
plausible inference; and they virtually always build in additivity and the
multiplicative rule, guaranteeing that they won’t break any interesting
new ground. I found Polya’s Mathematics
and Plausible Reasoning
mildly interesting, but note that he didn’t
attempt to quantify anything.

[MA] A probability of .5 could mean
either fairly substantial support for a proposition or complete ignorance.

You go on to say that the ability to discriminate
these two cases does matter, but that’s something (additive) probability
theory can’t do.

I believe I said that discrimination between these two cases was not
representable by a single scalar number at the same time as that number
represents the subjective probability (or plausibility) of the competing hypotheses.
One number can’t represent two independent variables. You agreed with this a
couple of paragraphs later, so I’m at a loss to understand wherein we differ
here.

We agree that probability theory can’t
discriminate these two cases. We differ only in that I regard that as a
problem for probability theory.

In my language, what Sherlock
Holmes does is often largely statistical inference. He observes a datum, and
has noted that on many occasions this datum is accompanied by another (example,
Holmes observation that Watson will not(?) invest in gold shares, based on
billiard chalk on Watson’s hand, and the knowledge that Watson usually plays
billiards with a certain opponent). He may well also have a model that
reinforces this relationship. From several such data, Holmes infers that the
person is of this or that character, or has done thus and so. Is that the kind
of statistical inference concept against which you argue, because if it is,
then I disagree with you.

I actually don’t see the
statistics. Can you tell whether Holmes was using Laplace’s
Rule of Succession? Or Bayes’ Theorem?

I’m not clear how you can argue against a concept. I can see how you could
argue that it is misunderstood, misapplied, or is in principle of no value. But
if a person has a concept, how can you argue that it doesn’t exist? It’s a
perception, and I don’t think you can argue that any perception someone has is
invalid. As Bill P. from time to time reminds us, our perceptions are the only
contact we have with whatever “real world” might be out there.
Perceptions are the ultimate facts, and concepts are perceptions. If you accept
that, then it cannot be the concept of statistical inference against which you
argue, but its use.

It’s true, all of us have concepts
of God, phlogiston, unicorns, and round squares (Meinong evidently believed in
a sort of metaphysical underworld populated by such entities); and nothing I
could say would make those concepts disappear, nor is that particularly my
wish. What I’m saying about statistical inference bears some
similarity to what Thomas Szasz says about mental illness (not that I want to
get into debating Szasz’s views), when he says it is a myth. Like
most thoroughly educated people, you believe in statistical inference, deeply
enough I’ll bet you can’t imagine (nondeductive) inference being
nonstatistical, deeply enough that you didn’t even notice that I was
challenging that belief. I will be quite surprised if we come to
agreement on that point.

Mike

[Martin Taylor 2009.01.10.01.40

[From Mike
Acree (2009.01.09.2219 PST)]

[Martin Taylor
2009.01.08.10.59]–

A
fundamental question: E.T. Jaynes (Probability Theory: the
logic of science, Cambridge U.P, 2003) lists a set of desiderata that
should
apply to any measure of plausibility, and goes on to show that
logically these
desiderata lead to the standard (but not frequentist or objective)
measure of
probability. Jaynes is arguing strongly for a Bayesian approach, and by
this
point in the book has introduced a hypothetical robot that discovers
the
plausibility of a hypothesis based on available data. The desiderata
are:

  1. Degrees of plausibility are represented by real numbers.

  2. There is a qualitative correspondence with common sense.

3a. If a conclusion can be reached in more than one way, then every
possible
way must lead to the same result.

3b. The robot always takes into account all of the evidence it has
relevant to
a question. It does not arbitrarily ignore some of the information,
basing its
conclusions only on what remains. In other words, the robot is
completely
non-ideological.

3c. The robot always represents equivalent states of knowledge by
equivalent
plausibility assignments. That is, if in two problems the robot’s state
of
knowledge is the same (except perhaps for the labelling of the
propositions),
then it must assign the same plausibilities in both.

The question is whether you subscribe to these desiderata?

I’m
familiar with some of Jaynes’ earlier work, and, as Bayesian theorists
go, I sort of like him, maybe because he’s an engineer. I have
trouble with Desideratum #1, as I indicated in my previous message.

Let’s focus on that, then. I think we agreed that it is not reasonable
to expect a single scalar number to represent both the balance of the
evidence for or against a hypothesis and the amount of evidence that is
available bearing on the hypothesis, as these are two independent
concepts. The first is what I would take to relate to the concept of
plausibility or probability. Are you saying that you would not make
that association, or that the balance is more complicated than can be
represented by a scalar variable?

All
of us can assign such numbers easily enough, but whether those
operations mean
anything of scientific value is not so easily established. Many
theorists, like R. T. Cox in his Algebra
of
Probable Inference
(1961), have put forward sets of
criteria for
plausible inference; and they virtually always build in additivity and
the
multiplicative rule, guaranteeing that they won’t break any interesting
new ground.

That’s exactly what Jaynes avoids by starting with plausibility rather
than probability, and using ONLY those desiderata. He half-way
apologises at one point for going through a logical derivation of what,
to most people, seems a trivially obvious point, in deriving rather
than assuming the normal rules that apply to the combinations of
probabilities. I’ve forgotten what Watanabe does, as it’s decades since
I read the book, but I seem to remember he did much the same.

I found
Polya’s Mathematics
and Plausible Reasoning
mildly interesting, but note that he
didn’t
attempt to quantify anything.

[MA] A
probability of .5 could mean
either fairly substantial support for a proposition or complete
ignorance.

You go on to
say that the ability to discriminate
these two cases does matter, but that’s something (additive)
probability
theory can’t do.

No, because it’s on a different dimension. I come back to the idea that
one number can’t simultaneously represent two concepts.

I believe I said that discrimination between these two cases was not
representable by a single scalar number at the same time as that number
represents the subjective probability (or plausibility) of the
competing hypotheses.
One number can’t represent two independent variables. You agreed with
this a
couple of paragraphs later, so I’m at a loss to understand wherein we
differ
here.

We agree
that probability theory can’t
discriminate these two cases. We differ only in that I regard that as
a
problem for probability theory.

I’m not at all clear WHY probability theory should be applied to the
degree of interest people have in the truth of a hypothesis, or to
measuring the quantity of evidence that has been addressed to the
question.

In my
language, what Sherlock
Holmes does is often largely statistical inference. He observes a
datum, and
has noted that on many occasions this datum is accompanied by another
(example,
Holmes observation that Watson will not(?) invest in gold shares, based
on
billiard chalk on Watson’s hand, and the knowledge that Watson usually
plays
billiards with a certain opponent). He may well also have a model that
reinforces this relationship. From several such data, Holmes infers
that the
person is of this or that character, or has done thus and so. Is that
the kind
of statistical inference concept against which you argue, because if it
is,
then I disagree with you.

I actually
don’t see the
statistics. Can you tell whether Holmes was using Laplace’s
Rule of Succession? Or Bayes’ Theorem?

The statistics come from Holmes having observed in his lifetime similar
situations and correlated them with other facts. I have no idea any
more than you what went on in his fictional brain.

I’m not clear how you can argue against a concept. I can see how you
could
argue that it is misunderstood, misapplied, or is in principle of no
value. But
if a person has a concept, how can you argue that it doesn’t exist?
It’s a
perception, and I don’t think you can argue that any perception someone
has is
invalid. As Bill P. from time to time reminds us, our perceptions are
the only
contact we have with whatever “real world” might be out there.
Perceptions are the ultimate facts, and concepts are perceptions. If
you accept
that, then it cannot be the concept of statistical inference against
which you
argue, but its use.

It’s true,
all of us have concepts
of God, phlogiston, unicorns, and round squares (Meinong evidently
believed in
a sort of metaphysical underworld populated by such entities); and
nothing I
could say would make those concepts disappear, nor is that particularly
my
wish. What I’m saying about statistical inference bears some
similarity to what Thomas Szasz says about mental illness (not that I
want to
get into debating Szasz’s views), when he says it is a myth.

What about it is a myth? When someone says “I’ll give you three to one
against that” is he referring to a myth? I would take it as evidence
that he perceives something about the situation to which I would give
the label “probability”. I take it you would not.

Like
most thoroughly educated people, you believe in statistical inference,
deeply
enough I’ll bet you can’t imagine (nondeductive) inference being
nonstatistical, deeply enough that you didn’t even notice that I was
challenging that belief. I will be quite surprised if we come to
agreement on that point.

Now what do you mean by “statistical”? I have the feeling that you mean
“frequentist” “objective” measures of how often X is associated with Y.
Does your “statistical” include the use of formal and informal models,
intuition and hunches? Because the way I think of “probability”, those
may sometimes be the ONLY evidence available. “We should be able to go
by that road, because the weatherman says there really isn’t going to
be enough rain to flood it too deeply by the time we get to the low
point if we hurry; I’ll give 2 to 1 that we can.” What kind of
statistical inference is that? To me, it’s a perfectly valid
probability assessment. and a plausible inference from the evidence at
hand.

I have a problem relating “statistical” in its everyday sense to
“probability”, though I accept that statistics does often provide
evidence to influence one’s probability judgment. And what is
“nondeductive inference”?

Since I’m not clear on some of your definitions, at least not as clear
as once I thought I was, I guess I don’t know enough to know if I will
be surprised if we come to agreement.

Martin

[From Bill Powers (2009./01.10.0508 MST)]
Mike Acree
(2009.01.09.2219 PST)]
[Martin Taylor
2009.01.08.10.59]–
A fundamental question: E.T. Jaynes
(Probability Theory: the logic of science, Cambridge U.P, 2003) lists a
set of desiderata that should apply to any measure of plausibility, and
goes on to show that logically these desiderata lead to the standard (but
not frequentist or objective) measure of probability. Jaynes is arguing
strongly for a Bayesian approach, and by this point in the book has
introduced a hypothetical robot that discovers the plausibility of a
hypothesis based on available data.
I think you guys are on the verge of something, but we have to work out
what it is. Partly because of just mulling over this whole subject for a
long time, my brain has started involuntarily putting ideas together.
This morning I awoke with some amazing series of thoughts, of which I
hope a glimmer will still remain now that I’m back from dreamland. See
what you make of this.
First, the Bayesian approach. The probability of A is the probability of
B given C. Isn’t this just extending two-variable probabilities to three
dimensions? Considering A, B, and C together, we have eight possible
combinations of true and false, each one of which can be written as a
Bayesian conditional probability – can’t it? They? So why not
generalize? I suppose that’s already been done. Bayesian probability is
just the first step toward multidimensional probability. Maybe.
The main idea this morning was very simple: degrees of knowledge. What is
probability? Mike, you can perhaps put this in terms of kinds of
probability (apostolic? No, that’t not it), but to me it’s simply a
question of what we think we know by various means.
When we know nothing, we simply have faith: we decide what we want to
believe, and make it true. We believe it as long as doing so is pleasing
to us, or more pleasing than believing it’s not true… I’d call this the
lowest degree of knowledge. At least it’s based on something that matters
to us: intrinsic error, or lack of it. It’s not completely
arbitrary.
The next degree of knowledge is the bare detection of a regularity.
Something happens. Something happens. Something happens – ah, the
same thing
happens again. Now there is a single thing that
happens more than once. There it is again. “It” has happened
again. It is happening, again and again. Now it has repetition, duration
through time. And now it has stopped: I am remembering it, but no longer
sensing it.
And now there it is again. It is happening. And now it has stopped again.
It is not happening. And now it is happening. Not happening. Happening.
Now it is happening periodically: the happenings are happening in groups
and the groups become a series of “its” and something new
appears: the alternation through time, the frequency of
occurrance.
Then something else happens, different from the first thing.
Sometimes B happens, sometimes A happens. But if A stops happening, we
discover eventually, B stops happening, too. And when A starts up again,
so does B. It is not the case, we decide, that A is happening and B is
not happening, and if B is happening, A must have happened first. A is
causing B.
So knowledge gradually develops. At this point, A and B are simply
different. They have no meaning other than themselves, and other than the
fact that each one is not the other one.
We can now start to do statistics. Statistics is not about meanings; it
is about occurrances of anything. It doesn’t matter WHAT is occurring;
all that matters is THAT it occurred, and that it-1 is different from
it-2. No other relationship between the it-s matters. There is no
question of “plausibility.” We can count the total number of
its and also the number of times each it occurs. Whatever the ratio of
the two numbers is, we can say we expect future occurrances to be in the
same ratio, according to the latest observations. If they occur in the
ratio of 101 to 303, we can say we expect occurrance A to happen 1/3 as
often as occurance B, or for more occurrances, we expect them to occur in
the ratios nA:nB:nC and so on. Without any further kind of distinction,
that is all we can say; the occurrances will happen in that ratio, but we
can’t say which will happen next because no concept of a repeatable
temporal sequence has yet been invented.
Next, we observe that A and B occur as follows:
ABBAABAABBBAABAABBBBBAAABABBAAA.
That doesn’t provide any new knowledge, because we still can’t say what
will happen next, A or B. But we may come to realize that this is
happening:
ABBAABAABBBAABAABBBBBAAABABBAAA.
ABBAABAABBBAABAABBBBBAAABABBAAA.
ABBAABAABBBABAAABBBBBAAABABBAAA.
ABBAABAABBBAABAABBBBBAAABABBAAA.
ABBAABAABBBAABAABBBBBAAABABBAAA.
Now we are back to the start: something happens. Something happens.
Something happens. Eventually, the same thing is happening again and
again.
Now it’s a bigger thing, but it’s still a thing.

Notice, however, that this thing has parts, and the parts are the same in
very occurrance. If we see ABBAAB we now know what the rest of this thing
will be. No statistics are involved. This thing occurs, or it does not,
and there is nothing in between. We now have a logical variable. We have
a pattern.

It’s interesting that we have statistics made of logical variables whose
states are not statistical. It’s also interesting that we can see
repetitions of the same logical variables when they are not, in fact, the
same logical variables (see the third repetition above). When we see
enough of the pattern, we don’t have to see it all to make up our minds
what it is – that is, to cease to feel uncertain about what it
is.

As soon as we see a pattern, statistics is no longer necessary. The
elements of the pattern tell us which pattern we’re looking at, and we
don’t need to see what patterns came before it or after it. If we see N
elements of the temporal pattern, we KNOW what element N+1 will be; no
probability is calculated and none is needed. That’s the next level of
knowlege. In the above example, note that when we see the repeating
sequence of elements, we still don’t know whether the next sequence will
be the same one, but within the parts of each sequence, we do know. Or
think we do. Now we are using the distinction between A and B to generate
meaning: AB is different from BA, whereas AA is not different from AA. If
we see A we know that means we are not seeing B.

So statistics is used when we are trying to find regularities in
ocurrances of elements that are meaningless in themselves and are
unrelated to any other elements along any dimension except occurrance.
This is why statistical equations do not have to say what real-world
variables are indicated by the symbols in the equations. It doesn’t
matter what the real-world variables are, because their properties (other
than existence) are irrelevant.

This tells us something about information theory. Information theory is
cast in terms of probabilities. The amount of information in a message
can be calculated, given the possible number of different messages,
without knowing what any of the messages means. This tells us that
whatever information is in the formal sense, it is not information in the
common-language sense; whatever the technical meaning of message is, is
it not what we normally mean: that is, it is not ABOUT anything. We can
calculate the information in a string of gibberish just as readily as the
information in a poem. That calculation will not reveal the difference in
meaning content. This suggests that perhaps the choice of the term
information in information theory was unfortunate. It is certainly not
concerned with what we normally think of as information, which is
meaning. There is no reason to assume that a message high in information
content has any meaning, nor that a message with only 1 bit of
information in it has a minimal amount of meaning (one if by day, two if
by night).

This brings us to the next level of knowledge, which is knowledge
specific to meanings. I haven’t got so far into this, but perhaps others
can carry it on from here. Where we end up at this level is with
theories: organized patterns of meanings which purport to explain the
temporal sequences and other kinds of relationships among things that we
observe. The simplest sort of theory is “What has happened before
will happen again,” and the most advanced, perhaps, consists of
models that simulate an unobservable reality to produce happenings that
can be checked against experience. Theories are partly statistical, and
models or simulations are not statistical at all. A model behaves only
and exactly as it is organized to behave, and can never do anything
different from that unless you specifically include a generator of
randomness in it – and then you will still be specifying exactly what it
affects.

With that, the flow trickles to a stop. Time for coffee, a shower, and a
trip to pick up my granddaughter to take her to Lafayette, Colorado’s
Oatmeal Festival, at which people sit or stand around eating oatmeal with
any of a hundred or so different toppings. A nice thing to do in January.
My daughter Allie will be there at the recreation center demonstrating
her Bowen Body Therapy skills, which are sort of like a physiological
Method of Levels, gently directing attention here and there by tweaking
reflexes and apparently facilitating physiological reorganization in the
right spots.

Best,

Bill P.

···

The desiderata are:

  1. Degrees of plausibility are represented by real numbers.
  2. There is a qualitative correspondence with common sense.
    3a. If a conclusion can be reached in more than one way, then every
    possible way must lead to the same result.
    3b. The robot always takes into account all of the evidence it has
    relevant to a question. It does not arbitrarily ignore some of the
    information, basing its conclusions only on what remains. In other words,
    the robot is completely non-ideological.
    3c. The robot always represents equivalent states of knowledge by
    equivalent plausibility assignments. That is, if in two problems the
    robot’s state of knowledge is the same (except perhaps for the labelling
    of the propositions), then it must assign the same plausibilities in
    both.
    The question is whether you subscribe to these desiderata?
    I�m familiar
    with some of Jaynes� earlier work, and, as Bayesian theorists go, I sort
    of like him, maybe because he�s an engineer. I have trouble with
    Desideratum #1, as I indicated in my previous message. All of us
    can assign such numbers easily enough, but whether those operations mean
    anything of scientific value is not so easily established. Many
    theorists, like R. T. Cox in his Algebra of Probable Inference(1961), have put forward sets of criteria for plausible inference;
    and they virtually always build in additivity and the multiplicative
    rule, guaranteeing that they won�t break any interesting new
    ground. I found Polya�s Mathematics and Plausible Reasoning
    mildly interesting, but note that he didn�t attempt to quantify
    anything.

[MA] A probability of .5 could mean either fairly substantial
support for a proposition or complete ignorance.

You go on to
say that the ability to discriminate these two cases does matter, but
that�s something (additive) probability theory can�t do.

I believe I said that discrimination between these two cases was not
representable by a single scalar number at the same time as that number
represents the subjective probability (or plausibility) of the competing
hypotheses. One number can’t represent two independent variables. You
agreed with this a couple of paragraphs later, so I’m at a loss to
understand wherein we differ here.

We agree that
probability theory can�t discriminate these two cases. We differ
only in that I regard that as a problem for probability theory.

In my language, what Sherlock
Holmes does is often largely statistical inference. He observes a datum,
and has noted that on many occasions this datum is accompanied by another
(example, Holmes observation that Watson will not(?) invest in gold
shares, based on billiard chalk on Watson’s hand, and the knowledge that
Watson usually plays billiards with a certain opponent). He may well also
have a model that reinforces this relationship. From several such data,
Holmes infers that the person is of this or that character, or has done
thus and so. Is that the kind of statistical inference concept against
which you argue, because if it is, then I disagree with you.

I actually
don�t see the statistics. Can you tell whether Holmes was using
Laplace�s Rule of Succession? Or Bayes� Theorem?

I’m not clear how you can argue against a concept. I can see how you
could argue that it is misunderstood, misapplied, or is in principle of
no value. But if a person has a concept, how can you argue that it
doesn’t exist? It’s a perception, and I don’t think you can argue that
any perception someone has is invalid. As Bill P. from time to time
reminds us, our perceptions are the only contact we have with whatever
“real world” might be out there. Perceptions are the ultimate
facts, and concepts are perceptions. If you accept that, then it cannot
be the concept of statistical inference against which you argue, but its
use.

It�s true,
all of us have concepts of God, phlogiston, unicorns, and round squares
(Meinong evidently believed in a sort of metaphysical underworld
populated by such entities); and nothing I could say would make those
concepts disappear, nor is that particularly my wish. What I�m
saying about statistical inference bears some similarity to what Thomas
Szasz says about mental illness (not that I want to get into debating
Szasz�s views), when he says it is a myth. Like most thoroughly
educated people, you believe in statistical inference, deeply enough I�ll
bet you can�t imagine (nondeductive) inference being nonstatistical,
deeply enough that you didn�t even notice that I was challenging that
belief. I will be quite surprised if we come to agreement on that
point.

Mike

No virus found in this incoming message.

Checked by AVG -
http://www.avg.com

Version: 8.0.176 / Virus Database: 270.10.5/1885 - Release Date: 1/9/2009
7:59 PM

[Martin Taylor 2009.01.10.11.00]

[From Bill Powers (2009./01.10.0508 MST)]

Mike Acree
(2009.01.09.2219 PST)]

[Martin Taylor

2009.01.08.10.59]–

A fundamental question: E.T.

Jaynes
(Probability Theory: the logic of science, Cambridge U.P, 2003) lists a
set of desiderata that should apply to any measure of plausibility, and
goes on to show that logically these desiderata lead to the standard
(but
not frequentist or objective) measure of probability. Jaynes is arguing
strongly for a Bayesian approach, and by this point in the book has
introduced a hypothetical robot that discovers the plausibility of a
hypothesis based on available data.

I think you guys are on the verge of something, but we have to work out
what it is. Partly because of just mulling over this whole subject for
a
long time, my brain has started involuntarily putting ideas together.
This morning I awoke with some amazing series of thoughts, of which I
hope a glimmer will still remain now that I’m back from dreamland. See
what you make of this.

Thanks for this. It’s really good to have someone come at it from a
different viewpoint. It’s a useful disturbance.

First, the Bayesian approach. The probability of A is the probability
of
B given C. Isn’t this just extending two-variable probabilities to
three
dimensions? Considering A, B, and C together, we have eight possible
combinations of true and false, each one of which can be written as a
Bayesian conditional probability – can’t it?

Yes and no. I think I covered both in Episode 3 [Martin Taylor
2009.01.06.09.56], so I’ll be brief here.

First “No”. C, as a conditional, has no degree of belief. For the
purpose of figuring out whether B follows from A, C is just one of
possibly many conceivable background conditions.

You are interested in astrophysics, so I will use as an example the
calculations cosmologists do when they figure out what the Universe
would be like if one of the fundamental constants had a different value
from the value they measure. Nobody believes there is any possibility
that the constant might take on or have had in the past a different
value. It’s a conditional for the rest of the computation. The same is
true for conditionals in a Bayesian analysis.

Going back to Sherlock Holmes: Imagine Sherlock finding a footprint
outside a window. He has two hypotheses: (1) the print was made
innocently the previous day, (2) it was the print of an intruder in the
early morning. Conditional (a): it rained heavily in the late evening
until midnight – result, hypothesis 2 is much the likeliest;
Conditional (b): last night was dry – result, the two hypotheses are
indistinguishable; Conditional (c): there was light drizzle through the
night – result, hypothesis 2 is more likely than hypothesis 1, but
hypothesis 1 is still reasonably possible.

Now “Yes”. Sherlock doesn’t know how much rain there was and when it
fell. He observes other evidence of rain such as wet leaves (which
might be dew) or asks a local. Perhaps he makes a test footprint and
compares its crispness with that of the questionable footprint. These
are evidence that allows him to test hypotheses about C. At that point,
Sherlock does have a two-dimensional set of hypotheses {H1&Ca;
H1&Cb; H1&Cc; H2&Ca; H2&Cb; H2&Cc}.

To assess probabilities associated with those joint hypotheses Sherlock
has a whole lifetime of conditionals in which he believes, or at least
is willing to take as given for the purposes of this investigation. For
example consider datum “wet leaves”: Hypothesis Cb is rather
discredited, while Hypotheses Ca and Cc remain viable. Add datum
“puddles”, and Hypothesis Ca becomes more likely than Cc. Add datum
“Locals say it didn’t rain much overnight”, and Cc regains some
credibility relative to Ca. These, then can be used to influence the
relative probability of H1 and H2, since P(H&C) = P(H|C)P(C).

Ah, but now Sherlock, thinks of a new Hypothesis (3): Someone in the
house is faking the presence of an intruder by making the footprint.
With conditionals Ca, Cb, and Cc, H3 cannot be distinguished from H1 or
H2, but under a new conditional Cd “Somebody sprayed water in the area
to make it look as though the footprint was made in the early morning”,
H3 would become appreciably more credible than either H1 or H2.

So now Sherlock looks for evidence as to whether the “wet leaves” and
“puddles” data affect the credibility of the conditionals, making them
into hypotheses. He notes that leaves outside this area are not wet,
and there are few puddles elsewhere. His mental models of rain make
that situation unlikely if Ca or Cc were true, but those data are
consistent with Cb & Cd. Now he can go back to the original
question, and finds P(H3|Cb,Cd) substantially exceeds P(H1|Cb,Cd) and
P(H2|Cb,Cd), whereas P(H2|Ca) >> P(H1 or H3|Ca) and P(H1|Cc) ~=
P(H2|Cc) ~= P(H3|Cc). It becomes most important to Sherlock to discover
which of the conditionals is most likely to be true.

The point of “Yes and No” is that when a conditional is used AS a
conditional, it is taken to be absolutely true. If it is true, it leads
to certain conclusions about the hypotheses given the data. If it is
not true, then all bets are off about the hypotheses, just as in any
other "If A then B"statement.

When there is a question about whether some conditional C is true, then
that has to be taken into account in determining how strongly to
believe one hypothesis or another, given only the other background
conditionals. If there is a question, then it is always possible to
make a new test with the conditional not-C, though the universe of
not-C is often so broad as to make the new test rather indiscriminate.
In Sherlock’s case, for example, if the “wet leaves” were NOT wet with
water, what could they have been wet with, and how would that affect
Sherlock’s conclusions? Quite possibly it would leave all the three
hypotheses equally credible.

So why not
generalize? I suppose that’s already been done. Bayesian probability is
just the first step toward multidimensional probability. Maybe.

I don’t think it is “multidimensional probability” so much as scalar
probability determined over a multidimensional space of hypotheses. You
can, of course, have a vector of probabilities, such as “P(rain this
afternoon, P(rain this evening), P(rain tonight), P(rain tomorrow
morning)…” but that’s not quite the same thing as a multidimensional
probabiity in the sense you suggest.

The main idea this morning was very simple: degrees of knowledge. What
is
probability? Mike, you can perhaps put this in terms of kinds of
probability (apostolic? No, that’t not it), but to me it’s simply a
question of what we think we know by various means.

Isn’t that what all probability is?

When we know nothing, we simply have faith: we decide what we want to
believe, and make it true. We believe it as long as doing so is
pleasing
to us, or more pleasing than believing it’s not true… I’d call this
the
lowest degree of knowledge. At least it’s based on something that
matters
to us: intrinsic error, or lack of it. It’s not completely
arbitrary.

The next degree of knowledge is the bare detection of a regularity.
Something happens. Something happens. Something happens – ah, the
same thing
happens again.

Here is a BIG issue. What is “the same thing”?

“The same thing” is a categoric perception, is it not? When you have a
category-level perception, you can’t say much about the values of the
lower-level perceptions that contribute to the category, except that
they fall within whatever ranges are appropriate for the category. For
category “bird”, you can’t even say that “ability to fly” is a part of
the perception, for example – or maybe you can, for your personal
perception of the category “bird”. What is “the same thing” to you may
well not be “the same thing” to me. For each of us two instances of
“the same thing” at the category level are not necessarily the same at
the lower levels of perception.

But suppose it were (asserting a conditional): then you are beginning
to arrive at the development of statistical evidence for a probability
estimate. If you have a model that suggests how this “same thing” comes
to occur, that also is evidence (non-statistical, most probably) toward
your probability estimate.

Now there is a single thing that
happens more than once. There it is again. “It” has happened
again. It is happening, again and again. Now it has repetition,
duration
through time. And now it has stopped: I am remembering it, but no
longer
sensing it.

And now there it is again. It is happening. And now it has stopped
again.
It is not happening. And now it is happening. Not happening. Happening.
Now it is happening periodically: the happenings are happening in
groups
and the groups become a series of “its” and something new
appears: the alternation through time, the frequency of
occurrance.

A new category-level perception.

Then something else happens, different from the first thing.
Sometimes B happens, sometimes A happens. But if A stops happening, we
discover eventually, B stops happening, too. And when A starts up
again,
so does B. It is not the case, we decide, that A is happening and B is
not happening, and if B is happening, A must have happened first. A is
causing B.

A logic-level (program level?) perception – the beginning of a model.

So knowledge gradually develops. At this point, A and B are simply
different. They have no meaning other than themselves, and other than
the
fact that each one is not the other one.

We can now start to do statistics. Statistics is not about meanings; it
is about occurrances of anything. It doesn’t matter WHAT is occurring;
all that matters is THAT it occurred, and that it-1 is different from
it-2. No other relationship between the it-s matters. There is no
question of “plausibility.”

True. “Plausibility” enters here only (so far as I can see) in whether
it-1 did occur when the lower-level perceptions are near the edges of
the ranges that are appropriate for the category it-1. Once you have
perceived it-1, you have perceived it, and not it-2. The decision has
been made.

We can count the total number of
its and also the number of times each it occurs. Whatever the ratio of
the two numbers is, we can say we expect future occurrances to be in
the
same ratio, according to the latest observations. If they occur in the
ratio of 101 to 303, we can say we expect occurrance A to happen 1/3 as
often as occurance B, or for more occurrances, we expect them to occur
in
the ratios nA:nB:nC and so on. Without any further kind of distinction,
that is all we can say; the occurrances will happen in that ratio, but
we
can’t say which will happen next because no concept of a repeatable
temporal sequence has yet been invented.

Yes, with the caveat that we would expect them to occur in something
near the same proportion in the long run, not exactly the same
proportion, and that we might be prepared to put a 3 to 1 bet on the
next event being A, both being expressions of subjective probability,
based in this case only on statistical evidence.

Next, we observe that A and B occur as follows:
ABBAABAABBBAABAABBBBBAAABABBAAA.

That doesn’t provide any new knowledge, because we still can’t say what
will happen next, A or B. But we may come to realize that this is
happening:

ABBAABAABBBAABAABBBBBAAABABBAAA.
ABBAABAABBBAABAABBBBBAAABABBAAA.
ABBAABAABBBABAAABBBBBAAABABBAAA.
ABBAABAABBBAABAABBBBBAAABABBAAA.
ABBAABAABBBAABAABBBBBAAABABBAAA.

Now we are back to the start: something happens. Something happens.
Something happens. Eventually, the same thing is happening again and
again.
Now it’s a bigger thing, but it’s still a thing.

It’s another category, with variation in the perceptions that
contribute to it, themselves being categories. This is a sequence
category! Maybe, to me, or to Sherlock, the displaced B in the third
one is a critical clue to this being a different thing, not the same
thing as the others. Why did Mr Arbuthnot catch the 7:30 that morning
rather than the usual 8:10?

Notice, however, that this thing has parts, and the parts are the same
in
very occurrance. If we see ABBAAB we now know what the rest of this
thing
will be.

No we don’t. It’s another “White Swan”. You know what the rest will be
if it actually is another of “the same thing”, and if “the same thing”
category is tightly defined to allow only exact repetition. What you
may perceive is another instance of “the same thing coming up” In your
example, you don’t know whether what follows is going to be AABBBAAB or
AABBBABA, even if you have already perceived the category
ABBAABAABBBAABAABBBBBAAABABBAAA.

No statistics are involved. This thing occurs, or it does
not,
and there is nothing in between.

The latter, agreed, is true of categories. One perceives that an
instance of a category occurs, or not. But this does not follow from,
or lead to “No statistics are involved”. In a “White Swan” situation
where the perceptions are sequences, the more often “the same” start
has led to “the same” continuation, the more reasonable is the
development of the category perception of the whole once the start has
been perceived (at lower perceptual levels). You certainly wouldn’t
perceive that category the first time that sequence had occurred. The
second time, if the start were distinct enough from other patterns you
have seen, you might say “I’ve seen this before”, much as one does the
second time one hears a new piece of music. A category perception is a
decision. In the Bayesian sense, one perceives a category when the
evidence for it exceeds the evidence for a different category and for
“not a category I’ve seen” by a sufficient margin.

We now have a logical variable. We have
a pattern.

It’s interesting that we have statistics made of logical variables
whose
states are not statistical. It’s also interesting that we can see
repetitions of the same logical variables when they are not, in fact,
the
same logical variables (see the third repetition above). When we see
enough of the pattern, we don’t have to see it all to make up our minds
what it is – that is, to cease to feel uncertain about what it
is.

True, except for the “states are not statistical”. They are the results
of decisions (category perceptions) based on statistical history.

As soon as we see a pattern, statistics is no longer necessary.

True, the decision has been made when the category has been perceived.

The
elements of the pattern tell us which pattern we’re looking at, and we
don’t need to see what patterns came before it or after it. If we see N
elements of the temporal pattern, we KNOW what element N+1 will be;

There’s no possibility ABBAABAABBBAA might be followed by CZVXY? None
at all? Zero?

no
probability is calculated and none is needed.

Are you sure? I don’t imagine anything like actual Bayesian analysis is
done during the development of the category perception. Nevertheless,
the same conceptual structure can be applied.

At what point in the sequence did you perceive that the sequence was in
fact a member of the categor? Was it at “A”? How many sequence
categories are in your perceptual dictionary starting with “A”.
Considering “Is category ABBAABAABBBAABAABBBBBAAABABBAAA present”, it
makes a difference if ABBAABAABBBAABAABBBBBAAABABBAAA is the only
category you have known to start with “A” or if you can perceive a few
dozen all of which start with “A”. You can’t usefully have all those
dozens of category detectors telling you at the same time that their
category is present, all with 100% certainty.

So maybe it was at “AB” or, as you said above, at “AABBBAAB”. Why would
the category perceptual input function wait just that long and no
longer before you perceived the existence of the category? Could it
possibly be that it took this long before the probability of that
category sufficiently exceeded the probability of the others and of “no
category I know”?

OK, backing off to the first “A”, Assuming you had several different
sequence categories that start with “A”, at least you have eliminated
the categories that don’t start with “A”. Or have you? Sometimes you
might make a mistake with a category. You perceive something as
belonging to one category, and later see it as belonging to another.
Taking this “A” to be a letter rather than a metaphor for an instance,
suppose the letters are handwritten capitals, and sometimes you mistake
an H for an A or vice-versa. Probabilistically, it’s more likely that
you perceive category “A” when the sequence is
ABBAABAABBBAABAABBBBBAAABABBAAA. but sometimes you might see
HBBAABAABBBAABAABBBBBAAABABBAAA instead. On such an occasion, do you,
or do you not, perceive the category that you label
“ABBAABAABBBAABAABBBBBAAABABBAAA”? Probably you do. If there aren’t
many categories that you have developed with the following sequence
BBAABAABBB, you are likely to perceive the sequence category
ABBAABAABBBAABAABBBBBAAABABBAAA despite having perceived the first
letter to be “H”.

It’s all probabilistic. You can’t guarantee that
HBBAABAABBBAABAABBBBBAAABABBAAA isn’t a category with a quite different
meaning than ABBAABAABBBAABAABBBBBAAABABBAAA, even if you do know that
sometimes you see “A” as “H” and what you thought was and “H” was
“really(!)” “A”.

That’s the next level of
knowlege. In the above example, note that when we see the repeating
sequence of elements, we still don’t know whether the next sequence
will
be the same one, but within the parts of each sequence, we do know. Or
think we do. Now we are using the distinction between A and B to
generate
meaning: AB is different from BA, whereas AA is not different from AA.
If
we see A we know that means we are not seeing B.

We come to a new concept “meaning”. That’s a really slippery one. I’m
sure we could have quite a long and inconclusive thread about it :slight_smile:

So statistics is used when we are trying to find regularities in
ocurrances of elements that are meaningless in themselves and are
unrelated to any other elements along any dimension except occurrance.
This is why statistical equations do not have to say what real-world
variables are indicated by the symbols in the equations. It doesn’t
matter what the real-world variables are, because their properties
(other
than existence) are irrelevant.

True.

This tells us something about information theory. Information theory is
cast in terms of probabilities. The amount of information in a message
can be calculated, given the possible number of different messages,
without knowing what any of the messages means. This tells us that
whatever information is in the formal sense, it is not information in
the
common-language sense; whatever the technical meaning of message is, is
it not what we normally mean: that is, it is not ABOUT anything.

That is what most people say. It’s something with which I profoundly
disagree, basing my disagreement directly on Shannon. It is too much of
a leap to say that because statistics, including information measures,
can be computed in the absence of meaning that therefore the use of
statistics (even frequentist, “objective” statistics) implies the
absence of meaning. Some men do not wear black hats, therefore a person
wearing a black hat is not a man, I suppose.

We can
calculate the information in a string of gibberish just as readily as
the
information in a poem.

Can you? That’s news to me. A reference would be nice.

That calculation will not reveal the difference in
meaning content.

On the other hand, the meaning content profoundly affects the
difference in information content, as does your background
understanding of the poet’s aesthetic tendencies. So, from the Bayesian
approach, the information does relate closely to the meaning. The
numerical value, of course, does not. That’s just a scalar number, so
it can’t be expected to.

This suggests that perhaps the choice of the term
information in information theory was unfortunate.

No, it was used because it refers precisely to how much meaning you can
get out of a message ABOUT something. That’s very close to the everyday
use of the term, at least as close as PCT “perception” is to everyday
“perception”.

It is certainly not
concerned with what we normally think of as information, which is
meaning.

Oh, I do love the way you use “certainly” in your messages to signal
assertions of which you seem to be unsure, but that you wish your
audience to think are not to be questioned.

The meaning in any particular circumstance depends on the
perceptual/conceptual structure affected by the message. The quantity
of meaning that could have been passed by the message to a particular
receiver cannot be determined from the message itself. What can be
determined independently of meaning is “channel capacity”, a limit on
how rapidly information about A could reach B through the channel.

There is no reason to assume that a message high in
information
content has any meaning, nor that a message with only 1 bit of
information in it has a minimal amount of meaning (one if by day, two
if
by night).

That is true. There are four possibilities:

(1) High information content, much meaning

(2) High information content, little meaning

(3) Low information content, much meaning

(4) Low information content, little meaning

You correctly assert that possibilities 2 and 3 cannot be dismissed
reasonably. But then nether can 1 and 4. All remain possible. And
reasonable.

This brings us to the next level of knowledge, which is knowledge
specific to meanings.

In my way of looking at it, that’s all that “knowledge” can be. But
then, as I said earlier, the concept of “meaning” is abominably
slippery.

I haven’t got so far into this, but perhaps others
can carry it on from here. Where we end up at this level is with
theories: organized patterns of meanings which purport to explain the
temporal sequences and other kinds of relationships among things that
we
observe. The simplest sort of theory is “What has happened before
will happen again,” and the most advanced, perhaps, consists of
models that simulate an unobservable reality to produce happenings that
can be checked against experience. Theories are partly statistical, and
models or simulations are not statistical at all. A model behaves only
and exactly as it is organized to behave, and can never do anything
different from that unless you specifically include a generator of
randomness in it – and then you will still be specifying exactly what
it
affects.

You can specify precisely what a model will do only if you can also
specify its inputs precisely. If you can specify its inputs, no
considerations of probability or information can be applied to its
behaviour. If, however, it is to behave in an unpredictable world
(simulated or otherwise), then its actions become probabilistic, and
information-theoretic approaches are viable. This applies most
specifically to control systems that have no advance knowledge of
either their reference values or the disturbances to their perceptions.
For example, any limit on the capacity of the channel from the senses
to the perceptual input function affect the quality of control for
high-bandwidth disturbances.

I would like to suggest another meaning of “meaning”, which is “a
change in the value of a controlled perception”. This may not seem
reasonable on the face of it, but I think it can be argued. “Meaning”
is, to me those aspects of the world that influence how your actions
can influence your perceptions. Changes that happen in the world that
have no relation to controlled perceptions may yet be perceived, but do
they have any meaning for you? What is the meaning of sunrise, even a
beautiful one, if it does not affect your actions (meaning that it does
not disturb your controlled perceptions)?

Thanks for the disturbance!

Martin

[From Bill Powers (2009.01.10.1330 MST)]
Martin Taylor 2009.01.10.11.00]

Yes and no. I think I covered both in Episode 3 [Martin Taylor 2009.01.06.09.56], so I'll be brief here.

First "No". C, as a conditional, has no degree of belief. For the purpose of figuring out whether B follows from A, C is just one of possibly many conceivable background conditions.

When I say "B, given C", I do not mean just "B, given that C exists as a proposition." I mean specifically that the logical value of the proposition C is to be taken as true. I think you say the same thing. Similarly for B. We are describing the probability that B is true given that C is true. That translates into the logical expression B AND C. It looks to me as if the Bayesian expression p(B|C) is identically p(B AND C), which is p(B)*p(C). when you say "given C" you are implicitly saying what the probability of C is, and that this is a positive instance of it. If you don't know the probability of C, you can't compute the probability of (B|C).

..........................

On consideration, I've dumped most of what I wrote after this. Much too long-winded and full of detours. Just two quick points:

You can specify precisely what a model will do only if you can also specify its inputs precisely.

That's not how I specify precisely what a model will do. I specify it so we can compute its behavior for ANY pattern of inputs. You tell me what the input pattern will be, I'll predict what the output pattern will be. Some models will produce output patterns without any inputs. A model with a random noise generator in it will, of course, produce a range of outputs given any known input. That can be predicted, too, since we know what is affected by the random noise. If the noise is pre-recorded, we can predict the exact behavior at all times.

I would like to suggest another meaning of "meaning", which is "a change in the value of a controlled perception". This may not seem reasonable on the face of it, but I think it can be argued. "Meaning" is, to me those aspects of the world that influence how your actions can influence your perceptions. Changes that happen in the world that have no relation to controlled perceptions may yet be perceived, but do they have any meaning for you? What is the meaning of sunrise, even a beautiful one, if it does not affect your actions (meaning that it does not disturb your controlled perceptions)?

That term "influence your actions" is much too ambiguous to suit me. It would fit a stimulus-response model, an operant-conditioning model, or a model in which I say that a sunset is beautiful in order to keep you from saying that I lack aesthetic appreciation of nature. The trouble with trying to come up with general statements is that you have an application in mind, but the very generality allows for interpretations that don't fit the application you mean.

My proposal for a definition of meaning is simply "the perceptions indicated by the symbol, as we have learned to interpret the symbol." So the meaning of "red" is the perceived color I imagine when I hear or read that word. The word "system" indicates somewhat more complex perceptions, which are harder to explain than the meaning of red.

Best,

Bill P.

It looks to me as if the Bayesian expression p(B|C) is
identically p(B AND C), which is p(B)*p(C).

Excuse me for interrupting, but it seems that it's time to spell out some
fundamentals of Bayes' Law ...

The joint probability of A and B is:-
  p(A&B)

The joint probability is related to the conditional probability in the
following way:-
  p(A&B) = p(A|B).p(B)
... or
  p(A&B) = p(B|A).p(A)

So, this means that:-
  p(A|B).p(B) = p(B|A).p(A)
... or
  p(A|B) = p(B|A).p(A)/p(B)

This is Bayes' theorem.

Note this is a 'theorem' _not_ a 'theory' (i.e. it can be derived from basic
mathematical axioms; it is not a hypothesis).

For more than two items, it is possible to expand the joint probability into
a chain of conditional probabilities:-
  p(A&B&C) = p(A|B&C).p(B&C)/p(A)
... which further expands to:-
  p(A&B&C) = (p(A|B&C).p(B|C).p(C))/(p(A).p(B))

Hence it is possible to derive a set of functions for computing the
probability of a hypothesis given two sources of evidence, here is one:-
  p(H|E1&E2) = p(H&E1&E2)/p(E1&E2)

Bayes' theorem is particularly useful in pattern recognition because it
shows how to compute p(class|observations) from
p(observations|class).p(class)/p(observations) - where the terms in the
latter formulation are easily estimated from training data (because
p(observations|class) is a 'generative' or 'forward' model). Hence a Bayes
approach leads naturally to what is called 'analysis-by-synthesis'.

I hope this helps.

···

________________________________________________________________

Prof ROGER K MOORE BA(Hons) MSc PhD FIOA MIET

Chair of Spoken Language Processing
Speech and Hearing Research Group (SPandH)
Department of Computer Science, University of Sheffield,
Regent Court, 211 Portobello,
Sheffield, S1 4DP, UK

e-mail: r.k.moore@dcs.shef.ac.uk
web: http://www.dcs.shef.ac.uk/~roger/
tel: +44 (0) 11422 21807
fax: +44 (0) 11422 21810
mobile: +44 (0) 7910 073631

General Chair: INTERSPEECH-2009 http://www.interspeech2009.org/
________________________________________________________________

-----Original Message-----
From: Control Systems Group Network (CSGnet)
[mailto:CSGNET@LISTSERV.ILLINOIS.EDU] On Behalf Of Bill Powers
Sent: 10 January 2009 23:40
To: CSGNET@LISTSERV.ILLINOIS.EDU
Subject: Re: Statistics: what is it about?

[From Bill Powers (2009.01.10.1330 MST)]
Martin Taylor 2009.01.10.11.00]

>Yes and no. I think I covered both in Episode 3 [Martin Taylor
>2009.01.06.09.56], so I'll be brief here.
>
>First "No". C, as a conditional, has no degree of belief. For the
>purpose of figuring out whether B follows from A, C is just one of
>possibly many conceivable background conditions.

When I say "B, given C", I do not mean just "B, given that C exists
as a proposition." I mean specifically that the logical value of the
proposition C is to be taken as true. I think you say the same thing.
Similarly for B. We are describing the probability that B is true
given that C is true. That translates into the logical expression B
AND C. It looks to me as if the Bayesian expression p(B|C) is
identically p(B AND C), which is p(B)*p(C). when you say "given C"
you are implicitly saying what the probability of C is, and that this
is a positive instance of it. If you don't know the probability of C,
you can't compute the probability of (B|C).

..........................

On consideration, I've dumped most of what I wrote after this. Much
too long-winded and full of detours. Just two quick points:

>You can specify precisely what a model will do only if you can also
>specify its inputs precisely.

That's not how I specify precisely what a model will do. I specify it
so we can compute its behavior for ANY pattern of inputs. You tell me
what the input pattern will be, I'll predict what the output pattern
will be. Some models will produce output patterns without any inputs.
A model with a random noise generator in it will, of course, produce
a range of outputs given any known input. That can be predicted, too,
since we know what is affected by the random noise. If the noise is
pre-recorded, we can predict the exact behavior at all times.

>I would like to suggest another meaning of "meaning", which is "a
>change in the value of a controlled perception". This may not seem
>reasonable on the face of it, but I think it can be argued.
>"Meaning" is, to me those aspects of the world that influence how
>your actions can influence your perceptions. Changes that happen in
>the world that have no relation to controlled perceptions may yet be
>perceived, but do they have any meaning for you? What is the meaning
>of sunrise, even a beautiful one, if it does not affect your actions
>(meaning that it does not disturb your controlled perceptions)?

That term "influence your actions" is much too ambiguous to suit me.
It would fit a stimulus-response model, an operant-conditioning
model, or a model in which I say that a sunset is beautiful in order
to keep you from saying that I lack aesthetic appreciation of nature.
The trouble with trying to come up with general statements is that
you have an application in mind, but the very generality allows for
interpretations that don't fit the application you mean.

My proposal for a definition of meaning is simply "the perceptions
indicated by the symbol, as we have learned to interpret the symbol."
So the meaning of "red" is the perceived color I imagine when I hear
or read that word. The word "system" indicates somewhat more complex
perceptions, which are harder to explain than the meaning of red.

Best,

Bill P.

[From Mike Acree (2009.01.10.1851 PST)]

[Martin Taylor 2009.01.10.01.40–

[MA] A probability of .5 could mean
either fairly substantial support for a proposition or complete ignorance.

You go on to say that the ability to discriminate
these two cases does matter, but that’s something (additive) probability
theory can’t do.

No, because it’s on a different dimension. I come back to the idea that one
number can’t simultaneously represent two concepts.

I’m not at all clear WHY probability theory should be applied to the degree of
interest people have in the truth of a hypothesis, or to measuring the quantity
of evidence that has been addressed to the question.

If you have a thermometer where a reading
of, say, 50 degrees means either a certain temperature or a certain humidity, I
would say you need a new thermometer. You say temperature and humidity
are two different concepts, so there’s no problem, and why would anybody
want to measure humidity with a thermometer anyway.

You said:

In my language, what
Sherlock Holmes does is often largely statistical inference.

I said:

I actually don’t see the statistics.

You replied that you don’t know what
goes on in Holmes’ head, but you’re sure it’s statistical.

My best guess is that you’re making
the same elision as Cowles, who started this thread: All uncertain
inference is probabilistic, therefore it’s statistical. I can agree
that it’s probabilistic in the qualitative, epistemic sense; but it is
only since the 18th century that anybody would have said it was
statistical. My willingness to bet that the next queen of England will be
named Martha doesn’t make my reasoning statistical. There are no
aleatory probabilities to be computed, formally or informally.

Mike