I forwarded Bill Powers' initial post which focused on the correlations between two tests of depression to a statistician I know. He's been traveling abroad and only just returned and got to it. Here's his response...

The relevance to the psychological issue is limited. First conditions like depression are diagnosed (in the simplest setting) by having the individual fill out an inventory (e.g. Beck Depression Inventory). The developer has selected a cut-score such that scores above it are taken as an indication of depression. So treating depression as a dichotomous outcome is based on subjective choice of the cut-score. Now if you have another test, there is no reason to expect that its results would be independent of those of the first test. In fact that is counter-intuitive. So one would have to specify two joint distributions for the two tests: one for when the individual is "really" depressed and one when he is not.

Actually it is more complicated if we take into account the actual score on the inventory.
For example, suppose we have 100 individuals. Depending on the distribution of their scores on the inventory and the joint distributions of the two tests for each point on the scale, you can get a great variety of probabilities of agreement.

That said, it is certainly the case that errors of measurement and relationships among tests make inferences about treatment efficacy less powerful than we would like.

My take on what he's saying is that Bill's technical formulation of the problem was incomplete but that Bill's basic concern is well-founded.

I'm going to see if I can persuade him to read Richard Kennaway's paper next.

