An astrologer among the PCTers

[From Bill Powers (971120.1943 MST)]

Kennaway, Taylor, and Abbott (971120) --

I'm attaching the code for the program that generates the data and analysis
for the article in ABS, 'random.pas'. Also included is 'stats.pas', a unit
that calculates the correlations. The program, adapted as a demo, is more
or less self-explanatory.

This is not really a "third variable" effect; it's more a "wrong model"
effect. What I did was to set up a situation in which people could earn
"rewards" over a range of wages by putting out effort. "Costs" varied
randomly for everyone, subtracting random amounts from the produced reward.

The basic (naive) analysis is done as if effort is caused to increase by
the amount of obtained reward. That is, the more reward there is, the more
effort one would expect if reward causes effort. The straightforward S-R
analysis would simply assume a linear relation between cause and effect,
and calculate the correlation and regression line relating effort to reward
for the population. The result is, as the S-R hypothesis would predict,
that greater effort results from greater reward. The unpredictable "cost"
factor adds noise to the data.

Unbeknowst to the analyst, however, the individuals in this population are
all control systems rather than S-R systems. Instead of the effort output
being controlled by the reward input, the reward input is controlled by the
effort output. Each individual wants a different specific amount of reward,
and will put out effort proportional to the shortfall of received reward
below desired reward.

The random cost subtracted from the received reward represents
noncontrolled variables such as fatigue, variations in working conditions,
and everything else that can randomly contaminate the observed relationship.

Because the individuals are control systems, an increase in cost reduces
the reward, increases the error, and results in an increase in behavior.
Thus there is a strong negative relationship between reward and effort for
each individual.

I used the labels "effort", "reward", and "cost" for the variables in the
simulation, but just by changing the labels we could show that this
phenomenon would occur for any IV-DV relationship if the "DV" is really the
output of a control system controlling an effect of the "IV". The
population measures will show a positive relationship between IV and DV;
the control system equations will show a negative relationship for every
individual.

One of the basic assumptions in IV-DV analysis is that by selecting many
subjects randomly from within a large population, noncontrolled influences
on the behavior will tend to average out. But in this case, one of those
random influences, the reference level for reward, is directly responsible
for the false appearance of a positive relationship between IV and DV. The
control systems with high reference levels for reward will put out a lot of
effort, since that is required to satisfy the reference condition, while
those with low reference levels for reward will put out less effort, less
effort being required to achieve a lower reference condition of reward. In
the plot of effort versus reward, the subjects naturally sort themselves
along the effort dimension, the systems with higher reference levels
appearing farthest to the right in the plot.

I think some of the criticisms of my references to this simulation have
been based on a misunderstanding of what it is. I hope this clears up those
problems.

Best,

Bill P.

program randm;
uses dos,crt,graph,grutils,stats;

{demonstrates appearance of a plot between random variables with a
  systematic component, computes correlation, regression, and percent
  standard error of estimate from regression line. Then does same thing
  for subjects who want same number of units of reward. }

label exrand;

const maxdata = 3999;

var v1,v2,ref,c1,c2: array[0..maxdata] of integer;
     c,r1,r2,e: real;
     i,j,r0,origin,passes,vloc,reflev,firstpass: integer;
     b,k,d,effort,reward: real;
     n,n1,n2: string[60];
     ch: char;
     hsize,vsize,hcenter,vcenter: integer;

procedure initscreen;
begin
initgraphics;
hsize := getmaxx; vsize := getmaxy;
hcenter := (hsize+1) div 2;
vcenter := (vsize + 1) div 2;
end;

begin
initscreen;
restorecrtmode; clrscr;
  gotoxy(1,2); write(
' DEMONSTRATION OF DIFFERENCE BETWEEN MASS AND INDIVIDUAL MEASURES.');
  gotoxy(1,4); write(
' First a plot of 4000 control systems is shown. Each system wants a');
gotoxy(1,5); write(
'randomly-selected amount of reward (100 .. 300 units). Each system has a');
   gotoxy(1,6);
if graphdriver = CGA then
   write(
'randomly-selected cost (0..40 units) and "wage" (2.5 to 6.0 units per')
else write(
'randomly-selected cost (0..40 units) and "wage" (1.5 to 5.0 units per');
    gotoxy(1,7); write(
'unit of effort). After plot, "q" quits, space to go on.');
    gotoxy(1,8); write(
'After a space, the program then plots points for all systems that want the');
    gotoxy(1,9); write(
'same amount of reward (picked at random). Again, "q" to quit,');
    gotoxy(1,10); write(
'or space to plot another subgroup.');
    gotoxy(1,16); write(
' SPACE TO START PROGRAM');
ch := readkey;
if ch in ['q','Q'] then exit;
clrscr; gotoxy(17,12);
write('CREATING 4000 SAMPLES OF CONTROL SYSTEM BEHAVIOR');
origin := (hsize - 300) div 2; randomize;
firstpass := 0;
for i := 0 to maxdata do
begin
  if graphdriver = CGA then
  b := 2.5 + 3.5 * random
  else b := 1.5 + 3.5 * random;
  k := 5.0;
  d := -random(40);
  r0 := 100 + random(200);

  effort := k * (r0 - d)/ (1.0 + k * b);
  reward := (b * k * r0 + d) / (1.0 + k * b);
  v2[i] := round(effort);
  v1[i] := round(reward);
  ref[i] := r0;
end;
clrscr;
setgraphmode(graphmode); clearviewport;
vloc := (3 * vsize) div 4;
repeat
j := 0; reflev := 100 + random(200);
for i := 0 to maxdata do
begin
  if ref[i] = reflev then
  begin
   c1[j] := v1[i];
   c2[j] := v2[i];
   inc(j);
  end;
end;
for passes := firstpass to 1 do
begin

clearviewport;
if passes = 0 then
for i := 0 to maxdata do
  putpixel(origin +v1[i],vloc - v2[i],white)
else
for i := 0 to j - 1 do
  putpixel(origin +c1[i],vloc - c2[i],white);

for i := 0 to 300 do putpixel(origin +i,vloc,white);
for i := 0 to vloc do putpixel(origin,i,white);
if passes = 0 then begin
correl('i',@v1,@v2,maxdata + 1);
c := corr;
end
else begin
correl('i',@c1,@c2,j);
c := corr;
end;
if passes = 0 then r1 := regression else r2 := regression;
if c > 1.0 then c := 1.0;
e := sigy*sqrt(1 - c * c);
if passes = 0 then
for i := 80 to 300 do
  putpixel(origin +i,
      vloc - round((ybar + (i - xbar) * r1)),white);
outtextxy(origin + 8,10,'EFFORT');
outtextxy(origin + 310, vloc - 4,'REWARD');
outtextxy(origin - 4,vloc - 50,'- 50');
outtextxy(origin - 4,vloc - 100,'- 100');
outtextxy(origin - 4,vloc - 150,'- 150');
outtextxy(origin - 4,vloc - 200,'- 200');
outtextxy(origin + 200,vloc - 3,'|');
outtextxy(origin + 188,vloc + 5,'200');

str(c:5:3,n); n := 'Correlation, E:R = ' + n;
outtextxy(origin + 65,vloc + 20,n);
if passes = 0 then str(maxdata + 1,n) else str(j,n);
n := 'n = ' + n;
outtextxy(origin + 126,vloc + 30,n);
if passes = 0 then
  begin
   outtextxy(origin + 128,vloc + 40,'FIG. 1');
   outtextxy(origin + 220, vloc - 20, ' 99 < ref < 301');
  end
else
begin
  str(reflev,n);
  outtextxy(origin + 128, vloc + 40, 'FIG. 2');
  outtextxy(origin + 220, vloc - 20, 'Ref = ' + n);
end;

if passes = 0 then
begin
  str(r1:5:2,n);
  str(xbar:5:2,n1);
  str(ybar:5:2,n2);
end
else
begin
  str(abs(r2):5:3,n);
  str(xbar:5:2,n1);
  str(ybar:5:2,n2);
end;

if passes = 0 then
n := 'E = ' + n2 + ' + ' + n + '(R' + ' - ' + n1 + ')'
else
n := 'E = ' + n2 + ' - ' + n + '(R' + ' - ' + n1 + ')';
outtextxy(origin + 100,10,n);
if passes = 1 then
begin
  str( (-ybar/r2 + xbar):5:1,n);
  n := 'For E = 0, R = ' + n;
  outtextxy(origin + 100,25,n);
end;
outtextxy(0,vsize - 10,'q:quit, space:more');

ch := readkey;
if ch in ['q','Q'] then goto exrand;
firstpass := 1;
end; { of second pass }
until ch in ['q','Q'];
exrand:
restorecrtmode;
closegraph;
end.
{$N+}
unit stats;
interface

var sx,sy,sx2,sy2,sxy,xbar,ybar,sigx,sigy,corr,regression,intercept: real;

procedure correl(ty: char; x,y: pointer ; datasize: integer);

implementation

type dataarraytype = array[0..2047] of integer;
     rdataarraytype = array[0..2047] of real;
     dataptrtype = ^dataarraytype;
     rdataptrtype = ^rdataarraytype;

procedure correl;
var n,u,v,w,z: real;
    i: integer;
    dxptr,dyptr: dataptrtype;
    rdxptr,rdyptr: rdataptrtype;
begin
if ty = 'i' then
begin
  dxptr := x; dyptr := y;
end
else
begin
  rdxptr := x; rdyptr := y;
end;
sx := 0.0; sy := 0.0;
n := datasize;
for i := 0 to datasize - 1 do
  begin
   if ty = 'i' then
   begin
    u := dxptr^[i]; v := dyptr^[i];
   end
   else
   begin
    u := rdxptr^[i]; v := rdyptr^[i];
   end;
   sx := sx + u; sy := sy + v;
  end;
xbar := sx/n; ybar := sy/n;
sx2 := 0.0; sy2 := 0.0; sxy := 0.0;
for i := 0 to datasize - 1 do
  begin
   if ty = 'i' then
   begin
    u := dxptr^[i]; v := dyptr^[i];
   end
   else
   begin
    u := rdxptr^[i]; v := rdyptr^[i];
   end;
   sx2 := sx2 + (u - xbar)*(u - xbar);
   sy2 := sy2 + (v - ybar)*(v - ybar);
   sxy := sxy + (u - xbar)*(v - ybar);
  end;
sigx :=sqrt(sx2/n);
sigy :=sqrt(sy2/n);
if (abs(sigx*sigy) > 0.0001) then
  z := sxy/(n * sigx * sigy)
else z := 0.0;
corr := z;
if abs(sigx) > 0.001 then regression := z * sigy/sigx
else regression := 9999.99;
intercept := ybar - regression*xbar;
end;

end.From ???@??? Fri Nov 21 09:57:26 1997
Return-Path: owner-csgnet@POSTOFFICE.CSO.UIUC.EDU
Received: from beasley.cisco.com (mailgate-sj-2.cisco.com [171.69.2.135]) by pilgrim.cisco.com (8.8.5-Cisco.1/8.6.5) with ESMTP id BAA19354 for <bnevin@pilgrim.cisco.com>; Fri, 21 Nov 1997 01:20:03 -0500 (EST)
Received: from proxy1.cisco.com (proxy1.cisco.com [192.31.7.88]) by beasley.cisco.com (8.8.4-Cisco.1/CISCO.GATE.1.1) with ESMTP id WAA07838 for <bnevin@CISCO.COM>; Thu, 20 Nov 1997 22:20:01 -0800 (PST)
Received: (from smap@localhost)
  by proxy1.cisco.com (8.8.7/8.8.5) id WAA19435
  for <bnevin@CISCO.COM>; Thu, 20 Nov 1997 22:20:00 -0800 (PST)
Received: from postoffice.cso.uiuc.edu(128.174.5.11) by proxy1.cisco.com via smap (V2.0)
  id xma019413; Fri, 21 Nov 97 06:19:56 GMT
Received: from postoffice.cso.uiuc.edu (postoffice.cso.uiuc.edu [128.174.5.11])
  by postoffice.cso.uiuc.edu (8.8.5/8.8.5) with SMTP id AAA21754;
  Fri, 21 Nov 1997 00:17:35 -0600
Received: from POSTOFFICE.CSO.UIUC.EDU by POSTOFFICE.CSO.UIUC.EDU
          (LISTSERV-TCP/IP release 1.8b) with spool id 7066310 for
          CSGNET@POSTOFFICE.CSO.UIUC.EDU; Fri, 21 Nov 1997 00:17:28 -0600
Received: from AUVM.AMERICAN.EDU (smtp@auvm.american.edu [147.9.1.2]) by
          postoffice.cso.uiuc.edu (8.8.5/8.8.5) with SMTP id AAA29368 for
          <CSGNET@POSTOFFICE.CSO.UIUC.EDU>; Fri, 21 Nov 1997 00:07:37 -0600
Received: from AUVM.AMERICAN.EDU by AUVM.AMERICAN.EDU (IBM VM SMTP V2R2) with
          BSMTP id 2546; Fri, 21 Nov 97 01:02:24 EST
Received: from AUVM.AMERICAN.EDU (NJE origin NETNEWS@AUVM) by AUVM.AMERICAN.EDU
          (LMail V1.2a/1.8a) with BSMTP id 4706; Fri, 21 Nov 1997 01:02:18 -0500
Path: auvm!paladin.american.edu!news.indiana.edu!vixen.cso.uiuc.edu!
      howland.erols.net!newsfeed.internetmci.com!192.87.106.104!surfnet.nl!news.tue.
      nl!usenet
Lines: 129
NNTP-Posting-Host: annex2s1.urc.tue.nl
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.01 (Win95; I)
Xref: paladin.american.edu bit.sci.purposive-behavior:9734
Message-ID: <3474F690.2490@ele.tue.nl>
              <CSGNET@POSTOFFICE.CSO.UIUC.EDU>
Sender: "Control Systems Group Network (CSGnet)"
              <CSGNET@POSTOFFICE.CSO.UIUC.EDU>
Organization: Eindhoven University of Technology, The Netherlands
              <CSGNET@POSTOFFICE.CSO.UIUC.EDU>
X-UIDL: e9d2a3f7211c00978eb88e36b316206c

[Hans Blom, 971121]

(Bill Powes (971120.0655 MST))

Imagine that you were perfectly able to predict the future, yet the
future was not fully predetermined and you were still free to choose
your actions. You would know, for instance, what the stock market
would do the coming week and which lottery ticket would show up with
the winning number tomorrow. You can easily come up with even more
desirable predictions. No doubt most people would consider themselves
to be far better in control than we are now.

True, but this is a fairy-tale. Don't get me wrong: I'm not saying that
predicting the future is useless, especially when we're talking about
population effects. But I think people have relied too much on planning and
prediction, which (in the real world) are very limited in their capacity to
forecast correctly. It is at least as important to be able to deal with
disturbances as they arise, even when one does not know they exist and has
not anticipated them.

Yes, I did present the ideal situation of perfect prediction and no
noise.
To make a principle clear it often helps to present it in its ideal
form.
Of course I agree with you that this ideal is far from what we normally
encounter in daily life, where we are plagued by all types of
uncertainties.

If you relied entirely on prediction, you'd be helpless the moment something
unpredicted happened.

I would put this even more strongly: something entirely unpredictable
cannot
be coped with, except maybe by accident -- and hence with probability
zero.

The more important it is to predict correctly, the higher will be the cost
when the prediction is wrong.

What do you mean here? I perceive only a tautology.

Thus humans invent ways to forecast the future. We cannot _know_ the
environment function, but we can make a more or less accurate guess
of what it is. And the more accurate that guess is, the better we are
able to control.

That's true, but this doesn't mean that all control can be substantially
improved by making good predictions. If you can already control in the
presence of the kind of disturbances that actually occur, and do it so well
that any further improvement would be of negligible importance, investing
in a lot of predictive machinery would be a waste of resources.

Here you describe the situation where learning is finished and cannot be
improved. Or where there is/can be no learning. You may well be right
that
this is so for the lower five or six levels of control in the HPCT
hierarchy,
although sensor adaptation might be considered a primitive form of
learning
as well, for instance.

Don't forget that even after you have predicted what is likely to happen,
and have selected the appropriate action to take, you still must turn that
selection into the actual action that has the required effect. And that
requires present-time closed-loop control, because even your muscles act in
unpredictable ways, gaining and losing sensitivity to neural signals
according to recent use.

Yes, I was talking about the idealized case where there are no
"unpredictable
ways". I wonder, however, what you mean with "present-time" control. I
thought
that in an earlier discussion we had established that control can only
influence
the _future_ state of affairs; the present is there already and cannot
be changed
anymore. Reference levels necessarily refer to the future. Right _now_
we have to
set the reference for _then_. The point is: how far into the future? Do
we just
take the very near future into account, or is it also (our expectation
about) the
far-away future that influences how we act? I could point at numerous
examples
where this is the case. A great deal of our economy is concerned with
attempts to
provide people with future "certainty", i.e. predictability. Even though
the risk
exists that the director of your pension fund is an embezzler...

And this type of counting on the far-reaching, reliable effect of
present-time actions
certainly isn't limited to humans only. Bears "insure" themselves by
eating enough --
but not too much -- before they start their winter sleep. Chipmunks (?)
bury nuts in
times when they are available in plenty to dig them up again in times of
scarcity.

And of course the details of the external world keep shifting in countless

ways,

so you have to be able to vary your actions according to the current external
circumstances, only a small part of which you can sense.

Sure. Our predictions cannot be perfect; the world is far too complex.
Yet I would bet
that normally some predictability helps some.

You can't plan how you're going to turn the steering wheel before you take the
automobile trip.

Yet you're pretty sure that when you turn the steering wheel to the
right, the car
will turn right. If you could not count on a rather reliable
relationships between the
two, you wouldn't take that trip. Not in that car...

It's nice to be able to predict the future, but in my opinion we trust our
predictions too much and tend to forget the failures, at the expense of
learning how to deal with life as it happens.

I fully agree with you. In no way, however, that contradicts my thesis
that control
would be impossible without reliable, trustworthy predictabilities. As
we demand in
that car.

Greetings,

Hans

···

Date: Fri, 21 Nov 1997 03:48:48 +0100
Reply-To: "Control Systems Group Network (CSGnet)"
From: Hans Blom <j.a.blom@ELE.TUE.NL>
Subject: prediction
To: Multiple recipients of list CSGNET

[From Bruce Abbott (971120.2320 EST)]

Bill Powers (971120.1943 MST) --

Kennaway, Taylor, and Abbott (971120)

I'm attaching the code for the program that generates the data and analysis
for the article in ABS, 'random.pas'. Also included is 'stats.pas', a unit
that calculates the correlations. The program, adapted as a demo, is more
or less self-explanatory.

This is not really a "third variable" effect; it's more a "wrong model"
effect. What I did was to set up a situation in which people could earn
"rewards" over a range of wages by putting out effort. "Costs" varied
randomly for everyone, subtracting random amounts from the produced reward.

. . .

I think some of the criticisms of my references to this simulation have
been based on a misunderstanding of what it is. I hope this clears up those
problems.

Thanks for the demo, Bill. I ran it and it is as fine an example of the
"third variable problem" in correlation as exists anywhere in the Western
hemisphere. Both plotted variables (reward and effort) are _dependent_
variables, and both increase across control systems as functions of the true
independent variable, reference level. The illusion of a direct
relationship between reward and effort appears because of the relationship
of each with the uncontrolled third variable, and would disappear in
group-average data if the control systems were assigned to groups so as to
equate their reference levels. Given that the experimenter has no knowledge
of these reference levels, his or her best strategy would be to randomize
control systems to groups, which would strongly tend to equate the groups on
this variable given the group sizes.

What is lacking in the demo as designed is any real attempt by the
experimenter to set the levels of reward, so reward is not an independent
variable there. The population-based direct relationship between reward and
effort might misleadingly suggest the possibility of a direct causal
influence of reward on effort, but as every undergraduate is taught, "ya
can't infer cause from correlation." A properly designed true experiment
would quickly rule out this apparent linkage.

Regards,

Bruce

[From Richard Kennaway (971121.0940 GMT)]

Bruce Abbott (971120.1700 EST):

You have proven no such thing. what you've done is call attention to the
"third variable problem" in correlational research. It doesn't apply to
properly controlled experiments.

I can't think of any way of knowing that you have "properly controlled the
experiment so as to eliminate third variables", except by looking at the
data for each individual and seeing if it has a similar distribution as for
the whole population.

In which case, you're not deriving anything about the individuals from the
population data.

In the pay/satisfaction study (for which I'd still like refs -- if there's
a substantial body of research arising from it, is there by now an
authoritative textbook on the subject?), it appears from your description
of it that each subject went through the experiment exactly once. One data
point does not make a distribution. The raw data of the experiment do not
contain any information about what any individual did under varying
circumstances, and therefore no conclusion can be drawn about what any
individual might have done under varying circumstances.

Kennaway is no doubt a fine
mathematician, but he seems to be rather ignorant about the nature of
experimental designs as pioneered by Sir Ronald Fisher, and about what
conclusions can be drawn from their results. (I allow the possibility that
I will soon be eating crow!)

What I don't know, I can learn, and I still remember something of an
undergraduate statistics course many years ago -- balanced block designs,
latin squares, F tests, and so on -- is this the stuff you're alluding to?

I use the correlational model and the bivariate normal distribution as a
case study in my paper simply because it is mathematically tractable and
widely used (see e.g. the multitude of studies cited in "The Bell Curve").
I claim in the paper that I expect my conclusions to be much the same for
other experimental designs and other distributions, but perhaps the paper
would be strengthened (and its completion delayed) if I were to also
discuss designs where the experimenter sets the distribution of one
variable.

-- Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
   School of Information Systems, Univ. of East Anglia, Norwich, U.K.

[From Bruce Gregory (971121.0940 EST)]

Rick Marken (971120.1800)

By the way, this post is so incredibly intelligent because I
have been listening to Mozart (a symphony he composed at the age
of 12!) while writing it. Maybe there's something to that
group-based after all;-)

Mozart has the same effect on me. Maybe we should require that
people listen to at least one Mozart piece before posting to
CSGnet :wink:

"The difficulty lies, not in the new ideas, but in escaping the
old ones, which ramify, for those brought up as most of us have
been, into every corner of our minds."

                                        John Maynard Keynes

Bruce

[From Bill Powers (971121.0647 MST)]

Bruce Abbott (971120.2320 EST)--

Thanks for the demo, Bill. I ran it and it is as fine an example of the
"third variable problem" in correlation as exists anywhere in the Western
hemisphere. Both plotted variables (reward and effort) are _dependent_
variables, and both increase across control systems as functions of the true
independent variable, reference level.

OK, I guess that's a "third-variable" effect, even though (from the
traditional point of view) the reference level is not an observable
variable (it's a hypothetical variable in a model).

It would be interesting to look at effort versus wages. The experimenter
can manipulate wages, but not rewards -- the amount of reward received
depends, at any wage level, on the effort and so is always a dependent
variable (even through it's usually spoken of as if it were an independent
variable). I believe you would get the same sort of effect. Feel free to
modify the code if you want to try it.

The illusion of a direct
relationship between reward and effort appears because of the relationship
of each with the uncontrolled third variable, and would disappear in
group-average data if the control systems were assigned to groups so as to
equate their reference levels.

To do this, you would have to understand control theory, wouldn't you? You
have to use a control model to deduce what any person's reference level is
for a given variable. My point (considering where the paper was published)
was to show how one could reach the wrong conclusions from a statistical
analysis if one didn't know that the organisms in the experiment were
control systems.

Given that the experimenter has no knowledge
of these reference levels, his or her best strategy would be to randomize
control systems to groups, which would strongly tend to equate the groups on
this variable given the group sizes.

That wouldn't really do it, would it? Each group would still contain a
distribution of reference levels from low to high, and those with the
higher reference levels would put out more effort to get more reward.

Since I wrote this program for myself, it's pretty hard to read. The
critical part of it is this:

for i := 0 to maxdata do
begin
  if graphdriver = CGA then
  b := 2.5 + 3.5 * random
  else b := 1.5 + 3.5 * random;
  k := 5.0;
  d := -random(40);
  r0 := 100 + random(200);

  effort := k * (r0 - d)/ (1.0 + k * b);
  reward := (b * k * r0 + d) / (1.0 + k * b);
  v2[i] := round(effort);
  v1[i] := round(reward);
  ref[i] := r0;
end;

As you may see, I just solved the control-system equations for the
variables instead of running a simulation. Note the "if CGA" -- this was
written a while back.

Here b is wages, k is the control system gain, r0 is the reference level,
and d is the cost-disturbance. The wages vary randomly between 1.5 and 5
units of reward per unit effort. The cost varies randomly between 0 and 40,
and the reference level between 100 and 300. You could sort your subgroups
by wages, since that's an independent variable that the experimenter could
manipulate.
I think you will find that effort appears to increase with wages, although
for each individual, increasing the wage will decrease the effort. But
maybe not -- the only way to find out is to try it. It could be that there
will seem to be no relation between wages and effort. Let us know, if you
try it.

What is lacking in the demo as designed is any real attempt by the
experimenter to set the levels of reward, so reward is not an independent
variable there.

Since reward depends on effort, there's no way to set it as an independent
variable. Wages would the manipulated variable. If someone has a zero
reference level for rewards, the rewards would be zero at any wage level.

The population-based direct relationship between reward and
effort might misleadingly suggest the possibility of a direct causal
influence of reward on effort, but as every undergraduate is taught, "ya
can't infer cause from correlation." A properly designed true experiment
would quickly rule out this apparent linkage.

Every undergraduate is taught that, and forgets it as quickly as possible.
Anyone who talks about controlling behavior through manipulating rewards
has forgotten it. You manipulate wages, not rewards.

Anyway, the point I was trying to make is that the usual impression we get,
which is that people will work harder if given higher rewards, is not true
if the people are controlling for reward. People will work harder if they
_want_ higher rewards. It's how much reward they want that determines how
much they get _and_ how hard they will work. You're right, it's a "third
variable" problem. But the third variable is inside the organism, and you'd
have to be a control theorist to guess what it is.

Best,

Bill P.

[From Hank Folson (971121)]

Bruce Abbott (971119.1550 EST)]

>The statistical theory behind this method is as follows. The basic logic
>of an experiment is to manipulate some variable and observe some other
>variable thought to be affected by it, while holding all other variables
>constant.
>(The variable manipulated by the experimenter is called the independent
>variable; the variable observed while the independent variable is being
>manipulated is called the dependent variable.)...

When I was studying engineering, we used this exact approach in all our
lab experiments on internal combustion engines, refrigeration systems, and
chemical reactions. We never had to rely on statistics to get meaningful
results. Inconsistent results indicated we had done something wrong.

I remember doing experiments on a steam engine with a flyball governor
speed control system. We applied the same independent-dependent variable
approach, and still got consistent accurate results. We did not need to
use statistics.

Bruce, why do engineering lab students get 100% correlations (on both S-R
& control systems) and do not need to use statistics on a population to
get meaningful results as psychologists do? The difference is not due to
engineers coming from a superior population. :wink:

Sincerely, Hank Folson

[From Mike Acree (971121.1220)]

Subscribing only to the digest leaves me perennially a day late; the
advantage is that many points I would have made have been made very
nicely by others. I've silently cheered much in the posts by Bill,
Rick, Richard, and Martin, and have only a couple of small observations
to add.

Abbott (971119.1550) writes: "There is a small chance that this
conclusion [that pay influences liking for the task] is incorrect." "A
small chance" sounds like a reference to the alpha level, but there is
in fact no mathematical probability of the conclusion being incorrect.
The only way we could speak of the probability of a conclusion being
correct, as C. S. Peirce pointed out, would be "if universes were plenty
as blackberries, and we could put a quantity of them in a bag, shake
them well up, draw out a sample," and find the proportion of them in
which the conclusion held true. The significance level is the
conditional probability of the data given the null hypothesis, not the
probability of the hypothesis given the data. These quantities are not
the same, just as, in the stock example, the probability that a man is
Catholic, given that he is the pope, is not the probability that a man
is pope, given that he is Catholic. Significant results do not in
themselves impugn the null hypothesis. Gosset was fond of pointing out
that, if he dealt himself a hand of 13 trumps after thoroughly shuffling
the cards, the chance hypothesis would still be more credible than any
conceivable alternative, whatever the odds agains the observed event.

The confusion is not, however, confined to defenders of significance
testing. Marken (971119.1900) says, "ANOVA tells you the probability
that the observed ... F was drawn from a population of _groups_ with a
mean F ratio of 1.0." No, for the same reason: it tells you the
probability of drawing an F as large as the observed one IF you were
sampling from a central F distribution. The reference to a population
of groups in this context is also incidentally wrong; more precisely, it
could be applied only to the random-effects model. In the fixed-effects
model, there is no population of groups, only populations of
individuals, even thought the purpose of the analysis is comparison of
means. Final technical quibble: the mean of the central F distribution
is df_error/(df_error-2), a little bit more than 1.

My own stats professor, Neil Rankin, once suggested that if
psychologists were trying to understand the operation of a
teeter-totter, having observed what appeared to be the relevant
variables, they might select the distance of one child from the fulcrum
as the dependent variable, then vary the other distance and weights of
the two children, in a three-way factorial ANOVA. Realistic data would
show all two- and three-way interactions to be highly significant,
leading to the conclusion that the teeter-totter was an extremely
complicated phenomenon. It would be an unusual psychologist who would
discover that a log transformation made all the interactions disappear.

As for the more relevant point about ANOVA, I'm reminded of Jane
Loevinger's challenge about factor analysis--psychology's oldest and
probably most distinctive contribution to statistics--30 years ago: to
name a single scientific discovery that resulted from it. I think the
same challenge could be put to ANOVA, or to path analysis or any of the
more elaborate statistical techniques in psychological research. Most
people's list of "greats" in psychology would likely be heavy with
Europeans who made no use of statistics; among Americans, the most
prominent is probably Skinner, who joins PCTers in their rejection of
statistics.

I leave the last word to Galton, who was disparaging the focus on
averages over a century ago: "It is difficult to understand why
statisticians commonly limit their inquiries to averages, and do not
revel in more comprehensive views. Their souls seem as dull to the
charm of variety as that of a native of one of our flat English
counties, whose retrospect of Switzerland was that, if its mountains
could be thrown into its lakes, two nuisances would be got rid of at
once."

Mike

[From Bruce Abbott (971121.1545 EST)]

Bill Powers (971121.0647 MST) --

Bruce Abbott (971120.2320 EST)

Thanks for the demo, Bill. I ran it and it is as fine an example of the
"third variable problem" in correlation as exists anywhere in the Western
hemisphere. Both plotted variables (reward and effort) are _dependent_
variables, and both increase across control systems as functions of the true
independent variable, reference level.

OK, I guess that's a "third-variable" effect, even though (from the
traditional point of view) the reference level is not an observable
variable (it's a hypothetical variable in a model).

It would be interesting to look at effort versus wages. The experimenter
can manipulate wages, but not rewards -- the amount of reward received
depends, at any wage level, on the effort and so is always a dependent
variable (even through it's usually spoken of as if it were an independent
variable). I believe you would get the same sort of effect. Feel free to
modify the code if you want to try it.

Good idea. But I think that it is a little confusing to label your two
dependent measures "effort" and "reward." These sound like amounts, but in
your model they are rates. I will call them "work rate" and "earnings
rate." Each control system has a randomly determined reference earnings
rate, and will increase or decrease the work rate as necessary to bring
earnings rate near its reference. In a given control system, the work rate
needed to maintain the earnings rate near its reference will decrease as
wages increase. Thus there will be an inverse relation between wages and
work rate (and also between wages and earnings rate). However, across
control systems, those having higher reference levels for earnings rate will
have commensurately higher earnings rates and work rates at a given wage
level. The points giving the earnings rate of each control system at that
wage level will form a vertical stack from bottom to top in order of
reference level. So will the points for work rate. The same will happen at
each wage level, but as wages increase, earnings rate and work rate will
both tend to decline. If a large number of control systems have been
randomly assigned to each wage level, the average earnings rate and the
average work rate will be seen to decline as wages increase. Thus the
inverse relation seen in the individual systems will also appear in the
averages.

The illusion of a direct
relationship between reward and effort appears because of the relationship
of each with the uncontrolled third variable, and would disappear in
group-average data if the control systems were assigned to groups so as to
equate their reference levels.

To do this, you would have to understand control theory, wouldn't you? You
have to use a control model to deduce what any person's reference level is
for a given variable.

In the absence of such an understanding, you could approximate the effect of
equating reference levels across different levels of wage by assigning
control systems to the different levels of the independent variable (wage)
at random, as described above. (With a large n for each group, the groups
are almost guaranteed to have the same average reference level.) You don't
need to know what the individual reference levels are, or even that they exist.

My point (considering where the paper was published)
was to show how one could reach the wrong conclusions from a statistical
analysis if one didn't know that the organisms in the experiment were
control systems.

In the new example you suggest, that would not happen, as described above.

Given that the experimenter has no knowledge
of these reference levels, his or her best strategy would be to randomize
control systems to groups, which would strongly tend to equate the groups on
this variable given the group sizes.

That wouldn't really do it, would it? Each group would still contain a
distribution of reference levels from low to high, and those with the
higher reference levels would put out more effort to get more reward.

But the average reference would be similar across groups, so what would be
seen in the averages is a decrease in effort (and reward) as wages increase.
You might want to modify your program and see for yourself.

Regards,

Bruce

[From Rick Marken (971121.1400)]

Bill Powers (971121.0647 MST) to Bruce Abbott (971120.2320 EST) --

You could sort your subgroups by wages, since that's an independent
variable that the experimenter could manipulate. I think you will
find that effort appears to increase with wages, although
for each individual, increasing the wage will decrease the effort.
But maybe not -- the only way to find out is to try it. It could
be that there will seem to be no relation between wages and effort.
Let us know, if you try it.

I did this experiment using a spreadsheet and your equations for
effort given wages, reference for rewards and cost. I selected
three levels of the IV (wages): .5, 1.5, 2.0. The "subjects" were
randomly assigned to each level of the IV (that is, each subject
had a different randomly selected reference for reward, gain and
cost factors -- on most runs I held the cost factor constant under
the assumption that this was one of the variables that could be
controlled across subjects by the experimenter).

The results were exactly what Bruce Abbott said they would be; the
average (over subjects) effort (DV) _decreases_ with increases in
wages (IV) -- which is the same as the relationship between wages
and effort found for each individual subject. So in this case, the
group average reflects the the relationship between wages and effort
that exists for each individual.

But does this mean that, in general, this kind of group experiment
is a legitimate way to study characteritics of individuals? Does
this mean that Bruce Abbott is (choke, gag) correct in arguing for
the appropriateness of using group data to leann about the nature
of individuals?

OF COURSE NOT! :wink:

The group averages in this experiment are a correct representation
of the characteristics of each individual because each individual
in this experiment _is the same in terms of this characteristic_!!
Every individual is controlling for reward (though for a different
amount with a different gain, so there are individual differences)
so there is a negative relationship between wages and effert
(increased wages are associated with decreased effort) FOR EACH
INDIVIDUAL, the same as the relationship between wages and effort
that was observed FOR THE GROUP. So in this particular situation
the group result correctly reflect the individual results.

But a real experimenter has no way of knowing (as I did in my
simulation) that all the individuals in the experiment ARE THE
SAME with respect to the characteristic being measured by the
group data (in this case, the negative relationship between
wages and effort). In a real experiment, many of the individuals
may not even be controlling for the variable called "reward" --
which is equivalent to having a reference of 0 for reward.

I revised my spreadsheet simulation so that a randomly selected
1/2 of my "subjects" were not controlling for reward (reference = 0).
The group results of this experiment were _the same_ as they
were when all people were controlling for rewards; there was a
marked inverse relationship between wages and effort. But now we
can see that this group average is _not_ a correct representation
of the response to wages of about 1/2 of the subjects. So if some
experimenter published the results of this research, saying "effort
is inversely related to wages", he would actually be _wrong_ about
half of the population (the ones who are not controlling for reward);
for that half of the population there is _no_ relationship between
wages and effort.

So in terms of the main topic of discussion here (the appropriateness
of using group data for studying the nature of individuals) Bruce
Abbott is wrong. It is not appropriate to use group data to study
individual behavior, ever. Doing so is (as Bill noted) systematized
prejudice. I think this is a very important issue and Bruce Abbott
is in a position where he can either contribute to the continuation
of a criminal approach to research (by publishing more textbooks
defending this approach) or contribute to the development of a new
approach to research, based on testing individuals to determine what
variables they are actually controlling. The choice is yours, Bruce.
But, before you make it, I suggest taking two Mozart Piano
Concertos (17 & 21 will do), the final forgiveness aria from
Nozze de Figaro, and call me in the morning.

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken

[From Bruce Abbott (971121.1835 EST)]

Mike Acree (971121.1220) --

Abbott (971119.1550) writes: "There is a small chance that this
conclusion [that pay influences liking for the task] is incorrect." "A
small chance" sounds like a reference to the alpha level, but there is
in fact no mathematical probability of the conclusion being incorrect.

Nice catch, Mike. The significance level indicates the probability that a
difference as large or larger than the one actually observed would occur by
chance, given that the null hypothesis (that there is no treatment effect)
is true. If this probability is small enough, then one is willing to reject
the idea that the observed difference is a purely chance phenomenon (such a
difference is unlikely to have occurred by chance if the null hypothesis is
true). If the data are poorly explained by chance, then what remains to
explain the observed difference? The only other reasonable candidate (if
the experiment was conducted properly and the statistical assumptions were
met) is an effect of independent variable on the scores.

The peculiar thing about this mistake is that I just got done pointing out
the same mistake to someone else (who made it in their published text).
Then I go and do it myself. Go figure. It must be that damned left brain
acting up again. (;-> Thanks for pointing it out.

Regards,

Bruce

[Martin Taylor 971121 10:40]

Rick Marken (971120.1800)]

re: Bruce Abbott (971120.1300 EST), Bruce Abbott (971120.1700 EST), and
Martin Taylor (971120 15:40) I said:

More detailed responses to follow as soon as I finish reading my
horoscope.

Again, what am I thinking? I'm not going to be able to tell
these guys anything that will change their minds. I admire
the persistance of those (like Richard Kennaway and Bill Powers)
who are willing to continually re-expose the impropriety and
perniciousness of using group data to study individual behavior.
And I suppose Abbott and Taylor should get some credit for
showing Richard Kennaway the kind of garbage he can expect to
get back from the Science reviewers.

It might be nice if you read the messages on which you comment, before
releasing your comments to the world, mightn't it? I'll let Bruce speak
for himself, but I would ask you to point out to me where in the
message of mine that you reference I espoused using group data to study
individual behaviour. Could you do that? I've reread my message, and I
can't find anything that can easily be reinterpreted in that way. But
I've come to learn the _w i d e_ range of interpretations you can put
onto other people's writings, so I'd be fascinated to learn what I wrote
that leads you to see me as advocating the use of group data to study
individual behaviour.

I didn't do that (much), even before I heard of PCT:-)

Martin

[From Bill Powers (971121.1628 MST)]

Bruce Abbott (971121.1545 EST)--

Good idea. But I think that it is a little confusing to label your two
dependent measures "effort" and "reward." These sound like amounts, but in
your model they are rates.

It makes little difference: you could do the same thing with total reward
and total effort to date: work done vs money received. But it would be
clearer to do as you suggest, or at least to understand the terms as you
interpret them (it's easier to write effort than energy expended per unit
time).

I agree with your analysis. It may be interesting to explore this a little
further. Let's look at the equations, understanding "reward," "effort," and
"cost" to be measured per unit time. If effort is measured in pieces per
hour and wages in dollars per piece, then reward is measured in dollars per
hour.

reward = wages*effort - cost

effort = wages * gain * (R0 - reward)

where R0 is the desired amount of reward (rate).

Solving first for reward and then for effort, we obtain

           wages * gain * R0 - cost
reward = --------------------------
               1 + wages * gain

          gain * (R0 + cost)
effort = ------------------
           1 + wages * gain

Here we find the inverse relationship between wages and effort that you
noted. But we find another interesting relationship between reward and
wages. The reward is asymptotic to R0 as wages increase. This means that
these control systems are maintaining a particular level of reward by
reducing their efforts as wages increase, or increasing their efforts as
wages decrease.

You spoke in an earlier post about the experimenter possibly "manipulating
the reward." As we can see here, manipulating the wages is not the same
thing as manipulating the the reward. Because these systems are controlling
for a certain amount of reward, altering the wages has only a very small
effect on the amount of reward actually obtained. The principal effect of
raising wages is to lower the effort.

All of which goes to show that there must be something to doing a job
beside earning money, if people are really control systems.

In the absence of such an understanding, you could approximate the effect of
equating reference levels across different levels of wage by assigning
control systems to the different levels of the independent variable (wage)
at random, as described above. (With a large n for each group, the groups
are almost guaranteed to have the same average reference level.) You don't
need to know what the individual reference levels are, or even that they

exist.

Technically, or numerically, that's probably true. But in the real world
you can't just "assign wages" -- where are you going to find a person with
a reference level for $500,000 per year to go in the group that is paid
$10,000 per year? If you sorted people in actual jobs into groups by wage,
you would find a range of reference levels for monetary reward within each
group, but I strongly doubt that you'd find the same average reference
level in each group. The people who wanted lots of money (and were willing
to put out the effort to get it) would most likely be found in the groups
earning higher wages. Isn't it likely that your experimental design would
leave out something important here? If people who had higher reference
levels got higher wages, the appearance would be that increasing wages
increases both reward and effort, whereas in each individual the
relationship is still the opposite of that.

My point (considering where the paper was published)
was to show how one could reach the wrong conclusions from a statistical
analysis if one didn't know that the organisms in the experiment were
control systems.

In the new example you suggest, that would not happen, as described above.

In fact it probably would happen, but for reasons outside the premises of
the experiment as given. The assumption that sorting the people by wage
would average out differences in reference level is probably wrong. You
wouldn't find many people who want to earn $100,000 per year applying for
minimum-wage jobs.

But the average reference would be similar across groups, so what would be
seen in the averages is a decrease in effort (and reward) as wages increase.
You might want to modify your program and see for yourself.

Note that the reward would not be predicted to decrease; it would increase
very slightly as wages increased. And I hope you agree that the average
reference level would probably NOT be similar across groups sorted by wage.

Best,

Bill P.

[From Rick Marken (971122.0950)]

Martin Taylor (971121 10:40) --

It might be nice if you read the messages on which you comment,
before releasing your comments to the world, mightn't it?

I'll make a mental note.:wink:

I'll let Bruce speak for himself,

Bruce won't reply to my posts anymore; his only defense against
the ultimate disturbance -- the spreadsheet;-)

but I would ask you to point out to me where in the message of
mine that you reference I espoused using group data to study
individual behaviour.

I was responding to your comment [Martin Taylor (971120 15:40)]
re: Bill's ABS demo. You said:

Isn't it a little disingenuous of you STILL to be citing that
paper in this context? Or have you reversed your understanding
of some years ago that the average of slopes is not necessarily
the slope of averages? The error is _only_ in assuming that
taking slopes commutes with taking averages, NOT in assuming that

The _apparent_ relationship you get from varying IVs and
measuring DVs over a population is NO INDICATOR AT ALL of the
actual relationship between IV and DV for any individual in the
group, and can be completely wrong for ALL of them

You were certainly not espousing the use of group data for the study
of individual behaviour. But it seemed to me that you were doing
one of your patented misdirection plays when you said Bill was
lying about the fact that his ABS demo shows that the relationship
you see between IVs and DVs over a population is NO INDICATOR AT
ALL of the actual relationship between IV and DV for any individual.
In fact, Bill's ABS demo is an excellent demonstration of this
fact. It is a correlational study but it still shows rather nicely
how you can obtain a population relationship between variables
that is exactly the opposite of the actual relationship between
the variables for each individual.

Bruce Abbott pointed out that the results of the demo were achieved
by manipulating a "third variable" (reference level) which differed
over individuals. Bruce correctly pointed out that this third
variable problem could be eliminated by factoring it out. Richard
Kennaway then correctly pointed out that, in order to do that, you would
have to _study each individual_ (using The Test), one at a
time, in order to measure the value of this third variable. Bruce
Abbott also pointed out that this third variable problem would be
eliminated in a completely randomized experimental design (assigning
individuals randomly to conditions -- levels of the IV). I [Rick
Marken (971121.1400)] verified Bruce's claim using spreadsheet
modeling (which seems to have made no impression on anyone) and
proceeded to show that this group result was still NO INDICATOR
AT ALL of the actual relationship between IV and DV for each
individual because the same group result (negative relationship between
IV and DV) is obtained when the actual relationship between
IV abd DV for half the subjects nothing like the group result.
I'm sure it would be easy to build a spreadsheet model where the
group relationship between IV and DV differs from the actual
relationship etween IV and DV for _every_ member of the population.

Are you opposed to the use of group data for studying the
characteristics of individuals, Martin? If so, just say so,
and abjure all the obfuscation. The issue is really too
important for that.

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bruce Abbott (971122.1415 EST)]

Rick Marken (971121.1400) --

Bill Powers (971121.0647 MST) to Bruce Abbott (971120.2320 EST)

You could sort your subgroups by wages, since that's an independent
variable that the experimenter could manipulate. I think you will
find that effort appears to increase with wages, although
for each individual, increasing the wage will decrease the effort.
But maybe not -- the only way to find out is to try it. It could
be that there will seem to be no relation between wages and effort.
Let us know, if you try it.

I did this experiment using a spreadsheet and your equations for
effort given wages, reference for rewards and cost. I selected
three levels of the IV (wages): .5, 1.5, 2.0. The "subjects" were
randomly assigned to each level of the IV (that is, each subject
had a different randomly selected reference for reward, gain and
cost factors -- on most runs I held the cost factor constant under
the assumption that this was one of the variables that could be
controlled across subjects by the experimenter).

The results were exactly what Bruce Abbott said they would be;

Oh I love to hear that. Would you mind saying that again?

The results were exactly what Bruce Abbott said they would be;

O.K., one more time:

The results were exactly what Bruce Abbott said they would be; the
average (over subjects) effort (DV) _decreases_ with increases in
wages (IV) -- which is the same as the relationship between wages
and effort found for each individual subject. So in this case, the
group average reflects the the relationship between wages and effort
that exists for each individual.

There must be a catch, right? I mean, you can't just leave it at that, can you?

But does this mean that, in general, this kind of group experiment
is a legitimate way to study characteritics of individuals? Does
this mean that Bruce Abbott is (choke, gag) correct in arguing for
the appropriateness of using group data to leann about the nature
of individuals?

OF COURSE NOT! :wink:

No, of course not!

The group averages in this experiment are a correct representation
of the characteristics of each individual because each individual
in this experiment _is the same in terms of this characteristic_!!
Every individual is controlling for reward (though for a different
amount with a different gain, so there are individual differences)
so there is a negative relationship between wages and effert
(increased wages are associated with decreased effort) FOR EACH
INDIVIDUAL, the same as the relationship between wages and effort
that was observed FOR THE GROUP. So in this particular situation
the group result correctly reflect the individual results.

But a real experimenter has no way of knowing (as I did in my
simulation) that all the individuals in the experiment ARE THE
SAME with respect to the characteristic being measured by the
group data (in this case, the negative relationship between
wages and effort). In a real experiment, many of the individuals
may not even be controlling for the variable called "reward" --
which is equivalent to having a reference of 0 for reward.

Yes, and?

I revised my spreadsheet simulation so that a randomly selected
1/2 of my "subjects" were not controlling for reward (reference = 0).
The group results of this experiment were _the same_ as they
were when all people were controlling for rewards; there was a
marked inverse relationship between wages and effort. But now we
can see that this group average is _not_ a correct representation
of the response to wages of about 1/2 of the subjects. So if some
experimenter published the results of this research, saying "effort
is inversely related to wages", he would actually be _wrong_ about
half of the population (the ones who are not controlling for reward);
for that half of the population there is _no_ relationship between
wages and effort.

Try again, Rick. If half your "subjects" at each wage level show one
relationship and half show the reverse, the average effect should be zero.
If you got _the same_ results as before, you screwed up.

So in terms of the main topic of discussion here (the appropriateness
of using group data for studying the nature of individuals) Bruce
Abbott is wrong. It is not appropriate to use group data to study
individual behavior, ever.

The second simulation should have produced a weak or nonexistent
relationship between wages and effort (or reward). The fact that one would
get something similar to the individual functions in the group means only
when all the functions were the similar was pointed out by me in my post.
There I noted that the obtained group-based function would become weaker to
the extent that the individual functions are inconsistent. This is
different from finding a strong group-based relationship in any direction
whatever, which is what was being claimed. The observed function will
essentially represent an average of the individual functions.

I have never claimed that the group-based functions will tell you anything
about the function in a _given_ individual (unless all individuals have the
same function). My claim was that group-based functions can provide an
indication of the average individual function, and to the extent that
individuals are similar, reveal something about how the independent variable
may typically relate to the dependent variable in individual_s_ (your
mileage may vary). Where there is consistency in the individual functions,
group-based methods will do a reasonable job of revealing the relationship.
For some experimental variables there is no other method for determining
what the relationship is, as exposure to one level of the independent
variable permanently changes the system. The effect of learning is a prime
example.

Regards,

Bruce

[From Richard Kennaway (971122.1956 GMT)]

Bruce Abbott (971122.1415 EST):

My claim was that group-based functions can provide an
indication of the average individual function

What is "the average individual function"? Is that something different
from a "group-based function"?

and to the extent that
individuals are similar, reveal something about how the independent variable
may typically relate to the dependent variable in individual_s_ (your
mileage may vary).

This is a tautology. Knowing the group function tells you about those
individual functions which are similar to the group function. Big deal.
If you know, by observing, which individual functions are similar to the
group function, measuring the group function tells you nothing new about
any individual.

Where there is consistency in the individual functions,
group-based methods will do a reasonable job of revealing the relationship.
For some experimental variables there is no other method for determining
what the relationship is, as exposure to one level of the independent
variable permanently changes the system. The effect of learning is a prime
example.

In such a situation, how do you experimentally determine that there is
consistency in the individual functions? In fact, how do you even
determine an individual function, given that you can only get one data
point from any individual?

-- Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
   School of Information Systems, Univ. of East Anglia, Norwich, U.K.

[From Bruce Gregory (971122.1600 EST)]

Rick Marken (971122.0950)

Bruce Abbott pointed out that the results of the demo were achieved
by manipulating a "third variable" (reference level) which differed
over individuals. Bruce correctly pointed out that this third
variable problem could be eliminated by factoring it out. Richard
Kennaway then correctly pointed out that, in order to do that, you would
have to _study each individual_ (using The Test), one at a
time, in order to measure the value of this third variable. Bruce

Abbott also pointed out that this third variable problem would be
eliminated in a completely randomized experimental design (assigning
individuals randomly to conditions -- levels of the IV). I [Rick
Marken (971121.1400)] verified Bruce's claim using spreadsheet
modeling (which seems to have made no impression on anyone) and
proceeded to show that this group result was still NO INDICATOR
AT ALL of the actual relationship between IV and DV for each
individual because the same group result (negative relationship between
IV and DV) is obtained when the actual relationship between
IV abd DV for half the subjects nothing like the group result.

Just for the record, your spreadsheet model made an impression on me.
I'm always pleansantly surprised when someone actually looks to see
what the model predicts. You serve as my conscience. Have you ever
been accused of that before? :wink:

Bruce

[From Rick Marken (971122.1300)]

Bruce Abbott (971122.1415 EST) --

Try again, Rick. If half your "subjects" at each wage level show
one relationship and half show the reverse

I didn't say that they show the "reverse" relationship, oh Mozart
deprived one. I said their reference for reward was 0. This would
make their effort output 0. Since the presence of a bunch of zeros
in the data would be obvious to even Mozart deprived researchers,
I simply added a random number -- representing the contribution
of "effort" to any other variables these subjects are controlling --
as the value of the DV for the subjects who were not controlling
for reward.

The fact that one would get something similar to the individual
functions in the group means only when all the functions were
the similar was pointed out by me in my post.

The group average function is what it is. It's a nice inverse
linear function relating wages to effort on repetitions of
my simulation; but this function happens to resemble the actual
function of a minority of the subjects (I've re-run it with only
20% of the subjects controlling for reward). In real experiments
the experimenter has no idea whether the observed group based
function applies to all, some or _none_ of the subjects. The
group result tells you only about the group, not the individuals
in it.

There I noted that the obtained group-based function would

< become weaker to the extent that the individual functions are

inconsistent.

The experimenter cannot possibly know whether or not the observed
result is "weak" relative to what it would be if all subjects had
the same function. All the experimenter observes is the group
result. In my experimental simulation the observed correlation
between wages and average group effort is typically -.99. Is this
a strong or a weak result?

My claim was that group-based functions can provide an
indication of the average individual function, and to the
extent that individuals are similar, reveal something about
how the independent variable may typically relate to the
dependent variable in individual_s_

In my demo, 80% of the individuals were not controlling for the
perception of reward AT ALL. Yet the group average shows the kind
of relationship between effort and wages that you would expect
if every individual was controlling for reward. This means
that, in your individual dealings with people, if you believed
or predicted, based on the group research, that that a particluar
person would tend to exert less effort as their wages were
increased, you would usually be wrong. Of course, once you started
interacting with the person you would find that wages don't have
the expected effect on his behavior -- but you could have found
this out anyway, _without the group research results_. All the group
research results did was _prejudice_ you to think that this is the
way individual humans work. But this individual person (and possibly
all other individual people) DON'T work the way the group data
say they do.

Therapists deal with individuals. If group research tells them
"problem X is caused by variable Y" then when people come in
"presenting" with problem X, the therapist will be inclined to
help the person by working on variable Y. But it is very likely
that variable Y has nothing to do with problem X for this or
any other individual. Hopefully, a wise therapist will eventually
figure this out and change the theraputic strategy. But I imagine
that, in many cases, the initial prejudice that comes from the group
results is hard to overcome; the therapist becomes sure that the
_real_ cause of the problem is variable Y and the patient is just
"in denial". This, to me, is the evil of group research; it's
prejudice, plain and ugly.

Since you apparently don't want to climb out of the dung hole of
conventional psychological research that you have dug for yourself,
I'll just make one last point for the benefit of other researchers
who are yet completely hopeless. I believe that the therapist
would be in a lot better shape if s/he could determine the variables
involved in _each individual_ patient's problem. That's the promise
of PCT research. It will let therapists know what is the same about
all individuals (their organization as input control systems) and
something about the _kinds_ of variables they control. It will
also show the therapist how to determine whether their patients
are trying to control a particular variable or not.

Group research is fine for people who deal with groups (policy
people, politicians, administrators, epidemiologists, etc) but
it's worse than useless for people who deal with individuals.

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[Martin Taylor 971122 21:40]

Rick Marken (971122.0950) to Martin Taylor (971121 10:40)

Are you opposed to the use of group data for studying the
characteristics of individuals, Martin? If so, just say so,
and abjure all the obfuscation. The issue is really too
important for that.

I thought I had been rather clear on this issue, before. However, I'll
try again:

(1) You can't study AN individual by using group data.

(2) You CAN study the kinds of influence that are likely to affect
individuals by looking at group data.

(3) You can NOT get anything sensible out of a study by looking at the
significance levels of statistics (except to discover whether your
experiment is powerful enough to achieve "significance").

(4) You CAN determine how likely it is that a particular individual will
have some effect measured during the group study by looking at the
distributions found in the study (typically using a surrogate measure
such as a confidence interval, though that's really inadequate).

It's the pairing of these last two that leads us into all the quagmires.
Suppose the average over individuals is 3, and the distribution is
Gaussian with a standard deviation of 300. You are not going to do much
better than a coin toss if you want to say Mr. X will have a positive
value. But if the standard deviation is .003, you'd be pretty stupid to
pass up a bet with someone who offered to bet you that Ms Y had a negative
value (unless he knows Ms Y!-)

Which leads to Bill's study.

I was responding to your comment [Martin Taylor (971120 15:40)]
re: Bill's ABS demo. You said:

Isn't it a little disingenuous of you STILL to be citing that
paper in this context? Or have you reversed your understanding
of some years ago that the average of slopes is not necessarily
the slope of averages? The error is _only_ in assuming that
taking slopes commutes with taking averages, NOT in assuming that

The _apparent_ relationship you get from varying IVs and
measuring DVs over a population is NO INDICATOR AT ALL of the
actual relationship between IV and DV for any individual in the
group, and can be completely wrong for ALL of them

You were certainly not espousing the use of group data for the study
of individual behaviour. But it seemed to me that you were doing
one of your patented misdirection plays when you said Bill was
lying about the fact that his ABS demo shows that the relationship
you see between IVs and DVs over a population is NO INDICATOR AT
ALL of the actual relationship between IV and DV for any individual.

I didn't say Bill was lying. I offered him the option of saying that
he had indeed reversed his understanding, an understanding we came to
jointly over a blackboard in Durango. The study shows nothing at all
about the relation between indivudal and group data.

In fact, Bill's ABS demo is an excellent demonstration of this
fact. It is a correlational study but it still shows rather nicely
how you can obtain a population relationship between variables
that is exactly the opposite of the actual relationship between
the variables for each individual.

Not if it's the study that we talked about in Durango, and that seems
to have been the one under discussion here recently (and still is).

What Bill showed is that it is improper to take the slope of a set
of averages and say that it is the average of a set of slopes. But
most people knew that already. I grant you that there are some pretty
unmathematical people in psychology, who might not see this fact as
obvious, so the demonstration is useful. But it has nothing "zero,
nada"--to use your words-- to do with showing you can't use group
data as an indicator of values for individuals in the group.

A demonstration that you can't use group data to predict individual
data has to use the _same_ measure for both. Is the group average
value of X related to the value of X for the individuals in the group?
If you can set up a demonstration wherein the group average value of
some X is outside the range of X for all the individuals, then _that_
would be a great demo. But I don't think it can be done.

Bruce Abbott pointed out that the results of the demo were achieved
by manipulating a "third variable" (reference level) which differed
over individuals. Bruce correctly pointed out that this third
variable problem could be eliminated by factoring it out. Richard
Kennaway then correctly pointed out that, in order to do that, you would
have to _study each individual_ (using The Test), one at a
time, in order to measure the value of this third variable.

So they did, but this is a separate issue, of no particular concern
since the fundamental observation is a red-herring anyway.

Bruce
Abbott also pointed out that this third variable problem would be
eliminated in a completely randomized experimental design (assigning
individuals randomly to conditions -- levels of the IV). I [Rick
Marken (971121.1400)] verified Bruce's claim using spreadsheet
modeling (which seems to have made no impression on anyone) and
proceeded to show that this group result was still NO INDICATOR
AT ALL of the actual relationship between IV and DV for each
individual because the same group result (negative relationship between
IV and DV) is obtained when the actual relationship between
IV abd DV for half the subjects nothing like the group result.

Yep. I observed what you did, and why. And it is unrelated to my point
that it is illegitimate to compare the average of a set of slopes (or
any one of the slopes) with the slope of a set of averages.

I'm sure it would be easy to build a spreadsheet model where the
group relationship between IV and DV differs from the actual
relationship etween IV and DV for _every_ member of the population.

Do it, using a legitimate measure, and we will all be indebted.

Martin

[Martin Taylor 971122 21:50]

[Martin Taylor 971122 21:40]

A small correction...

What Bill showed is that it is improper to take the slope of a set
of averages and say that it is the average of a set of slopes. But
most people knew that already.

It seems that most people who write the CSGnet didn't know this, so
I shouldn't have said "most people know it already." Nevertheless, it
is so, and I am surprised that there seems to be so much confusion
about it.

Just think of an ellipse shaded with cross-hatching leading from
NorthWest to SouthEast. In which direction is the main diagonal of
the ellipse?

Oh, you don't know?

Let's try again.

Just think of a cross-hatched ellipse with its main diagonal from
SouthWest to NorthEast. In which direction is the cross-hatching?

Oh, you still don't know?

Well, then....let's think. There must be SOME way to determine the
slope of the mid-points of these cross-hatch lines if we know their
individual slopes, musn't there? Or is it that there must be a way
to find their individual slopes if we know their mid-points?

Well, if there is a way to do either of those things, I don't know it.

How are these problems different from trying to find a relation between
the slopes of individual data and the slope of a line through individual
averages?

I find it rather bitterly amusing that there should be a prolonged
argument over the proposition that the failure of an ellipse axis to
line up with the direction of cross-hatching should show that it is
improper to use group data as a guide to properties of individuals.

It doesn't say much for the ratio of rhetoric to thought in the group.

Sorry.

Martin

[From Richard Kennaway (971123.1000 GMT)]

Martin Taylor 971122 21:40:

It's the pairing of these last two that leads us into all the quagmires.
Suppose the average over individuals is 3, and the distribution is
Gaussian with a standard deviation of 300. You are not going to do much
better than a coin toss if you want to say Mr. X will have a positive
value. But if the standard deviation is .003, you'd be pretty stupid to
pass up a bet with someone who offered to bet you that Ms Y had a negative
value (unless he knows Ms Y!-)

But this is still a proposition about the group: the proposition that
nearly all of them have a positive value for this attribute. As you
point out, whether the bet is in my favour depends on whether the person
offering it knows Ms Y's individual value. The fact that if the bet is
made about a randomly selected individual, I'll likely win, is not a
proposition about the selected individual, it is a proposition about the
group. Ms Y does not have a probability of having a negative value; she
simply has whatever value she has. If I measure her value, other
people's values are irrelevant. If I do not, I'm just guessing.

Martin Taylor 971122 21:50:

I find it rather bitterly amusing that there should be a prolonged
argument over the proposition that the failure of an ellipse axis to
line up with the direction of cross-hatching should show that it is
improper to use group data as a guide to properties of individuals.

Indeed. Quite clearly, it shows exactly that.

-- Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
   School of Information Systems, Univ. of East Anglia, Norwich, U.K.