Science or mush

[From Dag Forssell (970929 23.15)]

Rick Marken (970928.0900) entertained us with a list of reasons PCT has
such a hard time. I continue to think that one very basic reason is that
there are very few people around (in the life sciences) who can tell a
descriptive "explanation" with almost zero utility from an in-depth working
model type of explanation. This makes the difference between science and
mush. To me, the ongoing debate on CSGnet about reinforcement demonstrates
this very nicely.

Here is an article I just spotted in Business Week - October 6, 1997, that
discusses this issue.

Best, Dag

···

--------------------------
Science & Technology

TESTING

EVERYONE KNOWS E=MC2
NOW WHO CAN EXPLAIN IT?)

A science guru wants students to interpret data, not parrot it

"We're wasting everyone's time. We're saying science is some boring chore
,and the nation really doesn't need it"

BRUCE ALBERTS,
National Academy of Sciences

Bruce Alberts' quest to reform science testing began more than a decade
ago. Teaching biochemistry to medical students at the University of
California at San Francisco, he was appalled to find they "were not really
learning anything." The future doctors easily parroted back biochemical
terms but failed to grasp the concepts. The culprit? Multiple-choice
tests, Alberts charges, similar to those that form the backbone of
America's vast standardized testing industry. Only when he and UCSF
Colleague Diane S. Colby turned to essay questions were they able to boost
students' understanding of biochernistry-and their interest in it. "It
showed the power of tests to shape what students learn and how they study,"
says Alberts.

Now, as president of the National Academy of Sciences, the 59-year- old
Alberts is using his prestigious post to try to end reliance on
standardized multiple choice tests. He's especially taking aim at the
biology SAT II test, taken by 50,000 high schoolers each year. By
emphasizing memorization and word association over conceptual knowledge,
these tests are poor judges of students' abilities, he argues. Worse, the
relentless obsession with scores has had a pernicious effect on education:
Teachers prepare students to be good test-takers but not necessarily good
thinkers. And universities contribute to the vicious cycle by depending on
test scores for admissions decisions.

CRITICAL THINKING. The toll is immense, Alberts believes. Our current
science education system has turned off countless youths to the thrill of
science-and left too many people bereft of the analytical skills needed to,
say, interpret claims of global warming or to exercise the kind of critical
thinking that is valuable in many aspects of life-such as analyzing
financial data. "We're wasting everyone's time," Alberts warns. "We're
saying science is some boring chore and the nation really doesn't need it."

Alberts is one of a growing number of critics of widespread standardized
testing. For years, opponents have argued that the exams don't actually
measure anything important to students' eventual success, either in
academia or later life. Now, there's increasing evidence to back them up. A
study published in June by Wendy M. Williams, professor of human
development at Cornell University, and Yale University psychologist Robert
J. Sternberg, compared students' scores on the psychology Graduate Record
Examination (GRE) to the students' later accomplishments. With one small
exception, they found no link between scores and any measure of
performance, except for firstyear grades. And even that correlation was
weak. "It's obvious that the tests are not corresponding to real
performance," concludes Williams. (The exception was a weak link between a
part of the GRE designed to test analytical reasoning and the quality of
dissertations in male students only.)

Meanwhile, physicists have been raising similar concerns. "The GRE physics
subject test may do more harm than good," says Harvard University physicist
Howard Georgi. Many top graduate programs rarely accept anyone whose score
falls below a certain level. That slams the door on students who might be
better scientists than some who ace the tests. For instance, among his own
graduate students, "I've observed that women did surprisingly poorly on the
GRE considering what good physicists they are," Georgi says. When he asked
one outstanding student why her score was so low, "she told me that the
physics GRE was simply too 'nerdy' to be taken seriously by an intelligent
woman," he recalls.

Promising physicists and psychologists aren't the only ones hurt by the
standardized gatekeepers. Barred by its board of regents from considering
race or gender in admissions decisions, the University of California system
now faces dramatic declines in black and Hispanic students. Of 196
African-American students who applied to the University of California at
San Diego medical school, for example, not one was admitted. The reason:
Test scores were used as a key criterion. That's a serious misuse of the
exams, critics say. "If the tests don't predict anything important, then
why are we using them?" asks Cornell's Williams.

What's ironic is that standardized testing once played a very different
role. The field sprang out of a need during World War I to quickly evaluate
huge numbers of potential soldiers. "For rapid, crude selection, these
tests are the best you can do," explains Paul Black, professor emeritus of
science education at King's College London. According to the critics,
that's precisely the problem: the tests aren't good for much beyond crude
screening. Later, the tests helped students from public high schools open
the doors to snobbish private universities. "Originally, people looked on
the tests as an opportunity rather than as a gatekeeper," explains Hessy
Taft, senior examiner for Educational Testing Service Inc. (ETS), which
develops the SAT and other tests.

OPPORTUNITIES? But now, the whole educational establishment has come to
rely on the tests. Scores even affect property values, as affluent parents
choose homes in school districts with high test scores. "It's almost a
crime the way our society is fixated on scores," says Yale's Sternberg.
People increasingly look at the tests as gatekeepers, not opportunities.
That's a fact that no one but admissions officers really likes. "We use the
tests in the wrong way," says Harvard's Georgi. Instead, he envisions a
test that's pretty easy, so every competent person would do well, but those
who really couldn't handle graduate school would be identified. In other
words, return to the idea of a very crude selection. Even ETS cautions that
scores predict little more than first-year grades. "No rational person
would tell you that you should base admission only on test scores," says
ETS' Taft. And yet the reality is that many admissions committees for elite
universities-facing many more qualified candidates than they can handle-do
just that.

That's why Bruce Alberts has been pushing for change on two fronts. One is
pressuring universities to place less reliance on the tests. Alberts' dream
is for top universities to tell high school students not to take the
biology SAT exam at all. The universities would rely instead on criteria
such as hands-on experience doing science.

Some university presidents, including Stanford University's Gerhard Casper,
have been supportive of the idea. But change has been slow. Alberts hopes
to force the issue during the annual meeting of the Association of American
Universities in mid-October. His effort may get a boost from a recent
University of California task force. Worried about the huge drop in
minority admissions, the task force recommended that the SAT and other
standardized tests be dropped as requirements.

At the same time, Alberts has been toiling on another front-trying to make
changes in the actual biology SAT Here, with the help of like-minded
educators at ETS and the College Board, which oversees the exam, he has
helped bring about modest progress. His chief ally is Indiana University
biology professor J. Jos� Bonner. Like Alberts, Bonner decries the way
science teaching-and testing-has come to focus on facts and conclusions
rather than the process of science.

ROOTING OUT ROTE With Alberts pushing from the outside and Bonner and ETS'S
Taft from the inside, ETS has made two changes to the biology SAT that will
take effect this fall. One is fewer questions (80 instead of 95) with a
higher percentage that ask students to interpret data rather than to
regurgitate facts (30% instead of 20% to 25%). "By increasing the part
that's reasoning-oriented, well give teachers the freedom to back off
requiring so much memorization," Bonner explains. The other change is
offering students a choice of two versions of the exam, one concentrating
on ecology and one focusing on molecular biology. That way, students won't
have to learn so many facts.

For the testing industry, these changes "were so dramatic they almost
didn't go through," says Bonner. The major hurdle: ETS statisticians who
worried that they wouldn't be able to ensure that, say, a 600 score meant
the same thing year after year.

But to Alberts, the changes are frustratingly small and slow in coming. Why
care if a 600 score always means the same thing if it doesn't measure
anything important, he asks. It's far better to substitute essay tests,
which tap students' creative juices-and encourage teachers to teach real
problem-solving skills. Such tests already exist. Britain, for instance,
firmly resisted jumping on the multiple-choice bandwagon. Here at home, ETS
has added an essay section to the Advanced Placement tests. At Cornell,
Williams is producing a new psychology test she hopes could replace the
psychology GRE. And in Alberts' old biochemistry course at UCSF, the tests
he and Colby developed have even had a remarkable effect: Some students
report that the exams' problem-solving puzzles stimulate them to want to
learn more biochemistry-instead of viewing the course as yet another dull
hurdle on their way to their M.D.s. That's the kind of testing, says
Alberts, that can lift education out of its doldrums and "bring both
science and business what they need: people who think."

By John Carey in Washington
------------------------------------------------------------------
[Box]
THE GREAT TESTING DEBATE

Critics such as Bruce Alberts have taken aim at standardized tests,
especially in the sciences. Here are the main arguments for and against.

CRITICS DEFENDERS

Tests don't predict anything May be true, but it's useful for
about future success except for colleges to know who will do well
first-year grades. in the first year.

Because they are poor predictors, scores shouldn't be the sole
tests shouldn't be used as the criteria, but they are useful.
basis for admission to college Besides, there's nothing better
or graduate school. -grades and recommendations
                                  aren't enough.

Tests are turning thousands Tests are being improved to
off to science because curricula emphasize understanding
aimed at test preparation of scientific concepts
emphasize rote memorization, rather than memorization and
neglecting concepts and the word association.
thrill of discovery.

An alternative exists in the Such tests would be too expen-
form of essay-type tests sive to grade, and comparing
more like those used in Britain. scores from year to year would
                                  be too difficult.
------------------------------------------------------------------

[From Dag Forssell (951128 1420)]

[Bruce Abbott (9527.1650 EST)]

   And you were beginning to think I'd never reply to this one! (;->

And you probably have concluded I am not going to respond to you
either.

Since I have not posted for a month, a short recap of my assignment
may be in order:

[Bruce Abbott (951030.1715 EST)]

   At this point, Dag, I will stop to ask you what you think

     about what I've said so far in this little exposition of
     reinforcement theory. Do you think that Thorndike's approach
     was a sensible one? Was it, in your opinion, science or mush?

   Explain.

[Dag Forssell (951031 0945)]

   I am glad you have joined my challenge to distinguish science

     from mush. I have come to think that this is a very important
     issue, one that the overwhelming majority of people never give
     any thought, and one that is at the root of many fruitless
     discussions on CSG-L.

Bill P., Rick M. Fransisco A. and others joined the open-book exam.

Significant to me, you then stated:

[Bruce Abbott 95-10-31 17:27:25 EST]

   I take Thorndike seriously, but perhaps not in the way you

     imagine. I offer Thorndike's experiment and his analysis in
     order to bring into focus what sorts of activity you (and I)
     believe qualify as legitimate science. The fact that
     Thorndike's analysis may have "glaring errors" (from our
     perspective nearly 100 years later) is another issue. I take
     Thorndike seriously, not because I think his analysis was
     correct, but because I think his experiment and its analysis
     point to unresolved difficulties for HPCT. But this, again, is
     another issue.

I felt that you pulled the rug out from under my assignment even
before I began to address it. Both Bill and Rick provided detailed
critiques, however.

I am very pleased to see that an extensive thread has unfolded from
my challenge, while I have attended not one but, on successive
weekends, a total of three weddings and otherwise kept very busy in
my new, regular life (not spending full time on PCT, but developing
a paying career before I get back to PCT full time in the future).

I sent you my book on PCT. You may have noted that I recognize
that the words theory and science have such vastly different
meaning to different people, that we might as well consider any
infant to be a scientist. After all, the infant is very busy
creating an understanding of the world in the developing mind. We
are all scientists. A review of my book would be welcome.

You have apparently felt that I, and other PCTers with me, slight
scientists of centuries past by labeling their theories mush. I
will readily grant you that Aristotle was the greatest scientist
that ever lived (If you count longevity of acceptance and
reverence), but does that mean that we consider his ideas science
today? Misleading mush is more like it.

By focusing your assignment on whether Thorndike's approach was
scientific in the context of his time, you pull the whole
discussion in a direction I did not expect. You have since been
granted that Thorndike did his best. Is this the essence of your
argument that we should learn from existing EAB research? They
have done their best! Why not learn from Aristotle and Alchemists
with the same sincere respect? I am not respectful of Aristotle--
he never tested anything. I *am* respectful of Alchemists. They
were able to make things! But their explanations were mush, and we
have thrown out their theories as essentially useless for
contemporary science.

I am also respectful of wise clinical psychologists, but
psychological theories are mush. They continue to be so, long
after Thorndike. There is no excuse for this, but we must respect
these as serious theories anyway? I will grant you that clinical
psychologists are scientists, but not in the modern, physical sense
of the word that I have advocated (I have repeatedly suggested that
PCT is a physical science). Conventional psychologists are no more
and no less scientific than the wise men who wrote the Old
Testament, the Tora, the book of Tao, Buddha, Confucius etc. The
discussion of what is science or a scientific approach came down to
the observation that it is a matter of private opinion. These
concepts are obviously carefully controlled, subjectively held
systems concepts. Every individual develops and holds their own
systems concepts.

Without a concept of a mechanism that holds up when tested, you
cannot build a science that "works" to a high degree, whether in
chemistry, astronomy, physics or psychology. This has nothing to
do with respect for the individual.

In your response to Bill P.'s critique of Thorndike you say:

[Bruce Abbott (951101.1535 EST)]

   This misstates the case. Probability is inferred from

     relative frequency (the observations), but the probability of
     a behavior is assumed to be a function of some as-yet-unknown
     mechanism in the brain, operating in conjunction with current
     perceptual inputs. The probability itself does not cause
     anything.

Your appeal to and willingness to take seriously "some as-yet-
unknown mechanism" is a major reason for the mushiness of
reinforcement theory. This kind of reasoning is not falsifiable.
This is why discussions around it become fruitless.

Shannon Williams (951101) (2 posts) made lucid observations,
further illustrating the mushiness of reinforcement theory:

   S-R describes a correlation between environment and behavior.

     But correlation is not cause. If you cannot visualize a
     causal mechanism, then you delude yourself if you think you
     hypothesize about cause. What is worse is: a hypothesis that
     does not have a causal mechanism is not subject to error. It
     is infallible. And progress stops because if you cannot see
     where a hypothesis needs improvement, what would cause you to
     improve it? And what would cause you to turn away from it?

Debate continued. You explained Thorndike's observations:
[Bruce Abbott (951102.1115 EST)]

   QUESTION 2

   During this ongoing activity, the door suddenly opened,

     presenting the cat an obvious escape route. This immediately
     satisfied the reference for the "find-a-way-out" system and
     thus reestablished control by the "minimize distance to food"
     system.

To talk about a "find-a-way-out" system and a "minimize distance to
food" system strikes me as a version of conventional talk about
behavior. I read: "find-a-way-out" behavior and "minimize distance
to food" behavior. This is a mushy way to discuss PCT, and this
kind of talk is what makes conventional psychological theory mush.

With PCT and especially HPCT we recognize that there is no such
thing as a "find-a-way-out" system or "minimize distance to food"
system. You are guessing about reference signals in a hierarchy
and doing a poor job of it. I don't think I have ever had a
reference for "minimum distance to food". Have you? (I did not
smear Christine's face with wedding cake).

   The classic answer is that the cat is FORMING AN ASSOCIATION.

     That is, it is beginning to relate the perceptions that were
     active at the time the door opened to the opening of the door.

   Why to the opening of the door? Because that event was at the

     time very important to the cat; it eliminated error in one of
     the cat's perceptual control systems at a time when its usual
     actions were proving incapable of doing so.

   However, there is a problem confronting the cat, and that is

     to identify which if any of its perceptions correlates with
     the opening of the door--the "assignment of credit" problem.
     It could have been a coincidence. It might have been produced
     by something the cat DID (i.e., cause-effect). If the latter,
     what was the critical element? What perception arising from
     the cat's ongoing behavior must be recreated? The perception
     of moving from left to right across the box? The perception
     of the pole pressing against the skin of the neck? Something
     else? If the cat lacks the capacity to reason it out, to
     develop and test hypotheses, what other mechanism might lead
     to a workable solution?

   My guess would be some kind of "coincidence detector." Back

     in the box for Trial 2, the cat's coincidence detector does
     not yet have enough input to narrow down the possibilities.
     So the cat goes back to its usual activities, under the
     control of those systems previously described, and probably
     others. With time, the cat's activities carry it across the
     pole for the second time, and again the door opens. The
     coincidence detector is starting to develop an association
     (correlation) between the perception of "doing _this_ as
     opposed to _that_, that this particular goal was being pursued
     (e.g., perceive rubbing against the pole) and not some other.
     So the cat attempts to repeat what it was doing at the time
     the door opened. It rubs against the pole, but not hard enough
     to spring the latch. No coincidence. A little later, the cat
     tries again, and the door opens. What was different? More
     pressure against the pole? A different angle of approach?
     The coincidence detector adjusts its parameters. Ultimately,
     what is being adjusted is a set of reference levels having to
     do with approaching the pole from a given direction and making
     contact with it, with at least a certain amount of pressure,
     along a particular surface of the body. Approach from another
     direction, contact with a different part of the body, would
     also work, but the cat's mechanism does not "know" that. So
     the now efficiently-escaping cat is observed to repeat a
     highly stereotyped movement against the pole on each trial.

   Hans Blom (951101) has already proposed a mechanism similar to

     the one I have just described:

Bruce, I cannot see that you have proposed any kind of mechanism in
your description above. At best, you have painted an implausible
flow chart of words and images. It cannot be tested. It is
subject to different interpretation by every reader. The
"mechanism" you describe above is mush all the way through. And it
is mush coming from Bruce Abbott TODAY, not a century ago.

The next day Bill P. commented on the same post:

[Bill Powers (951102.1420 MST)]

   I think your discussion, EXPLAINING THORNDIKE'S OBSERVATIONS,

     is reasonable and well-ordered. I wonder, however, if you
     recognize just what an enormous mouthful you are biting off.
. . .

   . . I think this is getting me close to my real objections. I

     think that in constructing his "simple" experiment Thorndike
     actually set up an enormously complex situation which seemed
     simple only because Thorndike chose to look at it in a simple
     way.

My personal view of this is that psychologists study enormously
complex phenomena. They appear not to be interested in a simple
thing like how a person can bend a single finger at will. That is
uninteresting "finger-bending behavior", taken for granted as
magic, explanation not expected. Hardly the stuff grants are made
of. Yet, a PCTer need only to stretch and bend a single finger to
reassure him or her self that PCT offers a sound explanation for
life and our experience.

By theorizing about apparent complex phenomena, psychologists have
constructed systems of explanations that I visualize as the upper
stories of high rise houses of cards -- without any foundation,
first or second story. They hang in thin air. Many different
versions exist, competing with each other, none connected to terra
firma.

Psychologists say that this is the best that can be done, and are
firmly convinced that when a physical foundation is eventually
found, it must necessarily connect with their research (since it is
scientifically sound) and validate it retroactively. Psychology
will then take a leap forward, joining the physical sciences, and
all the efforts of pioneering EAB psychologists will fit like
pieces in the then-completed puzzle.

PCT lays a physical foundation for a new psychology, but it turns
out that this foundation is located in the next county, and does
not connect with or support existing ideas. PCT demonstrates
clearly that the phenomena that are observed and explained by
contemporary ideas are illusions.

All year, I have not seen you agree that reinforcement is an
illusion. (The most recent comment: [Bill Powers (951123.0700)].
You keep talking about alternative explanations, as if they somehow
have equal validity.)

The houses of cards that have been built so elaborate without
foundation and hanging in thin air are bound to collapse. They are
mush.

Bill P. ended his post:

   The points you have brought up are points that Thorndike never

     considered, yet they are things that obviously have to be
     established before we can say ANYTHING scientific.

You were terribly upset by Bill's post. Was it because the obvious
implication of his statement was that Thorndike's EAB approach was
mush, and by implication that EAB is mush? That's what I suspect.
But Bill has told you dozens of times in the past year that
reinforcement theory is mush, based as it is on the study of an
illusion.

Discussing explanations with Shannon Williams you say:
[Bruce Abbott (951106.1055 EST)]

   Now I know what you are calling an explanation, although I do

     not agree that there is only one type of explanation. You are
     looking for a mechanistic explanation, as opposed to, say, a
     functional one.

I find this mushy, too. Functions never exist without a mechanism
that makes them appear, not in a physical universe. A mechanistic
explanation and a functional explanation are necessarily the same
thing. The mechanism is what physical science explains and what
PCT explains. You have indulged several times in painting a wordy
flow chart of "functions", none of them supported by any plausible
mechanism. This does not build a science that can stand the test
of time.

Discussions of Newton and Copernicus have been most interesting.
The analogy falls down, however. Tycho Brahe made observations of
angles, times, and other physical quantities. He did not interpret
his findings, thus distorting them with his preconceived ideas.
Johannes Kepler studied the data and observed that if the heavenly
bodies moved in ellipses (a mechanism), observations would fit the
data. Modern physical science recognizes the validity of Tyho's
observations and Kepler's analysis. Newton built on this.

By comparison, Thorndike made a large number of wild, unspoken,
subjective assumptions flavoring his verbal descriptions of his
"data". Analysis that follows from this is way off base.
Scientists have built on Thorndike and his guesswork analysis,
creating an ever more elaborate structure of guesses. Contemporary
EAB apparently suffers the same disease. Remember your admiration
for the EAB analysis of the goose rolling eggs, contrasted with
Bill P.'s posts from 1991? We cannot build on the ANALYSIS of EAB
research as you have claimed, because most of it is guesswork of
poor quality. Not all is poor however, as we have conceded you
already. The continuing discussion has recently turned to IV-DV.
I shall pull a post on IV-DV from the archive on and post it
immediately below.

The thread goes on and on. I shall end this post soon.
Debating Rick you lament:
[Bruce Abbott (951110.1250 EST)]

  Could you describe this moderate position, please?

   Well, I've been trying to for about a year now. As your

     question so amply illustrates, it's been a waste of my time.

I don't think your participation on CSG-L has been a waste of time.
Seems to me that you have reconsidered a large number of your own
convictions already, sharply limiting your claims of the validity
of reinforcement theory and other accepted truths. You have drawn
out the best in Bill P. over and over again. PCTers and lurkers
have learned more about reinforcement theory, PCT and the arguments
on both sides. Seems to me that you have reorganized your firmly
held principles and systems concepts a great deal in the past year,
but that there is more to go before you become a genuine PCT
scientist, free from beliefs in and co-dependence on the mush of
reinforcement theory :-).

I'll end by going back to the beginning:

[Bruce Abbott 95-10-31 17:27:25 EST]

  What I have wanted to challenge are some of your systems

     concepts, so carefully constructed by your mind over a long
     time, still resisting disturbances.

   Fair enough. That is precisely what I have been doing with

     regard to certain systems concepts of yours.

Now, what systems concepts of mine did you mean to challenge?

Best, Dag

···

-------------------------------

Here is the PCTDOCS archive post on IV-DV I promised above.

-------------------------------

STUDY_IV.DV
Independent Variable - Dependent Variable

Unedited posts from archives of CSG-L (see INTROCSG.NET):

Date: Wed Apr 28, 1993 6:20 am PST
[From Bill Powers (930428.0700)]

General, on IV-DV:

IV = Independent Variable; DV = Dependent Variable

The term "IV-DV" threatens to degenerate on this net into a
stereotype of an approach to human behavior. All that this phrase
means is that one variable is taken to depend on another and the
degree and form of the dependence is investigated experimentally.
This is a perfectly respectable scientific procedure. Want to know
how the concentration of salt affects the boiling point of water?
Keep the atmospheric pressure constant, carefully vary the salt
content, and carefully measure the boiling point. You can find
relationships like this throughout the Handbook of Chemistry and
Physics, and so far nobody has suggested anything methodologically
wrong with these tables and formulae.

If we're going to object to a procedure for investigating behavior,
let's not indulge in synecdoche, but say exactly what it is about
the method to which we object. There can be no valid objection to
the IV-DV approach itself.

The basic problem with the IV-DV approach as used in the bulk of
the behavioral sciences is that it is badly used; that bad or
inconclusive measures of IV-DV relationships are not discarded, but
are published. The basic valid approach has been turned into a
cookbook procedure that substitutes crank-turning for analysis,
thought, and modeling. The standards for acceptance of an apparent
IV-DV relationship have been lowered to the point where practically
anything that affects anything else, by however indirect and
unreliable a path, for however small a proportion of the
population, under however ill-defined a set of circumstances, is
taken as a real measure of something important, and is thenceforth
spoken of as if it were just as reliable a relationship as the
dependence of the boiling point of water on the amount of dissolved
salt.

While I was in Boulder, I spent some time in the library looking
through a few journals. By chance, I looked first through two
issues of the 1993 volume (29) of the Journal of Experimental
Social Psychology. With few exceptions, the articles were of the
form "the effect of A on B." One article went further: the title
was "Directional questions direct self-conceptions."

All of the articles rested on some kind of ANOVA, primarily
F-tests, and the justification for the conclusions was cited, for
example, as "F(1,82) = 7.88, p < 0.01." No individual data were
given; it was impossible to tell how many subjects behaved contrary
to the hypothesis or showed no effect. There was no indication,
ever, that the conclusion was not true of all homo sapiens.

I suppose that a person who understood F-tests (how about some
help, Gary) might be able to deduce the number of people in such
studies who didn't show the effect cited as universal. Even I could
see, in some cases, that there had to be numerous exceptions. For
example, paraphrasing,

Subjects covertly primed rated John less positively (M = 21.32)
than subjects not primed (M=22.78). Ratings were significantly
correlated with the independent "priming" variable: r(118) = 0.35,
p < 0.001.

[Skowronski, J.J.; Explicit vs. implicit impression formation: The
differing effects of overt labeling and covert priming on memory
and impressions." J. Exp. Soc. Psychol _29_, 17-41 (1993)

When means differ by only 1.46 parts out of 22, it's clear that
many of the 120 students must have violated the generalization, so
this conclusion would be true of something close to half of the
students. The coefficient of uselessness is 0.94, showing the same
thing. The authors are teasing a small effect out of an almost
equal number of examples and counterexamples. In another study,
"When warning succeeds ... " a rating scale ran from -5 to 5, and
the mean self-ratings for one case were 0.89 and in the other
-0.92. A large number of the subjects must have given ratings in
the opposite order from the one finally reported.

So what we're talking about here is not a bad methodology, but bad
science based on equivocal findings.

The IV-DV approach is not incompatible with a model-based approach
or with obtaining highly reliable data. In the Journal of
Experimental Psychology - General, I found a gem by Mary Kay
Stevenson, "Decision- making with long-term consequences: temporal
discounting for single and multiple outcomes in the future" (JEP-
General, _122_ #1, 3-22 (1993). Mary Kay Stevenson, 1364 Dept. of
Psychology, Psychological sciences building, Purdue University, W.
Lafayette, Indiana 47907-1364

This paper used old stand-bys like questionnaires and rating
scales, but it had some rationale in the observation that during
conditioning, delaying a consequence of a behavior lowers the
strength of the conditioning. It also freely postulated a thinking
organism making judgements -- this was actually an experiment with
high-level perceptions. Moreover, there was a systematic model
behind the analysis, and an attempt to fit an analytical form to
the data rather than just do a standard ANOVA.

Furthermore -- oh, unheard-of procedure -- Ms. Stevenson actually
replicated the experiment with 5 randomly-selected individuals,
fitting the model to each individual's data and verifying that the
curve for each one was concave in the right direction.

The mathematical model predicted between 97 and 99 percent of the
variance in the data.

I didn't have time to read the article carefully, but it certainly
seemed to show that high standards were applied and that an IV-DV
approach can yield data that anyone would call scientific. All
that's required is that one think like a scientist. A LOT of work
went into this paper. If only papers in psychology done to this
standard were published, all the different JEPs would fit into a
single issue.

In JEP-Human Perception and Performance, there was a good
control-theory experiment:

Viviani, P. and Stucchi, N. Behavioral movements look uniform:
evidence of perceptual-motor interactions (JEP-HPP _18_ #3, 603-
623 (August 1992).

Here the authors presented subjects with spots of light moving in
ellipses and "scribbles" on a PC screen, and had them press the ">"
or "<" key to make the motion look uniform (as many trials as
needed). The key altered an exponent in a theoretical expression
used to relate tangential velocity to radius of curvature in the
model. The correlation of the formula with an exponent of 2/3 (used
as a generative model) with the subjects' adjustments of the
exponent was 0.896, slope = 0.336, intercept 0.090.

This is just the kind of experiment a PCTer would do to explore
hypotheses about what a subject is perceiving. By giving the
subject control over the perception in a specified dimension, the
experiment allows the subject to bring the perception to a
specified state -- here, uniformity of motion -- and thus reveals
a possible controlled variable (at the "transition" level?). The
authors didn't explain what they were doing in that way, but this
is clearly a good PCT experiment. Even the correlation was
respectable, if not outstanding (the formula was rather arbitrary,
so it should be possible to improve the correlation considerably by
looking carefully at the way the formula misrepresented the data).

There is a world of difference between the kinds of experiments
reported in J Exp. Soc. Psych and the two described above (and
between the two described above and most of the others in JEP).

From good experiments, even if one doesn't buy the interpretation,

one can go on to better experiments. From bad experiments there is
no place to go: you say "Oh" and go on to something completely
different.

Best, Bill P.
-------------------------------------

End archive file

[Martin Taylor 951129 12:00]

Dag Forssell (951128 1420) to Bruce Abbott

Dag, according to your analysis, ALL science is potential mush. It becomes
mush as soon as some future "scientist" discovers a better way of looking
at a phenomenon. The standards of the time when the "scientist" lives
are irrelevant to whether what he/she was doing is science. You accord
the honour of the label "scientist" to practitioners of the past while
denying that what they did was science.

PCT is based on the notions that perception is NOW, whether the perception
is of what is going on now, of a remembered past, or of an imagined time
or place. According to that notion, you are saying that there can be
no perception of something that can legitimately be called "science,"
because it depends on the perceptions of people yet unborn, and will
continue to do so until humanity and the future species toward which
humanity may evolve have all died.

If there's no possibility of anything we call "science" being reliably
not "mush", then why bother with the label? Let's just call what we
do "mush-looking."

As you may guess, I don't agree at all with your view that what constitutes
"science" changes with future history. "Science" to me is the application
of the best approach known at the time to the acquisition of knowledge
about how the universe works. My rusty high-school Latin suggests to me
(quite probably wrongly) a verb "Sciere--to know."

There was a time when the best approach to knowing was thought to be
through the revelations of seers. Some people still believe in that
approach, but we don't call it science because we think there are
better ways. The content of science is scientific knowledge.

The knowledge may change over time, and what was "known" may later be shown
to be wrong; even the best methods may be improved upon later. But that
makes what was done in the past no less science than what is done now
according to the standards of today. Our science, even PCT, is not
guaranteed to be the "forever standard" against which all past and future
approaches to knowledge should be judged.

···

--------------

Without a concept of a mechanism that holds up when tested, you
cannot build a science that "works" to a high degree, whether in
chemistry, astronomy, physics or psychology. This has nothing to
do with respect for the individual.

A "mechanism" is only a reference to a functional description in some
other domain of knowledge, usually one that has a wider range of application
than the domain for which you use the mechanism. At base, all mechanisms
rest on the unknown. The benefit of a mechanism is not that it explains
anything "really", but that when the mechanism is provided with a very
small set of boundary conditions, it produces a large set of results that
correspond satisfactorily with obsrvations. A good curve fit does as
much, but is devalued because it applies only within the specific data
set, whereas the mechanism applies to other sets of observations of "the
same kind." When you get down to the basic unknown, all "mechanism" becomes
curve-fitting, but curve fitting that applies under the widest range of
circumstances, such as "all interactions involving material objects and
electromagnetic effects."

All in all, I think your message to Bruce constitutes largely mush, insofar
as it refers to the work of people doing their best by the standards of
their times. Where you refer to the work of people presently acting as
"scientists" I make no judgement (in this posting).

Martin

[From Dag Forssell (951201 1330)]

[Martin Taylor 951129 12:00]

   All in all, I think your message to Bruce constitutes largely

     mush, insofar as it refers to the work of people doing their
     best by the standards of their times.

Yes, this does become a matter of interpretation and personal
opinion. I did not introduce the historical perspective to this
discussion, but it is helpful to apply a historical perspective.
I have found Thomas S. Kuhn's book _The Structure of Scientific
Revolutions_ most interesting. I have posted in the past on
articles I found in _American Psychologist_ that show how social
scientists quote Kuhn without taking his message to heart. [See
SCIENCE.PSY on the PCTDOCS disk]. The implications for
contemporary psychological theories are just too damning, I
suppose. I am comfortable calling anyone who is doing their best
a scientist, including a toddler who is in fact doing a magnificent
job of sorting out and making sense of their world. But since
anyone and everyone who wants to be called scientist is one, we
develop other distinctions. Hence the popular terms "hard" and
"soft" science, where hard means science that works and soft means
science that "seems" to work -- sometimes.

   Where you refer to the work of people presently acting as

     "scientists" I make no judgement (in this posting).

When we discussed the difference between knowledge and belief last
spring, Bill P. made the distinction between SUPPORTABILITY and
ACCEPTANCE, as the only difference. Skepticism is required at the
initial consideration of ideas or stories, so that only supportable
ideas become beliefs. Without skepticism, any idea or story, no
matter how weak or non-existent its support, can be accepted as
belief.

Mary P. suggested to me recently that she has begun to think of the
"Systems Concept" level as "Belief" level. I think this has merit.
We all develop beliefs about all kinds of subjects and aspects of
life. Clusters of beliefs might be called identity, and we can
have several of those. Personally, I am a man, husband, father,
engineer, PCTer, atheist, Swede, translator, educator, semi-
vegetarian, etc. Our many beliefs make up reference signals which
determine how we live our lives.

Physical scientists BELIEVE in Newton's laws of motion, in
Einstein's theory of relativity, in Ohm's law, in chemical bonds
etc. They have (in most cases) personally replicated the basic
experiments so they know what the phenomena dealt with by these
laws are, and they know that the laws predict correctly with
99.99999999% correlation and better -- to the limit of measuring
instruments. This kind of belief is therefore well supported.
This kind of belief when applied to projects has allowed us to
travel to the moon and beyond. It "works". Physical scientists
have learned to expect this kind of dependability. They are
skeptical of anything less. I think everyone ought to be, but only
a minuscule portion of our population appears to develop this kind
of expectation and corresponding skepticism.

Other "scientists" BELIEVE in things that are not well supported at
all, where correlations in experiments are so weak as to be almost
meaningless, where applications work sometimes or not at all, and
the subjects of discussion can easily be shown (by applying
physical science, PCT style) to be products of fertile, unchecked
imagination. But young scientists BELIEVE just as strongly once
they have ACCEPTED what they have been taught, even if it is
lacking in support. When "scientists" do not expect precise
predictions and do not reject suggested "explanations" which fail
to predict with high accuracy, you get what I have called mush and
the stage is set for additional, future scientific revolutions. On
CSG-L we BELIEVE (well supported, we think and are prepared to
demonstrate) that PCT lays the foundation for a new, lasting order
in the social sciences, but as Kuhn shows, and we experience, this
new movement of supportable BELIEF is up against existing, firmly
entrenched but unsupported BELIEF.

As we have discussed over and over on CSG-L, the standards set in
the social sciences TODAY are pitifully low. Any confusing "verbal
flowchart/functional" "explanation" goes, no matter how
implausible, and is readily taken seriously and BELIEVED by minds
that have not learned to be skeptical.

The low standards and lack of healthy skepticism exhibited by
"scientists" in the life sciences doom them to be merchants of
mush, not what ought to and can be solid science TODAY.

The standards we strive for on CSG-L are the high standards of
physical science TODAY. Anything less invites and perpetuates
mush.

Best, Dag