p-hacking

Phil would have appreciated this. You will too, Rick.

http://www.chronicle.com/article/Spoiled-Science/239529

I agree; Phil would have appreciated this and probably applauded and said, “Finally!�

···

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Phil would have appreciated this. You will too, Rick.

http://www.chronicle.com/article/Spoiled-Science/239529

[From Bruce Abbott (2017.03.22.1015 EST)]
Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting netsâ€? described in his Casting Nets and Testing Specimens.

Concern about so-called p-hacking is not new (although the catchy new label for it appeared relatively recently). However, renewed attention to the issue apparently was kicked off by this article in 2011:

https://www.ncbi.nlm.nih.gov/pubmed/22006061

Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.â€?

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. In “p-hackingâ€? the term refers to massaging the data in various ways until one obtains a statistically significant result, one in which the p-value (the probability of obtaining an effect at least as large as the one observed if chance alone is responsible for that difference) is less than some criterion value, typically p < .05 or 1 chance in 20. P-hacking involves such practices as conducting multiple tests on the same data set, which can result in “probability pyramidingâ€? – gettting at least one significant difference by chance, which is much more likely than the stated level of significance. It can also involve omitting data points that one can find some apparently reasonable excuse for excluding (e.g., outliers), eliminating certain conditions from the analysis that were included in the original design, and others. My favorite example involves running a few subjects, doing the significance test, and repeating until one gets a statistically significant result. I recall reading one two-group study that included only three rats per group. As nobody sets out to run a group-based study with so few subjects, it was obvious to me that the author had conducted significance testing after running only the first three rats in each group, got p < .05, and quit. The power of a test (its ability to detect a real effect of the independent variable) would have been extremely low with such a small samples, so the results almost certainly capitalized on chance.

Bruce A.

···

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Phil would have appreciated this. You will too, Rick.

http://www.chronicle.com/article/Spoiled-Science/239529

[Bruce Nevin 20170322.10:56 ET]

Bruce Abbott (2017.03.22.1015 EST)–

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. Â

Just for fun, here’s a minor and extraneous correction re hacking. Not all who hack into computer systems are ‘black hat’ hackers, and not all hacking involves breaking into computer systems on the internet. I spent many years in the company of computer engineers. Many a time I heard one say admiringly to another “that’s a good hack!” about some innovative or unexpected coding trick.Â

First, from the Random House dictionary:

    6. Slang. to deal or cope with; handle: He can’t hack all this commuting.

    7. Computers. to devise or modify (a computer program), usually skillfully.

Definition 6 gives a hint as to the origin of the computer slang.

From cyberlaws.com:

Computer hacking refers to the practice of modifying or altering computer software and hardware to accomplish a goal that is considered to be outside of the creator’s original objective. Those individuals who engage in computer hacking activities are typically referred to as “hackers.â€?

This fits p-hacking, but even this definition can be read in too narrow a scope.Â

PC Magazine offers a more balanced survey of meanings that gives some hints to the history and evolution of the term atÂ

http://www.pcmag.com/encyclopedia/term/44046/hack

Beyond that, reverse-engineering is a form of hacking.Â

http://hackeracademy.com/lesson/reverse_engineering [and many, many other links for “is reverse engineering hacking”]

Consequently, to the extent that PCT is reverse-engineering of the human nervous system’s interaction with the environment, PCT can be legitimately said to be a hack. Strictly speaking, reverse engineering is a tool and the hack is a subsequent modification or repurposing exploiting understanding gained with the tool.Â

···

On Wed, Mar 22, 2017 at 10:14 AM, Bruce Abbott bbabbott@frontier.com wrote:

[From Bruce Abbott (2017.03.22.1015 EST)]
Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting netsâ€? described in his Casting Nets and Testing Specimens.

Concern about so-called p-hacking is not new (although the catchy new label for it appeared relatively recently). However, renewed attention to the issue apparently was kicked off by this article in 2011:

https://www.ncbi.nlm.nih.gov/pubmed/22006061

Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.â€?

Â

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. In “p-hackingâ€? the term refers to massaging the data in various ways until one obtains a statistically significant result, one in which the p-value (the probability of obtaining an effect at least as large as the one observed if chance alone is responsible for that difference) is less than some criterion value, typically p < .05 or 1 chance in 20. P-hacking involves such practices as conducting multiple tests on the same data set, which can result in “probability pyramidingâ€? – getting at least one significant difference by chance, which is much more likely than the stated level of significance. It can also involve omitting data points that one can find some apparently reasonable excuse for excluding (e.g., outliers), eliminating certain conditions from the analysis that were included in the original design, and others. My favorite example involves running a few subjects, doing the significance test, and repeating until one gets a statistically significant result. I recall reading one two-group study that included only three rats per group. As nobody sets out to run a group-based study with so few subjects, it was obvious to me that the author had conducted significance testing after running only the first three rats in each group, got p < .05, and quit. The power of a test (its ability to detect a real effect of the independent variable) would have been extremely low with such a small samples, so the results almost certainly capitalized on chance.

Bruce A.

Â

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Â

Phil would have appreciated this. You will too, Rick.Â

Â

http://www.chronicle.com/article/Spoiled-Science/239529

[From
Bruce Abbott (2017.03.22.1015 EST)]

        Yes, Phil Runkel would have appreciated this.  Please keep

in mind, however, that Phil was not opposed to the
methodology: It is the “casting nets� described in his * Casting
Nets and Testing Specimens.*

        Concern about so-called p-hacking is not new (although the

catchy new label for it appeared relatively recently).Â
However, renewed attention to the issue apparently was
kicked off by this article in 2011:

        [https://www.ncbi.nlm.nih.gov/pubmed/22006061](https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pubmed_22006061&d=DwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=-dJBNItYEMOLt6aj_KjGi2LMO_Q8QB-ZzxIZIF8DGyQ&m=4csK8k4e9CGJqHlrHYrWFf_2DEENFUgkxVC8cPmyofY&s=XJEo8yD3YcokFjq5zlitk5XOEG9Kzbn5Y_JFjsQRFtA&e=)

        Several decades ago I told my future coauthor on the

research methods book that I thought statistical
significance testing was the worst thing ever to happen to
experimental psychology. I asked him recently if he
remembered that. He said “Yes, and at the time I thought
you were crazy.�

···

From:
Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet)
CSGNET@listserv.illinois.edu
Subject: p-hacking

Â

          Phil would have appreciated this. You

will too, Rick.Â

Â

http://www.chronicle.com/article/Spoiled-Science/239529

Bruce Abbott (2017.03.22.1135 EST)]

BA: Thanks, Bruce; I had forgotten about that use of the term “hacking.â€? Like most slang, its definition is evolving.

Bruce A.

[Bruce Nevin 20170322.10:56 ET] –

Bruce Abbott (2017.03.22.1015 EST)–

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc.

Just for fun, here’s a minor and extraneous correction re hacking. Not all who hack into computer systems are ‘black hat’ hackers, and not all hacking involves breaking into computer systems on the internet. I spent many years in the company of computer engineers. Many a time I heard one say admiringly to another “that’s a good hack!” about some innovative or unexpected coding trick.

First, from the Random House dictionary:

  1. Slang. to deal or cope with; handle: He can’t hack all this commuting.

  2. Computers. to devise or modify (a computer program), usually skillfully.

Definition 6 gives a hint as to the origin of the computer slang.

From cyberlaws.com:

Computer hacking refers to the practice of modifying or altering computer software and hardware to accomplish a goal that is considered to be outside of the creator’s original objective. Those individuals who engage in computer hacking activities are typically referred to as “hackers.â€?

This fits p-hacking, but even this definition can be read in too narrow a scope.

PC Magazine offers a more balanced survey of meanings that gives some hints to the history and evolution of the term at

http://www.pcmag.com/encyclopedia/term/44046/hack

Beyond that, reverse-engineering is a form of hacking.

http://hackeracademy.com/lesson/reverse_engineering [and many, many other links for “is reverse engineering hacking”]

Consequently, to the extent that PCT is reverse-engineering of the human nervous system’s interaction with the environment, PCT can be legitimately said to be a hack. Strictly speaking, reverse engineering is a tool and the hack is a subsequent modification or repurposing exploiting understanding gained with the tool.

···

/Bruce

On Wed, Mar 22, 2017 at 10:14 AM, Bruce Abbott bbabbott@frontier.com wrote:

[From Bruce Abbott (2017.03.22.1015 EST)]
Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting netsâ€? described in his Casting Nets and Testing Specimens.

Concern about so-called p-hacking is not new (although the catchy new label for it appeared relatively recently). However, renewed attention to the issue apparently was kicked off by this article in 2011:

https://www.ncbi.nlm.nih.gov/pubmed/22006061

Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.â€?

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. In “p-hackingâ€? the term refers to massaging the data in various ways until one obtains a statistically significant result, one in which the p-value (the probability of obtaining an effect at least as large as the one observed if chance alone is responsible for that difference) is less than some criterion value, typically p < .05 or 1 chance in 20. P-hacking involves such practices as conducting multiple tests on the same data set, which can result in “probability pyramidingâ€? – getting at least one significaant difference by chance, which is much more likely than the stated level of significance. It can also involve omitting data points that one can find some apparently reasonable excuse for excluding (e.g., outliers), eliminating certain conditions from the analysis that were included in the original design, and others. My favorite example involves running a few subjects, doing the significance test, and repeating until one gets a statistically significant result. I recall reading one two-group study that included only three rats per group. As nobody sets out to run a group-based study with so few subjects, it was obvious to me that the author had conducted significance testing after running only the first three rats in each group, got p < .05, and quit. The power of a test (its ability to detect a real effect of the independent variable) would have been extremely low with such a small samples, so the results almost certainly capitalized on chance.

Bruce A.

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Phil would have appreciated this. You will too, Rick.

http://www.chronicle.com/article/Spoiled-Science/239529

[From Rick Marken (2017.03.22.1830)]

···

 Bruce Abbott (2017.03.22.1015 EST)-
BA: Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting netsâ€? described in his Casting Nets and Testing Specimens.

RM: Phil wasn’t opposed to the methodology when it was used to study groups; but he was very opposed to it when it was used to study individuals. Phil’s called his preferred methodology for studying individuals “testing specimens”, which, when applied to the study of living organisms, is the Test for the Controlled Variable (which, at that time Phil wrote about it, was just called The Test rather than the TCV). Phil describes this methodology starting on p. 117 of Casting Nets and Testing Specimens and in the Chapter entitled “Testing Specimens” where, I’m honored to say, he uses a couple of my research studies as examples of the testing specimens methodology.Â

BA: Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.â€?

RM: Your thinking anticipates some current thinking on the issue. In 2010, an article by J. L. Rodgers in the American Psychologist described a quiet revolution that was happening in psychology, which involved a move away from statistical significance testing (which Rodgers called “null hypothesis statistical testing (NHST)”) to model testing. Sounds great, right. But unlike you, Rodgers thought of this as a methodological rather than a statistical analysis revolution. So I submitted a comment on the Rodgers article to American Psychologist  but it was not accepted (the reason given was that there was “insufficient interest”;-). But I liked the comment so much that I put it up at my website and eventually included it in “Doing Research on Purpose”. The net version is at:

 http://www.mindreadings.com/MMR.pdf

RM: The gist of the comment is that the revolution Rodgers describes is great but it is a statistical not a methodological revolution. I go on to explain that psychology does need a methodological revolution because the current methodology is based on the wrong model of behavior. The methodological revolution is aimed at testing to determine the variables controlled by individuals (testing specimens). I like to think Phil would have been very pleased had he lived to read it. I’d be interested in what others in the group think of the comment (if there is any interest;-)

BestÂ

Rick

Â

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. In “p-hackingâ€? the term refers to massaging the data in various ways until one obtains a statistically significant result, one in which the p-value (the probability of obtaining an effect at least as large as the one observed if chance alone is responsible for that difference) is less than some criterion value, typically p < .05 or 1 chance in 20. P-hacking involves such practices as conducting multiple tests on the same data set, which can result in “probability pyramidingâ€? – getting at least one significant difference by chance, whiich is much more likely than the stated level of significance. It can also involve omitting data points that one can find some apparently reasonable excuse for excluding (e.g., outliers), eliminating certain conditions from the analysis that were included in the original design, and others. My favorite example involves running a few subjects, doing the significance test, and repeating until one gets a statistically significant result. I recall reading one two-group study that included only three rats per group. As nobody sets out to run a group-based study with so few subjects, it was obvious to me that the author had conducted significance testing after running only the first three rats in each group, got p < .05, and quit. The power of a test (its ability to detect a real effect of the independent variable) would have been extremely low with such a small samples, so the results almost certainly capitalized on chance.

Bruce A.

Â

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Â

Phil would have appreciated this. You will too, Rick.Â

Â

http://www.chronicle.com/article/Spoiled-Science/239529


Richard S. MarkenÂ

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
                --Antoine de Saint-Exupery

[From Leeanne Wright (2017.03.22.12.27 AEST]

Hi Richard/all,

I just read your comment and thought it was a good summary of the types of issues you write about in DRoP, which I read last year. Funnily enough, even though most of undergraduate statistics in psychology starts with (and is mostly about) p-testing, your book helped me to understand the issues associated with this approach in a way that wouldn’t have been otherwise available to me. In advanced statistics we eventually got to model-making, which I enjoyed immensely (even though I thought it would be intimidating). I even managed a HD. I have followed this thread with interest, and have printed off the articles so that hopefully, when I go down to the intensive school next month and start looking for a research supervisor, I will not be forced to do the same old, same old……

I think this sort of discussion is really useful and interesting to students/newcomers by the way. At least it was for me.

Regards

Leeanne

···

Bruce Abbott (2017.03.22.1015 EST)-
BA: Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting netsâ€? described in his Casting Nets and Testing Specimens.

RM: Phil wasn’t opposed to the methodology when it was used to study groups; but he was very opposed to it when it was used to study individuals. Phil’s called his preferred methodology for studying individuals “testing specimens”, which, when applied to the study of living organisms, is the Test for the Controlled Variable (which, at that time Phil wrote about it, was just called The Test rather than the TCV). Phil describes this methodology starting on p. 117 of Casting Nets and Testing Specimens and in the Chapter entitled “Testing Specimens” where, I’m honored to say, he uses a couple of my research studies as examples of the testing specimens methodology.

BA: Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.â€?

RM: Your thinking anticipates some current thinking on the issue. In 2010, an article by J. L. Rodgers in the American Psychologist described a quiet revolution that was happening in psychology, which involved a move away from statistical significance testing (which Rodgers called “null hypothesis statistical testing (NHST)”) to model testing. Sounds great, right. But unlike you, Rodgers thought of this as a methodological rather than a statistical analysis revolution. So I submitted a comment on the Rodgers article to American Psychologist but it was not accepted (the reason given was that there was “insufficient interest”;-). But I liked the comment so much that I put it up at my website and eventually included it in “Doing Research on Purpose”. The net version is at:

http://www.mindreadings.com/MMR.pdf

RM: The gist of the comment is that the revolution Rodgers describes is great but it is a statistical not a methodological revolution. I go on to explain that psychology does need a methodological revolution because the current methodology is based on the wrong model of behavior. The methodological revolution is aimed at testing to determine the variables controlled by individuals (testing specimens). I like to think Phil would have been very pleased had he lived to read it. I’d be interested in what others in the group think of the comment (if there is any interest;-)

Best

Rick

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. In “p-hackingâ€? the term refers to massaging the data in various ways until one obtains a statistically significant result, one in which the p-value (the probability of obtaining an effect at least as large as the one observed if chance alone is responsible for that difference) is less than some criterion value, typically p < .05 or 1 chance in 20. P-hacking involves such practices as conducting multiple tests on the same data set, which can result in “probability pyramidingâ€? – getting at least one significant difference by chance, which is much more likely than the stated level of significance. It can also involve omitting data points that one can find some apparently reasonable excuse for excluding (e.g., outliers), eliminating certain conditions from the analysis that were included in the original design, and others. My favorite example involves running a few subjects, doing the significance test, and repeating until one gets a statistically significant result. I recall reading one two-group study that included only three rats per group. As nobody sets out to run a group-based study with so few subjects, it was obvious to me that the author had conducted significance testing after running only the first three rats in each group, got p < .05, and quit. The power of a test (its ability to detect a real effect of the independent variable) would have been extremely low with such a small samples, so the results almost certainly capitalized on chance.

Bruce A.

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Phil would have appreciated this. You will too, Rick.

http://www.chronicle.com/article/Spoiled-Science/239529


Richard S. Marken

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
–Antoine de Saint-Exupery

[From Bruce Abbott (2017.03.23.1015 EST)]

[Martin Taylor 2017.03.22.10.18]

[From Bruce Abbott (2017.03.22.1015 EST)]

Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting nets” described in his Casting Nets and Testing Simens.

Concern about so-called p-hacking is not new (although the catchy new label for it appeared relatively recently). However, renewed attention to the issue apparently was kicked off by this article in 2011:

https://www.ncbi.nlm.nih.gov/pubmed/22006061

Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.”

MT: I don’t know how many decades ago that was for you, but it is the inverse of my experience. In my case it was my advisor who told me that "statistical significance testing was the worst thing ever to happen to experimental psychology.

BA: The early ‘80s, not long after I started my academic career at IPFW. I was doing experimental work from the Skinnerian perspective, which eschews the group-based null hypothesis significance testing approach and instead focuses on behavioral changes within the individual subject across treatments.

MT: Here’s a quote from the abstract of a paper on which one of his friends was senior author (Edwards, Lindeman and Savage, Psychol. Review., 70/3, p193ff): “A common feature of many classical significance tests is that a sharp null hypothesis is compared with a diffuse alternative hypothesis. Often evidence which, for a Bayesian statistician, strikingly supports the null hypothesis leads to rejection of that hypothesis by standard classical procedures.” My advisor suggested that the main reason for using significance tests was that it eased approval of publications by journal editors. I admit to having used it in that way quite often, but I have also objected when editors have asked for one.

BA: One reviewer of my methods text took issue with my assertion that the size of p is not an indicator of how likely it is that the treatment had a real effect. (I had claimed that it is inappropriate to think that a very small p-value (e.g., .000001) makes a treatment effect more likely than a larger p-value (e.g., .05). The p-value estimates how probable a difference at least as large as the on observe would occur, given that the null hypothesis is true and therefore the difference is a chance difference. It’s the probability of the data, given the null hypothesis, not the probability of the null hypothesis, given the data. However, the reviewer pointed out that one could estimate the latter if one is willing to state an a-priori probability and apply Bayes Theorem.

MT: Some years later, I had a paper rejected (by another of my graduate school colleagues) because I refused to use significance tests to prove that there was a difference between men and women in a colour discrimination experiment in which no woman scored below 80% and no man scored over 20% (or something like that). I took the position that you can always find an effect with a significance test if you look hard enough, but what mattered was what Ward Edwards called “The Interocular Traumatic Test.” The wording I used in my response to the editor was that with enough data you could find an effect of the inclination of the rings of Saturn on the curl in a puppy-dog’s tail.

Lowell Shipper, from whom I took graduate courses in statistical analysis, mentioned the same test: You look at the data, and the effect hits you between the eyes! I had a similar encounter with a reviewer who wanted me to perform a statistical test on data in which there was NO OVERLAP in treatment samples.

MT: One of my graduate school colleagues did a study with the opposite effect of the p-hacking described in the article. Someone had proposed a theory that said there should be a weak population effect in some social study. Apparently it must have been considered animportant theory, because six papers from different research groups had all found “no significant effect”. The concensus was that the theorized effect did not occur, because “no significant effect” was mentally translated as “no effect”. But my colleague noted that all the studies showed a consistent effect in the direction theorized (by a significance test, a 1/64 chance of happening if the actual effect magnitude was zero), and did a meta-analysis that showed the probability that there was no effect was actually vanishingly small. The totality of the studies that had led to a consensus that the theory was wrong actually provided strong evidence that the theory was right (or at least more right than the only other alternative tested).

BA: Interesting. I had a similar result in a study involving the effect of predictable versus unpredictable stress on pain sensitivity. I ran the experiment three times with changes I hoped would produce a larger difference. I obtained essentially the same result each time but in each case the effect failed to reach statistical significance. This was prior to the introduction of meta-analysis, which might have shown the effect.

MT: So we have in common a decades-long belief that " statistical significance testing was the worst thing ever to happen to experimental psychology".

BA: Thumbs up!

Bruce

[From Rick Marken (2017.03.23.1150)]

···

Leeanne Wright (2017.03.22.12.27 AEST)–

Hi Richard/all,

LW: I just read your comment and thought it was a good summary of the types of issues you write about in DRoP, which I read last year. Funnily enough, even though most of undergraduate statistics in psychology starts with (and is mostly about) p-testing, your book helped me to understand the issues associated with this approach in a way that wouldn’t have been otherwise available to me. In advanced statistics we eventually got to model-making, which I enjoyed immensely (even though I thought it would be intimidating). I even managed a HD. I have followed this thread with interest, and have printed off the articles so that hopefully, when I go down to the intensive school next month and start looking for a research supervisor, I will not be forced to do the same old, same old……

LW: I think this sort of discussion is really useful and interesting to students/newcomers by the way. At least it was for me. Â

RM: Thanks Leeanne! You made my day! I think your new found skills at modeling will allow you to make great contributions to the science of PCT. I hope I am not betraying too much of my own ignorance of modeling, however, Â if I ask what is HD?Â

Best regards

Rick

Â

Regards

Leeanne

On 23 Mar 2017, at 11:33 am, Richard Marken rsmarken@gmail.com wrote:

[From Rick Marken (2017.03.22.1830)]


Richard S. MarkenÂ

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
                --Antoine de Saint-Exupery

 Bruce Abbott (2017.03.22.1015 EST)-
BA: Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting nets� described in his Casting Nets and Testing Specimens.

RM: Phil wasn’t opposed to the methodology when it was used to study groups; but he was very opposed to it when it was used to study individuals. Phil’s called his preferred methodology for studying individuals “testing specimens”, which, when applied to the study of living organisms, is the Test for the Controlled Variable (which, at that time Phil wrote about it, was just called The Test rather than the TCV). Phil describes this methodology starting on p. 117 of Casting Nets and Testing Specimens and in the Chapter entitled “Testing Specimens” where, I’m honored to say, he uses a couple of my research studies as examples of the testing specimens methodology.Â

BA: Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.�

RM: Your thinking anticipates some current thinking on the issue. In 2010, an article by J. L. Rodgers in the American Psychologist described a quiet revolution that was happening in psychology, which involved a move away from statistical significance testing (which Rodgers called “null hypothesis statistical testing (NHST)”) to model testing. Sounds great, right. But unlike you, Rodgers thought of this as a methodological rather than a statistical analysis revolution. So I submitted a comment on the Rodgers article to American Psychologist  but it was not accepted (the reason given was that there was “insufficient interest”;-). But I liked the comment so much that I put it up at my website and eventually included it in “Doing Research on Purpose”. The net version is at:

 http://www.mindreadings.com/MMR.pdf

RM: The gist of the comment is that the revolution Rodgers describes is great but it is a statistical not a methodological revolution. I go on to explain that psychology does need a methodological revolution because the current methodology is based on the wrong model of behavior. The methodological revolution is aimed at testing to determine the variables controlled by individuals (testing specimens). I like to think Phil would have been very pleased had he lived to read it. I’d be interested in what others in the group think of the comment (if there is any interest;-)

BestÂ

Rick

Â

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. In “p-hackingâ€? the term refers to massaging the data in various ways until one obtains a statistically significant result, one in which the p-value (the probability of obtaining an effect at least as large as the one observed if chance alone is responsible for that difference) is less than some criterion value, typically p < .05 or 1 chance in 20. P-hacking involves such practices as conducting multiple tests on the same data set, which can result in “probability pyramidingâ€? – getting at least one significant difference by chance, which is much more likely than the stated level of significance. It can also involve omitting data points that one can find some apparently reasonable excuse for excluding (e.g., outliers), eliminating certain conditions from the analysis that were included in the original design, and others. My favorite example involves running a few subjects, doing the significance test, and repeating until one gets a statistically significant result. I recall reading one two-group study that included only three rats per group. As nobody sets out to run a group-based study with so few subjects, it was obvious to me that the author had conducted significance testing after running only the first three rats in each group, got p < .05, and quit. The power of a test (its ability to detect a real effect of the independent variable) would have been extremely low with such a small samples, so the results almost certainly capitalized on chance.

Bruce A.

Â

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Â

Phil would have appreciated this. You will too, Rick.Â

Â

http://www.chronicle.com/article/Spoiled-Science/239529


Richard S. MarkenÂ

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
                --Antoine de Saint-Exupery

[From Rick Marken (2017.03.23.1150)]

···

Leeanne Wright (2017.03.22.12.27 AEST)–

Hi Richard/all,

LW: I just read your comment and thought it was a good summary of the types of issues you write about in DRoP, which I read last year. Funnily enough, even though most of undergraduate statistics in psychology starts with (and is mostly about) p-testing, your book helped me to understand the issues associated with this approach in a way that wouldn’t have been otherwise available to me. In advanced statistics we eventually got to model-making, which I enjoyed immensely (even though I thought it would be intimidating). I even managed a HD. I have followed this thread with interest, and have printed off the articles so that hopefully, when I go down to the intensive school next month and start looking for a research supervisor, I will not be forced to do the same old, same old……

LW: I think this sort of discussion is really useful and interesting to students/newcomers by the way. At least it was for me.

RM: Thanks Leeanne! You made my day! I think your new found skills at modeling will allow you to make great contributions to the science of PCT. I hope I am not betraying too much of my own ignorance of modeling, however, if I ask what is HD?

Best regards

Rick

Regards

Leeanne

On 23 Mar 2017, at 11:33 am, Richard Marken rsmarken@gmail.com wrote:

[From Rick Marken (2017.03.22.1830)]


Richard S. Marken

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
–Antoine de Saint-Exupery

Bruce Abbott (2017.03.22.1015 EST)-
BA: Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting nets� described in his Casting Nets and Testing Specimens.

RM: Phil wasn’t opposed to the methodology when it was used to study groups; but he was very opposed to it when it was used to study individuals. Phil’s called his preferred methodology for studying individuals “testing specimens”, which, when applied to the study of living organisms, is the Test for the Controlled Variable (which, at that time Phil wrote about it, was just called The Test rather than the TCV). Phil describes this methodology starting on p. 117 of Casting Nets and Testing Specimens and in the Chapter entitled “Testing Specimens” where, I’m honored to say, he uses a couple of my research studies as examples of the testing specimens methodology.

BA: Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.�

RM: Your thinking anticipates some current thinking on the issue. In 2010, an article by J. L. Rodgers in the American Psychologist described a quiet revolution that was happening in psychology, which involved a move away from statistical significance testing (which Rodgers called “null hypothesis statistical testing (NHST)”) to model testing. Sounds great, right. But unlike you, Rodgers thought of this as a methodological rather than a statistical analysis revolution. So I submitted a comment on the Rodgers article to American Psychologist but it was not accepted (the reason given was that there was “insufficient interest”;-). But I liked the comment so much that I put it up at my website and eventually included it in “Doing Research on Purpose”. The net version is at:

http://www.mindreadings.com/MMR.pdf

RM: The gist of the comment is that the revolution Rodgers describes is great but it is a statistical not a methodological revolution. I go on to explain that psychology does need a methodological revolution because the current methodology is based on the wrong model of behavior. The methodological revolution is aimed at testing to determine the variables controlled by individuals (testing specimens). I like to think Phil would have been very pleased had he lived to read it. I’d be interested in what others in the group think of the comment (if there is any interest;-)

Best

Rick

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. In “p-hackingâ€? the term refers to massaging the data in various ways until one obtains a statistically significant result, one in which the p-value (the probability of obtaining an effect at least as large as the one observed if chance alone is responsible for that difference) is less than some criterion value, typically p < .05 or 1 chance in 20. P-hacking involves such practices as conducting multiple tests on the same data set, which can result in “probability pyramidingâ€? – getting at least one significant difference by chhance, which is much more likely than the stated level of significance. It can also involve omitting data points that one can find some apparently reasonable excuse for excluding (e.g., outliers), eliminating certain conditions from the analysis that were included in the original design, and others. My favorite example involves running a few subjects, doing the significance test, and repeating until one gets a statistically significant result. I recall reading one two-group study that included only three rats per group. As nobody sets out to run a group-based study with so few subjects, it was obvious to me that the author had conducted significance testing after running only the first three rats in each group, got p < .05, and quit. The power of a test (its ability to detect a real effect of the independent variable) would have been extremely low with such a small samples, so the results almost certainly capitalized on chance.

Bruce A.

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Phil would have appreciated this. You will too, Rick.

http://www.chronicle.com/article/Spoiled-Science/239529


Richard S. Marken

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
–Antoine de Saint-Exupery

[From Leeanne Wright (2017.03.24.8.11 AEST)]

Hi Rick,

A HD is a High Distinction. I felt pretty chuffed with myself given that I only did Maths at high school to year 10 level (age 15). Which was 1986 for me. Shows that an old dog can learn new tricks after all.

It’s still a foreign language though. Whenever I read the results section of a paper, my most common question is “Now what the hell does that mean?â€?. Funnily enough, the more I learn the more I realise I don’t know…. Ironically, that’s what makes iit fun…

Regards

Leeanne

···

Leeanne Wright (2017.03.22.12.27 AEST)–

Hi Richard/all,

LW: I just read your comment and thought it was a good summary of the types of issues you write about in DRoP, which I read last year. Funnily enough, even though most of undergraduate statistics in psychology starts with (and is mostly about) p-testing, your book helped me to understand the issues associated with this approach in a way that wouldn’t have been otherwise available to me. In advanced statistics we eventually got to model-making, which I enjoyed immensely (even though I thought it would be intimidating). I even managed a HD. I have followed this thread with interest, and have printed off the articles so that hopefully, when I go down to the intensive school next month and start looking for a research supervisor, I will not be forced to do the same old, same old……

LW: I think this sort of discussion is really useful and interesting to students/newcomers by the way. At least it was for me.

RM: Thanks Leeanne! You made my day! I think your new found skills at modeling will allow you to make great contributions to the science of PCT. I hope I am not betraying too much of my own ignorance of modeling, however, if I ask what is HD?

Best regards

Rick

Regards

Leeanne

On 23 Mar 2017, at 11:33 am, Richard Marken rsmarken@gmail.com wrote:

[From Rick Marken (2017.03.22.1830)]


Richard S. Marken

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
–Antoine de Saint-Exupery

Bruce Abbott (2017.03.22.1015 EST)-
BA: Yes, Phil Runkel would have appreciated this. Please keep in mind, however, that Phil was not opposed to the methodology: It is the “casting nets� described in his Casting Nets and Testing Specimens.

RM: Phil wasn’t opposed to the methodology when it was used to study groups; but he was very opposed to it when it was used to study individuals. Phil’s called his preferred methodology for studying individuals “testing specimens”, which, when applied to the study of living organisms, is the Test for the Controlled Variable (which, at that time Phil wrote about it, was just called The Test rather than the TCV). Phil describes this methodology starting on p. 117 of Casting Nets and Testing Specimens and in the Chapter entitled “Testing Specimens” where, I’m honored to say, he uses a couple of my research studies as examples of the testing specimens methodology.

BA: Several decades ago I told my future coauthor on the research methods book that I thought statistical significance testing was the worst thing ever to happen to experimental psychology. I asked him recently if he remembered that. He said “Yes, and at the time I thought you were crazy.�

RM: Your thinking anticipates some current thinking on the issue. In 2010, an article by J. L. Rodgers in the American Psychologist described a quiet revolution that was happening in psychology, which involved a move away from statistical significance testing (which Rodgers called “null hypothesis statistical testing (NHST)”) to model testing. Sounds great, right. But unlike you, Rodgers thought of this as a methodological rather than a statistical analysis revolution. So I submitted a comment on the Rodgers article to American Psychologist but it was not accepted (the reason given was that there was “insufficient interest”;-). But I liked the comment so much that I put it up at my website and eventually included it in “Doing Research on Purpose”. The net version is at:

http://www.mindreadings.com/MMR.pdf

RM: The gist of the comment is that the revolution Rodgers describes is great but it is a statistical not a methodological revolution. I go on to explain that psychology does need a methodological revolution because the current methodology is based on the wrong model of behavior. The methodological revolution is aimed at testing to determine the variables controlled by individuals (testing specimens). I like to think Phil would have been very pleased had he lived to read it. I’d be interested in what others in the group think of the comment (if there is any interest;-)

Best

Rick

The term “p-hackingâ€? obviously borrows from the idea of computer “hacking,â€? but the word “hackingâ€? doesn’t mean the same thing in the two cases. In computer hacking it involves breaking into a password-protected computer system in order to steal information or wreak havoc. In “p-hackingâ€? the term refers to massaging the data in various ways until one obtains a statistically significant result, one in which the p-value (the probability of obtaining an effect at least as large as the one observed if chance alone is responsible for that difference) is less than some criterion value, typically p < .05 or 1 chance in 20. P-hacking involves such practices as conducting multiple tests on the same data set, which can result in “probability pyramidingâ€? – gettting at least one significant difference by chance, which is much more likely than the stated level of significance. It can also involve omitting data points that one can find some apparently reasonable excuse for excluding (e.g., outliers), eliminating certain conditions from the analysis that were included in the original design, and others. My favorite example involves running a few subjects, doing the significance test, and repeating until one gets a statistically significant result. I recall reading one two-group study that included only three rats per group. As nobody sets out to run a group-based study with so few subjects, it was obvious to me that the author had conducted significance testing after running only the first three rats in each group, got p < .05, and quit. The power of a test (its ability to detect a real effect of the independent variable) would have been extremely low with such a small samples, so the results almost certainly capitalized on chance.

Bruce A.

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Tuesday, March 21, 2017 10:02 PM
To: Control Systems Group Network (CSGnet) CSGNET@listserv.illinois.edu
Subject: p-hacking

Phil would have appreciated this. You will too, Rick.

http://www.chronicle.com/article/Spoiled-Science/239529


Richard S. Marken

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
–Antoine de Saint-Exupery

[From Rick Marken (2017.03.24.1300)]

···

 Bruce Abbott (2017.03.23.1015 EST)

MT: One of my graduate school colleagues did a study with the opposite effect of the p-hacking described in the article. Someone had proposed a theory that said there should be a weak population effect in some social study. Apparently it must have been considered animportant theory, because six papers from different research groups had all found “no significant effect”. The concensus was that the theorized effect did not occur, because “no significant effect” was mentally translated as “no effect”. But my colleague noted that all the studies showed a consistent effect in the direction theorized (by a significance test, a 1/64 chance of happening if the actual effect magnitude was zero), and did a meta-analysis that showed the probability that there was no effect was actually vanishingly small. The totality of the studies that had led to a consensus that the theory was wrong actually provided strong evidence that the theory was right (or at least more right than the only other alternative tested).

BA: Interesting. I had a similar result in a study involving the effect of predictable versus unpredictable stress on pain sensitivity. I ran the experiment three times with changes I hoped would produce a larger difference. I obtained essentially the same result each time but in each case the effect failed to reach statistical significance. This was prior to the introduction of meta-analysis, which might have shown the effect.

RM: But, of course, even if it did, we know from Powers’ 1978 Psych Review paper that these apparent “effects” are an example of a behavioral illusion; an apparent causal relationship between a disturbance and the action (or a side-effect thereof) that protects a controlled variable from the effect of the disturbance. That is, the apparent causal relationship between an independent and dependent variable is a side effect of the process of control.Â

MT: So we have in common a decades-long belief that " statistical significance testing was the worst thing ever to happen to experimental psychology".
Â

BA: Thumbs up!

RM: My decades-long belief is that failure to see that behavior is a process of control was the worst thing ever to happen to experimental psychology, far worse than the addition to statistical significance testing.Â

BestÂ

Rick


Richard S. MarkenÂ

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
                --Antoine de Saint-Exupery

[Martin Taylor 2017.03.25.11.44]

I suppose it might be mildly interesting to know how Powers's clear

description of the Behavioural Illusion can be applied to the
studies Bruce and I described. I re-read the Psych Review paper once
more (in LSC I) and can find nothing relevant. You may be right. A conceptual problem that changes the nature of
what you think you are observing is more deeply problematic than a
measuring error in the observations of the inappropriate variable.
Martin

···

On 2017/03/24 4:01 PM, Richard Marken
wrote:

[From Rick Marken (2017.03.24.1300)]

                Bruce Abbott (2017.03.23.1015

EST)

                MT: One of my graduate school

colleagues did a study with the opposite effect of
the p-hacking described in the article. Someone had
proposed a theory that said there should be a weak
population effect in some social study. Apparently
it must have been considered animportant theory,
because six papers from different research groups
had all found “no significant effect”. The concensus
was that the theorized effect did not occur, because
“no significant effect” was mentally translated as
“no effect”. But my colleague noted that all the
studies showed a consistent effect in the direction
theorized (by a significance test, a 1/64 chance of
happening if the actual effect magnitude was zero),
and did a meta-analysis that showed the probability
that there was no effect was actually vanishingly
small. The totality of the studies that had led to a
consensus that the theory was wrong actually
provided strong evidence that the theory was right
(or at least more right than the only other
alternative tested).

                BA: Interesting.  I had a similar result in a study

involving the effect of predictable versus
unpredictable stress on pain sensitivity. I ran the
experiment three times with changes I hoped would
produce a larger difference. I obtained essentially
the same result each time but in each case the
effect failed to reach statistical significance.
This was prior to the introduction of meta-analysis,
which might have shown the effect.

          RM: But, of course, even if it did, we know from

Powers’ 1978 Psych Review paper that these
apparent “effects” are an example of a behavioral
illusion; an apparent causal relationship between a
disturbance and the action (or a side-effect thereof) that
protects a controlled variable from the effect of the
disturbance. That is, the apparent causal relationship
between an independent and dependent variable is a side
effect of the process of control.

        MT:

So we have in common a decades-long belief that "
statistical significance testing was the worst thing ever to
happen to experimental psychology".

        BA:

Thumbs up!

        RM: My decades-long belief is that failure to see that

behavior is a process of control was the worst thing ever to
happen to experimental psychology, far worse than the
addition to statistical significance testing.

[From Rick Marken (2017.03.25.1250]

[Martin Taylor 2017.03.25.11.44]

MT: One of my graduate school colleagues did a study with the opposite effect of the p-hacking described in the article. Someone had proposed a theory that said there should be a weak population effect in some social study...

BA: Interesting. I had a similar result in a study involving the effect of predictable versus unpredictable stress on pain sensitivity. I ran the experiment three times with changes I hoped would produce a larger difference. I obtained essentially the same result each time but in each case the effect failed to reach statistical significance. This was prior to the introduction of meta-analysis, which might have shown the effect.

RM: But, of course, even if it did, we know from Powers' 1978 Psych Review paper that these apparent "effects" are an example of a behavioral illusion; an apparent causal relationship between a disturbance and the action (or a side-effect thereof) that protects a controlled variable from the effect of the disturbance. That is, the apparent causal relationship between an independent and dependent variable is a side effect of the process of control.

MT: I suppose it might be mildly interesting to know how Powers's clear description of the Behavioural Illusion can be applied to the studies Bruce and I described. I re-read the Psych Review paper once more (in LSC I) and can find nothing relevant.

RM: I think it would be very interesting to see how the Behavioral Illusion applies to the studies you and Bruce described. But I can't really do it without knowing a little more about the studies. All I know about the research you described is that it provided evidence of "a weak population effect in some social study". What I need is a description of the independent variable and the dependent variable on which the independent variable had a "weak effect". In Bruce's study I know that the independent variable was the predictability of a stressor (probably a shock) -- the stressor being either predictable or unpredictable --  and the dependent variable was pain sensitivity. In order to show how the Behavioral Illusion applies I need to know a little more about how pain sensitivity was measured.Â
RM: So if I can get that information I can suggest how the Behavioral Illusion applies (or, perhaps, doesn't apply) to these studies. But I'll just say that I think the Behavioral Illusion does apply because the illusion is that variations in an independent variable have an effect on (are a cause of) variations in the dependent variable in an experiment. So when you say that you found evidence of "an effect" (whether the evidence comes from a single significance test or a meta-analysis) you are describing something that Bill Powers showed to be an illusion when the system under study is purposive (a control system), such as a living organism.
RM: Independent variables in experiments are disturbances to variables subjects were asked to control and disturbances don't cause organisms to do anything. They are simply one source of an effect on the controlled variable, the other main source of an effect being the actions of the system itself. It is the state of the controlled variable -- not the disturbances to that variable -- that is the cause of the actions (or a side effect of those actions) that are mistakenly being seen as affected by the disturbance. Thus, any apparent "effect" of an independent variable on anything the control system does -- that is, the appearance that the effect of an independent variable "runs through" the organism causing variation in the dependent variable -- is an illusion.Â
BestÂ
Rick

···

--
Richard S. MarkenÂ
"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
                --Antoine de Saint-Exupery

[Martin Taylor 2017.03.25.16.34]

I don't know what the study was, but I don't think it matters for

the point at issue – the misleading use of significance testing. I
think we can all agree that the individuals whose actions were
responsible for whatever effects were measured were controlling some
perception or other, probably not all of them the same perception.
Very probably the effects in question were related to the
disturbances caused by side-effects of those control actions. But
these facts that seem to follow from PCT don’t seem relevant to the
question in any way that I can see.

As Bill P pointed out more than once, when you are dealing with

population effects, different considerations apply. There isn’t “a”
controlled perception affecting one environmental variable and
having side effects on that same controlled perception. There’s a
whole network of cross-influences with highly variable coupling
constants, and to find any social effects among the many that happen
when lots of people control lots of perceptions, inducing lots of
cross-disturbances and feedback networks, is interesting in itself.
There’s no control system at work. There are myriads of them
interacting in lots of different ways, creating some effects that
are observable in social measures. But when the existence or
non-existence of the social effect depends on a significance test
(or many), there’s a real problem.

I think it's dependent on individual control in the same way that

the properties of materials with complex structures are dependent on
the valence structures of atoms. They are, but to try to work out
the material properties from the properties of the atoms is a
fiendishly difficult problem.

Martin

Martin
···

[From Rick Marken (2017.03.25.1250]

                [Martin Taylor

2017.03.25.11.44]

            MT: I suppose it might be mildly interesting to

know how Powers’s clear description of the Behavioural
Illusion can be applied to the studies Bruce and I
described. I re-read the Psych Review paper once more
(in LSC I) and can find nothing relevant.

          RM: I think it would be very interesting to see how the

Behavioral Illusion applies to the studies you and Bruce
described. But I can’t really do it without knowing a
little more about the studies. All I know about the
research you described is that it provided evidence of “a
weak population effect in some social study”. What I need
is a description of the independent variable and the
dependent variable on which the independent variable had a
“weak effect”.

                              MT: One of my

graduate school colleagues did a study
with the opposite effect of the
p-hacking described in the article.
Someone had proposed a theory that
said there should be a weak population
effect in some social study…

                              BA: Interesting.  I had a similar

result in a study involving the effect
of predictable versus unpredictable
stress on pain sensitivity. I ran the
experiment three times with changes I
hoped would produce a larger
difference. I obtained essentially
the same result each time but in each
case the effect failed to reach
statistical significance. This was
prior to the introduction of
meta-analysis, which might have shown
the effect.

                        RM: But, of course, even if it did, we

know from Powers’ 1978 Psych Review
paper that these apparent “effects” are an
example of a behavioral illusion; an
apparent causal relationship between a
disturbance and the action (or a side-effect
thereof) that protects a controlled variable
from the effect of the disturbance. That is,
the apparent causal relationship between an
independent and dependent variable is a side
effect of the process of control.

          In Bruce's study I know that the independent variable

was the predictability of a stressor (probably a shock) –
the stressor being either predictable or unpredictable –
 and the dependent variable was pain sensitivity. In
order to show how the Behavioral Illusion applies I need
to know a little more about how pain sensitivity was
measured.Â

          RM: So if I can get that information I can suggest how

the Behavioral Illusion applies (or, perhaps, doesn’t
apply) to these studies. But I’ll just say that I think
the Behavioral Illusion does apply because the illusion is
that variations in an independent variable have an effect
on (are a cause of) variations in the dependent variable
in an experiment. So when you say that you found evidence
of “an effect” (whether the evidence comes from a single
significance test or a meta-analysis) you are describing
something that Bill Powers showed to be an illusion when
the system under study is purposive (a control system),
such as a living organism.

          RM: Independent variables in experiments are

disturbances to variables subjects were asked to control
and disturbances don’t cause organisms to do anything.
They are simply one source of an effect on the controlled
variable, the other main source of an effect being the
actions of the system itself. It is the state of the
controlled variable – not the disturbances to that
variable – that is the cause of the actions (or a side
effect of those actions) that are mistakenly being seen as
affected by the disturbance. Thus, any apparent “effect”
of an independent variable on anything the control system
does – that is, the appearance that the effect of an
independent variable “runs through” the organism causing
variation in the dependent variable – is an illusion.Â

BestÂ

Rick


Richard S. MarkenÂ

                                  "Perfection

is achieved not when you have
nothing more to add, but when you
have
nothing left to take away.�
  Â
            Â
–Antoine de Saint-Exupery

[From Rick Marken (2017.03.25.1700)]

···

Martin Taylor 2017.03.25.16.34]

MT: I don't know what the study was, but I don't think it matters for

the point at issue – the misleading use of significance testing.

RM: The point at issue is whether or not Powers 1978 description of the Behavioral Illusion applies to the studies you and Bruce described. I am completely on board with the fact that “significance testing” can be quite misleading; it’s one of my favorite parts of the statistics class I teach.

MT: I

think we can all agree that the individuals whose actions were
responsible for whatever effects were measured were controlling some
perception or other, probably not all of them the same perception.
Very probably the effects in question were related to the
disturbances caused by side-effects of those control actions. But
these facts that seem to follow from PCT don’t seem relevant to the
question in any way that I can see.

RM: They are not relevant to the misleading use of the significance test. But they are relevant to how the Behavioral Illusion might related to the studies mentioned by you and Bruce.Â

Â

MT: As Bill P pointed out more than once, when you are dealing with

population effects, different considerations apply. There isn’t “a”
controlled perception affecting one environmental variable and
having side effects on that same controlled perception.

RM: So do you think that Powers description of the Behavioral Illusion applies only to the apparent effect of an independent on a dependent variable in experiments where the effect is found for only one individual at a time? Such studies are rather rare in scientific psychology and the 1978 paper was a critique of (or, as Bill put it, some “spadework at the foundations of”) scientific psychology.Â

RM: By far the majority of studies in scientific psychology use group data to determine whether there is an “effect” of an independent on a dependent variable; they are looking for what you call “population effects”. When the results of such studies are “significant” the implication is that the “effect” that is found applies at the individual level. This is the problem with “population effects” that Powers pointed out more than once; the tendency to take group level data as applying to the individuals in the group. But in the 1978 paper Powers took scientific psychologists “at their word”, so to speak, and allowed that when an “effect” was found in a psychological experiment (whether the effect was the average over many individuals, as in the typical experiment, or what was observed for each individual, as in operant experiments, like Bruce’s experiment on the effect of the predictability of shock) it referred to the effect of the independent on the dependent variable at the individual level.Â

RM: So if you really want to know how the Behavioral Illusion applies to the studies you and Bruce described, you have to know what was actually done in those experiments; in particular, what the independent and dependent variables were. Since you don’t remember the details of the experiment you mentioned perhaps Bruce can describe the details of his experiment and we can see how the Behavioral Illusion might apply.Â

RM: Of course, it’s possible that you don’t want to know how the Behaivoral Illusion applies and that’s fine. But some some people might find it interesting. Especially those, like Leeanne Wright, who have expressed an interest in doing research based on PCT. So I think it might be useful to take a close look at some psychology experiments and see if (or how) Bill’s analysis in the 1978 paper applies to them.Â

BestÂ

Rick

Â

There's a

whole network of cross-influences with highly variable coupling
constants, and to find any social effects among the many that happen
when lots of people control lots of perceptions, inducing lots of
cross-disturbances and feedback networks, is interesting in itself.
There’s no control system at work. There are myriads of them
interacting in lots of different ways, creating some effects that
are observable in social measures. But when the existence or
non-existence of the social effect depends on a significance test
(or many), there’s a real problem.

I think it's dependent on individual control in the same way that

the properties of materials with complex structures are dependent on
the valence structures of atoms. They are, but to try to work out
the material properties from the properties of the atoms is a
fiendishly difficult problem.

Martin





Martin


Richard S. MarkenÂ

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
                --Antoine de Saint-Exupery

            MT: I suppose it might be mildly interesting to

know how Powers’s clear description of the Behavioural
Illusion can be applied to the studies Bruce and I
described. I re-read the Psych Review paper once more
(in LSC I) and can find nothing relevant.

          RM: I think it would be very interesting to see how the

Behavioral Illusion applies to the studies you and Bruce
described. But I can’t really do it without knowing a
little more about the studies…

          In Bruce's study I know that the independent variable

was the predictability of a stressor (probably a shock) –
the stressor being either predictable or unpredictable –
 and the dependent variable was pain sensitivity. In
order to show how the Behavioral Illusion applies I need
to know a little more about how pain sensitivity was
measured.Â

          RM: So if I can get that information I can suggest how

the Behavioral Illusion applies (or, perhaps, doesn’t
apply) to these studies. But I’ll just say that I think
the Behavioral Illusion does apply because the illusion is
that variations in an independent variable have an effect
on (are a cause of) variations in the dependent variable
in an experiment. So when you say that you found evidence
of “an effect” (whether the evidence comes from a single
significance test or a meta-analysis) you are describing
something that Bill Powers showed to be an illusion when
the system under study is purposive (a control system),
such as a living organism.

          RM: Independent variables in experiments are

disturbances to variables subjects were asked to control
and disturbances don’t cause organisms to do anything.
They are simply one source of an effect on the controlled
variable, the other main source of an effect being the
actions of the system itself. It is the state of the
controlled variable – not the disturbances to that
variable – that is the cause of the actions (or a side
effect of those actions) that are mistakenly being seen as
affected by the disturbance. Thus, any apparent “effect”
of an independent variable on anything the control system
does – that is, the appearance that the effect of an
independent variable “runs through” the organism causing
variation in the dependent variable – is an illusion.Â

BestÂ

Rick


Richard S. MarkenÂ

                                  "Perfection

is achieved not when you have
nothing more to add, but when you
have
nothing left to take away.�
  Â
            Â
–Antoine de Saint-Exupery

[From Leeanne Wright (2017.03.27.1.39 AEST)]

Hi Rick, Martin & Bruce,

LW: I continue to be fascinated by this thread! The topic of significance-testing in psychology seems to be very salient at the present time. Therefore, good research fodder, perhaps! Maybe, the time has finally come…?

LW: My interest, I have to admit, is definitely piqued!

LW: Yes, I am very interested! Both in doing research based on PCT and in how the Behavioural Illusion might apply to those (or other) studies. I am doing the coursework component of Honours this year and the research component next year, although I will be approaching potential supervisors when I go to the Intensive School in a couple of weeks time….I have my fingers and ttoes crossed, because having crossed the PCT bridge, so to speak, there really is no going back….

Reggards

Leeanne

···
MT: I don't know what the study was, but I don't think it matters for

the point at issue – the misleading use of significance testing.

RM: The point at issue is whether or not Powers 1978 description of the Behavioral Illusion applies to the studies you and Bruce described. I am completely on board with the fact that “significance testing” can be quite misleading; it’s one of my favorite parts of the statistics class I teach.

            MT: I suppose it might be mildly interesting to

know how Powers’s clear description of the Behavioural
Illusion can be applied to the studies Bruce and I
described. I re-read the Psych Review paper once more
(in LSC I) and can find nothing relevant.

          RM: I think it would be very interesting to see how the

Behavioral Illusion applies to the studies you and Bruce
described. But I can’t really do it without knowing a
little more about the studies…

MT: I

think we can all agree that the individuals whose actions were
responsible for whatever effects were measured were controlling some
perception or other, probably not all of them the same perception.
Very probably the effects in question were related to the
disturbances caused by side-effects of those control actions. But
these facts that seem to follow from PCT don’t seem relevant to the
question in any way that I can see.

RM: They are not relevant to the misleading use of the significance test. But they are relevant to how the Behavioral Illusion might related to the studies mentioned by you and Bruce.

MT: As Bill P pointed out more than once, when you are dealing with

population effects, different considerations apply. There isn’t “a”
controlled perception affecting one environmental variable and
having side effects on that same controlled perception.

RM: So do you think that Powers description of the Behavioral Illusion applies only to the apparent effect of an independent on a dependent variable in experiments where the effect is found for only one individual at a time? Such studies are rather rare in scientific psychology and the 1978 paper was a critique of (or, as Bill put it, some “spadework at the foundations of”) scientific psychology.

RM: By far the majority of studies in scientific psychology use group data to determine whether there is an “effect” of an independent on a dependent variable; they are looking for what you call “population effects”. When the results of such studies are “significant” the implication is that the “effect” that is found applies at the individual level. This is the problem with “population effects” that Powers pointed out more than once; the tendency to take group level data as applying to the individuals in the group. But in the 1978 paper Powers took scientific psychologists “at their word”, so to speak, and allowed that when an “effect” was found in a psychological experiment (whether the effect was the average over many individuals, as in the typical experiment, or what was observed for each individual, as in operant experiments, like Bruce’s experiment on the effect of the predictability of shock) it referred to the effect of the independent on the dependent variable at the individual level.

RM: So if you really want to know how the Behavioral Illusion applies to the studies you and Bruce described, you have to know what was actually done in those experiments; in particular, what the independent and dependent variables were. Since you don’t remember the details of the experiment you mentioned perhaps Bruce can describe the details of his experiment and we can see how the Behavioral Illusion might apply.

RM: Of course, it’s possible that you don’t want to know how the Behaivoral Illusion applies and that’s fine. But some some people might find it interesting. Especially those, like Leeanne Wright, who have expressed an interest in doing research based on PCT. So I think it might be useful to take a close look at some psychology experiments and see if (or how) Bill’s analysis in the 1978 paper applies to them.

[Martin Taylor 2017.03.27.00.01]

I find it difficult to see what research needs to be done that was

not done 5 decades or more ago. That’s interesting. I was totally under a misapprehension that the
thread had the subject title “p-hacking”, the misuse of significance
testing.
In what way? The significance testing or the Behavioral Illusion?
The former is a generic problem that applies everywheere you are
concerned with signal in noise. It would occur in the extraction of
a “neural current” from a bunch of sporadically firing neurons, if
that PCT approximation were to be seriously studied. For some reason
Rick can’t find enough S-R studies to criticize in the recent
literature, so he has to go back to studies mentioned in anecdotes
about things that happened decades ago. Whether the “Behavioural
Illusion” critique would apply to them is hard to know, at least in
the case of my fellow graduate student, because not only do I not
know now what he was working on, I didn’t know at the time (1958 or
1959). What I know is that he demonstrated that a whole field of
enquiry had been misled by significance testing, and wanted to tell
everyone about it. Whether the studies would hold up in light of
PCT, is a quite separate question on which I have no opinion.
If you want to talk about the applicability of the Behavioural
Illusion to the interpretation of population studies, I’m happy to
contribute. I suggest you open a new thread and start it with some
proposition that might be analysed and studied seriously, rather
than mixing it up with the p-hacking issue.
It’s usually clear. Bill’s description was very clear, which helps
us to determine when the Behavioural Illusion applies and when it
either clearly doesn’t or when it’s hard to tell. But in a different
thread, please.
Most people who come to that bridge either get scared before they
are half-way across and turn back, or get across into a new world.
Some who get across just stay where they are, saying “how beautiful”
in some kind of a trance while they make a nice garden plot at the
end of the bridge, while others say “Yes, it’s really beautiful; now
let’s see where all these barely travelled paths lead in this lovely
new territory”. I prefer the latter approach, but not everyone does.
That’s just a difference of cognitive style.
The Behavioural Illusion is worth taking seriously, not as a mantra,
but as an object of study, which is why it deserves its own thread.
Martin

···

I wasn’t going to continue with this
thread, having said all I wanted to say. I have no interest in the
studies that were mentioned, except as amusing and perhaps
interesting examples of the misinterpretation of significance
tests. However, since Leanne chimed in …

[From Leeanne Wright (2017.03.27.1.39 AEST)]

Hi Rick, Martin & Bruce,

    LW: Â I continue to be fascinated by this thread!

 The topic of significance-testing in psychology seems to be
very salient at the present time. Therefore, good research
fodder, perhaps! Â Maybe, the time has finally come…? <

                                    MT: I suppose it might

be mildly interesting to know
how Powers’s clear description
of the Behavioural Illusion can
be applied to the studies Bruce
and I described. I re-read the
Psych Review paper once more (in
LSC I) and can find nothing
relevant.

                                RM: I think it would be

very interesting to see how the
Behavioral Illusion applies to the
studies you and Bruce described.Â
But I can’t really do it without
knowing a little more about the
studies…

                    MT: I don't know what the study was, but

I don’t think it matters for the point at issue
– the misleading use of significance testing.

                  RM: The point at issue is whether or

not Powers 1978 description of the Behavioral
Illusion applies to the studies you and Bruce
described.

                  I am completely on board with the

fact that “significance testing” can be quite
misleading; it’s one of my favorite parts of the
statistics class I teach.

LW: Â My interest, I have to admit, is definitely piqued!Â

                    MT:

I think we can all agree that the individuals
whose actions were responsible for whatever
effects were measured were controlling some
perception or other, probably not all of them
the same perception. Very probably the effects
in question were related to the disturbances
caused by side-effects of those control actions.
But these facts that seem to follow from PCT
don’t seem relevant to the question in any way
that I can see.

                  RM: They are not relevant to the

misleading use of the significance test. But they
are relevant to how the Behavioral Illusion might
related to the studies mentioned by you and
Bruce.Â

Â

                    MT:

As Bill P pointed out more than once, when you
are dealing with population effects, different
considerations apply. There isn’t “a”
controlled perception affecting one
environmental variable and having side effects
on that same controlled perception.

                  RM: So do you think that Powers

description of the Behavioral Illusion applies
only to the apparent effect of an independent on a
dependent variable in experiments where the effect
is found for only one individual at a time? Such
studies are rather rare in scientific psychology
and the 1978 paper was a critique of (or, as Bill
put it, some “spadework at the foundations of”)
scientific psychology. Â

                  RM: Of course, it's possible that you

don’t want to know how the Behaivoral Illusion
applies and that’s fine. But some some people
might find it interesting. Especially those, like
Leeanne Wright, who have expressed an interest in
doing research based on PCT. So I think it might
be useful to take a close look at some psychology
experiments and see if (or how) Bill’s analysis in
the 1978 paper applies to them.

      LW: Yes, I am very interested! Â Both in doing research based

on PCT and in how the Behavioural Illusion might apply to
those (or other) studies. Â I am doing the coursework
component of Honours this year and the research component next
year, although I will be approaching potential supervisors
when I go to the Intensive School in a couple of weeks time….I
have my fingers and toes crossed, because having crossed the
PCT bridge, so to speak, there really is no going back……Â

Phil would have appreciated this. You will too, Rick.

http://www.chronicle.com/article/Spoiled-Science/239529

HB : I must say I like it too. How simingly innocent articles about RCT can make so much damage to PCT.

Best,

Boris

···

From: Bruce Nevin [mailto:bnhpct@gmail.com]
Sent: Wednesday, March 22, 2017 3:02 AM
To: Control Systems Group Network (CSGnet)
Subject: p-hacking