Summary: Jacob Cohen's 1994 paper "The earth is round (p < .05)" has been one of my favorites for both academic writing and eviscerating null-hypothesis significance testing. Here I review a few of my favorite sections, and encourage everyone doing research to read it in its entirety.
"The statistical folkways of a more primitive past continue to dominate the local scene"
Some years ago I came across Cohen (1994): The earth is round (p < .05). It always stuck with me as an entertaining yet piercing critique of null-hypothesis significance testing (NHST). I've recently re-read this masterpiece, which is full of gems, starting with the introduction:
I make no pretense of the originality of my remarks in this article. One of the few things we, as psychologists, have learned from over a century of scientific study is that at age three score and 10, originality is not to be expected. David Bakan said back in 1966 that his claim that "a great deal of mischief has been associated" with the test of significance "is hardly original," that it is "what 'everybody knows,'" and that "to say it 'out loud' is...to assume the role of the child who pointed out that the emperor was really outfitted in his underwear" (p. 423). If it was hardly original in 1966, it can hardly be original now. Yet this naked emperor has been shamelessly running around for a long time.
Like many men my age, I mostly grouse. My harangue today is on testing for statistical significance, about which Bill Rozeboom (1960) wrote 33 years ago, "The statistical folkways of a more primitive past continue to dominate the local scene" (p. 417).
And today, they continue to continue. And we, as teachers, consultants, authors, and otherwise perpetrators of quantitative methods, are responsible for the ritualization of null hypothesis significance testing (NHST; I resisted the temptation to call it statistical hypothesis inference testing) to the point of meaninglessness and beyond. I argue herein that NHST has not only failed to support the advance of psychology as a science but also has seriously impeded it.
Yes, and 20 years later the situation has changed little, if at all. Can we at least adopt Cohen's suggestion of referring to traditional p value testing as "statistical hypothesis inference testing"?
Some other choice quotes:
- "So even when used and interpreted 'properly,' with a significance criterion (almost always p < .05) set a priori...H0 has little to commend it in the testing of psychological theories in its usual reject/confirm-the-theory form.*
- "Even a correct interpretation of p values does not achieve very much, and has not for a long time."
Cohen ends with some sensible suggestions, which I'll paraphrase:
"First, don't look for a magic alternative to NHST, some other objective mechanical ritual to replace it. It doesn't exist."
Understand and improve the data that we collect.
Report effect sizes in the form of confidence intervals.
The article is a quick and entertaining read, and required reading for those doing any sort of statistical analyses.