Chris Carter, author of Science and Psychic Phenomena and other books, has sent me a detailed response to J.E. Kennedy's review and to some of the comments that have appeared on this blog. He's given me permission to publish it, so here it is in its entirety. (Everything that follows is from Chris.)
=========
Michael, this is directly from my book, Science and Psychic Phenomena:
“However, it later turned out that Milton and Wiseman had botched their statistical analysis of the ganzfeld experiments by failing to consider sample size. Dean Radin simply added up the total number of hits and trials conducted in those thirty studies (the standard method of doing meta-analysis) and found a statistically significant result with odds against chance of about twenty to one.
“The thirty studies that Milton and Wiseman considered ranged in size from four trials to one hundred, but they used a statistical method that simply ignored sample size (N). For instance, say we have three studies, two with N = 8, 2 hits (25 percent), and a third with N = 60, 21 hits (35 percent). If we ignore sample size, then the un-weighted average percentage of hits is only 28 percent; but the combined average of all the hits is just under 33 percent. This, in simplest terms, is the mistake they made. Had they simply added up the hits and misses and then performed a simple one-tailed t-test, they would have found results significant at the 5 percent level. As statistician Jessica Utts later pointed out, had Milton and Wiseman performed the exact binomial test, the results would have been significant at less than the 4 percent level, with odds against chance of twenty-six to one.” (p. 99)
Kennedy wrote: “The large number of methodological decisions for meta-analyses, like other types of post hoc analyses, provides great opportunity for researchers to consciously or unconsciously bias the results. The endless debates about different possible statistical tests, inclusion cutoff criteria, data trimming, data transformations, and so forth, have no convincing resolutions.”
We can easily see that Kennedy’s point is nonsense. Simply adding up the total number of hits and trials and then doing a then performing a straightforward t-test of significance – as Dean Radin did – has nothing to do with, as Kennedy puts it, “endless debates about different possible statistical tests, inclusion cutoff criteria, data trimming, data transformations, and so forth.” It is a straightforward method of doing statistical analysis that is taught in all first year statistics courses.
Table 1: Standard Ganzfeld Replications 1991-2003
|
Laboratory
|
Sessions
|
Hit Rate*
|
PRL, Princeton, NJ
|
354
|
34 percent
|
University of Amsterdam, Netherlands
|
76
|
38 percent
|
University of Edinburgh, Scotland
|
97
|
33 percent
|
Institute for Parapsychology, NC
|
100
|
33 percent
|
University of Edinburgh, Scotland
|
151
|
27 percent
|
University of Amsterdam, Netherlands
|
64
|
30 percent
|
University of Edinburgh, Scotland
|
128
|
47 percent
|
University of Gothenburg, Sweden
|
150
|
36 percent
|
University of Gothenburg, Sweden
|
74
|
32 percent
|
Totals:
|
1194
|
34.4 percent
|
Strangely, in an exchange with me in the book Debating Psychic Experiences, arch-“skeptic” Ray Hyman mentioned only one of these replication studies:
Instead of conducting meta-analyses on already completed experiments on the Ganzfeld, for example, the parapsychologists might have tried to directly replicate the auto-Ganzfeld experiments with a study created for the stated purpose of replication. The study would be designed specifically for this purpose and would have adequate power. In fact, such studies have been carried out. An example would be Broughton’s attempt to deliberately replicate the auto-Ganzfeld results with enough subjects to insure adequate power. This replication failed. From a scientific viewpoint this replication attempt is much more meaningful than the retrospective combining of already completed (and clearly heterogeneous) experiments. (emphasis added)
But is this replication attempt really “much more meaningful from a scientific viewpoint” than the combined results in a meta-analysis? If the true hit rate were 33 percent with 25 percent expected by chance alone, then the probability that a sample size of 151 will fail to yield results significant at the 5 percent level is 28 percent. In other words, Broughton’s failure to replicate with a sample that small is even less remarkable than flipping a coin twice and getting heads both times.
As an example of a replication study, Hyman could just have easily mentioned Kathy Dalton’s (1997) study using creative individuals, which achieved a hit rate of 47 percent. The odds-against-chance of this result is over 140 million to one. This closely replicated the auto-Ganzfeld results mentioned above (Schlitz & Honorton, 1992), which found a 50 percent hit rate for students from the Juilliard School. It also closely matched results from a study using primarily musicians (Morris, Cunningham, McAlpine, & Taylor, 1993), which found a 41 percent hit rate.
These figures should make the conclusion clear: the earlier results have been replicated by a variety of researchers in different laboratories in different cultures, with similar hit rates. Hyman (1996a) wrote: “The case for psychic functioning seems better than it has ever been…. I also have to admit that I do not have a ready explanation for these observed effects. (p. 43)” Hyman and the other “skeptics” have lost the Ganzfeld debate.
Meta-analysis of the Ganzfeld
However, instead of debating the merits of individual studies, what does the data considered as a whole tell us? Meta-analysis is designed specifically to answer this question, and Dean Radin (2006) has performed it on all Ganzfeld experiments (confirmatory and exploratory) performed over a 30-year period. He wrote:
From 1974 through 2004 a total of 88 Ganzfeld experiments reporting 1,008 hits in 3,145 trials were conducted. The combined hit rate was 32 percent as compared to the chance-expected 25 percent. This 7 percent above-chance effect is associated with odds against chance of 29 quintillion to 1. (p. 120)
Could the results be due to a file-drawer problem of unreported failures? Radin answers:
If we insisted that there had to be a selective reporting problem, even though there’s no evidence of one, then a conservative estimate of the number of studies needed to nullify the observed results is 2,002. That’s a ratio of 23 file drawer studies to each known study, which means that each of the 30 known investigators would have had to conduct but not report 67 additional studies. Because the average Ganzfeld study had 36 trials, these 2,002 “missing” studies would have required 72,072 additional sessions (36 x 2002). To generate this many sessions would mean continually running Ganzfeld sessions 24 hours a day, 7 days a week, for 36 years, and for not a single one of those sessions to see the light of day. (p. 121)
Ersby’s comments
"Kennedy’s argument is that meta-analyses offer enough subjective leeway to adjust results until the required figure is achieved. The Bem, Palmer & Broughton paper actually supports it, rather than refutes it. The BPP paper is, after all, a reworking of a previous meta-analysis with an additional inclusion criteria [sic] which had never been used before and has not been used since. This is exactly the kind of subjectivity that Kennedy is talking about.”
The “additional inclusion criterion” was simply the status of the studies used, that is, whether they were meant to be confirmatory or exploratory. Far from this never been used before, or being somehow “subjective,” this was actually a criterion specified in the joint communique written with skeptic Ray Hyman. As I wrote in my book:
“In their joint communiqué, Hyman and Honorton asked future ganzfeld investigators, as part of their “more stringent standards,” to clearly document the status of the experiment; that is, whether it was meant to merely confirm previous findings or to explore novel conditions. The problem with the Milton and Wiseman study was that it simply lumped all studies together, regardless of whether the status of each study was confirmatory or exploratory. In other words, Milton and Wiseman made no attempt to determine the degree to which the individual studies complied with the standard ganzfeld protocol as spelled out in the joint communiqué.” (p. 100)
Meta-analysis is essentially combining many smaller studies into one larger study in order to exploit the greater statistical power of larger sample sizes to detect effects with statistical significance. As such, it is only useful when all of the studies are of the same nature. This is indeed standard practice, and common sense.
Ersby adds:
“It’s worth noting that Standardness alone does not bring Milton & Wiseman’s meta-analysis up to statistical significance. In other words, the hypothesis that M&W’s poor results was due to including exploratory work is not supported by the data. It is only when the new data from 1997-1999 is introduced does the result reach significance.”
Simply not true. As I wrote above, had they simply added up the hits and misses and then performed a simple one-tailed t-test, they would have found results significant at the 5 percent level. Wiseman, by the way, did not dispute this when statistician Jessica Utts pointed this fact out at a conference.
“Which brings me to the other argument about the M&W: that Milton and Wiseman deliberately excluded the Dalton experiment to achieve a null result. But if you add the Dalton experiment to the M&W database, and use their method to calculate the effect size, it still doesn't achieve significance.”
Again, simply not true. It attains significance even without the Dalton experiment included.
“Lastly, Chris Carter's claim that Milton & Wiseman 'botched' their statistical analysis doesn't stand up to scrutiny. M&W used the same method as Honorton did in 1985. And, as Kennedy states, the method used by Carter is not considered a standard meta-analysis technique."
Again, not true. According to Honorton: "I calculated the exact binomial probability for each study and obtained its associated Z score." This means his z scores were indeed weighted by sample size, so no, Milton & Wiseman did not use the same method that Honorton did. And by the way, the exact binomial test (which uses probabilities and combinations of possible chance results and so explicitly takes into account sample size) most certainly is considered a standard technique.
No statistician would ever argue that sample size should not be considered in performing statistical tests. Sample size is crucially-intrinsic to proper statistical analysis, and I would not have expected Milton & Wiseman’s error from any competent first year statistics student. The error is not even sophomoric.
No wonder this person writes under a pseudonym.
------
Bem, D., and Honorton, C., 1994. “Does Psi Exist?”, Psychological Bulletin, Vol. 115, No. 1, pages 4 – 18.
Bem, D., Palmer, John, and Richard Broughton, 2001. “Updating the Ganzfeld Database: a victim of its own success?”, The Journal of Parapsychology, Vol. 65, September, pages 207-218.
Bierman, Dick J., 1995. “The Amsterdam Ganzfeld Series III & IV: Target clip emotionality, effect sizes, and openness,” Proceedings of the 38th Annual Parapsychological Association Convention, pages 27 – 37.
Broughton, Richard, and Cheryl Alexander, 1995. “AutoGanzfeld II: The first 100 sessions,” Proceedings of the 38th Annual Parapsychological Association Convention, pages 53 – 61.
Broughton, R.S., & Alexander, C.H. 1997. "AutoGanzfeld II: an attempted replication of the PRL Ganzfeld research." Journal of Parapsychology, 61, 209-226.
Carter, Chris, 2012. Science and Psychic Phenomena: the Fall of the House of Skeptics, Rochester, Vermont: Inner Traditions.
Collins, H.H., 1985. Changing Order: Replication and Induction in Scientific Practice. Beverly Hills, CA: Sage.
Dalton, K., (1997). “Exploring the Links: Creativity and Psi in the Ganzfeld.” Proceedings of Presented Papers: the Parapsychological Association 40th Annual Convention, pp. 119-134.
Harris, M., & Rosenthal, R. 1988. Human Performance Research: an overview. Washington, DC: National Academy Press.
Honorton, C., 1975. “Error some place!” Journal of Communication, 25, pages 103-116.
Honorton, C. 1985. “Meta-analysis of Psi Ganzfeld Research: A Response to Hyman.” Journal of Parapsycholgy 49: 51–91.
Honorton, C., 1993. “Rhetoric over Substance: the Impoverished State of Skepticism.” Journal of Parapsychology 57, pages 191-214.
Hyman, R., and Honorton, C., 1986. “A joint communiqué: the psi Ganzfeld controversy.” Journal of Parapsychology, 50, pages 351-364.
Hyman, Ray. 1991. “Comment” Statistical Science, 6, pages 389-392.
Hyman, Ray. 1996a. “Evaluation of Program on Anomalous Mental Phenomena.” Journal of Scientific Exploration, 10, pages 31-58.
Hyman, R. 1996b. "The Evidence for Psychic Functioning: Claims vs. Reality." Skeptical Inquirer, March/April 1996, pp. 24-26.
Morris, R., Cunningham, S., McAlpine, S., & Tayor R., (1993) "Toward replication and extension of autoGanzfeld results." Proceedings of the Parapsychological Association 36th Annual Convention, Toronto, Canada, pp. 177-191.
Morris, Robert, Kathy Dalton, Deborah Delanoy and Caroline Watt, 1995. “Comparison of the sender/no sender condition in the Ganzfeld.” Proceedings of the 38th Annual Parapsychological Association Convention, pages 244 – 259.
Parker, A., 2000. “A Review of the Ganzfeld Work at Gothenburg University.” Journal of the Society for Psychical Research..
Parker, A., 2003. “We ask, does psi exist?” Journal of Consciousness Studies, 10, No. 6-7, 111-134.
Parker, A., 2000. "A review of the Ganzfeld work at Gothenburg University." Journal of the Society for Psychical Research, 64, pp. 1-15
Radin, Dean. 1997. The Conscious Universe: The Scientific Truth of Psychic Phenomena. San Francisco: HarperCollins.
Radin, Dean. 2006. Entangled Minds. New York, NY: Pocket Books.
Rosenthal, R., (2002) "Covert communication in classrooms, clinics, courtrooms, and cubicles." American Psychologist, 57 (11), pp. 839-849.
Schlitz, M.J., & Honorton, C., (1992). "Ganzfeld psi performance with an artistically gifted population." Journal of the American Society for Psychical Research, 86, pp. 93-98.
Wright, T., & Parker, A., 2003. “An Attempt to Improve ESP Scores using the Real Time Digital Ganzfeld Technique.” European Journal of Parapsychology, 18, 69-75.
Recent Comments