HARKing on Reproducibility


It’s October, barely the beginning of the fiscal year, and already there is another crisis. Now, what is being identified as a crisis is something we have heard about for quite a while–the status of the science that underlies our profession, and more importantly, our clinical interventions. Now I agree that as a profession, it is our obligation to ensure that if we advertise “science-based” interventions, we must guarantee the exactitude of our scientific findings and the methods they are based on as best we possibly can. Our repertoire of clinical interventions has expanded rapidly in the past half-century, mostly driven by the results of clinical investigations into “what works,” and more specifically, “what works for which conditions.” Think here about the evidence base for specific therapies like dialectical behavioral therapy (DBT), or specific interventions for insomnia, panic disorder and the like–all based on the outcome of well-designed, often double-blinded and randomized, trials.

So forgive me if I don’t respond to the latest scientific crisis with a huge amount of alarm. True, our science is often flawed. It is extremely difficult to create a double-blinded study of psychotherapy conditions, as wait-list controls are simply not informative, and key factors, such as therapist allegiance, or other critical variables, such as the therapist-patient bond, are extraordinarily difficult to quantify. Also, social science studies often lack the funding seen for investigations of drugs or medical devices and this makes replication more difficult. Finally, as a comparatively young science, our clinical investigations take us to places where a science-blazed path is faint, if existent at all. A lot of ink has been expended about the lack of science-based rigor in newer areas of psychological investigation–studies of feminist, LGBTQ, or ethnic minority psychologies, to name a few. Work in these fields is not terribly old and it is often hard to force investigations in these emerging areas into the categories that define more established fields of study. Often this work is narrative or qualitative, methods that don’t lend themselves too well to traditional analysis. I admit that sometimes it is bad: poorly thought out, inadequately quantified, or more reflective of a particular non-science-based agenda, often a political or advocacy-based agenda. And yes, in areas where scientific norms have not been fully developed, it is possible–possibly even likely–that more “bad” research makes its way into press than in better-studied fields. But the degree to which this invalidates emerging psychological science is uncertain.

Although it is dangerous to make relativistic comparisons, particularly using negative data, much is occurring in the world of established science that suggests we should not be over-alarmed by some of the less stellar work in newer, emerging areas of psychological science. Has not the larger, more established world of cancer research been recently fixated on revelations that leading researchers at leading institutions (Memorial Sloan-Kettering, for example) had deliberately hidden their well-remunerated ties to industry? The journal Science has yet again published a front-page series of articles (Science under scrutiny, 21 September 2018) questioning the method and results of many meta-analytic studies, often in areas of applied psychology. The chief exemplar in the meta-analytic critique in Science dealt, perhaps not surprisingly, with psychological research, specifically into video gaming and a suspect publication in the preeminent journal Psychological Bulletin. A follow-on article dealt with the widespread issue of “HARKing,” or “Hypothesizing After Results are Known,” a variant of what those of you who were exposed in your graduate school research methods courses to the masterful Cook and Campbell’s Quasi-Experimentation knew more poetically as “fishing and the error rate problem,” one of many post-hoc temptations those authors outlined. This is probably closest to the modern phenomenon called “p-hacking”, where researchers find p values supporting their hypotheses and emphasize these. Not necessarily academic fraud, but something that certainly tests the boundaries of rigorous data analysis and reporting.

Academics are pressured to publish, preferably positive findings; journals are pressured to maintain profits, and definitely prefer to publish positive findings; and new research technologies continually challenge orthodoxy. Excuse me while I stifle a dainty yawn, but weren’t all of us exposed to these dilemmas in our graduate education–mine occurred during the era of the computer punch card and perforated multifold readout. If you’re not old enough to remember those days, consider yourself lucky. It’s not that I’m dismissing the importance of correctly collected and analyzed data. I view fudging and post-hoc fishing as dimly as the most august “journalologist,” a term that has recently come into use to describe those who study scientific publishing and strive to improve the quality of published research. Nor do I disbelieve reports that p-hacking and other dubious methods are endemic, or that negative findings are routinely squelched. More importantly, I think the journalologists and other scientific watchdogs are correct in believing that the fixes to this solution aren’t hard. Preregistration of clinical trials is now mandatory for a drug or device seeking FDA approval. This isn’t a failsafe mechanism, but it provides some protection that undoubtedly psychological research would benefit from. A discipline wide commitment to sharing raw data would help, as would a commitment by professors to avoid seeking publication in for-profit journals of questionable quality (and to teach their students to steer clear of such temptations). A true embrace of negative findings, rather than a dismissive acknowledgement that such findings can be as informative as positive findings would provide balance in the archival record.

All good science is iterative, and our research technologies are not static. As was pointed out in one of the suite of articles in the 21 September Science issue, the whole concept of meta-analysis is barely 40 years old and our definitions of what constitute convincing research findings ever-evolving. Randomized controlled trials in medicine are themselves a recent phenomenon–according to some, though this is contested, the first RCT in medicine appeared in the late 1930s and was a study of psychostimulants in children with minimal brain dysfunction, now ADHD.

It is not that all psychological science is bad, nor is it all irreproducible. We probably stand up pretty well in this regard to science in most applied fields, including medicine. We can take some, albeit not much, comfort in knowing that the publication of erroneous or misleading findings in psychotherapy outcomes studies has less drastic real-world consequences than the publication of misleading results in treatments for cancer or other life-threatening diseases. What we cannot do is become complacent.

In the spring of this year, as readers of this column will recall, APA published a clinical practice guideline on the treatment of PTSD that greatly discomfited many in the practice community. At the time, I cautioned that avid embrace of these guidelines would be unwise. I simultaneously cautioned that wholesale rejection of such guidelines would be similarly unwise. Now, APA is seeking public comment on a guideline for the treatment of depression. I predict that a similar outcry will result when the guideline is finally released. Some will legitimately argue that the guidelines are based on only those favored therapies that have received the most scientific attention. This is a criticism that has at least some merit. Others will point out the weaknesses of the meta-analyses that underlie the guideline. This criticism also has merit. What isn’t meritorious is the belief that such guidelines are part of a vast conspiracy to eradicate certain forms of therapy, or to impose lock-step rigidity on the delivery of psychotherapy. CBT may be a perfectly justified intervention for depression that is rooted in inaccurate cognitive schemae. It may be less effective for a patient whose depression stems from existential questioning of their place in the world or their contributions to their profession or the lives of others. But surely it is incumbent on all of us to examine these guidelines, weigh their merits, and determine if what we practice is the best we can offer our patients.

Copyright © 2018 National Register of Health Service Psychologists. All Rights Reserved.

1200 New York Ave NW, Ste 800

Washington DC 20005

p: 202.783.7663

f: 202.347.0550

Endorsed by the National Register