On the Ethics of Clinical Data Collection: Are Data Informative or Transformative?


All psychologists who do third party billing, and particularly those who work with electronic health records, provide, whether they know it or not, a steady stream of patient related data to an unseen army of analysts. Every coded encounter gets swept up and tossed into an analytic mill, where insurors, actuaries, and others chart healthcare engagement, costs, outcomes, and a myriad other factors. Psychologists who work in most healthcare delivery settings, and even independent practitioners, are increasingly bound to the Promethean rock by two adamantine chains: electronic health records and outcomes data. Viewed separately, it is easy to characterize these chains as real fetters that constrain our practices and limit our independence. But there is the possibility that combining the chains will be transformative, and potentially even turn into a benefit that will assist patients and providers alike.

Insurors and regulators are increasingly concerned about the ability of clinicians to demonstrate positive outcomes. Patients must get better, and preferably quickly, in order for providers to be reimbursed. Unfortunately, despite years of trying, we don’t have very robust standardized outcomes measures. One of the most common outcomes measures, and one that is increasingly relied on by planners, is the Patient Health Questionnaire -9, or PHQ-9. Variants of this form exist (there is a PHQ-2, which some say produces outcomes data that are as informative as those from the longer version). Whichever version you choose, it’s not very good. It may or may not surprise you to know that the PHQ-9 is based on an earlier assessment tool called the PRIME-MD, which was developed in the 1990s essentially as a helpmeet to pharmaceutical companies to aid the sale of antidepressants in primary care. The copyright on the PHQ-9 is still owed by Pfizer, Inc.

While not all outcomes measures have copyrights by pharmaceutical firms, most are problematic. First, they are not terribly descriptive, and it has long been argued that there is scant incremental validity of a standardized form over the classic “How ya doin’?”.  Second, some of the better outcomes measures, like the OQ-45, take significant amounts of time to complete and are therefore not widely used clinically, although they may be robust research tools. Third, few if any of these were devised specifically for psychology.

But the biggest issue with most outcomes questionnaires is that they simply do not capture patient progress in any meaningful way. Scores on instruments may move in a desired direction, but this does not truly describe a patient’s trajectory through an episode of care. What is statistically significant is not, often, clinically significant. At some level, clinicians understand this, which creates resistance to compliance with payor mandated submission of standardized outcomes data. An alternative exists, and one that when fully utilized may provide us with a system that better predicts patient wellness but also alleviates the burden on patients and providers of completion of outcomes measures.

Let us, for argument’s sake, posit that data are informative, and they are never transformative. The best data answer a question or set of questions with precision and brevity. Utilizing the scientific method, discrete data points accumulate to provide us with new understanding of a problem. “Aha” moments exist, but they occur rarely. In general, the process of discovery is slow, painstaking, and truth be told, rather tedious. Discrete data (information) points accrete gradually. Taken together, such data can be used to transform our understanding of a particular problem, but it is the synthesis, not the individual datum, that creates new understanding, i.e., is transformative.

But now I will argue that an exception to this exists. An individual datum will always be informative and not transformative, but big data can be transformative. “Big data” in health care refers to data sets that are collected on the basis of interactions with many hundreds of thousands of patients treated by many thousands of clinicians. One of the biggest health data sets that exists is collected by the U.S. military, which tracks the aggregate health care provided to approximately 9 million beneficiaries. Every patient encounter is recorded in this system, every prescription, every procedure, every diagnosis, every test, and every recorded outcome. The VA, large civilian HMOs, and others also utilize parallel data sets. Electronic health records, like EPIC, that are used by small groups and other healthcare systems, also collect data on many thousands of encounters: data provided by all of us, wittingly or not. These data, in aggregate, can be used to identify factors predictive of positive outcome—factors generally not included in current clinical outcomes measures.

We know that standardized questionnaires do not adequately capture patient progress. We also know that, however effective our interventions are, the amount of time we spend with patients represents to them a vanishingly small part of their lives. The modal number of psychotherapy visits remains stubbornly stuck at 1. The average length of psychotherapy sessions remains at around 8 sessions—not a significant portion of any patient’s life. Our estimates of our abilities to transform patient’s live are very likely highly inflated. So how do we ultimately measure the outcome of our interventions?

The key is utilization of condition management schemes that rely on the existence of big data sets. Such mechanisms allow us not only to track the progress of a patient through an episode of care but to detect if other factors, perhaps not directly related to that episode of care, have affected a positive or negative outcome. Here is where big data become transformative, because what they reveal as curative factors are often counterintuitive or unexpected. Incorporation of these unexpected factors can lead to algorithms that guide more effective care.

It follows, then, that to use these data sets we have to rethink how we measure progress, not as responses on standardized questionnaires but as separate factors that are predictive of a return to wellness or optimum functionality. Such factors are difficult, if not impossible, to identify without reliance on big data sets, as they may be both theoretically and temporally removed from an evidence-based intervention. Using large data sets to compare the status of two groups of depressed patients, one that has returned to full functionality and one that remains impaired might, to use a far-fetched example, reveal that the prescription of a multivitamin, rather than any antidepressant or course of psychotherapy, made the difference between stasis and recovery—a serendipitous finding that emerges from the analysis of very large data sets.

Scary? Possibly. These data repositories can be hacked, and while there are significant protections that guard against individual patient data being divulged, breaches and misuses do occur. As a recent, horrible example, the insuror Aetna sent thousands of letters via U.S. mail that identified patients’ HIV status on a see-through envelope window. Other large hospital systems have recently had patient information held hostage by hackers, permanently compromising data in those systems (even the payment of electronic ransoms cannot guarantee that individual patient data might not be divulged in the future).

Psychologists and other healthcare providers have an ethical obligation to ensure that health delivery systems protect privacy. Because of the sensitivity of our work we are held to a higher standard to ensure that patient data are safeguarded from disclosure. We also have an ethical obligation to ensure that large healthcare organizations do not misuse such data, for example, by imposing treatment regimens that do not account for the needs or wishes of individual patients.  Rather than focus on these ethical concerns, however, many psychologists have adopted the view that the very existence of these data sets is unethical. But these data sets are a reality, and have been for many years. In this electronic era, it is as pointless to argue against amassing large data sets as it was for King Canute to try and stop the incoming waves a millennium ago. Rather, the better questions are a) how to we protect the confidentiality of these data on both an individual and systems level, and b) what surprises do they hold that might allow us to materially advance the efficacy of our treatments for patients with mental disorders?

Copyright © National Register of Health Service Psychologists, All rights reserved.

1200 New York Ave NW, Ste 800

Washington DC 20005

p: 202.783.7663

f: 202.347.0550

Endorsed by the National Register