Let me start out by saying that this is all my opinion. I am throwing my thoughts out there in the hope that this will stimulate some discussion about what study design needs to look like and what kinds of evidence is appropriate to represent acceptable evidence that a technology does, or does not provide improvements.
I periodically see or discussions that disparage the evidence for automation as being "weak" for lack of study design, with the notion that the only good study is a randomized, double-blinded, placebo-controlled study with powerful statistical significance.
There can be no doubt that such study design is necessary when dealing with the effects of therapeutic drugs in humans:
- There is a well-known placebo effect - a 1996 study in the British Medical Journal on the effect of pill color on perceived effectiveness that is an interesting read.
- Evaluation of effectiveness by clinicians can be colored by their expectation of (or desire for) results.
- There can be wide variance in human response to a medication; we can only perceive true effectiveness when enough patients from a wide enough variety of patients have tried the medication and it can statistically be shown that those who were treated fared statistically better than those who were not, or who received more standard therapy.
Even then, there can be, and are constraints on what kinds of studies can be performed. For instance, there are ethical prohibitions on denying therapy to patients that might be life-saving or for providing therapies that might be more harmful than useful.
Further, we can, and should rely on basic science to inform some of our decisions. I daresay that none of us needs a double-blind, controlled trial on parachutes nor would any of us agree to participate in such a study. We all experience gravity every day and do not require a study to remind us that it still exists. It is a fundamental, and well-documented part of our human experience (unless, of course, you are an astronaut).
So, how are studies involving technology different?
- There really aren't placebos for most of our technologies. They are large enough, and invasive enough that we cannot blind their use. They are either there or they are not. If we need to demonstrate benefit, we need to measure the involved processes before the application of technology and afterward.
- Our technologies either produce specific end points or they do not. In that sense, their outcomes are quite measurable and deterministic. Statistics therefore provide less value in differentiating pre- and post-implementation measurements. The question is less about are the pre- and post- systems statistically different than it is about whether or not any difference is meaningful. Most of our technologies can produce reams of data at exceptional levels of detail. So if, for example, the desired outcome for a technology is an improvement in the speed of a process, then it is less important that the pre- and post data are statistically different than it is about whether the improvement in speed is sufficient to produce other benefits. If the improvement in speed is 0.2 seconds, and that difference is statistically different, it is still unlikely to produce a benefit unless the process being measured occurs hundreds of thousands of times a day.
- It does turn out that definition of those endpoints if often where technology studies fail. Automation generally affects processes, and the entire process needs to be measured, not just sub-processes. For example, a study of IV workflow that focuses only on the time it takes to physically prepare the dose would not be able to measure benefits like reducing the number of doses being made in a day, or the reduction in waste from being able to re-purpose doses that were made but no longer needed, or the ability to capture the "dead" time between preparation and checking to reduce the overall transit time of a dose through the preparation process.
- It also turns out that good technology can change the way work is performed in ways that require some time to relearn how to perform old processes in new ways. So studies on the adoption of technology need to include some measure of what kinds of work changes were needed and how long it took for people to become proficient in the new, automated processes. And it is critically important that measurement of the impact of these technologies be measured after the users of the technology have become proficient with its use.
- Technologies may be incompletely designed for the work they are to address. For example, an IV robotic system that can only make doses that have a single active ingredient in a single, commercially available fluid cannot completely replace the laminar air flow hood, because those doses with multiple ingredients still have to be prepared somewhere. There may be enough of those single-agent doses to be prepared that the robotics are still valuable, but it won't eliminate the need for the IV room any time soon. The more things that have to be prepared and dispensed via alternative mechanisms, the less valuable the automation is likely to be.
This list is not meant to be exhaustive; it just contains the things that have been running through my head about what kinds of evidence are needed to properly evaluate automation. It seems pretty clear to me that double-blind, placebo-controlled trials are unlikely to be practical, and that, ultimately, statistical evaluation of results is going to be less meaningful than other, more concrete measures (like reduction in cost, improvement in speed, or improvement in accuracy) which are rather deterministic.
So what do you think?
As always, the contents of this blog represent my own thinking, and not necessarily that of ASHP of my employer, BD.
Dennis A. Tribble, PharmD, FASHP
Ormond Beach, FL
datdoc@aol.com