Skip to main content

Verified by Psychology Today

How Strong Is the Evidence For Evidence-Based Therapies?

A recent review highlights a need for more rigorous testing of treatments.

Dmytro Zinkevych/Shutterstock
Source: Dmytro Zinkevych/Shutterstock

The world of mental health treatments is vast—ranging from cognitive behavioral therapy (CBT) and Acceptance and Commitment Therapy (ACT) to psychodynamic approaches, to exposure therapy, family therapy, and others. Some treatments have stacks of empirical studies backing them up.

But in the last decade, scientists have found that a variety of seemingly well-supported psychological findings may actually be the product of questionable research practices and other methodological shortcomings. Concerns raised by the “replication crisis,” which undercut some high-profile ideas in social psychology and other areas of the field, are touching clinical psychology as well.

A team of psychologists recently analyzed the research connected to therapies that have been designated “empirically supported treatments” (ESTs) by Division 12 of the American Psychological Association. The clinical psychology division has labeled more than 80 treatment types based on how well-supported each one is by research, giving ratings such as “strong,” “modest,” or “controversial.”

The reviewers applied a range of measures to assess the robustness of studies flagged as supporting these treatments. These included intuitive tests of quality, such as how frequently studies of a particular treatment misreported statistics, as well as less obvious ones. One metric used, called the Replicability-Index, accounts for how statistically well-powered a study was to find the reported effects. Statistically unlikely results could signal "p-hacking" or selective reporting that masks a less-flattering body of evidence.

“We selected a diverse array of metrics that tapped into different ideas of what it would mean to have good scientific evidence,” says John Kitchener Sakaluk of the University of Victoria, who co-authored the review, published in the Journal of Abnormal Psychology.

“EST research was typically underpowered with a co-occuring inflation in the reporting of statistically significant effects,” the authors write. In other words, as is the case with many past psychology studies, the treatment studies tended to lack statistical power (which is influenced by the number of participants in a study, among other factors), and the proportion of positive results was often implausible. According to the authors, fewer than half of the treatments’ evidence bases were “consistently credible” on all metrics. For some treatments, the research showed weaknesses across the board. For others, the results were more mixed, with both positive and negative signs. “In those cases, it's more difficult to adjudicate their current status,” Sakaluk says.

It’s not necessarily the case that treatments for which the evidence was ambiguous are ineffective. The evidence for Dialectical Behavior Therapy (DBT) in the treatment of borderline personality disorder, which the APA's Division 12 has called "strong," received low grades from Sakaluk and colleagues on nearly every metric. “But our review is not saying that DBT for borderline personality disorder does not work,” Sakaluk stresses. “What we’re saying is that if this is what it means to be a strong EST, and that's how DBT for borderline is labeled, those articles do not appear, in terms of these metrics, in the way that I think most people would hope.” The scope of the review was limited to references listed by Division 12; other research on DBT may be more rigorous.

Some treatments fared better under scrutiny. The studies supporting cognitive behavioral therapy for generalized anxiety disorder and obsessive-compulsive disorder, for example, were judged strong on most of the metrics applied.

The broad range of research quality emphasizes the need for patients to talk to their therapists about the effectiveness of therapy. “It’s been recommended for many years now that, in therapy, a clinician and a patient assess how well the process is going,” says co-author Alexander J. Williams, a clinical psychologist at the University of Kansas who supervises students in the use of empirically supported treatments. “Probably, what’s going on here is a need to double down on that. We’re not saying that anything goes.”

He advises that “the patient should be checking in with the therapist—‘How will we assess how therapy is going?’—the therapist should be encouraging the patient to communicate with them about how they’re doing, and that should be an ongoing, frequent part of therapy.”

The review “solves only some of the problems” with how treatments are studied, contends University of California, Davis clinical psychologist Christopher Hopwood, who was not personally involved. “Many of the issues are more fundamentally related to [study] design and measurement.” He’s skeptical of what he describes as “the general premise of randomized controlled trials pitting ostensibly different treatments” against what he considers questionable diagnostic categories.

Testing treatment effectiveness can give an oversimplified picture of how therapy works, according to PT blogger Gregg Henriques, a psychologist at James Madison University. While randomized controlled trials, considered the gold standard in the field, are intended to make an air-tight case for a given treatment, they may not account for certain variables that are important to therapeutic outcome, such as a patient’s individual personality and commitment to treatment, he says. Also, the differently treated control groups that serve as a comparison for patients receiving the main treatment sometimes set too low a bar for assessing the treatment’s effectiveness.

Nevertheless, some clear-cut principles of effective treatment do exist, he says. Exposure and response prevention is one approach that cuts across different treatment types and is used for phobias and other conditions: In brief, if a person with a pathological aversion to something is carefully exposed to it, “and if nothing bad happens, the system habituates and develops new learning structures that function to inhibit the strong negative emotional reaction.” A generic approach to therapy that utilizes such principles, according to Henriques, could provide a stronger comparison group against which to test treatments in need of empirical support.