Online Courses and CE: We offer a series of online educational programs for professionals and the public. Visit us here for previews and discounts on our online programs.

Follow PsychologySalon on Facebook: Become a fan of the PsychologySalon page; updates will appear in your news feed.

Looking for a therapist? We have eleven registered psychologists in our clinic, and we are accepting new clients. For information, visit

Tuesday 7 April 2015

Publication Bias and Meta-Analyses: Tainting the Gold Standard with Lead

Cute, yes. But effective?
As Ben Goldacre notes in his excellent book Bad Pharma, for decades the gold standard for medical evidence was the review article - an essay looking at most or (hopefully) all of the research on a particular question and trying to divine a general trend in the data toward some conclusion ("therapy X seems to be good for condition Y," for example).

More recently, the format of review articles has shifted - at least where the questions addressed have leant themselves to the new style. The idea has been to look at the original data for all of the studies available, and in effect reanalyze them as though the research participants were all taking part in one gigantic study. By increasing the number of data points and averaging across the vagaries of different studies, a clearer finding might emerge.

The meta-analysis has gone on to be revered as a strategy for advancing healthcare. It has vulnerabilities, of course:

  • It depends on the availability of a number of original studies.
  • It can be distorted by a particularly strong result in one study with a lot of participants
  • It can only be as strong as the research design of its constituent parts. 

Nevertheless, if there are a number of well-designed studies with roughly similar formats addressing a similar question, the meta-analysis can provide a balanced, weighted result that points nicely toward treatment selection decisions.

But how are meta-analyses affected by unpublished studies? 

In my last post I discussed how a publication bias (most commonly, a bias against publishing negative results) leads to a situation in the literature roughly equivalent to reporting only the participants who benefited from a treatment - and slipping under the rug the data from those who did not. And in fact there is a problem for meta analyses.

Imagine that we want to evaluate the effectiveness of a radical new therapy in which depressed individuals talk about their relationships with their pets to the therapist. I don't practice this form of therapy myself, you'll be happy to know, but I'm sure someone does. Call it "Talking About Cats Therapy," or TACT. Studies examining it compare participants' mood improvements from pre- to post-therapy with the improvement seen in a placebo therapy (PT; let's make it a sugar pill, for simplicity's sake, though you'd generally want something that looks more like the treatment being tested).

We look at the published literature and find that there are six published studies. By an amazing coincidence, all six had the same number of participants (100; 50 in each condition), roughly similar outcomes (TACT participants improved on average 4 points more on the Beck Depression Inventory than PT participants), and the same amount of variability in response (lots: in every case, some people improved a lot and some less; a few even worsened).

Given this wide variability, we'll imagine that only two of the studies meet the effect size necessary to achieve statistical significance. In the other four studies TACT was statistically no better than PT, despite still showing a 2-3 point advantage for TACT.

We conduct our meta-analysis, combining the subjects of the 6 studies into one analysis with 600 participants - 300 in TACT and 300 in PT. We've averaged the greater gains made by the participants in TACT - which comes to 4.0 points overall. But because we now have 300 people per group, our study is more powerful - and that 4-point difference is enough to reach statistical significance - at a higher level (p>.01) than the two original studies that were significant (both p>.05).

But there's a secret.

In our fantasy universe there weren't just 6 studies of TACT versus PT. There were 10. In 4 of the studies the results suggested that TACT actually made people worse, and the people receiving sugar pills improved a little due to expectancy (about the same amount as they did in the published trials).

Those four studies, like most of the many unsupportive studies of antidepressant medication discussed in my last post, were not published.

The developers of TACT, who firmly believe in the therapy (and stand to make big money from a well-supported therapy via training workshops), decided that there must be some flaw with these negative studies. In retrospect, the therapists weren't perhaps so well-trained, and somehow there were a lot of people who didn't actually like their cats in the TACT condition. And anyway, the journals surely wouldn't be interested in publishing articles about therapies that are worse than placebo, so no point in trying.

But this unpublished data is important.

If we conducted a meta-analysis on all 10 studies, we would find that the positive-ish and negative studies average out, leading to a difference between TACT and PT of 0.00: a complete null effect. The unavailability of negative trials causes our state-of-the-art meta-analysis to misperceive a null therapy as effective.

Why does this matter?

When negative studies go unpublished, and when meta-analyses depend only on the published work, the problems of biased data are not averaged out; they are combined. The result can be a stronger finding for a null or harmful therapy than was found in ANY of the studies upon which the meta-analysis was based (stronger, that is, in terms of significance level). Theoretically, it would be possible to obtain a significant meta-analysis of a hundred studies, none of which had reached significance on their own.

Meta-analysis is often viewed as a way of averaging out results and flaws in constituent studies. The lack of representativeness brought about by the nonpublication of negative data (which is the most common type of publication bias) is not compensated for by combining the published studies - it is made worse.

The researchers working with the Cochrane Collaboration, a group dedicated to creating systematic reviews of medical therapies, attempt to correct this problem by locating research trials that have gone unpublished. The results are frequently at variance with the conclusions that would be reached by a review of the published data alone - largely because researchers (or funders) frequently opt not to publish trials that are unsupportive.

Does this really matter? After all, if you are arguing that it is possible for a human to climb Mount Everest without oxygen, it takes only one positive result to make your point. It is irrelevant how many previous attempts resulted in failure.

In healthcare research, however, it matters a great deal. We are looking to see not whether it is possible for a given therapy or approach to benefit at least one person who gets it. Every therapy - whether it is past-life regression, Vitamin C, or high-colonic enemas - will appear to have helped someone, whether because of expectancy, spontaneous recovery, or pure chance. It is for this reason that patient testimonials are not considered to be valid evidence in favour of health-related procedures.

The question we are always asking is whether a therapy is effective (or damaging) for a group of people, be they male airplane phobics, all diabetes sufferers, or post-transplant patients on immunosuppressive drugs. We look at the variability (versus consistency) of response across individuals in our target group, the magnitude of effect, and the size of the effect once the influences of expectancy are removed (usually by comparing the treatment group with a placebo condition). This is precisely the type of judgement likely to be affected by examining only a subset of the data.

What this means is that although meta-analysis is a tremendously useful tool in healthcare research, it remains subject to one of the largest sources of research bias - the selective publication of results.

What should we do?

The obvious solution, arrived at by anyone who looks at the problem, is to create a registry for trials before they are carried out, with the understanding that only pre-declared trials will be published, and that all pre-declared trials will be published regardless of the results.

This initiative, at least for pharmaceutical trials, has been agreed upon and declared by a consortium of prominent journals, leading many of us to believe that a big part of the problem had been solved. (At least for medications commencing trials now - it is still not helpful in resolving the situation for medications already on the market). I have openly stated as much at numerous workshops on depression treatment.

Unfortunately, I may have spoken too soon. According to Goldacre, the solemn pronouncements of the editors of many of medicine's most prestigious journals have meant what few of us were cynical enough to fear: Nothing at all. The journals have gone on publishing unregistered trials much as they did before.

There's just one difference. Having seen and acknowledged a fundamental problem that compromises the validity of the research they promote, their actions constitute an overt and conscious (rather than simply neglectful) abandonment of the principles of science.

Whether it will be decided, perhaps by future editors, that the welfare of patients merits an improvement in practice remains to be seen. We can only hope.


Goldacre, Ben (2012). Bad Pharma. New York: Faber & Faber.