Online Courses and CE: We offer a series of online educational programs for professionals and the public. Visit us here for previews and discounts on our online programs.

Follow PsychologySalon on Facebook: Become a fan of the PsychologySalon page; updates will appear in your news feed.

Looking for a therapist? We have eleven registered psychologists in our clinic, and we are accepting new clients. For information, visit

Friday 7 November 2014

The PHQ-9: A Tool for the Overdiagnosis of Depression

Physicians and other clinicians like to have quick and easy screening tools for common disorders. One of the most common mental disorders is Major Depression, so there is an understandable desire for a tool that can give the clinician a quick sense of whether there might be a problem.

The Patient Health Questionnaire, or PHQ-9, seems like a good option, and it is one of the most widely-used screening tools in clinical practice. The main section has nine items, corresponding to the nine diagnostic symptoms for Major Depressive Disorder in the DSM-5. Patients rate how they have been doing in each area using a 0 to 3 scale. “Over the last 2 weeks, how often have you been bothered by any of the following problems?” The ratings are accompanied by the following descriptors:

0: Not at all
1: Several days
2: More than half the days
3: Nearly every day

The “Over the last 2 weeks” specifier is relevant, because to count as a Major Depressive Episode (MDE) the symptoms have to have been present for at least two weeks. Importantly, eight of the nine DSM-5 criteria also specify that the symptom must be present "nearly every day" (NED, the suicidality item is excepted, and weight gain or loss over the course of a month without dieting can be substituted for appetite change).  

There’s a second item asking how difficult these problems have made it to work, take care of things at home, or get along with other people. This also seems relevant, because in order to count toward a diagnosis the symptoms must be disruptive in a person’s life.

On many versions of the form patients can score their own answers simply by adding up the numbers (0-3) for the nine symptoms. The questionnaire takes people only a minute or two to complete, and the clinician can score it in seconds. Perfect.

And let’s be clear: We use the PHQ-9 as a routine instrument in our own clinic. We like it. We print out the measure – and happily throw the accompanying interpretive guidelines in the trash. But what if you don’t?

So what's the problem?

The PHQ9 is useful as a quick self-report measure that can be used as a springboard for a more formal face-to-face assessment of depression symptoms. If a client/patient truthfully returns a form with a score less than 5, it is highly unlikely that further questioning will reveal the presence of a current major depressive episode.

There are two problems, however.

First, many practitioners appear to use the scale to make the diagnosis, without formal followup in interview. This is not an appropriate practice, because there is too much room for interpretation with many of the items. Clients might score 3 points on item e (poor appetite or overeating), for example, if they have had a lifelong pattern of overeating that is unrelated to current mood problems. Questioning would exclude these points.

More significantly, there is considerable vagueness in most of the interpretive guidelines accompanying the PHQ-9. A set distributed by a prominent pharmaceutical company, for example (at states “Scores of 5, 10, 15, and 20 represent cutpoints for mild, moderate, moderately severe, and severe depression, respectively.” This mimics the wording of the most common versions of the scoring guidelines - and is a paraphrase of wording in an oft-cited article by some of the developers of the measure (Kroenke et al, 2001).

Notice the wording:  "depression." Not "Major Depressive Disorder." Throughout the writing on the PHQ-9 there is vagueness about whether we are talking about depression-like symptoms or a diagnosable medical condition. And it makes a difference. 

Comparing PHQ-9 cutoffs to DSM-5 criteria

If, reader, you are a clinician who uses the PHQ-9, I encourage you to get out a copy and lay it alongside your DSM-5.

Major Depressive Episodes are, as the guidelines indicate, coded as Mild, Moderate, or Severe in intensity (along with other specifiers outside the scope of this post). 

Mild MDE is diagnosed if a person just barely meets the criteria on five of the nine symptoms. This is impossible to do with a PHQ-9 score of less than 12. This would involve a person scoring three symptoms at “3”, the poor appetite/overeating symptom at "2" (a bit dodgy, but this might make up for the lack of a weight gain specifier in the PHQ item), and the self-harm score at “1”, given that suicidality need not be present most days. All other symptoms would have to be scored "0."

I have seen hundreds of PHQ-9s, and have never seen a profile like this. If a person scores 3 symptoms “3”, other symptoms are always present to at least some degree. But it is conceivable.

What isn’t possible is being truthful on the measure, getting a total score of 5 – or anything less than 12 – and meeting diagnostic criteria for MDE of ANY level of severity.

To meet criteria for MDE-Moderate, a person has to meet criteria for more than 5 symptoms or the intensity of symptoms must be significantly greater than that required for the cutoff – very unlikely without a score of at least 15. To meet criteria for MDE-severe the score would need to be significantly higher still.

The widely-used PHQ-9 cutoffs, if misinterpreted as describing Major Depressive Episode, simply do not match up with DSM-5 criteria.

Who cares?

Some versions of the interpretive guidelines for the PHQ-9 acknowledge the distinction between symptoms and disorders. Others don't. One set that is widely distributed on the Internet suggest that a score from 0 to 4 suggests "the patient may not need depression treatment." On the other hand, perhaps they “may” anyway. Maybe we all do, the guidelines seem to imply.

The “may” is arguably defensible. It’s possible that the patient simply didn’t understand the questions, or flat-out lied to minimize their problems. One could as easily say that if the patient neglects to complete the measure altogether they “may not need depression treatment.”

Most sets of guidelines seem to suggest to physicians and others that if a patient scores 5 or more, the best guess is that the person has Major Depressive Disorder. This is far enough wrong that it comes across in some cases as the result of deliberate distortion. It is no surprise that the PHQ-9 is embraced with enthusiasm by pharmaceutical companies hoping to sell product.

If physicians take the suggested cutoffs seriously, the result would be (has been, perhaps?) a mammoth rate of overdiagnosis of Major Depression. 

The main function of diagnosis is to point the way to treatment. The dominant form of treatment for depression in today’s healthcare system is the prescription of antidepressant medication. The risk, then, is that vast numbers of people not suffering from Major Depressive Disorder will be prescribed medication for normal-range mood disturbance.

The effectiveness of antidepressant medication has never been properly evaluated with this group. Recent reviews examining their effectiveness with the largest group of MDE sufferers – those meeting full criteria for the Mild form – suggest that it is very difficult to discern a therapeutic effect over and above the placebo response (Fourneir et al, 2010; Kirsch et al, 2008). Given this, it’s hard to imagine that the results would be more impressive if we actually had data looking at antidepressant effectiveness in the subclinical population.

What should we do?

I like the use of brief screeners, but I sometimes shudder at the thought of what people do with the results. At our clinic we use both the PHQ-9 (as mentioned above) and the GAD-7, but we treat the cutoffs as pharmaceutical-promotion literature and throw them away. We never diagnose based on screener results, instead using them as jumping-off points for formal diagnostic interview. This seems to retain the usefulness of the measures but compensates for their shortcomings.

A BONUS: Does your practice will involve diagnosing clinical depression? Maybe my online course "Diagnosing Depression Using DSM-5" can help. Click here to access this $25 course for 80% off, or just $5.  


Fourneir, JC, DeRubeis, RJ, Hollon, SD, Dimidjian, S, Amsterdam, JD, Shelton, RC, & Fawcett, J. (2010) Antidepressant drug effects and depression severity: A patient-level meta-analysis. Journal of the American Medical Association, 303, 47-53.

Kirsch, I, Deacon, BJ, Huedo-Medina, TB, Scoboria, A, Moore, TJ, & Johnson, BT (2008) Initial severity and antidepressant benefits: A meta-analysis of data submitted to the Food and Drug Administration. PLoS Medicine 5(2):e45.

Kroenke, K, Spitzer, RL, & Williams, JB (2001) The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606-613.

YouTube VLog

I have now launched a YouTube VLog on psychological topics called How to be Miserable, with new posts every Tuesday and occasional Thursdays! Come take a visit and see what you think. Consider subscribing (just press the big red SUBSCRIBE button on the page) to ensure that new videos appear in your YouTube feed.  Here's the intro video: