The Book of Woe: The DSM and the Unmaking of Psychiatry (25 page)

Read The Book of Woe: The DSM and the Unmaking of Psychiatry Online

Authors: Gary Greenberg

Tags: #Non-Fiction, #Psychology, #Science

BOOK: The Book of Woe: The DSM and the Unmaking of Psychiatry
10.63Mb size Format: txt, pdf, ePub

Narrow’s regular clinical interview doesn’t seem to include taking a swing at any of these hanging curveballs. Instead he delivers a volley of questions about whether she has ever been manic or terrified of germs, or had “worries that go around and around in your mind” (no, no, and no). He inquires into her smoking habits (a pack a day), her drug use (LSD once or twice a year, and marijuana if someone offers it to her), and her drinking (none).

“So what kind of person do you think you are?” he asks. “Shy? Withdrawn?”

“Well, I’m not outgoing,” Virginia says.

Nearly a half hour has gone by since we met Virginia Hamm. Somewhere paint has dried and grass has grown, and I’m beginning to think that back at APA HQ, where Virginia is Emily and Dr. Narrow is Bill, there is some trouble between them. Because she is not helping him at all. Her monosyllabic answers are draining his monotone questions of whatever vitality they might have had. She might be illustrating a clinical point with her reticence, but Narrow’s interrogation isn’t exploring it. Nor is it clear what use he’s making of the Cross-Cutting Measures or the PROMIS or anything else from the heap of REDCap data, or just how these vaunted dimensional measures are supposed to work. Neither does he seem intent on establishing trust by turning the bits of information he is extracting into the first tendrils of intimacy or, for that matter, on achieving any other therapeutic goal. In fact, for the life of me, I can’t figure out what he is up to.

But then again, I’m not a psychiatrist, let alone a psychiatrist at the top of my profession’s food chain, and it is the nature of the mental health professions that they are practiced in their own little silos, which means that even after nearly thirty years in the business, I might still not know how a regular clinical interview is supposed to go.

My BlackBerry buzzes. It’s a message from Michael First, who has the row behind me all to himself, and who
is
a psychiatrist at the top of his profession. “I’m not sure what the point of this exercise is,” it says. I am reassured, if no less mystified.

As if he has read our thoughts (or heard the BlackBerry and realized he’s losing his audience), Narrow suddenly stops the Q&A. “I think I’m going to go onto the computer now,” he says. “I’ll be quick.” He turns toward it and taps away. It’s not clear whether he’s reassuring us or the patient or the women in the front row.

We are looking at a giant version of his monitor on an overhead screen. To the left is a list of clickable diagnostic classifications. He tells us he is going to open the module for Hoarding Disorder—“the most dramatic example that we have,” he says, although this was surely damnation with faint praise. (It also happens to be a diagnosis proposed for DSM-5; they’ve smuggled in an advertisement for one of their new products.) We’re looking at questions keyed to the proposed diagnostic criteria for Hoarding Disorder. Narrow clicks on the items that, based on the interview so far, he thinks Virginia meets. The computer asks him if he wants to enter a diagnosis of Hoarding Disorder. He clicks yes and then clicks another box. The page disappears before I can read it.

There is a sudden commotion from the front row; the women are waving their hands and stage-whispering to Narrow. I’m thinking that maybe they’re going to remind him that he has left out something important, something that will allow us to see what innovation this rigmarole brings to diagnostics or, for that matter, how it does anything other than provide a near-perfect, if unintended, example of the circles in which psychiatric nosology runs, the way he got to the diagnosis through the symptoms and the symptoms through the diagnosis, how Hoarding Disorder is another line carved in sand, a diagnosis that will no doubt be the object of scorn for the leaders of DSM-6 or DSM-7, fodder for their paradigm-busting cannons, an opportunity for them to justify the new book by decrying the old one’s reifications. Perhaps, I think, they’re going to stand up and say, “But Bill, you didn’t even pretend to be in doubt for a moment about the outcome—which, after all, let’s face it, you knew all along, since you watched the videotape of the real Virginia—so you couldn’t demonstrate how to get to the diagnosis or what it has to do with emotionality or schizotypy or antagonism or with the PHQ and the ASRM or anything we’ve seen other than the story about the brothers and the dumpster and the piles of papers, which didn’t require a computer to figure out, which indeed may have come to you despite the computer,” and then add, sotto voce, how it might not be such a good idea to put all this unreadiness and ineffectiveness on such naked display, not even here in front of this crowd whose sparseness must seem to him, and to Kupfer and Regier, like a blessing.

But it’s none of that. “You didn’t save it!” one of them exclaims. Narrow has missed a click and lost all the data he’s spent the last twenty minutes entering.

Flop sweat breaks out on his brow. “I’m sorry,” he says. “This is only the second time I’ve done this.” He sounds more sheepish than petulant, like a batter returning to the dugout after whiffing in the clutch, explaining to his teammates that he’d faced this pitcher only twice before.

“Should I cut this short?” Narrow asks the women. He’s also looking across the stage, where Kupfer has replaced Regier. It’s pretty clear Narrow has had enough; he’s begging for the hook. But none is offered, so he takes matters into his own hands announcing that he will skip over the modules for other diagnoses and will now turn to the severity measures.

These turn out to be quite simple and, unlike the other ratings, clear in their application, if not terribly revealing. He goes through the criteria proposed for Hoarding Disorder and asks Virginia to rate their intensity and the distress they cause her on a scale of one to five. Then, “because we don’t want total dependence on all these forms,” he gives his own rating, as we field trial clinicians will be expected to do. “I don’t have a lot of experience with this disorder”—and how could he, since it doesn’t yet exist, at least not officially?—“but I would say it’s moderate. It could be a lot worse. I mean, there’s no dead animals in there.”

Narrow looks back at the front row. “Anything else?”

“Save it,” someone replies.

He’s clicking through that procedure when Kupfer takes the lectern. There’s little mercy in his move. He doesn’t even thank Narrow (or Kuhl) for his efforts. Instead, he tells us to pick up our clickers. “Given the fact that we’ve spent a reasonable amount of time on the interview,” he begins (and I don’t hear any irony in his voice), we ought to be able to come up with a diagnosis. A list of choices flashes on the board. The steel drums play. The percentages are revealed. Sixty-seven percent of us have voted for Hoarding Disorder.

It is a regular landslide, unless you are in, say, Turkmenistan. Which, evidently, Kupfer wishes we were. He wants to know why a full third of the room—about a dozen of us—have not voted for the party-endorsed candidate. (I’m wondering the same thing. Assuming the five APA functionaries—15 percent of the electorate—voted the right way, only half of the rest of us voted for Hoarding Disorder, and I had cast votes for Hoarding Disorder on the four clickers I could easily reach.) He thinks the next question might help to provide an answer. How useful were the new criteria in reaching our conclusion? When 65 percent of us answer either
moderately
or
extremely
, he observes that this is pretty much the same percentage of the crowd who voted for Hoarding Disorder, as if this somehow strengthened the credibility of the criteria, as if it did more than indicate how circular diagnostic logic is. Of course they were useful in making the diagnosis; there was no other way to reach it and,
pace
that dissenting 33 or 35 percent, no other diagnosis to reach. (I’m feeling a little guilty about that two-point discrepancy; I cast a vote for
moderately
on one less clicker than in the first poll.)

When 31 percent of voters say they think Virginia had Mixed Anxiety-Depression—a proposed diagnosis that has so little to do with Virginia that I figure it was thrown in just to fill out the multiple choices—I begin to wonder if someone is intentionally committing mayhem. Ten people who want to embarrass Kupfer? A terrorist cell sent in by the proponents of MAD to blow up HD? Acting out by the terminally bored?

Kupfer must be wondering, too, because he invites people to come forward and explain their votes. No one does. He moves on. He asks how useful we found the Cross-Cutting Measures and whether the forms were too long (50 percent) or too short (4 percent; he’s lost even the front row). When 53 percent say that the DSM-IV criteria are superior (a meaningless question because the DSM-IV doesn’t list Hoarding Disorder), Kupfer is quick to call the result a “nice split.” When 30 percent say the new approach is superior to DSM-IV’s, and 20 percent say it’s equivalent, he points out that this means half of us thought it was the same or better. But this turd cannot be easily polished, and when Narrow ends the presentation by saying, “Of those who said this is worse or much worse, we’d like to hear why,” it is hard to imagine that he means it or that either man ever wants to hear about field trials again.

•   •   •

When the session is turned over to the audience for questions, Michael First stands up. He waits in line at the audience mic while a man takes Narrow to task for not asking more about Virginia’s substance abuse. “I hope you weren’t rating my interview, as opposed to the general approach,” Narrow responds.

First doesn’t attack Narrow for being unprepared or criticize his technique or ask him just exactly what the point of that exercise was. Instead, he says that as he watched the demonstration, he was wondering how he would have asked the questions and how every other clinician would have asked them; and, realizing that there are as many regular clinical interviews as there are clinicians, he was also wondering how Narrow, in his role as the head of research for DSM-5, was going to deal with that. How will he know, in the likely event of diagnostic discrepancies among clinicians, that they are the result of the criteria rather than the way each clinician asks the questions? How, in other words, will the field trials do the job they are supposed to do—evaluate the reliability of the new DSM?

“Well, that’s a very complex question,” Narrow replies, and proceeds not to answer it, except to say that they had tried to figure it out in a pilot study and it hadn’t worked out.

But First asks the question again. Narrow acknowledges that not only are they not requiring a structured interview, they are not even training the study clinicians on the new diagnoses or “telling them how they should be interpreting these measures.” They are simply asking them to familiarize themselves with the website. But, he assures us, this is not a weakness in the design but a strength: the field trials will mirror how clinicians practice in the real world and thus yield more realistic results than the DSM-III and DSM-IV field trials did. Those old numbers, the ones that did so much to restore psychiatry’s respectability, Narrow is saying, were overstated, inflated by the pristine conditions under which they were conducted. But now that the APA has cleverly dirtied up the trials, Narrow tells us, he can almost guarantee that reliability will be worse than it was in earlier DSMs.

I have underestimated First. He has managed to unearth the point of the exercise after all: to prepare us all for the lousy outcomes the field trials were evidently designed to yield. As we file out of the room, I ask him what it’s like to be a bystander to the proceedings. “Oh, it’s absolutely excruciating,” he answers, as if that were obvious. Which, come to think of it, it is.

•   •   •

Narrow was correct about at least one thing. As Helena Kraemer, the chief statistician on the DSM-5 task force, told a much larger crowd the next day, “
People’s expectations of what reliability should be
11
have been grossly inflated.” She left no question about who was responsible for this: Bob Spitzer.

Spitzer knew it was not enough to ask two doctors to diagnose a patient, compare their answers, and use the results to pronounce judgment on whether the diagnosis was reliable. That approach wouldn’t account for the possibility that the clinicians agreed by chance—by, say, flipping a coin or tossing a dart or just plain guessing—rather than because the diagnostic criteria were well written. Fortunately for Spitzer, in 1960 a statistician named Jacob Cohen had invented a method for calculating the extent to which agreement between two people using the same rating scale is the result of factors other than chance. The statistic had come to be known as
Cohen’s kappa
, and Spitzer, working with Cohen, had adapted it for use in evaluating the reliability of diagnoses.

Spitzer and Cohen introduced kappa to psychiatrists in 1967, promoting it as a way out of the reliability mess. At first, they used it primarily to quantify just how bad things were, and this agenda shaped the way they addressed a problem built into the statistic. A kappa of 0 indicates that any agreement is by chance alone; a kappa of 1 indicates that researchers have come to the same conclusion for nonrandom reasons (presumably because the criteria work). But what do the numbers in between mean? How much agreement is sufficient to call a diagnosis reliable (or not)? After all, even a low kappa means that clinicians outperformed coin tossers or monkeys at typewriters.

This has turned out to be a hotly contested question, or at least as hot as anything in statistics gets. In 1974, Spitzer proposed an answer. Kappas of around .40, he said, indicated “poor” agreement, .55 was “no better than fair,” .70 was “only satisfactory,” and more than .80 would be “uniformly high.” But as California professors Stuart Kirk and Herb Kutchins noted,
Spitzer “could have employed
12
very good
,
good
,
not so good
, and
bad
,” and they pointed out that there was a reason he didn’t. Spitzer’s 1974 paper was an attempt to put numbers to the widely noted poor reliability of DSM-II diagnoses, “belittling the reliability of the past,” as Kirk and Kutchins put it, in order to set the stage for the transition to a criterion-based future.

Other books

Avador Book 2, Night Shadows by Martin, Shirley
Black Dawn by Desconhecido(a)
The Faceless One by Mark Onspaugh
Hell on Church Street by Hinkson, Jake
Marionette by T. B. Markinson