The Teacher Wars (32 page)

Read The Teacher Wars Online

Authors: Dana Goldstein

BOOK: The Teacher Wars
8.35Mb size Format: txt, pdf, ePub

A more sensitive early value-added formula was developed in Dallas in the mid-1990s, where statisticians recognized the fact that disadvantaged students tend to experience slower academic growth than their middle-class peers, no matter how good their teachers. That's because poor children are more likely to experience out-of-school disruptions, such as poor nutrition, a move, or homelessness, which can affect learning. The Dallas research team created a value-added equation that included controls for children's demographic traits, such as parental income and proficiency in English, essentially giving teachers who worked with disadvantaged kids bonus points. This technique found smaller, yet still significant, teacher effects on kids' test scores.

Value-added measurement changed pretty much everything in our national conversation about student achievement. To assess a school's improvement or decline, No Child Left Behind compared the “snapshot” score of one group of third graders on an end-of-year math test to the scores of the children who were in third grade the previous year. These snapshots made the teachers and schools that serve poor children look especially bad, because those schools earned low scores year after year. Snapshots obscured whether any individual student was doing better or worse over time.
Growth measures that track one group of children over the course of several years, like value-added, present a more nuanced picture. But in 2001, when NCLB was designed, most policy makers in Washington hadn't heard about value-added. While the law's real-world
consequences were playing out in schools across the country, value-added research grew much more sophisticated. Economists created experiments that randomly assigned students within one school to various teachers, and then measured differences in test score growth. That method eliminated the bias caused by principals clustering the most challenging or most able students in particular classrooms. Researchers also identified more sensitive controls for the factors that influence a child's test score but are not related to his classroom teacher's performance. A value-added model developed by the University of Wisconsin for New York City included controls not only for family income and English proficiency, but also for a student's race, gender, disability status, how often he was absent from class, whether he had been enrolled in summer school, and whether he had recently moved, been suspended, or repeated a grade. The
New York City value-added model also compared teachers only to other teachers who taught similar-sized classes, and who had the same number of years of experience.

Using these methods, labor economists produced a massive body of research. It suggested that a teacher's pathway into the classroom—whether through a traditional teachers college, a graduate-level program in teaching, or an alternative program like Teach for America—hardly mattered with regard to how well they raised student test scores, nor did their college major. There was more value-added variation between teachers within a school than across all the schools in a district—a hopeful finding proving what many urban teachers had long argued: that even “failing” schools employ some excellent educators. First-year teachers were not very good, but they made major leaps in effectiveness by the end of their second year on the job, and they continued improving steadily for five to ten years, after which their measurable performance generally flatlined.

The results of these experiments remain “noisy,” as social scientists say. When value-added is calculated for a teacher using just a single year's worth of test score data,
the error rate is 35 percent—meaning more than one in three teachers who are average will be misclassified as excellent or ineffective, and one in three teachers who excel or are terrible will be called average. Even with three years
of data, one in four teachers will be misclassified. It is difficult, if not impossible, to compute an accurate value-added score for teachers who work in teams within a single classroom—a method rapidly growing in popularity—or for the
two-thirds of teachers who teach grades or classes not subject to standardized tests.

Some advocates of value-added downplayed these problems and made huge claims based on the technique. The Stanford economist Eric Hanushek, a fellow at the conservative Hoover Institution and proponent of cutting school funding, advanced the hypothesis that if poor children were assigned
five “good teachers in a row”—those with value-added scores in the top 15 percent—it would completely close the academic achievement gap between the poor and the middle class.
In a 2006 paper for the Brookings Institution, three economists, Robert Gordon, Thomas Kane, and Douglas Staiger, used similar logic to estimate that firing the bottom 25 percent of first-year teachers annually, as determined by a single year's worth of value-added data, could create $200 billion to $500 billion in economic growth for the country, by enabling poor children to earn higher test scores and go on to obtain better jobs.

The most important thing to realize about these claims, which appear frequently in the media, is that they are untested. According to Tulane University economist Doug Harris, another leading value-added scholar, no experiment has ever been conducted in which poor children are randomly assigned to multiple high value-added teachers in a row, to test if the achievement gap totally closes. “It's still purely hypothetical,” he told me, “and it would be an incredibly tough experiment to pull off.” Even if such an experiment did take place, Harris guesses that it would fail to confirm the hypothesis that teachers alone can close achievement gaps. Here's why: The Hanushek theory is that five teachers who each add 10 points to a child's test score will move that child from the fortieth to the ninetieth achievement percentile over the course of five years. But in real-world conditions, value-added gains tend to fade out over time; next year the average child will lose 50 percent of the test score gains she made this year, and by three years from now she will have lost 75 percent of this year's gains. According to Harris, that means the academic and economic effects of having multiple above-average
teachers in a row have been inflated by more than half. Effective teachers can narrow, but not close, achievement and employment gaps that reflect broader income, wealth, and racial inequalities in American society.

This reality was demonstrated by the most celebrated value-added study ever conducted. Economists Raj Chetty, John Friedman, and Jonah Rockoff tried to figure out if teachers who were good at raising test scores were also good at improving their students' long-term life outcomes—in other words, if value-added was a good proxy for some of the other goals, aside from raising test scores, that we want teachers to fulfill. Using tax returns and school district records from an unnamed large city, they examined twenty years of data from more than one million children and their teachers, tracking the students from third grade through young adulthood. One finding was that the current achievement gap is driven much more by out-of-school factors than by in-school factors; differences in teacher quality account for perhaps 7 percent of the gap. But it turned out that the group of students who had been assigned to just one top value-added teacher—a teacher one standard deviation more effective than the norm—experienced small, yet observable, differences in life outcomes. These students earned, on average, 1.3 percent more per year, the difference between a salary of $25,000 and $25,325. They were 2.2 percent more likely to be enrolled in college at age twenty, and were 4.6 percent less likely to become teen mothers.

The researchers posited that if there were a way to systematically move the top value-added teachers to the lowest-performing schools, perhaps 73 percent of the test score achievement gap could be closed. That, however, is a gargantuan policy challenge:
When a separate Department of Education/Mathematica trial offered more than one thousand high value-added teachers $20,000 to transfer to a low-income school, less than a quarter chose to apply for the jobs. (Those who did transfer produced test score gains among elementary school kids, but not among middle schoolers.) There was another major caveat, which Chetty, Friedman, and Rockoff acknowledged: Like almost every other major value-added study ever conducted, this one took place in a low-stakes setting, meaning
that teachers were not being evaluated or paid according to their students' test scores. It was possible, the three economists noted, that in a higher-stakes setting, test scores would lose their predictive power, for instance in cases where they reflected not students' true learning but rather teaching to the test or cheating.

Value-added measurement had proven to be a useful research tool. Now the question was whether it could actually be used as a policy instrument—to select, promote, train, reward, and terminate teachers. Education history (and some economists) urged caution. No Child Left Behind had provoked states to lower standards and the scores that would qualify as proficient. It had narrowed the curriculum and increased teaching to the test. These trends proved the wisdom of “Campbell's law,” the oft-quoted social scientific rule named for the educational psychologist Donald Campbell: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

Yet by the time Chetty, Friedman, and Rockoff published their study with its caution about using value-added measurement in high-stakes settings, the tool had already made the giant leap from research into practice. Congress never did get around to updating NCLB. But Bill Gates stepped in. The technology titan's early philanthropic efforts in education had focused on placing computers in schools, and then on breaking up large high schools into smaller ones. He'd been frustrated by the mixed record of those reforms; they might have helped raise high school graduation rates, but they did not produce the test score jumps he was looking for.
In 2007 Gates met value-added scholars Thomas Kane and Robert Gordon, who had already talked to New York City schools chancellor Joel Klein about using value-added to compare some teachers, like those from the Teaching Fellows alternative certification program, to others. Gates, too, liked the idea, and within a year he was making grants of up to $100 million to school districts that agreed to use value-added measures to evaluate teachers. Gates loved data and believed in the importance of employee evaluation and incentives. As CEO of Microsoft, he had implemented a system known as stack
ranking, in which every manager ranked his direct reports from worst to best in two stacks: one on performance this year, and one on potential to improve over time. The system was used to distribute cash bonuses and stock options, and sometimes to lay off low performers.

Gates's successor at Microsoft, Steve Ballmer, made the so-called “rank and yank” system much more rigid, using just one pile to rate employees on every factor of their performance.
An August 2012
Vanity Fair
article on Microsoft's corporate troubles called attention to some of the downsides of this plan, namely that by ranking workers from top to bottom, it focused employees on competing against the other members of their teams, instead of working together to share best practices.
Elsewhere in the corporate world, companies spent the 1990s adopting a Japanese management tool, used at Toyota and Sony, called continuous quality improvement. Under that system, managers and teams of workers looked at performance data together, not to rank individuals, but to identify group weaknesses and work cooperatively to address them.
Even Japanese schools are set up this way; through a practice called lesson study, teachers collaborate to plan lessons, observe one another delivering them, and then share feedback.

Many education leaders worried that high-stakes evaluation tying individual students to individual teachers had the potential to introduce the Microsoft problem of competition into a profession that almost every expert agreed was far too lacking in collaboration. American schools were finally moving toward group lesson planning and paired or team teaching. With those setups it could seem counterproductive to focus so much on the test score link between one teacher and one student.
Many
different teachers were now impacting each child's learning, even in the elementary grades.


I am actually really intrigued by value-added systems done right,” said Randi Weingarten in 2005, then the president of the United Federation of Teachers in New York City.
She agreed to an experiment in which teachers at two hundred New York City schools could win up to $3,000 in bonuses if value-added evaluation showed test scores had improved. It was a collective, not an individual, value-added scheme: If scores went up, every teacher would
get the same amount of money, no matter how little or how much she had contributed to the effort. This was in line with what Al Shanker, shortly before his death, had laid out as the correct union position on performance pay tied to student learning outcomes: in favor of it, but only if achievement was measured collectively across classrooms, to encourage collaboration. Citing the high error rates and noise in value-added calculations, Weingarten resisted accountability plans that linked a teacher to the test scores of students in his or her individual classroom. But she was about to lose that battle in the biggest way possible—with the president of the United States as her adversary.

One of the hottest tickets at the 2008 Democratic National Convention was to a panel discussion on education reform hosted by a coalition of foundations, nonprofits, and businesses that supported charter schools and teacher evaluation based on value-added. The event was held at the glittering postmodern Denver Art Museum, and the featured speakers were two young African American mayors thought to have the ear of Barack Obama, the party's nominee: Adrian Fenty of Washington, D.C., and Cory Booker of Newark. Fenty praised his schools chancellor, Michelle Rhee, a Teach for America alum who was pushing a plan, funded by philanthropists, to weaken teachers' tenure protections in exchange for bonuses tied to value-added data and tougher classroom observations. A small number of top D.C. teachers would supposedly be able to earn the eye-popping salary of $130,000 per year. “The American Federation of Teachers, which I don't think does anything for the people of the District of Columbia, is weighing in against it,” Fenty said at the event, “and the only thing I can think of is that the heads of the union, they want to keep their jobs.” Booker added, “Ten years ago when I talked about school choice, I was literally tarred and feathered. I was literally brought into a broom closet by a union and told I would never win office if I kept talking about charters.”

Other books

The King's Hand by Anna Thayer
Queen of Starlight by Jessa Slade
Target by Joe Craig
Henry IV by Chris Given-Wilson
Blood Slave by Travis Luedke
Vinegar Girl by Anne Tyler
A Visible Darkness by Jonathon King