Authors: Dana Goldstein
The push to change how teachers were evaluated did, however, impact teachers' working lives. Principals were spending more time than ever before in classrooms, and more time filling out paperwork describing what they had seen there.
In New York City in 2012, under pressure from the Bloomberg administration, nearly half of all teachers who applied for tenure were denied; 3 percent of those were fired, and the rest were kept on probation. Just five years earlier 97 percent of teachers who applied for tenure had been approved.
In Washington, D.C., by 2012 10 percent of the teacher corps had been laid off based on performance evaluations. It's worth looking more deeply at Washington, in part because Michelle Rhee's agenda for teachers anticipated, by several years, many of the policy trends that Race to the Top spread nationwide. Despite the wave of research suggesting that merit pay was ineffective at raising student achievement,
the District of Columbia stuck with its plan. Each year, six hundred to seven hundred teachers were offered annual bonuses, typically of $15,000 or less, though 20 to 30 percent of them were turning the money down because they were unwilling to lose their tenure protections in exchange. The city's average teacher salary rose to $77,512, higher than in the surrounding suburbs or in the region's charter schools. But only one or two teachers per year, out of four thousand, earned the city's top bonus of $25,000. Perhaps the biggest question is whether D.C. will be able to afford this generous pay moving forward: The program's philanthropic funding has run out, and it now costs the district $6 million annually.
Was teaching improving in D.C.? A 2013 study reported hopeful results. It found a significant number of low-rated teachers were choosing on their own to leave the district, while those low-rated teachers who stayed produced higher student test scores the following year. A separate report from The New Teacher Project presented more mixed evidence. It found 88 percent of the city's highest-rated teachers chose to stay, yet highly rated teachers who left were more likely than teachers in other districts to cite the evaluation system itself as one of the reasons they were unhappy. Historically, teachers have gotten little feedback on what they need to do better, much less how. New systems were meant to offer improvements, but often they failed to. Indeed, professional development in D.C. remained patchy, with only one-third of low-performing teachers and one-quarter of high-performing teachers reporting that they had received constructive feedback on their practice. Even more problematically, the data revealed that inexperienced and low-performing teachers were increasingly clustered in the city's poorest neighborhoods, east of the Anacostia River. It was unclear why this was the case: because it was easier to score well if one worked in a middle-class school, or because many effective teachers in D.C. were avoiding high-poverty schools.
To address these problems, in 2012 Michelle Rhee's successor, Kaya Henderson, accelerated the pace at which teachers in high-poverty schools could qualify for financial bonuses tied to student performance, hoping to make working in those schools more attractive. She also decreased the amount of teachers' evaluation scores tied to value-added, from 50 to 35 percent in tested subjects and grades, and she added a new evaluation category to reward teachers for “commitment to school community.”
Those shifts to a more holistic system of teacher evaluation were overshadowed by
a series of exposés, published by Jack Gillum and Marisol Bello of
USA Today
, demonstrating that during Rhee's chancellorship, the test-maker CTB/McGraw-Hill flagged hundreds of D.C. classrooms for statistically improbable answer sheet erasure rates on state tests, possible evidence that adults had corrected students' mistakes. The average child erases zero, one, or two answers on a multiple-choice test; typical answer sheets at one
D.C. school, the Noyes Education Campus, contained between five and twelve erasures, depending on the classroom. The school's principal, Wayne Ryan, resigned in disgrace, but only after collecting $20,000 in bonuses attached to test score increases.
Noyes was not an isolated case. Increasingly, there was evidence that a significant number of unscrupulous administrators and teachers nationwide had responded to the higher stakes attached to state-level standardized testsâevaluations, bonus pay, and public release of dataâby cheating. The same
USA Today
team that revealed the D.C. irregularities studied six other states and found over sixteen hundred examples of probable test score manipulation between 2002 and 2010. (The newspaper would have almost certainly found even more cheating had it not zeroed in on only the most suspicious test score leaps: those that statisticians said were about as likely to be legitimate as a Powerball ticket was to be a winner. For example: At one Gainesville, Florida, elementary school, math proficiency rates jumped from 5 percent to 91 percent in three years.) A subsequent investigation by the
Atlanta Journal-Constitution
discovered 196 school districts across the country with suspicious test score gains.
Atlanta itself was the site of the nation's most infamous recent cheating scandal.
On March 29, 2013, thirty-five Atlanta teachers and administrators, including the city's former superintendent, Beverly Hall, were indicted. The grand jury report revealed a shockingly sick culture of adult cheating, in which Hall, who had been the 2009 national “superintendent of the year,” fired whistle-blowers and protected the jobs of employees who purposefully sat struggling kids next to high-performing ones to encourage cheating on tests, and who gathered at afterschool “erasure parties” to correct multiple-choice answer sheets before submitting them to be graded. Teachers and principals in Atlanta could earn thousands of dollars in bonuses for raising scores; Hall's bonuses totaled $580,000.
In the wake of this appalling ethical lapse, which resulted in thousands of Atlanta childrenâlargely poor and blackâbeing told they had acquired crucial academic skills they actually lacked, accountability reformers rushed to defend high-stakes testing policies. “
The existence of cheating says nothing about the merits of testing,” Arne Duncan argued in the
Washington Post
. Bill Gates
said that cheating represented just
a “tiny” rounding error in the landscape of standardized testing. They all advocated blaming the adult cheaters while absolving the policies that provided incentives to cheat.
Even where no systemic cheating was alleged, there were disappointments with the new teacher evaluation schemes.
When New York City released value-added data for individual teachers in 2012âand the
Times
and other news organizations made them searchable by teachers' namesâthe margin of error was a staggering 53 points out of 100 for English teachers and 35 points out of 100 for math teachers. Numbers like that forced even strong supporters of data-driven accountability, including Bill Gates, Wendy Kopp, and Doug Harris, to speak out against the public release of such data. Kati Haycock began to worry that reformers, including many of her allies, had run “roughshod over those who were anxious about whether value-added was strong enough to support all of this â¦Â There are voices who said, âDo it anyway! This is the moment!' Those people may still be right, but I count myself among a group of folks who are saying, as mad as I was about how slow we went before Race to the Top, I think I might be almost more upset now about the decision to go faster than these systems can handle.”
Chester Finn, the moderate Republican reformer and former assistant secretary of education in the Reagan administration, agrees. “We'll probably discover ten years from now you can't do truly quantitative achievement-based evaluation of teachers with any great reliability,” he told me. This is the typical hype-disillusionment cycle in American education reform, driven by moral panic about bad teaching.
Already there is some evidence that the new Race to the Top evaluation systems are failing to meaningfully distinguish between teachers, in much the same way that past evaluation systems failed. In Michigan and Tennessee in 2012, 98 percent of teachers were rated effective or better; in Florida, 95 percent; and in Georgia, 94 percentânumbers hardly different from those under the old systems.
It is unclear exactly why this is happening, but we can wager a few guesses. It could be that, as in the past, principals are not taking the time to thoroughly evaluate each teacher on the classroom observation components of these systems, either because of the large administrative burden this imposesâFlorida's observation system requires ratings in sixty categories for each teacherâor because they lack the training in how to do so.
Teachers union leaders have suggested the low ineffective rates prove that only a tiny fraction of teachers, after all, are bad at their jobs. Before you dismiss this response as self-serving, consider this: Even tough reformers like
Colorado state senator Mike Johnston say they'd like to see only the bottom 5 to 10 percent of teachers fired each year. Economist Eric Hanushek has even written, “
The majority of [American] teachers are effective. They are able to compete with teachers virtually anywhere else in the world.” If only a small minority of teachers are truly terrible, then evaluation systems that flag 2 to 6 percent of a state's teachers as problematic, produce layoffs of 10 percent of teachers in D.C., and deny or defer 50 percent of teachers tenure in New York City represent a huge step forward toward a more accountable profession. In New Haven, a new union contract eliminated tenure protections for just the 2 percent of teachers declared “ineffective” annually. Superintendent Garth Harries, an accountability reformer, is satisfied. “I think the 2 percent represents a real and significant number of teachers,” he told me. “In the end, it's not a huge number, but the fact that these teachers are, in fact, leaving for reasons directly rated to performance has a fairly profound impact on the rest of the force. Folks saying, âThank God!' and folks saying, âThey're serious. I have to make sure I get my act together!' If we're truly going to have a professional construct for teaching, I don't think there's a set number of teachers we remove and then we're done. I don't think I'd want it to be below 2 percent [annually]. But I'd be perfectly happy with 2 percent in perpetuity.”
Jonah Rockoff, a coauthor of the landmark value-added study linking test score growth to later income, says that because of concerns over teaching to the test, the next frontier for research will be to measure a teacher's impact in new ways. That could be done by
looking at how teachers influence student behavior, attendance, or GPA. “
We all know test scores are limited not just in their power and accuracy, but in the scope of what we want teachers and schools to be teaching our kids,” Rockoff said. “If we had a more holistic view of teaching, that would be great. But I don't mean touchy-feely, âyou can teach however you want.' It's the idea that there's not just one thing we care about our kids learning. We're going to measure how kids do on socio-cognitive outcomes and reward teachers on that, too.”
But as Arne Duncan has acknowledged, states can't simply use value-added to “fire their way to the top.” Even if test scores were a flawless reflection of student learning and teacher quality, there is no evidence that the new teachers who replace the bad teachers will be any betterâit is practically impossible to predict, via demographic traits, test scores, grades, or pathway into the profession, who will become an effective teacher.
Research and experience demonstrate that it makes good sense to tie teacher tenure and job security more closely to performance, and less to seniority. The contract provisions of the 1960s and 1970s make less sense now that we know so much more about how teachers' mind-sets and practices impact children's learning. But the history of American public education shows that teachers are uniquely vulnerable to political pressures and moral panics that have nothing to do with the quality of their work. Even Michelle Rhee says she believes in due process, as long as the process of grieving a termination is conducted quickly. “
I'd seen too many examples of good teachers who had been railroaded by ineffective administrators,” she wrote in her memoir. “Those teachers had to have a structure through which they could appeal evaluations when appropriate.”
If the key to systemwide improvement is not through mass firings or union-busting, then what remains is to turn the existing average teacher into an expert practitioner, what Rockoff calls “moving the big middle” of the teaching profession. That effort will require a lot more than dataâit will require a shared vision of what excellent teaching looks like, and the mentorship and training to get teachers there.
*1
Note how different this top-down theory of change is from that of the community control movement, in which parents at the grassroots level were conceived of as the vanguard of school reform.
*2
In
Visible Learning
, John Hattie notes that while researchers generally have trouble locating the effects of teachers' content knowledge on student outcomes, there is other evidence suggesting that teachers' general intellectual ability, particularly vocabulary and verbal facility, are positively associated with student achievement gains. These skills, however, may have very little to do with the competitiveness of a teacher's college or graduate school, or the content of the classes he or she took there.
*3
In 2007 TFA sent 13 percent of corps members to charter schools. In 2013, as recession budget cuts slowed district hiring, one-third of corps members were hired by charter schools and about half of alumni still teaching were working in charters. Not all charters are “no excuses” schools. Some, like Global Community in Harlem and Community Roots in Brooklyn, emphasize project-based learning and other progressive pedagogies.
*4
Johnston's enthusiasm for test-score-based accountability was a sign of changing times. Less than a decade earlier he had published a poignant memoir about his time as a TFA corps member in the Mississippi Delta, in which he complained about “innumerable state testing sessions” and “the furor to try to improve test scores.”
*5
School closings have emerged as one of the most controversial issues in education reform. Closings are sold as a way to get kids into better schools. But according to the Consortium on Chicago School Research, only 6 percent of Chicago students whose schools were shut down ended up enrolled in a school within the top achievement quartile, and 40 percent of students from closed schools ended up at schools that were on academic probation.
*6
Districts and charters pay TFA $2,000 to $5,000 per corps member, which helps cover the costs of the summer institute and the support TFA provides to its teachers during the school year.
*7
The foundation run by the Walton family, the descendants of the Walmart founders, is a key TFA funder, and has also contributed to the National Right to Work Legal Defense Foundation, an anti-union group.