Authors: Dana Goldstein
REFORMING EDUCATION BY EMPOWERING TEACHERS
To many American teachers, the last decade of value-added school reform has felt like something imposed on them from outside and from aboveâby politicians with little expertise in teaching and learning, by corporate philanthropists who long to remake education in the mold of the business world, and by economists who see teaching as less of an art than a science.
According to a 2013 poll conducted by Scholastic and the Gates Foundation, the majority of American teachers feel alienated from education policy making, with only a third reporting that their opinions are valued at the district level, 5 percent reporting they are valued at the state level, and just 2 percent reporting they are valued at the national level. Those frustrations have begun to break into the public debate. Dissident teachers and their unions are winning support from parent activists who are protesting the increased number of standardized tests, the time spent on test prep, and the lack of instructional time for projects, field trips, art, and music. Testing is a part of any functional education system, but in recent years it has often seemed like the horse of school improvement has been driven by the cart of collecting student data to be used in teacher evaluation. Meanwhile, more and more accountability reformers acknowledge that new teacher evaluation systems are not a panacea. They identify only a small number of teachers as ineffective, and do nothing, on their own, to guarantee that teachers' skills will actually improve over time. The hope that collecting more test scores will raise student achievement
is like the hope that buying a scale will result in losing weight. We now have a lot of numbers to back up our inkling that something is wrong. But if we don't start improving instruction in the classroom, those numbers simply will not change.
“No excuses” strategies are not the only promising avenue for instructional reform. In the long term, reform programs that combine high-stakes standardized tests with scripted lesson plans and a limited arsenal of pedagogical strategies may make teaching a less attractive job for exactly the sort of ambitious, creative, high-achieving people we most want to attract.
Polls of teachers who leave the profession show many did so because they received no constructive feedback on their practice, they had too little time to think creatively and collaborate with colleagues, and they had no opportunity to take on additional responsibilities and grow as professionals. So the next step in American education reform may be to focus less on top-down efforts to ferret out the worst teachers or turn them into automatons, and more on classroom-up interventions that replicate the practices of the best. Today reformers across the country are experimenting with empowering teachers to coach their peers, to remake teacher education, to design creative curriculum materials, and to lead school turnaround efforts. These practices conceive of veteran teachers as assets, not liabilities. As history has taught us, that is a pragmatic stance crucial to sustaining any reform program, which teachers must carry out on the ground.
Race to the Top focused attention on the teacher evaluation process, particularly on how student test scores are used to judge teachers. But in every state, a large, if not dominant, part of a teacher's evaluation score is still tied to classroom observations.
Observation is a challenging endeavor, in large part because it can be so subjective. Remember William Maxwell? He was the superintendent in turn-of-the-century New York City who complained that 99.5 percent of teachers were being evaluated as “good.” He created a complex new AâD system based on principal observations and ratings, in which, it turned out, the vast majority of principals rushed through the motions and gave all their teachers a
B+. For over a century, classroom observation has failed to successfully differentiate between teachers. So how can that change? How can observation capture what everyone knowsâthat some teachers are better than othersâand what everyone doesn't yet know: What exactly makes them that way?
The importance of looking beyond value-added measurement to carefully watch how teachers work with children is underscored by new research on what actually occurs in many classrooms, especially those populated by low-income students. In 2009 economist Thomas Kane and
the Bill and Melinda Gates Foundation began a massive study on teacher effectiveness, known as the MET (Measures of Effective Teaching) project. MET collected videos of 1,333 teachers at work and gave them to highly trained evaluators to analyze. The experts found that only a third of the classrooms showed evidence of teachers promoting intellectual growth beyond rote learning.
That aligns with past research.
A 2011 observation of elementary school classrooms in Baltimore showed that the majority of teachers failed to use challenging vocabulary words, failed to ask questions that probed for conceptual understanding (as opposed to simply correct answers), and rarely led their classes in whole-group discussions. In the weeks before state standardized tests, the Baltimore teachers engaged in those desirable activities even less frequently than usual and also decreased their personal interactions with students, who were “spending a good deal of their time on paper and pencil skill-based worksheets that did not require critical thinking or collaboration,” the researchers reported.
A 2009 review of the research literature on teacher practices, including several studies of thousands of elementary school classrooms across the country, found that low-income children are likely to spend their school days drilling in low-level skills, like spelling, and watching teachers deal with poorly behaved students.
Maybe none of that matters, if multiple-choice worksheets help children learn.
But research shows that when teachers promote more interactions among students and focus their lessons on concepts that are broader and more challenging than those represented on multiple-choice tests, children's scores on higher-level assessmentsâlike
those that require writingâactually go up. Rigorous, interactive classrooms promote higher student achievement.
When I visited Harrison District 2, the Colorado school system that created standardized tests for every subject and grade level, I accompanied crusading superintendent Mike Miles on a round of classroom observations at Fox Meadows Middle School.
*1
There is no collective bargaining in Harrison, so Miles had incredible autonomy in shaping teachers' working conditions. Teachers had to keep their classroom doors open at all times. They were told to expect up to sixteen surprise observations each semester from administrators, instructional coaches, or outside consultants. Teachers told me they felt constantly watched. A TFA corps member who generally supported Miles's accountability efforts described Harrison as “a high-anxiety district to work in.”
This all seemed a little extreme. And since I hadn't been impressed with the quality of the district's paper tests in art or physical education, I was prepared to find the classroom observations less than compelling. But I was wrong. In spot observations of about ten or fifteen minutes each, Miles was able to make a series of insightful critiques of teachers' performance. He traveled through the school's hallways with a six-person team of administrators and consultants, some of whom were being trained in these new observation methods. When the group stepped into the classroom of a science teacher in her early twenties, the young woman began to tremble ever so slightly. She was conducting a lesson on “hypotheses, graphs, and data.” But the activity she had assigned the seventh graders, reading a graph and answering questions about the values on it, had nothing to do with hypotheses, which Miles thought was the most important concept in the lesson. Out in the hallway, he discussed with his team what they had observed. “Did she get to the idea of using data to construct a hypothesis?” he asked. “No.” He also noted that while the students worked in small groups, the teacher could have
been moving around the classroom more actively, making sure each child was participating.
In a social studies class, Miles was unimpressed with the teacher's assignment for students: “Using geographic facts, which Western European country most resembles Colorado?” It was vague, kind of boring, and far too easy. Plus, there was no map displayed of Europe, making it hard for the students, most of whom had never left Colorado, to visualize what they were learning about.
For a math teacher working with students on circumference, the superintendent suggested that instead of simply writing equations on the whiteboard, the teacher could have demonstrated the concept using a physical object, like a basketball. A second math teacher spent ten minutesâfar too longâexplaining that the word “denominator” referred to the bottom part of a fraction. “She's not a superstar,” Miles said. He knew of an exemplary math teacher whose classroom this teacher should observe.
Before he left Fox Meadows that day, Miles took some time to evaluate the evaluators. Leafing through the notes the administrators and consultants had taken on each teacher, he said, “I'm not seeing enough validating comments. Every room I went into, I saw positive things.” Later on he told me he thought the school had two to three “distinguished teachers,” the district's highest designation, and perhaps four who would be dismissed at the end of the year, one of whom had tenure. In a school of thirty teachers, that was a roughly 14 percent ineffective rate.
There is nothing new about the idea of an administrator taking a detailed look at a teacher's classroom practice. Progressive Era reformers promoted “efficiency” observations, in which supervisors used lengthy rubrics to rate teachers according to measures like how many children were late to class or how many seconds it took to hand out worksheets. After World War II, a “supervisory visit” to a teacher's classroom might have entailed a principal judging whether sufficiently “democratic ideals” were being promoted in the lesson. By 1980 many school administrators used a “clinical” model to observe classrooms, a system based on medical rounds, popularized by Robert Goldhammer of Harvard's graduate-level
teaching program. Principals would conduct pre-observation and post-observation conferences with teachers to reflect on their practice, areas for improvement, and long-term goals. But because Goldhammer had not defined specific characteristics of effective instruction, principals who used his model often failed to provide concrete, helpful feedback to teachers.
Later, Madeline Hunter's more prescriptive “lesson design” system became popular. Principals looked for whether a teacher's lesson included several key components, such as a lesson objective written on the board, a “model” of successful performance, and an opportunity for students to practice new concepts. Mike Miles was clearly influenced by the Hunter systemâa combination of direct instruction and student group work, similar to how Teach for America expects its corps members to work. A criticism of Hunter is that she focused too much on teacher-directed behavior, and not enough on whether the teacher helped students become self-directed learners. Today there are other, potentially more sensitive classroom observation tools, and because of Race to the Top's emphasis on improving teacher evaluation, these methods are now being adopted in thousands of classrooms nationwide. The Classroom Assessment Scoring System, CLASS, was developed at the University of Virginia. It gives teachers numeric ratings based on whether they exhibit behaviors associated with achievement gains, such as “expanding on student talk”ârepeating a child's speech back to her, using corrected grammar and more sophisticated vocabulary.
Another popular and detailed classroom observation model is embedded within Charlotte Danielson's Framework for Teaching, first developed in 1996. Danielson was a former Washington, D.C., public school teacher who went on to become an education researcher at Educational Testing Service, the test-maker. She knew that during the “competency” craze of the 1970s there had been a lot of vague talk about asking students “higher-order questions.”
Danielson wanted to watch teachers work, look at their students' performance, and figure out exactly what effective, higher-order instruction looked like. According to her findings, an effective classroom discussion question has more than one correct answer. (Not “When did Hitler come to power?” but “What social, political, and
economic factors led to the Nazi Party gaining power in Germany? Which factors do you think were most crucial? Why?”) Danielson found that an excellent teacher asks students to explain concepts to one another, instead of repeating herself ad nauseam. She highlights connections between disciplinesâfor example, by giving students background information on Elizabethan England before assigning a Shakespeare play. She chooses books and works of art from a broad range of cultures, including the cultures from which her students hail. She allows students to debate one another in class, and requires them to cite evidence to support their claims. If the teacher is really skilled, the students can talk among themselves for many minutes about the topic at hand without her interrupting. She assesses her students throughout a unit, not just at the beginning and end, through pointed questioning and the use of tricks like “exit slips”âquick problems students must solve, on paper, before they leave the classroom.
In New York City, principals must now use the Danielson framework to observe each teacher's classroom at least four times per school year, and to conference with each teacher for professional development. If there is a downside to systems like these it is that most research-driven observation rubrics require administrators to rate teachers on many, many different competenciesâtwenty-two in the Danielson framework. Historically, evaluation systems with heavy time and paperwork burdens have not been viable in the long term, because principals either go through the motions without making meaningful distinctions among teachers, or they find ways to use the great number of subjective variables in these rubricsâfor example, “compliance with standards of conduct,” in Danielson's frameworkâas a way to target disfavored teachers for dismissal, regardless of more objective measures of performance.