“An Invalid Measure”: The Fundamental Flaws of Standardized Testing
The swelling number of test-defiers is rooted in the increase of profoundly flawed standardized exams. Often, these tests don't reflect the concepts emphasized in the students' classes and, just as often, the results are not available until after the student has already left the teacher's classroom, rendering the test score useless as a tool for informing instruction. Yet the problem of standardized bubble tests' usefulness for educators extends well beyond the lag time (which can be addressed by computerized tests that immediately calculate results). A standardized bubble test does not help teachers understand how a student arrived at answer choice “C.” The student may have selected the right answer but not known why it was right, or conversely, may have chosen the wrong answer but had sophisticated reasoning that shows a deeper understanding of the concept than someone else who randomly guessed correctly. Beyond the lack of utility of standardized testing in facilitating learning there is a more fundamental flaw. A norm-referenced, standardized test compares each individual student to everyone else taking the test, and the score is then usually reported as a percentile. Alfie Kohn describes the inherent treachery of the norm-referenced test:
No matter how many students take an NRT [norm-referenced test], no matter how well or poorly they were taught, no matter how difficult the questions are, the pattern of results is guaranteed to be the same: Exactly 10 percent of those who take the test will score in the top 10 percent. And half will always fall below the median. That's not because our schools are failing; that's because of what the word
median
means.
10
And as professor of education Wayne Au explained in 2011, when he was handed a bullhorn at the Occupy Education protest outside the headquarters of Gates Foundation, “If all the students passed the test you advocate, that test would immediately be judged an invalid metric, and any measure of students which mandates the failure of students is an invalid measure.”
Unsurprisingly, the Gates Foundation was not swayed by the logic of Au's argument. That is because standardized testing serves to reinforce the mythology of a meritocracy in which those on the top have achieved their position rightfullyâbecause of their hard work, their dedication to hitting the books, and their superior intelligence as proven by their scores. But what researchers have long known is that what standardized tests measure above all else is a student's access to resources. The most damning truth about standardized tests is that they are a better indicator of a student's zip code than a student's aptitude. Wealthier, and predominately whiter, districts score better on tests. Their scores do not reflect the intelligence of wealthier, mostly white students when compared to those of lower-income students and students of color, but do reflect the advantages that wealthier children haveâbooks in the home, parents with more time to read with them, private tutoring, access to test-prep agencies, high-quality health care, and access to good food, to name a few. This is why attaching high stakes to these exams only serves to exacerbate racial and class inequality. As Boston University economics professors Olesya Baker and Kevin Lang's 2013 study, “The School to Prison Pipeline Exposed,” reveals, the increases in the use of high-stakes standardized high school exit exams are linked to higher incarceration rates. Arne Duncan's refusal to address the concerns raised by this study exposes the bankruptcy of testocratic policy.
Perhaps the testocracy's most cherished standardized test concept is “value-added modeling” (VAM), which attempts to gauge the contribution of a teacher toward student learning by complicated formulas involving multiple test scores. The absurdity of using VAM scores to evaluate pedagogy was on full display in Tampa, Florida, when Jefferson High social studies teacher Patrick Boyko was named the 2014 Hillsborough County Teacher of the Year. Despite being recognized by his school community as a stellar teacher, Boyko's VAM score for the 2012â13 school year was -10.23 percent (meaning his students scored 10 percent worse on the Florida Comprehensive Assessment Test (FCAT), supposedly due to his teaching, than comparable students across Florida). In 2011â12, Boyko attained an even lower VAM score of -â 19.44 percent. That score “would never reflect on what I do,” Boyko said.
11
The American Statistical Association (ASA, the largest organization of statisticians in the world) agreed. The ASA released an April 2014 study stating, “VAMs are generally based on standardized test scores and do not directly measure potential teacher contributions toward other student outcomes.” The study continues, “VAMs typically measure correlation, not causation:Â Effectsâpositive or negativeâattributed to a teacher may actually be caused by other factors that are not captured in the model.”
12
As Dr. Audrey Amrein-Beardsley, associate professor at Arizona State University, explained in her invaluable book
Rethinking Value-Added Models in Education
, VAMs are unreliable because “a teacher classified as adding value has approximately a 25â50% chance of being classified as subtracting value the following year.”
Authentic Assessment
Let's make this much clear: Educators are not against the assessment of students. Teachers rely on various forms of assessment every day to help understand the thought processes, progress, and conceptual obstacles faced by their studentsâall in the service of informing instruction for the next steps in a student's development. In the next chapter I describe how, in the wake of the MAP boycott in Seattle, educators came together to form the Teacher Work Group on Assessment and created guidelines called “Markers of Quality Assessment,” which defined authentic assessments as those that reflect actual student knowledge and learning, not just test-taking skills; are educational in and of themselves; are free of gender, class, and racial bias; are differentiated to meet students' needs; allow students opportunities to go back and improve; and undergo regular evaluation and revision by educators. Authentic forms of assessment, used for helpful diagnostic purposes instead of doling out punishment, are prerequisites to an education designed to promote creativity and critical thinking. As Phyllis Tashlik explains in chapter 27,
The general public gleans what the media throw at them and the tendency is for people to think, “Oh, if you're against standardized testing, then you're against assessments,” which is not the case at all. What we're against is an assessment that has the consequence of narrowing curriculum and teaching and learning. It's important to realize that as soon as you institute these standardized tests, you're also affecting curriculum, and you're affecting how teachers teach, and you're affecting how time is used. And it's that connection between assessment, curriculum, and instruction that just doesn't get explained enough in the public conversation about testing. Performance assessments offer such a greater opportunity to develop interesting curriculum and structure more opportunities for the teacher to relate to the kids in front of them.
Much has been written describing alternative forms of assessment to standardized testing (for a more comprehensive discussion on this topic, read the Rethinking Schools book
Pencils Down
, edited by Wayne Au and Melissa Temple). One straightforward way to visualize a superior substitute to bubble testing is to picture the process of getting a PhD. When PhD candidates prepare to graduate, their committees do not judge their knowledge by having them eliminate wrong answer choices on a standardized test. Candidates engage in the much more meaningful process of defending a dissertation. Doctoral students develop a thesis, conduct research over time, collaborate with an advisor, revise the thesis as needed, and finally defend the thesis before a panel of experts. Innovative classrooms around the nation (and around the world) have adapted just this model, tailoring it to every subject and age. This form of performance-based assessment, often coupled with a portfolio of the student's work over a period of time, has many advantages over standardized bubble testing, but, perhaps most important, it challenges each student to explain her or his ideas around issues actually being taught in the classroom. The drawback to this form of assessment, from the testocracy's vantage point, is that it empowers the teachers and students in the classroom, fosters critical thinking, and, without a standardized exam to sell to every district, makes it harder to turn a profit.
Rotten to the Common Core
The jewels in the crown of the testocracy are the high-stakes exams encrusted in the Common Core State Standards. As of June 2014, forty-three states had adopted these standards “to ensure all students are ready for success after high school,” as the CCSS website explains, and to “establish clear, consistent guidelines for what every student should know and be able to do in math and English language arts from kindergarten through 12th grade.”
13
The CCSS were described by Lyndsey Layton in the
Washington Post
as “one of the swiftest and most remarkable shifts in education policy in U.S. history,” made possible because of the massive investment by Bill Gates. Layton points out, “The Bill and Melinda Gates Foundation didn't just bankroll the development of what became known as the Common Core State Standards. With more than $200 million, the foundation also built political support across the county, persuading state governments to make systemic and costly changes.”
14
And with the testocrats in charge of the development of the standards, the primary stakeholders in education were excluded from providing any meaningful input. As
Rethinking Schools
, a leading journal of social justice education, editorialized:
Written mostly by academics and assessment expertsâmany with ties to testing companiesâthe Common Core standards have never been fully implemented and tested in real schools anywhere. Of the 135 members on the official Common Core review panels convened by Achieve Inc., the consulting firm that has directed the Common Core project for the NGA, few were classroom teachers or current administrators. Parents were entirely missing. Kâ12 educators were mostly brought in after the fact to tweak and endorse the standardsâand lend legitimacy to the results.
15
In some instances the CCSS have replaced deeply flawed standards or scripted curricular regimes that require teachers to read lessons from a script. In these instances, the CCSS's claim of not prescribing to teachers how to meet the standards and of being “based on application of knowledge through higher-order thinking skills” can appear emancipating. Yet because of the lack of educator and parent input to the standards, there are serious limitations. As Diane E. Levin and Dr. Nancy Carlsson-Paige (the latter a contributor to this book) have explained about the negative impact of CCSS on early childhood development,
The proposed common core national education standards for Kâ12âwhich will impose higher academic standards on younger childrenâcontradict decades of early education theory and research about how young children learn best and how to close the achievement gap. The imposition of one-size-fits-all standards on young children can't solve the problems of an education system that is fundamentally unequal.
16
Chicago Public Schools preschool teacher and parent Kirstin Roberts elaborates on the research about how children best learn when she writes in chapter 19 that NCLB, RttT, and the CCSS have been responsible for “the dramatic increase in testing of the very young over the last decade,” and “[have] pushed out developmentally appropriate curriculum, including play-based learning, from early childhood classrooms.”
The pitfalls of the CCSS are best illustrated by what its supporters have to say. Bill Gates said of the Common Core in 2009, “When the tests are aligned to the common standards, the curriculum will line up as wellâand that will
unleash powerful market forces
in the service of better teaching. For the first time, there will be a
large base of customers eager to buy products
that can help every kid learn and every teacher get better” (my emphasis).
17
The Thomas B. Fordham Institute, a conservative think tank, estimates implementing the new standards will cost the nation between $1 billion and $8 billion. Nearly all the profits will go to book publishers and test creators like Pearson and CTB/McGraw-Hill.
18
Prominently displayed on one sidebar of the official Common Core website is a pull quote from Edward B. Rust, Jr., Chairman and CEO of State Farm Insurance Companies, who proclaims: “State-by-state adoption of these standards is an important step toward maintaining our country's competitive edge. With a skilled and prepared workforce the business community will be better prepared to face the challenges of the international marketplace.”
19
Here the true purpose of the CCSS is revealed: It has very little to do with helping students develop their capacities and much more to do with empowering US businesses to dominate global markets and stuff additional cash in the already bulging bespoke-suit pockets of testing executives. One of the most glaring examples of how the standards are designed to accomplish this goal is in their approach to literacy. The CCSS emphasize informational texts at the expense of literature, fundamentally impeding students' understanding of a central element of human expression. As award-winning children's author Alma Flor Ada argues in chapter 25, literature “not only gives an example of the power of language, but becomes a model of living consciously, of paying attention to what happens around us, of discovering a deeper meaning in life.” But if global competition is the purpose of education, then Ada's contention that education be about investigating the meaning of life should be deemed frivolous and pushed from our classrooms.