Category Archives: Testing

PISA Brainwashing: Measure, Rank, Repeat

When Mary Catherine Bradshaw, a teacher since 1984 in Nashville, TN, announced her retirement from public schools, Bradshaw pointed her finger at one major reason, standardized testing:

[S]he says standardized testing is the reason….

Testing, she said, has taken away from instructional time and taken the joy out of learning.

Much has changed, she said, since she took her first job as a teacher at Hillsboro in 1984 when she said she was attracted to its diversity and commitment to academic reputation.

“There was more of a focus on the whole student, the joy of learning, building a community and finding one’s own passion in the midst of the K-12 experience,” she said.

“Now, with the focus on testing, data collection and closing a too narrowly defined gap among learners, I have found myself ready to retire from public education.”

Bradshaw’s concern about the loss of joy due to the central place of testing in education is echoed in a recent statement about PISA rankings [1], as Peter Wilby details in Academics warn international school league tables are killing ‘joy of learning’:

Now nearly 100 leading educational figures from around the world have issued an unprecedented challenge to Pisa – and what they call “the negative consequences” of its rankings – in a letter to its director, Andreas Schleicher….

“Education policy across the world is being driven by the single aim of pushing up national performance levels on Pisa,” says one signatory, Stephen Ball, professor at London university’s Institute of Education. “It’s having a tremendously distorting effect, right down to the level of classroom teaching.” Another signatory, Sally Tomlinson, research fellow at Oxford university’s education department, says that, though the Pisa league tables appear to be scientifically based, “you really can’t compare a country the size of Liechtenstein with one the size of China and nor can you compare education systems that developed over the years in different political, social and cultural contexts”.

The signatories are particularly concerned about the UK, the US and other countries imitating schools in Asian countries that come high in the Pisa rankings. They are suspicious of Shanghai’s success. “Shanghai’s approach is an incredibly strategic one,” says Ball. “Their students practise the tests. It’s difficult to see what their maths teachers can say to ours except ‘teach to the test’.”

While international rankings based on test scores have influenced public perception of U.S. public education for at least 60+ years (see Hyman Rickover’s books lamenting U.S. rankings, for example), state rankings based on NAEP and SAT/ACT scores have also been central to perception as well as policy, especially since the early 1980s.

While the open letter to Schleicher is a powerful and important challenge to the misleading influence of PISA, the essential problem is high-stakes testing coupled with ranking as well as a persistent misinterpretation of test data (see this excellent examination of how test scores are misunderstood and misused).

As I have addressed often about the SAT (see HERE and HERE), even when a comparison of states appears fair and accurate—South Carolina with Mississippi, for example, since the states share a similar high-poverty demographics of students—the reality is far more complex: MS has a higher SAT average score than SC because the test-taking populations of students are significantly different despite the overall student populations being similar:

Two Southern states, Mississippi and South Carolina, share both a long history of high poverty rates (Mississippi at over 30% and SC at over 25%) and reputations for poor schools systems. Yet, when we compare the SAT scores (pdf) from Mississippi in 2010 (CR 566, M 548, W 552 for a 1,666 total) to SAT scores in SC (CR 484, 495, 468 for a 1,447 total), we may be compelled to charge that Mississippi has overcome a higher poverty rate than South Carolina to achieve, on average, a score 219 points higher.

This conclusion, based on a “few data points”, is factually accurate, but ultimately misleading once we add just one more data point: the percentage of students taking the exam. Just 3% of Mississippi seniors took the exam, compared to 66% in South Carolina. A fact of statistics tells us that SC’s larger percentage taking the exam is much closer to the normal distribution of the all seniors in that state, thus the average must be lower than a uniquely elite population, such as in Mississippi. Here, the statistics determined by the populations taking the exam trump the raw data of test averages, even when placed in the context of poverty. (The truth about failure in US schools)

Even if the open letter about PISA prompts reform by the OECD, we have evidence that the problem will persist. For example, The College Board struggles with both the statistical complexity of SAT data (see here about the recentering) and the misleading use of SAT data to rank states:

Educators, the media and others should:

8.1 Not rank or rate teachers, educational institutions, districts or states solely on aggregate scores derived from tests
that are intended primarily as a measure of individual students. Do not use aggregate scores as the single measure to
rank or rate teachers, educational institutions, districts, or states.

And yet, each year when SAT data are released, the media, political leaders, and public school critics rank states and pronounce schools a failure.

The open letter about PISA implores, “Slow down the testing juggernaut,” adding:

OECD’s narrow focus on standardised testing risks turning learning into drudgery and killing the joy of learning. As Pisa has led many governments into an international competition for higher test scores, OECD has assumed the power to shape education policy around the world, with no debate about the necessity or limitations of OECD’s goals. We are deeply concerned that measuring a great diversity of educational traditions and cultures using a single, narrow, biased yardstick could, in the end, do irreparable harm to our schools and our students.

Once we apply the brakes, we must then take a close look at the fundamental policy errors—high-stakes standardized testing, labeling, sorting, and ranking—and then abandon those practices for alternatives that address inequity both outside and inside schools and that honor the essential dignity and humanity of students and their teachers.

For Further Reading

Among the Many Things Wrong With International Achievement Comparisons, Gene V. Glass

More Things Wrong with International Assessments Like PISA, Gene V. Glass

[1] As full disclosure, I am a signatory on the letter.

Welcome to SC: A Heaping Stumbling-Bumbling Mess of Ineptitude

This is my 53rd year of living in South Carolina, the totality of my life.

This is my 31st year as an educator in SC—18 years as a high school English teacher and 13 years now in higher education.

My teaching career, coincidentally, began the exact year SC officially stepped into the accountability/standards/testing arms race that grew out of the early 1980s.

Over the past 30 years, SC has created, implemented, revised, and changed a nearly mind-boggling array of standards and tests:

  • SC Frameworks
  • SC state standards (revised multiple times)
  • BSAP and exit exams
  • PACT
  • PASS
  • HSAP and EOC (end-of-course) tests
  • Common Core and high-stakes tests TBD
  • Two concurrent and competing sets of school report cards (the long-standing state version and the federal letter-grade based version)

Before I move on, let me add that SC is a high-poverty state (in the bottom quartile of poverty in the U.S.) that is historically and increasingly racially diverse, a right-to-work state that picks fights with unions that have no power, and a challenging environment for children of color (in the bottom third of the U.S. for African American and Latino/a children).

So now let’s return to the Accountability Hunger Games: SC Edition:

SC Senate approves replacing Common Core in 1 year

That’s right, before SC schools, teachers, and students can actually make the transition from the repeatedly revised (and obviously failed) standards-and-tests Merry-Go-Round of state-based accountability to the all-mighty Common Core gravy train of world-class and college-ready standards and next-generation high-stakes tests [insert trumpet]:

The Senate on Thursday unanimously approved a bill that replaces Common Core education standards with those developed in South Carolina by the 2015-16 school year.

The bill, which passed 42-0, is a compromise of legislation that initially sought to repeal the math and reading standards that have been rolled out in classrooms statewide since their adoption by two state boards in 2010. Testing aligned to those standards must start next year, using new tests that assess college and career readiness, or the state will lose its waiver from the all-or-nothing provisions of the federal No Child Left Behind law.

But the state won’t be able to use tests South Carolina officials helped create with 21 other states. A bid must go out by September for their replacement.

[Insert wah-wah-wah]

As ridiculous and muddled as that all is (and I think “heaping stumbling-bumbling mess of ineptitude” may be understating the level of ridiculousness), Seanna Adcox’s coverage of this likely next-move for SC is chock-full of even more ineptitude so let me counts the ways:

  1.  “‘We’re back on track,’ said Sen. Mike Fair, R-Greenville.” Fair has built a political career on his yearly efforts to dismantle SC’s science standards by inserting an endless series of not-so-clever Creationism edits to the evolution elements of those standards. [HINT: He may not be the best authority on SC’s decisions about standards.]
  2. “Democrats say the bill forces the Legislature to spend money on technology in classrooms….” Ironically and sadly, SC is an equal-opportunity state in terms of political party ineptitude. SC is notorious for our Corridor of Shame, a swipe of high-poverty communities that roughly follows I-95 across the state. I will simply ask that you return to the Kids Count report on childhood opportunity and consider where tax dollars may be better spent than on technology investments for computer-based high-stakes testing that will further stigmatize the growing number of poor children of color in SC. [HINT: It ain’t on more technology that will fail and become obsolete.]
  3. “Computer testing allows for a better assessment of both students’ abilities and teachers’ effectiveness….” And nothing like baseless and inaccurate claims to help! [HINT: Nope.]
  4. “‘This is about maintaining control,’ said Campsen, R-Isle of Palms. ‘We shouldn’t cede our authority over children’s education to an outside process.'” See above and consider the smashing good job SC has done on its own for three decades. [HINT: That last sentence is sarcasm.]
  5. “Both the House and Senate budget proposals would spend about $30 million on technology next school year, focusing on rural districts.” $30 million on technology. [HINT: $30 million on technology.]

Unless that technology plan includes a provision for turning the iPads purchased into food trays once they are obsolete in a few months, I would posit that this entire farce is beyond ineptitude.

And I must add: SC is not some looney example of ineptitude in the world of education reform (although we do tend to be on the outer edges of looney in many things); in fact, the series of fits and starts that constitute SC’s heaping stumbling-bumbling mess of ineptitude are being replicated all across the U.S. as political manipulation of education collides with Tea Party lunacy.

If we may pause, then, and consider the real problems and the likely solutions: SC has an equity problem in our state and thus in our schools, and accountability/standards/testing do not address those essential equity problems.

SC must step off the accountability Merry-Go-Round, but this latest effort suggests we are enjoying the circus too much to make any reasonable decisions.

Legalizing Marijuana Offers Lesson for Changing Course in Education Reform

The role of causality in educational research needs to be questioned on the basis that education is not the same as medicine. As Biesta says: “Being a student is not an illness, just as teaching is not a cure.” (2007, p8) We should never assume that education is a “push and pull” process of simply linear causal relationships.

Tait Coles, Take no heroes; only inspiration.

“Batman has officially been kicking the ass of Gotham’s villains for 75 years,” explains Ryan Kristobak, “and so to honor the Dark Knight, the Warner Bros. panel unveiled the ‘Batman Beyond’ animated short at this year’s WonderCon.”

For long-time and recent fans of Batman, however, the legends of the Dark Knight are complicated by the many versions that exist among the DC comic book and graphic novel universe, films, TV, animated series, and video games.

The Batman Myth has several foundational characteristics and common themes that are nested in the Caped Crusader’s first appearance in Detective Comics 27 in 1940: Batman’s essential nature as a detective and crime fighter, the ambiguous relationship between Batman and the Gotham police department and city officials, and the larger themes about justice that are contrasted by Batman’s vigilante tendencies.

In The Dark Knight Rises, the final installment of the film trilogy directed by Christopher Nolan and starring Christian Bale, the opening scene framing the film also highlights a central message reflecting how justice is traditionally characterized in the U.S. The mayor of Gotham and Commissioner Gordon preside over Harvey Dent Day, named for the district attorney who is killed as Two-Face in The Dark Knight:

[the Mayor is giving a speech being at hosted at Wayne Manor]

Mayor: Harvey Dent Day may not be our oldest public holiday, but we’re here tonight because it’s one of the most important. Harvey Dent’s uncompromising stand against organized crime has made Gotham a safer place than it was at the time of his death, eight years ago. This city has seen a historic turn around. No city is without crime, but this city is without organized crime because of Dent’s act gave law enforcement teeth in its fight against the mob. Now people are talking about repealing the Dent Act, and to them I say, not on my watch.

[the audience claps]

Mayor: I wanna thank the Wayne Foundation for hosting this event, and I’m told, Mr. Wayne couldn’t be here tonight. I’m sure he’s with us in spirit….

Mayor: Jim Gordon, can tell you the truth about Harvey Dent. He could…but I’ll let him tell you himself. Commissioner Gordon!

[the audience claps as Gordon makes his way to the stand, Gordon looks down at his prepared speech and says to himself as he remembers the real truth of what happened to Dent]

Commissioner Gordon: The truth…

[he addresses the audience]

Commissioner Gordon: I have a speech telling the truth about Harvey Dent. Maybe the time isn’t right.

[he puts the speech away in his jacket pocket]

Commissioner Gordon: Maybe right now, all you need to know is that there are one thousand inmates in Blackgate Prison as the direct result of the Dent Act. These are violent criminals, essential cogs in the organized crime machine. Maybe, for now, all I should say about the death of Harvey Dent is this; it has not been for nothing. (transcript found here)

Justice in Nolan’s Gotham reflects the central elements of justice found in the U.S.: the right laws, the right people to enforce those laws, and the evidence those laws are working represented by a growing prison population.

Reagan Era Mass Incarceration and Education Accountability

As I have detailed in Education Reform in the New Jim Crow Era, the 1980s and the Reagan administration planted the seeds of both an era of mass incarceration, labeled the New Jim Crow by Michelle Alexander, and the high-stakes accountability era in public education.

The most troubling aspects of both mass incarceration and high-stakes education accountability are that the policies have created, not ended, the claimed problems they were designed to address.

Over the past thirty years, the criminal justice system in the U.S. has filled prisons with a disproportionate number of African American men as part of our most recent war on drugs—despite whites and African Americans using recreational drugs at the same rates.

The current era of mass incarceration has unintended consequences similar to prohibition in the 1920s and 1930s:

Prohibition turned law-abiding citizens into criminals, made a mockery of the justice system, caused illicit drinking to seem glamorous and fun, encouraged neighborhood gangs to become national crime syndicates, permitted government officials to bend and sometimes even break the law, and fostered cynicism and hypocrisy that corroded the social contract all across the country. With Prohibition in place, but ineffectively enforced, one observer noted, America had hardly freed itself from the scourge of alcohol abuse – instead, the “drys” had their law, while the “wets” had their liquor.

The recent legalization of marijuana suggests a possible social recognition that traditional views of the right laws enforced by the right people and resulting in the right people sitting in prison is the wrong formula for either justice or a peaceful and equitable society.

Along with a growing number of states legalizing or decriminalizing marijuana is a concurrent discussion of releasing prior drug offenders from prison, again suggesting a social admission that the laws we establish create criminals, but rarely deter crime.

Seeking justice must not be separated from seeking equity. If the shift in how people in the U.S. view marijuana signals anything, I think, it shows a broader concern for equity: Just as changing inequitable laws surrounding powder cocaine and crack came to represent an inequitable criminal justice system, legalizing marijuana is yet another effort to move the pursuit of justice in the U.S. toward a pursuit of equity.

Legalizing Marijuana: A Lesson for Changing Course in Education Reform

The war on drugs and the resulting mass incarceration have proven to be the wrong policies for achieving justice or equity in the U.S. Directly, we know that mass incarceration negatively impacts children (see Holly Yettick and Children of the Prison Boom).

But the parallel era of high-stakes education accountability shares the central flaws now being recognized in mass incarceration: high-stakes accountability creates failure in schools, teachers, and students (see FairTest’s Reports: High Stakes Testing Hurts Education).

Under Barack Obama and Secretary of Education Arne Duncan, federal and state education policies have remained focused on identifying the right standards and the right tests, most recently Common Core standards and so-called “next generation” tests. Unlike the move toward legalizing marijuana, education reform remains trapped and unable to see the Bitter Lessons from Chasing Better Tests, as Duncan proclaimed in 2009:

Until states develop better assessments—which we will support and fund through Race to the Top—we must rely on standardized tests to monitor progress—but this is an important area for reform and an important conversation to have.

Debating the quality of Common Core and the related tests, however, are the wrong arguments because high-stakes accountability is the wrong policy paradigm just as the war on drugs and mass incarceration are the wrong policies for justice.

Adopting and implementing Common Core as yet another round of seeking the right standards and the right tests will not work. We have three decades of evidence on that approach revealing that there is no correlation between the existence or quality of standards and student achievement (see Mathis, 2012).

The war on drugs has proven to be finding ourselves in a hole and continuing to dig. Legalizing marijuana is dropping the shovel and choosing instead to acknowledge that failure and to try another approach, one more rightly attuned to equity.

This is a lesson high-stakes accountability advocates need to learn.

Common Core and the related high-stakes tests are the wrong approach to equity and high-quality education; they are finding ourselves in a hole we created and continuing to dig.

As legalizing marijuana signals a possible turn to the end of mass incarceration, we need also to end the era of high-stakes accountability in education.

Let’s choose instead An Alternative to Accountability-Based Education Reform.

Writing Is a Journey: Thoughts on Writing, College, and the SAT

A writer’s writer often ignored is James Baldwin, who examines his drive to write in the context of race:

INTERVIEWER

If you felt that it was a white man’s world, what made you think that there was any point in writing? And why is writing a white man’s world?

BALDWIN

Because they own the business. Well, in retrospect, what it came down to was that I would not allow myself to be defined by other people, white or black. It was beneath me to blame anybody for what happened to me. What happened to me was my responsibility. I didn’t want any pity. “Leave me alone, I’ll figure it out.” I was very wounded and I was very dangerous because you become what you hate. It’s what happened to my father and I didn’t want it to happen to me. His hatred was suppressed and turned against himself. He couldn’t let it out—he could only let it out in the house with rage, and I found it happening to myself as well. And after my best friend jumped off the bridge, I knew that I was next. So—Paris. With forty dollars and a one-way ticket. (The Paris Review interview)

Prompted by the announcement from the College Board that the SAT would be revamped in 2016, including dropping the writing section added in 2005, The New York Times has included a Room for Debate on Can Writing Be Assessed?

So, unlike the moment when the SAT added writing (one that heralded only doom for the field of composition), I want to take this moment to examine writing and the teaching of writing because dropping writing from the SAT may prove to be a positive watershed moment for both.

First, let me offer a few points of context.

I am 53 and have been teaching for 31 years, most of that life and career dedicated to writing and teaching writing. I read and write every day—much of that reading and writing is serious in that it is connected to my professional work. But I also read and write extensively for pleasure, including my life as a poet.

Two facts about my writing life: (1) I write because I must, not because I choose to, and (2) I am always learning to write because writing is a journey, not something one can acquire fully or finish.

As well, I strongly embrace the foundational belief that writing is an essential aspect of human liberty, autonomy, agency, and dignity; this is part of the grounding of my work as a critical educator. Living and learning must necessarily include reading, re-reading, writing, and re-writing the world (see Paulo Freire, bell hooks, and Maxine Greene, just to mention a few).

Writing is also integral to academics, in terms of learning and scholarship. Writing is part of the learning process, but it is also a primary vehicle for scholarly expression.

Next, considering the importance of writing in human agency and education, any effort to standardized the assessment of writing or to use writing assessments as gatekeepers for any child’s access to further education are essentially corrupt and corrupting.

Adding writing to the SAT in 2005, then, was one of several powerful contexts that have seriously crippled the teaching of writing in formal education; those forces include also:

All three of the above fail the fundamental value in writing because they distract from the process and act of writing as well as misread writing a a fixed skill that can be attained at some designated point along the formal education continuum.

As the Faculty Director of First Year Seminars at my university, I focus primarily on how we address the teaching of writing in those seminars (and throughout the curriculum). That role has highlighted for me a lesson I also learned while teaching high school English for 18 years: Many teachers, including English teachers, do not see themselves as writing teachers and often expect that students should come to their courses already proficient writers.

Essentially, then, using a writing assessment of some sort to identify students as college-ready as writers perpetuates the idea that we can and should have students demonstrating some fixed writing outcomes before we allow them access to higher education; this presumes in some ways that college will not be a place where people can and should learn to write.

In much the same way that the accountability paradigm is misguided in fixating on outcomes over conditions, seeing writing as a measurable skill useful for gatekeeping college entrance shifts our focus away from what experiences students need so that their continual learning to write in college can be better supported.

Yes, student outcomes matter, and samples of student writing in the right contexts may provide some powerful evidence of what students know as writers and what students need as writers. But something in the addition of writing to the 2005 SAT must not be forgotten: One-draft, timed, and prompted writing scored by rubrics, and even by computers, works against the important goals of writing [1].

Just as grading should be shunned for feedback when teaching writing (see my chapter here), the question is not if writing can be assessed, but how do we insure that all students have access to the common experiences necessary at all point along the formal education experience?

What, then, are those common experiences—and once we implement those, how do we document those experiences in order to support both students having equitable access to higher education and to the continual learning to write that must be central throughout higher education?

Some thoughts on common experiences:

  • Rich and multi-genre/media reading experiences that include choice and assigned reading. Students need to develop genre awareness and discipline-specific awareness as readers.
  • Rich and multi-genre/media writing experiences that include the following: choice and assigned writing, peer and teacher feedback and conferences, workshop experiences drafting short and extended multi-draft compositions, and discipline-specific writing experiences.
  • Analysis of and experiences with a wide range of citation and documentation style sheets for integrating primary and secondary sources in original writing.
  • Continual consideration of expectations for writing both in academic/school settings and real world settings—challenging school-based norms such as thesis sentences and template essay formats.

While this isn’t meant to be exhaustive, the point is that instead of seeking ways in which we can assess well test-based writing or continuing to explore tests and metrics that correlate strongly with actual writing proficiencies, we must commit ourselves to all students having the sorts of common experiences with writing necessary to grow as writers—both for their own agency and their academic pursuits.

Finally, if we can commit to these conditions of learning instead of outcomes, we should then find ways to gather artifacts of these common experiences to use instead of metrics as we guide students through—and not gatekeep them from—formal education.

INTERVIEWER

Did what you wanted to write about come easily to you from the start?

BALDWIN

I had to be released from a terrible shyness—an illusion that I could hide anything from anybody. (The Paris Review interview)

[1] See The New Writing Assessments: Where Are They Leading Us? (Newkirk)From Failing to Killing Writing: Computer-Based Grading, and More on Failing Writing, and Students.

NOTE: For a historical perspective on teaching writing see selected works by Lou LaBrant.

If Fewer or No Tests, Then What?

When I responded to Students Should Be Tested More, Not Less by Jessica Lahey and the related study by Henry L. Roediger III and Jeffrey D. Karpicke in the blog post Students Should Be Tested Less, Then Not at All, resulting comments and Tweets suggest that the topic of moving toward fewer and even no tests needs further discussion and clarification.

One aspect of debating the role of tests in education revolves around the term “test.” For the general public, Lahey’s headline, I am certain, triggers a relatively basic view of tests—students answering questions created by a teacher or a standardized testing company. For the general public, distinguishing between teacher-made tests and high-stakes standardized tests or between summative and formative assessments will likely not change that basic perception.

And thus, Lahey’s headline is certain to cause more problems than good in the public debate about accountability, education reform, teacher effectiveness, and student achievement.

Many have noted the headline problem, but quickly argue that Lahey’s article, and Roediger and Karpicke’s research make a valuable case for formative assessment, adding that the study also raises concerns about high-stakes standardized testing and seeks to encourage more in-class formative assessments.

As I noted in my initial post, however, Roediger and Karpicke’s study is flawed—in their narrow defining of learning as retention and recall as well as their idealizing of testing (they raise concerns, but argue the positives outweigh those negatives).

Here, then, I want to clarify that calling for fewer and then no tests is not hyperbole on my part and not some idealized goal unfit for the real world of public school. As a co-editor with Joe Bower and building off the work of Alfie Kohn, I have detailed how to de-grade and de-test the writing classroom—practices I began as a public high school English teacher for 18 years and then expanded as a writing teacher in first-year seminars.

In terms of magnitude, yes, high-stakes standardized tests are by far the most corrosive types of tests impacting negatively teaching and learning. Standardized tests remain significantly biased by race, class, and gender, and their high-stakes status encourages the worst characterizations of schools, teachers, and students while also draining valuable resources and time from teaching and learning.

Despite the tradition of using standardized tests, U.S. education should end all high-stakes standardized testing—with a reasonable compromise being the use of randomized samplings of NAEP periodically to monitor large trends in measurable student outcomes (recognizing the limitations of measurable outcomes).

While ending standardized testing, or even lessening its frequency and impact, would be a huge move forward, continuing in-class testing would remain a misguided practice. Let me offer a few reasons and then an alternative.

Even the best in-class and teacher-made tests are reductive and only partial representations of learning because testing by its nature is artificial.

For example, consider testing any courses or student activities outside the so-called core curriculum, such as visual art, music, or athletics.

A course in painting that seeks students who can create their own original paintings does not begin with paint-by-numbers, and art teachers would never rely on traditional in-class tests of any kind to represent a student’s ability as a visual artist.

High school football teams, as well, line up each Friday night and the high school players actually play football; they don’t sit in desks and take tests to decide the best team (see Childress for an elaboration on this idea).

In other words, education has conceded the least accurate process, testing, to the core courses that we deem essential, while allowing in the so-called non-essential courses and activities the most authentic demonstrations of learning and teaching practices.

If tests are inadequate for determining a student’s ability in chorus, art, or soccer (where we allow and require students and players to perform the real task), I suggest that they are also inadequate for English, math, science, and social studies.

Now, before offering a brief consideration of what should replace testing, let me also explain that testing fails because it occupies time better spent doing real activities and receiving authentic feedback from teachers. This is the same issue with isolated grammar instruction as it fails the teaching of writing.

Isolated grammar instruction does not transfer to student writing and the time spent on that futile grammar instruction would have been better spent asking students to write. Such is the case with testing—as it wastes time better spent doing whole and authentic activities.

A transition to whole and authentic activities by students in class must begin by reconsidering the place of content acquisition and retention. Most commitments to testing see content as fixed and assume that memorization of that content must come before application, evaluation, or synthesis.

This is the distorted traditional view of Bloom’s taxonomy applied both to instruction and assessment in U.S. education—a view that reduces Bloom’s work on assessment to a linear and sequential model of teaching and learning.

To embrace students engaging in whole and authentic activities instead of tests, the acquisition of knowledge must be re-imagined as the result of that engagement, not a prerequisite to that engagement.

We own and know facts, knowledge, and details because and once we have used those facts in whole and authentic ways. Again, consider how we have learned to paint a work of art, play an instrument, or participate in an athletic event. All of these require some basics, some practice, some artificial preparation, but the real learning comes from the doing, the feedback while performing as a novice, and then the re-doing, and re-doing.

About 60 years ago, Lou LaBrant (1953) lamented:

It ought to be unnecessary to say that writing is learned by writing; unfortunately there is need. Again and again teachers or schools are accused of failing to teach students to write decent English, and again and again investigations show that students have been taught about punctuation, the function of a paragraph, parts of speech, selection of “vivid” words, spelling – that students have done everything but the writing of many complete papers. Again and again college freshmen report that never in either high school or grammar school have they been asked to select a topic for writing, and write their own ideas about that subject. Some have been given topics for writing; others have been asked to summarize what someone else has said; numbers have been given work on revising sentences, filling in blanks, punctuating sentences, and analyzing what others have written….Knowing facts about language does not necessarily result in ability to use it. (p. 417)

And this is essentially my argument about testing.

If we want students to be better at taking tests, then more testing will certainly accomplish that goal (again, that is basically what Roediger and Karpicke show).

But if we redefine learning and frame our teaching goals toward whole and authentic behaviors by students, we must recognize that students learn by doing those whole and authentic things.

Instead of tests, then, and grades, students need extended blocks of time in school to perform in whole and authentic ways (ways that occur in the real world outside of school; ways that occur in art class, chorus, and band, and on athletic fields and courts) along with having teachers observing and offering rich and detailed feedback that contributes to those students trying those performances again and again.

Not tests, whether we call them formative or summative, of the artificial kind, but whole and authentic performances and rich feedback leading to more and more performances.

Again, if you seek examples of what should replace the inordinate amount of time spent testing in schools, visit an art class, chorus, an athletic event—or consider that a central aspect of science courses are labs.

Commitments to testing are commitments to the static classroom where teachers are active, students are passive, and content is central. These commitments are asking very little of students.

I am calling for de-testing and de-grading the classroom in order to increase student activity, engagement, and thus learning in ways that are whole and authentic.

As Childress concludes in his argument that football is better than high school:

What I am saying is that we have a model for learning difficult skills — a model that appears in sports, in theater, in student clubs, in music, in hobbies — and it’s a model that works, that transmits both skills and joy from adult to teenager and from one teenager to another.

For Further Reading

More on Failing Writing, and Students

Education Done To, For, or With Students?

Teacher Quality, Wiggins and Hattie: More Doing the Wrong Things the Right Ways

Students Should Be Tested Less, Then Not at All

Students Should Be Tested More, Not Less by Jessica Lahey is not a compelling case to test students more, but another example of journalism failing to represent accurately a relatively limited study related to education.

Several aspects of the article reveal that the title and apparent claim of the need for more testing are misleading:

Henry L. Roediger III, a cognitive psychologist at Washington University, studies how the brain stores, and later retrieves, memories. He compared the test results of students who used common study methods—such as re-reading material, highlighting, reviewing and writing notes, outlining material and attending study groups—with the results from students who were repeatedly tested on the same material. When he compared the results, Roediger found, “Taking a test on material can have a greater positive effect on future retention of that material than spending an equivalent amount of time restudying the material.” Remarkably, this remains true “even when performance on the test is far from perfect and no feedback is given on missed information.”

And to be fair, this is the actual abstract of the study discussed above:

A powerful way of improving one’s memory for material is to be tested on that material. Tests enhance later retention more than additional study of the material, even when tests are given without feedback [emphasis added]. This surprising phenomenon is called the testing effect, and although it has been studied by cognitive psychologists sporadically over the years, today there is a renewed effort to learn why testing is effective and to apply testing in educational settings. In this article, we selectively review laboratory studies that reveal the power of testing in improving retention [emphasis added] and then turn to studies that demonstrate the basic effects in educational settings. We also consider the related concepts of dynamic testing and formative assessment as other means of using tests to improve learning. Finally, we consider some negative consequences of testing that may occur in certain circumstances, though these negative effects are often small and do not cancel out the large positive effects of testing. Frequent testing in the classroom may boost educational achievement at all levels of education.

Not to trivialize the study, but in short, the research associates “learning” with retention (memorization), and assumes a relatively direct correlation between test scores and the narrow view of learning as retention. In other words, if you want to raise summative test scores of retention, a series of smaller (and formative) tests are more effective in raising those scores than compared study strategies.

The problem with this “well, duh” study is that it remains trapped within the testing paradigm, even though the authors do concede (and then marginalize) problems with high-stakes testing and also briefly endorse the power of formative assessment: “the general procedure of using the results of classroom assessments as feedback for teachers to guide future instruction and also for students to guide their future studying” (p. 201).

This study, however, is not a compelling argument* as the title states for more testing.

In fact, it is an ideal opportunity to argue that we must move beyond retention, recall, and memorization as foundational to what counts as learning. We must also begin to reject that traditional testing formats (including selected-response formats in the classroom as well as standardized testing such as the SAT) are credible goals or evidence of learning.

Students should be tested less, and then not at all. Students should be offered opportunities to practice and perform whole and authentic activities (such as playing an instrument, creating a work of art, composing an essay, designing a budget for a project) during class time instead of preparing for and taking a battery of narrow assessments. Additionally, students need ample teacher feedback, and not grades, as part of drafting and revision processes surrounding those activities.

Retention and enhanced memory come from authentic engagement with real behaviors that students want to perform; memorization need not precede authentic displays of understanding, and must not be a primary proxy for learning. Ultimately, memorization is not deep learning, and testing limits, and never enhances deep learning. Test scores also misrepresent student learning, teacher impact, and school quality.

Lahey’s article and the research on testing do offer valuable concerns about high-stakes associated with testing, and lends credibility to formative assessment, but both in the end remain trapped within the failed testing paradigm that needs to be lessened and then rejected entirely.

* Broadly, the authors ignore entirely issues related to who decides what should be learned; in other words, critical educators tend to explore education not bound by the traditional testing paradigm within which this study resides like a bug trapped in amber. The narrow and static view of knowledge and learning is as problematic as the idealized view of testing that the study fails to challenge.

Faith-Based Education Reform: Common Core as Standards-and-Testing Redux

Let’s start with irony:

Compelling research suggests that the public in the U.S. is unique in its commitment to belief, often at the expense of evidence—leading me to identify the U.S. as a belief culture.

Additionally, while I remain convinced that the U.S. is a belief culture, I also argue that, below, the political cartoon posted at Truthout captures another important dynamic: Many committed to their own beliefs both do not recognize that they are committed to belief and belittle others for being committed to their beliefs:

By Clay Bennett, Washington Post Writers Group | Political Cartoon
By Clay Bennett, Washington Post Writers Group | Political Cartoon

And this brings me to advocacy for Common Core standards, with one additional point: Along with embracing belief over evidence, the public (along with political leadership) in the U.S. tends to lack historical context.

Placed in the century-plus commitment to pursuing new and supposedly higher standards for public schools, then, Common Core advocacy falls into only two possible characterizations:

  1. Common Core is a response to the historical failure of all the many standards movements that have come before, and thus, the success of CC depends on CC being somehow a different and better implementation of an accountability/standards/testing paradigm.
  2. CC advocacy is yet another example of finding oneself in a hole and persisting with digging despite evidence to the contrary. In other words, CC may well be yet another commitment to a reform paradigm that isn’t appropriate regardless of how it is implemented, as John Thompson details in his review of The Allure of Order:

Jal Mehta’s masterpiece, The Allure of Order, answers the question, “Why have American [school] reformers repeatedly invested such high hopes in these instruments of control despite their track record of mixed results?” He starts with the review of how the bloom fell off the NCLB rose, explaining why its results in the toughest schools have been “miserable.” In the highest poverty schools the predictable result has been “rampant teaching to the test” which has robbed children of the opportunity to be taught in an engaging manner.

Mehta explains that this “outcome might have been surprising if it were the first time policymakers tried to use standards, tests, and accountability to remake schooling from above.” The contemporary test-driven reform movement is the third time that reformers have used the “alluring but ultimately failing brew” of top down accountability to “rationalize” schools and, again, they failed [emphasis added].

These two claims are themselves evidence-based (and it will be interesting to watch as others respond, as they have to my previous work on CC, by either ignoring evidence or garbling evidence to support what proves to be faith-based commitments to CC), and thus should provide a foundation upon which to continue the debate about CC.

CC advocacy and criticism are often based on false narratives and baseless claims (see Anthony Cody for one example of this problem and Ken Libby‘s [@kenmlibby] cataloguing on Twitter #corespiracy)—again reinforcing the pervasive and corrosive consequences of faith-based, but not evidence-based debates.

Instead, we should start with an evidence-based recognition about standards-driven education reform.

For example, the existence and/or quality of standards are not positively correlated with NAEP or international benchmark test data—leading Mathis (2012) to conclude about CC: “As the absence or presence of rigorous or national standards says nothing about equity, educational quality, or the provision of adequate educational services, there is no reason to expect CCSS or any other standards initiative to be an effective educational reform by itself [emphasis in original]” (p. 2 of 5).

Therefore, CC advocacy has some principles within which it should continue if that advocacy is to be credible and thus effective:

  • Claims that CC advocacy is separate (and can be separated) from high-stakes testing must show evidence of when standards have been implemented without high-stakes tests (and how that was effective) or evidence of some state implementing CC without high-stakes tests connected. Otherwise, this is a faith-based claim.
  • Claims that accountability built on standards and high-stakes testing is an effective education reform strategy must show evidence of how that has worked in the previous state-based accountability era and then explain why those examples of success must now be replaced by the new CC set of standards. Otherwise, this is a faith-based claim.
  • CC advocacy has been endorsed as a logical next step built on the call in NCLB for scientifically based education reform; thus, CC advocates must either comply with the two points above or concede that the CC era is a break from evidence-based reform.

I am no advocate for remaining only within rational, evidence-based, and quantifiable norms for decision making, by the way, but I am convinced we must make clear distinctions between evidence and belief—and I am equally convinced that many education reformers enjoy a flawed freedom to call for evidence from their detractors while practicing faith-based reform themselves.

It is the hypocrisy that bothers me, the hypocrisy of power:

scientist evidence – Married to the Sea

Let’s acknowledge that teachers currently work under the demand of measurable evidence of their impact on students while CC advocates impose faith-based policies such as CC, new generation high-stakes testing, merit pay, charter schools, value-added methods of teacher evaluation, and a growing list of commitments to education reform at least challenged if not refuted by evidence.

CC advocates now bear the burden of either offering the evidence identified above or admitting they are practicing faith-based education reform.

REVIEW: De-Testing and De-Grading Schools, Bower and Thomas

Reviewed by J. Spencer Clark, Utah State University, which concludes:

The purpose of this book was to offer a map of the high-stakes accountability and standardization landscape, and more importantly to provide ways to navigate this landscape in positive ways. Bower and Thomas are successful in this regard and have provided a powerful critique that equally identifies powerful alternatives to high-stakes accountability. Overall, this is a fresh look at how to meld the theories behind de-grading and de-testing schools with actual classroom practice. This book could be a useful tool for instructors of pre-service methods and assessment courses, and possibly educational foundations courses at all levels, as it provides both an analysis of key aspects of a failing system of accountability and possible alternatives to it.

Bower, J., & Thomas, P. L. (2013). De-testing and de-grading schools: Authentic alternatives to accountability and standardization. New York, NY: Peter Lang USA.

De-Testing and De-Grading Schools

GUEST POST: Continu—what? Sara Newell

Continu—what?

Sara Newell

How do you derive meaning from a number? Should a parent or student respond differently to a 97% than to a 99%? What about a 75%? How do you know what number to assign to a student created product you’ve never seen before? As a 5th grade teacher at the Charles Townes Center for highly gifted students in grades 3-8, I felt these questions were a constant thorn in my side.

My students qualified for invitation to the center in Greenville, South Carolina based on scores in the top percentile on nationally normed tests. The current, numerical grading system has always presented quite a challenge to me as a public school classroom teacher—how do I push my students to strive for excellence without encouraging the crippling effects of perfectionism? In giving students and parents a true measure of learning, personal achievement, and goal-setting, the numerical grading system always seemed to me ineffective at best. Since I began teaching gifted students, I have been in a constant struggle to find a more effective way to provide accurate feedback about their current performance while motivating them to continue to give their best effort on whatever challenges are presented next.

The assessment issues faced in our school were exaggerated versions of the problems caused by the numerical grading system in schools across the country. The nature of our students simply intensifies the problem. For example, the vast majority of my students can ace a grade-level multiple-choice test before I even engage in the first lesson. Should they just receive “A’s”? Is that what they earned? And, if I – instead – increased the depth and complexity of my instruction to provide the appropriate intellectual challenge and a student then only mastered 92% of that material—is it “fair” to assign a less than stellar grade?

This issue becomes even more important as students begin to move into high-school level courses. How does a 92 affect their GPA when they are enrolled in high-school and honors courses beginning in 7th grade? Should they be scored less than their peers who attend mainstream schools? Teachers of gifted (and all) students face these types of problems again and again as they are asked to differentiate to meet the needs of diverse learners. How does a teacher maintain some sort of equity and still challenge students appropriately? Some schools have attempted to rectify this disparity by offering higher grade points for honors or AP classes. This does not remedy the problem—it simply magnifies the spectrum of an inaccurate ruler and introduces an additional disadvantage for college applicants whose schools do not offer this option.  The issue of quality feedback and appropriate challenge remains.

For a while, I thought the solution to the problem was that I needed to design better rubrics. If I could just break assignments down into more concrete sections, the students would see what they needed to do and would be able to demonstrate mastery in a way that provided equal access to all while challenging students appropriately. (And I could still put a number on it and feel good about it.) Unfortunately, there were still roadblocks.

In a subject-integrated inquiry-based classroom, how do you quantify “delightful,” “sophisticated,” “clever” and all of the other descriptors that address the work of students who clearly went above and beyond the scope of the assignment? The scale model in gingerbread of the Metropolitan Museum of Art or the original musical composition in response to a Langston Hughes poem received a 100% that was “worth” exactly the same amount as the student who ploddingly met the minimum requirement for each element. So, the rubric was a start, but it still lacked the depth I was seeking to truly communicate effectively with my students (not-to-mention their parents) about the quality of their work. Truly authentic assessment with feedback that can guide students into becoming independent learners still seemed out of reach.

Then, our principal brought back the idea of using continua from a school visit in Seattle. These reading, writing and math continua are based on the work of Bonnie Campbell Hill and provide a system to analyze student skill and progress over many years. The lists are simple and concise. They do not include every possible state standard but instead provide an overview of the crucial skills students need to be successful.

I jumped on these tools and piloted using them with my students almost immediately. My students completed self-evaluations, rating themselves at the “beginning,” “developing,” “proficient,” or “independent” levels described on the continua. I then added my own assessment of their skills. We used these in our student-led conferences, and I could see the beginnings of evidence-based discussions in their conversations. Students were using their writing portfolios and math assessments to provide concrete support for their evaluations. This represented a terrific shift in the way students and parents thought and talked about student work.

Instead of parent comments like, “What did you miss?” or “Great job!” I was hearing, “How did you decide you were proficient in reading fluency instead of independent?”  One parent asked his son, “I didn’t know you should be reading different genres. What are you reading right now? Is that the kind of book you always read?” These conversations were so much richer than the previous years’ event which basically consisted of students proudly showing their work while their parents made appreciative mumbles and nodded their heads.  I was excited by the beginnings of the give and take that marks a truly thoughtful discussion, but something was still missing. There was still not a way to communicate the truly exceptional or the gifted student who was playing it safe.

After musing on this initial success and talking repeatedly with a middle school colleague struggling with many of the same frustrations, we decided that we needed to create an additional continuum. The difference between the “minimum doer” and the outstanding student in our school was based not only on the ability to demonstrate skill mastery, but on the willingness to strive to apply critical and creative thinking skills. With this in mind, I pulled together a number of resources and began to hammer out a draft of a critical and creative thinking skills continuum. (I still haven’t hammered out a shorter name, though.) Dr. Richard Paul’s mini-guides on critical and creative thinking, Torrance’s work on creativity and Van Tassel-Baska’s writing on application of these skills in the classroom were all of great benefit to me as I worked. My hope was that this document would bridge the gap between the seemingly arbitrary nature of a number grade and the lightning strike of truly outstanding work. I ended up with a scale more rooted in psychology and child development than pedagogy and standards. This was initially surprising, but it became more satisfying as I realized that perhaps with this tool we might finally get to the roots of why one student was clearly outperforming another and more importantly—what to do about it.

The purpose of this creative and critical thinking skills continuum is to provide specific feedback for students and parents about the students’ current progress as well as to communicate in a straightforward way the next steps in their educational growth. Numeric grades are loaded with judgment, both objective and subjective, as well as academic stigma. Students feel that a 100% means that you are perfect while a 67% means that you are a loser. I’ve even had students tell me that even numbers are better than odd numbers (a 99% means that I am a point away from perfect—the most frustrating thing—but a 98% means that I’m solidly in the high “A” category). The focus on the number rather than what the number represents is a bizarre, yet true manifestation of the problem with attempting to quantify something as variable as knowledge and learning. Students become so focused on the number and what it “means” that they completely lose sight of the true purpose of assessment– reflection and growth. A continuum has no numbers—hence, no judgment. There is no “right” or “wrong” way to evaluate oneself with this method.

On first executing the continua in my classroom, I did not ask my students to provide evidence to support their evaluations. (That came later…) It was absolutely fascinating to see students read through and begin their self-evaluations on the critical and creative thinking continuum. I only allowed one hour of the class period for students to complete their analysis of this one-page document. However, most of my students took much more time than that. The room was silent. My students were incredibly focused on their reading and analysis. As 5th graders are still fairly ego-centric at this concrete operational stage (thanks Piaget), they seemed to feel that an assessment all about them was well worth their time. The questions students asked about concepts like “intellectual humility” and perseverance got to the core of what I had been trying to teach for years. Why is it important to continue to try to find a solution to a difficult problem? What does it mean to demonstrate originality? How do I know if I am taking an intellectual risk? These were the questions that I wanted my students to ask—and this was finally a document that set the stage to ask them.

Another revelation occurred when I reviewed these documents individually. I began to get a much more relevant picture of how each child saw him or herself. It was striking to compare the self-assessments with the list of test-score data that my principal had just sent out. (Yes, we are still in a public school. And yes, we still have to do things like set learning goals based on the number of points students “should” improve on certain tests.) Those standardized test scores have been relatively meaningless to me in the past. However, coupled with the information from the continuum self-assessments, a fascinating phenomenon was revealed. By and large, students in the top performing test score group had consistently given themselves the lowest evaluations on the continuum while the students with the lowest (comparative) test scores had marked themselves as having mastered all or almost all of the critical and creative thinking skills. The Dunning Kruger effect in action! We had a tremendous class discussion about this effect—in which less competent people in a field tend to overestimate their abilities. We analyzed how it applied to their attitudes and approaches to learning.  I began to see a shift in several students’ attitudes and performance following this one illuminating discussion.

This initial work was very inspiring. I was surprised and pleased at the effort my students put into their evaluations. The vocabulary from the continuum was popping up in our discussions again and again. Instead of “I don’t get it,” I was hearing comments like, “I need to clarify this—do you mean…?” The students were beginning to look at learning through this alternate lens. I continued to have students review the continuum and reflect on their progress as we completed units of instruction. They documented their growth and reflected on their struggles.

We also used the continuum to decide on areas of focus for the next units. I previewed with the students what I felt were the “big ideas” for learning while they made choices about skills they thought it important to develop. The quality of our communication continued to improve. Our goals were aligned—I was attempting to provide opportunities for them to improve in areas that THEY had identified as needing work. This method gave them a sense of control over their own learning.

Other teachers in my school are currently working to apply the math, reading, writing, and thinking skills continua in their classrooms. In the middle school, students are expected to provide support for their analysis as they complete their initial evaluations. In the lower grades, teachers use the continua to shift the focus from what students can’t do to what students COULD do. These continua are shared between teachers vertically to provide a long-range picture of the student’s development over time. This is something that a numeric grade based on grade-level standards fails to communicate.

At first some teachers struggled with how to make the continua relevant to their students, and all teachers recognized that the thought and effort needed to accurately utilize the continua required more time than typical “grading.” However, the value of the knowledge gained far outweighs the extra effort the analysis requires.

The next breakthrough came when I began to use the critical and creative thinking continuum in one-on-one parent conferences. For years, my conferences followed a fairly typical script. First, I would go over the previous year’s test scores. Then, I would discuss grades. The parent(s) and I would discuss any issues or “concerns,” and then I would try to end on some kind of positive note. For the parents of my highest achievers though, this was not a helpful meeting. While I’m sure they enjoyed hearing me list all of the delightful adjectives that described their child, I’m not sure that they felt that they were getting a clear picture of what their child could do to continue to grow.

The use of the continua has changed our discussions. My conferences conducted this fall focused on which elements their child was clearly demonstrating as well as areas their child could continue to develop. I was able to explain the Dunning Kruger Effect to parents who thought their child was practically perfect but who in reality was barely making an effort. I described to the parents of the perfectionists what intellectual risk-taking was and how their child could begin to do it. The conversations were so much richer than in the past, and parents did not feel that I was judging their parenting, or their children.

Instead, the focus was on attributes and evidence.  Parents were surprised and fascinated when reading their child’s reflections. The conferences now were a detailed conversation about the whole child and how he or she interacted with the world. Even more importantly, parents were now able to support our classroom objectives with greater accuracy.  One parent commented, “We were delighted to discuss and learn about the Creative Continuum. The Continuum is visual and the skill sets are clearly presented…Our meeting was one of the most informative conferences I have attended.”

Shifting our focus from a numerical grading system to a continuum-based evaluation has started to address many of the assessment issues I was facing. My students have stopped asking, “Is this for a grade?” as though that alone determines the value of an assignment. I continue to work to provide more opportunities for students to develop those critical and creative thinking skills. Knowing that I am going to be asking them to evaluate their growth—I am very conscious of the need to design learning experiences that require students to demonstrate those skills.

Most importantly, the students themselves feel a sense of ownership over their learning, and now they are making the effort to ask accurate, insightful questions about what they can do and what they still need to learn to do. Removing the focus from the number grade and putting it back on the evaluation of skills and attributes improves the quality of instruction, performance and communication. The end result is a focus on authentic student learning and success.

References

Davis, G.A., & Rimm, S.B. (2009). Education of the gifted and talented (6th ed.). Boston, MA: Allyn and Bacon.

Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people fail to recognize their own incompetence (PDF). Current Directions in Psychological Science, 12(3), 83–87.

Elder, L., & Paul, R. (2005). The miniature guide to critical thinking concepts & tools. (4th ed.). Dillon Beach, CA: The Foundation for Critical Thinking.

Hill, B. C. (2008). Retrieved from http://www.bonniecampbellhill.com/support.php

Van Tassel-Baska, J., & Stambaugh, T. (2006). Comprehensive curriculum for gifted learners. (3rd ed.). New York, NY: Pearson.

For Further Reading

Rubrics

Kohn, A. (2006). The trouble with rubrics. English Journal, 95(4), 12-15.

Wilson, M. (2007). Why I won’t be using rubrics to respond to students’ writing. English Journal, 96(4), 62-66.

Wilson, M. (2006). Rethinking rubrics in writing assessment. Portsmouth, NH: Heinemann.

Self-Assessment

Liberating Grades/Liberatory Assessment, sj Miller