Category Archives: Testing

Rethinking Literacy (and All) Assessment

June 22, 2017 plthomasedd 3 Comments

To whatever degree I have been an effective teacher over a 33-year (and counting) career directly and indirectly connected to teaching literacy has been grounded in my inclination to assess constantly my practices against my instructional goals.

Teaching is some combination of curriculum (content, the what of teaching), instruction (pedagogy, the how of teaching), and assessment (testing, the monitoring of learning). When I was in teacher education as a candidate, the world of teaching was laser-focused on instruction—our learning objectives scrutinized and driving everything.

Over the three decades of accountability grounded in standards and high-stakes testing, however, and the rise of backward design, both how students are tested (test formats) and what tests address have become the primary focus of K-12 teaching.

Accountability’s state and national impact has increased the importance of standardized testing—the amount of tests students are required to take but also the format of in-class assessments teachers use to prepare students for those tests.

High-stakes and large-scale testing is governed in many ways by efficiency—formats such as multiple choice that can be marked by computer; and therefore, many K-12 teachers model their assessment content and formats on what students will face in these high-stakes environments.

Over my career, then, I have watched teaching to the test move from a practice shunned by best practice to the default norm of K-12 education.

As a committed practitioner of de-grading and de-testing the classroom, I offer below some big picture concepts that I believe every teacher should consider in order to improve the quality of grading and testing practices, in terms of if and how our assessments match our instructional goals instead of how efficient our tests are or how well our classroom assessments prepare students for (really awful) large-scale high-stakes tests.

The principles and practices below are imperative for literacy instruction and learning, but apply equally well to all learning goals and content.

Holistic v. skills (standardized tests). Let’s imagine for a moment that you wish to learn to play the piano, and you are given lessons on scales, proper fingering, etc., using worksheets. After a unit on playing the piano, you are given a multiple-choice test on that material, scoring an A.

Having never played the piano or practiced at the piano, what do you think of that A?

To be proficient in the context of efficient skills-based tests is not the same as being proficient in holistic behaviors. While the testing industry has sold us on the idea that efficient skills-based tests (usually multiple choice) correlate strongly with the authentic goals for learning we seek, we should be far more skeptical of that claim.

Along with the problem of efficiency in standardized tests and selected-response tests in class-based assessment is the historical and current purposes of large-scale testing—for example, IQ and college entrance exams such as the SAT and ACT.

IQ testing has its roots in identifying low academic ability (identifying people who were expendable) and has never overcome problems with race, class, and gender bias.

College entrance exams began as a process for distinguishing among top students; therefore, test items that create spread are “good,” regardless of how well the question achieves our instructional goals.

For classroom teachers who seek assessments that support better teaching and learning, then, we should be seeking to assess in holistic ways first, and then to expose students to the formats and expectations of high-stakes testing.

One goal for rethinking assessment is to emphasize allowing and requiring students to practice whole behaviors (composing original texts, reading full texts by choice, etc.) and then to assess students’ levels of proficiency by asking them to repeat whole behaviors in testing situations.

Accomplishment v. deficit perspective. I am certain we have all experienced and many of us have practiced this standard approach to grading a student’s test: Marking with an “X” the missed items and then totaling the grade somewhere on the sheet, such as 100 – 35 = 65.

Let’s consider for a moment the assumptions and implications (as well as negative consequences) of this process.

First, this implies that students begin tests with 100 points—for doing nothing. Further, that creates an environment in which students are trying not to lose something they did not earn to begin with.

Now, a much more honest and healthy process for all assessments is that students begin with zero, nothing, and then the teacher evaluates the test for what the student accomplishes, not looking for and marking errors (something Connie Weaver calls, and rejects, as the “error hunt”).

By avoiding a deficit perspective (starting with 100 and marking errors) and embracing an accomplishment perspective (starting with zero and giving credit for achievement), we are highlighting what our students know and helping them to overcome risk aversion fostered by traditional (behavioral) practices in school.

Moving toward an accomplishment perspective is particularly vital for literacy development since taking risks is essential for growth. It is particularly powerful when giving feedback on and grading student writing (I learned this method during Advanced Placement training on scoring written responses to the exam).

Collaboration v. isolation. “[T]he knowledge we use resides in the community,” explains Gareth Cook, examining Steven Sloman and Philip Fernbach’s The Knowledge Illusion: Why We Never Think Alone, adding, “We participate in a community of knowledge. Thinking isn’t done by individuals; it is done by communities.”

However, traditional approaches to assessment are nearly always done in isolation; collaboration in testing situations is deemed cheating, in fact.

Consider for a moment your own lives as readers and writers. What do we love to do when reading a new novel? Talk with a trusted friend about the book, right? Community and collaboration fuel a better understanding of the work.

When writing, feedback is essential, another eye on our ideas, an uninvested editor to catch our mistakes.

While many of us have embraced community and collaboration in our instruction—implementing workshops or elements of workshops—we rarely allow collaboration in assessment.

See this post for an example of collaborative assessment in my introductory education course.

Feedback v. grades. One of the most frustrating aspects of practicing a de-graded classroom is that my students often identify on their opinion surveys of my courses that I do not provide adequate feedback—because they conflate grades (which I do not give throughout the semester) with actual feedback on their assignments (which I do offer, abundantly and quickly).

Most teachers, I believe, spend far too much time grading and then students receive insufficient feedback that requires them to interact with and learn from that help.

One element of my concern is that when teachers provide extensive feedback on graded work, most students check the grade and do not engage at all with the feedback; this is a waste of the teacher’s time and not contributing to student learning.

Ideally, we should be providing ample and manageable feedback on work that requires students to address that feedback, either in some response or through revision (see below).

For literacy instruction, fore-fronting feedback, requiring and allowing revision, and then delaying grades all support a much more effective process than traditional grading.

Revision v. summative assessment. That process above embraces revision over summative grading.

Whole literacy experiences, low-stakes environments that encourage risk, high-proficiency modeling and mentoring, and then opportunities to try again, to revise—these are the tenets of powerful and effective literacy instruction and assessment.

When students experience reading and writing as one-shot events mainly produced to be graded, they are cheated out of the awareness that literacy is cyclical, and recursive—to read and then to read again, to write and then to write again.

For Paulo Freire, literacy is agency, empowerment; we must read the world and re-read the world, write and re-write the world.

At the very least, we should decrease summative assessments and grading while increasing how often we require and allow revision.

Many argue that reducing grading also removes necessary accountability for student engagement, and while I find these arguments less compelling, I do replace my use of grades with minimum requirements for credit in any class or course. And I use those minimum requirements to emphasize the aspects of learning experiences I believe are most important.

Therefore, drafting of essays and revision are required, just as conferencing is.

Ultimately, our assessment and grading policies and practices send very strong messages about what matters in our classes; we must be diligent we are sending the messages we truly embrace.

Recalibrating grade scales (with a caveat) and no more averaging grades. Debates and policies about what numerical grades constitute each letter grade—such as whether a 90, a 93, or a 94 is the lower end of the A-range—are little more, to me, than rearranging chairs on the deck of the Titanic.

Instituting uniform grade scales in schools, districts, or entire states is unlikely to produce the results proponents claim; however, some policy moves concerning grades are both warranted and highly controversial—such as creating a floor score (such as a 50 or 62) for an F.

Rick Wormeli and others have very effectively demonstrated the inequity of traditional grading scales that have about 10 points per letter grade until the F, which may have 50-70 points.

Low numerical summative grades and the flawed practice of averaging grades have very negative consequences for students—the worst of which is creating a statistical death penalty for students early in a course that may encourage those students to stop trying.

Creating a floor grade on F’s is instructionally and statistically sound, then, but only if combined with the minimum requirement concept discussed above. In other words, converting a zero to 50 or 62 when a student does poorly on an assignment is not the same thing as converting a zero to 50 or 62 when a student submits no work at all.

The latter must not be allowed since students can game the system by doing no work until late in the grading period and depending on averages to produce a passing grade for the course.

Therein lies the failure of averaging grades.

Averages skew the weight of grades earned while learning instead of honoring the assessment or assessments after students have had ample time to learn, practice, and create a showcase artifact of learning.

As well, averages are not as representative of reality as modes, for example. Consider the following grades earned by a student: 10, 10, 85, 85, 85, 85, 85, 85, 100, 100.

The average for these grades is 73, but the mode is 85, and if these grades are earned in this order (10 early and the 100 last) on cumulative assessments, the 100 is also a potentially fair grade.

Grade and grade scales, then, are incredibly flawed in their traditional uses. Combining a revised, equitable numerical/letter grade structure (with minimum requirements of participation included) and choosing modes over averaging or portfolio assessment instead of averaging is recommended if de-grading is not an option.

The concepts above about rethinking assessment are effective ways to interrogate current assessment practices, and they are urgent for improving literacy instruction.

I do urge seeking ways to de-grade and de-test the classroom regardless of what is being taught, but in the real world, I recognize that goal may seem impossible.

The ways I offer above to rethink assessment, I believe, are quite practical and certainly are justifiable once we consider if and how our assessment practices do or don’t reflect our teaching and learning goals.

And thus: “A critical pedagogy asks us to reconsider grading entirely,” argues Sean Morris, “and if we can’t abandon it whole-hog, then we must revise how and why we grade.”

Charleston Post and Courier, Coleman, College Board, Education, education reform, Equity, inequity, Meritocracy, Poverty, privilege, race, racism, SAT, Testing

Reader 22 May 2017 [UPDATED]: Connecting Dots

May 22, 2017 plthomasedd

Why people are rich and poor: Republicans and Democrats have very different views

See: UPDATE 21 (20 May 2017): Grit, Education Narratives Veneer for White, Wealth Privilege

Minorities Who ‘Whiten’ Résumés More Likely to Get Interview, Michael Harriot

“Whitening” is an all-encompassing term for when prospective employees scrub their résumés of anything that might indicate their race. Applicants with cultural names will sometimes use their initials. Community or professional work with African-American fraternities, sororities or other organizations are deleted. One student omitted a prestigious scholarship he was awarded because he feared it might reveal his race.

Although the practice sounds demeaning and reductive in the year 2017, apparently it works. In one study, researchers sent out whitened résumés and nonwhitened résumés to 1,600 employers. Twenty-five percent of black applicants received callbacks when their résumés were whitened, compared with 10 percent of the job seekers who left their ethnic details on the same résumés.

The results were the same for employers who advertised themselves as “equal opportunity employers” or said that “minorities are strongly encouraged to apply.”

Whitened Résumés: Race and Self-Presentation in the Labor Market, Sonia Kang, Katy DeCelles, András Tilcsik, and Sora Jun

Abstract

Racial discrimination in labor markets is a critical process through which organizations produce economic inequality in society. Though scholars have extensively examined the discriminatory decisions and practices of employers, the question of how job seekers try to adapt to anticipated discrimination is often overlooked. Using interviews, a laboratory experiment, and a résumé audit study, we examine racial minorities’ attempts to avoid discrimination by concealing or downplaying racial cues in job applications, a practice known as “résumé whitening.” While some minority job seekers reject this practice, others view it as essential and use a variety of whitening techniques. When targeting an employer that presents itself as valuing diversity, however, minority job applicants engage in relatively little résumé whitening and thus submit more racially transparent résumés. Yet, our audit study shows that organizational diversity statements are not actually associated with reduced discrimination against unwhitened résumés. Taken together, these findings suggest a paradox: Minorities may be particularly likely to experience disadvantage when they apply to ostensibly pro-diversity employers. These findings illuminate the role of racial concealment and transparency in modern labor markets and point to an important interplay between the self-presentation of employers and the self-presentation of job seekers in shaping economic inequality.

Experts: Conflicts over Confederate names and symbols likely to continue, Paul Hyde

But Thomas said school administrators should encourage student debate over historical figures such as Wade Hampton — as an important lesson in democracy.

“If we really think that public education is to prepare people to live in a democracy, children need to have experiences with democratic processes,” Thomas said. “I think this specific protest should be seen as an opportunity for students to see what the democratic process looks like, with everybody’s voice mattering. Principals and superintendents of public schools — they have incredibly hard jobs — but they are the people who have to show students what moral courage is. If administrators and teachers can’t show moral courage, how do we expect our children to?”

See: Dismantling Monuments: History as a Living Document

When Standardized Tests Don’t Count | Just Visiting, John Warner

And yet, when it comes to marginalized and vulnerable populations within Charleston County Schools, these standardized assessments provide a rational for top-down oversight and control.

This is entirely common and predictable. “Accountability” is often weaponized against those without the means to defend themselves.

I have no wish to upend the academic culture of the Citadel over their terrible CLA scores, but maybe some of those who are willing to give our elite storied places a pass can extend the same spirit to those who have no such protections.

See Are America’s top schools ‘elite’ or merely ‘selective?’

Why The New Sat Is Not The Answer, Akil Bello and James Murphy

If anything, the discord between them is likely to grow as the College Board pursues an equitable society using a test that is designed to mark and promote distinctions.

For all the positive changes the College Board has made, the new SAT shouldn’t be counted among them. It is a test, not a solution.

Every attempt to manage academia makes it worse, Mike Taylor

The problem is a well-known one, and indeed one we have discussed here before: as soon as you try to measure how well people are doing, they will switch to optimising for whatever you’re measuring, rather than putting their best efforts into actually doing good work.

In fact, this phenomenon is so very well known and understood that it’s been given at least three different names by different people:

Goodhart’s Law is most succinct: “When a measure becomes a target, it ceases to be a good measure.”

Campbell’s Law is the most explicit: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

The Cobra Effect refers to the way that measures taken to improve a situation can directly make it worse.

America has locked up so many black people it has warped our sense of reality, Jeff Guo

According to a Wonkblog analysis of government statistics, about 1.6 percent of prime-age white men (25 to 54 years old) are institutionalized. If all those 590,000 people were recognized as unemployed, the unemployment rate for prime-age white men would increase from about 5 percent to 6.4 percent.

For prime-age black men, though, the unemployment rate would jump from 11 percent to 19 percent. That’s because a far higher fraction of black men — 7.7 percent, or 580,000 people — are institutionalized.

UNEQUAL ENFORCEMENT: How policing of drug possession differs by neighborhood in Baton Rouge

BR inequity

College Board, Education, education reform, Educational Research, Gilles Deleuze, literacy, Michel Foucault, NCLB, Poverty, Standards, Statistics, Teaching, Testing, Writing

Reformed to Death: Discipline and Control Eclipse Education

May 19, 2017 plthomasedd

An enduring gift of being a student and a teacher is that these experiences often create lifelong and powerful personal and professional relationships. Reminiscing about these experiences, however, is often bittersweet because we are simultaneously reminded of the great promise of education as well as how too often we are completely failing that promise.

After writing about my two years as as a co-lead instructor for a local Writing Project summer institute, the former student I discussed called me, and we found ourselves wading deeply into the bittersweet.

She has in the intervening years been a co-facilitator in the same workshop where I taught her now more than 15 years ago; she also has worked in many capacities providing teachers professional development and serving as a mentor to pre-service teachers completing education programs and certification requirements.

As we talked, the pattern that emerged is extremely disturbing: the most authentic and enriching opportunities for teachers are routinely crowded out by bureaucratic and administrative mandates, often those that are far less valid as instructional practice.

In my chapter on de-grading the writing classroom, I outlined how the imposition of accountability ran roughshod over the rise of the National Writing Project (NWP), which embodied both the best of how to teach writing and a gold standard approach to professional development.

What is best for teachers and what is best for students, however, are mostly irrelevant in the ongoing high-stakes accountability approach to education reform, a process in which discipline and control eclipse education.

Local sites of the NWP are crucibles of how the reform movement is a death spiral for authentic and high-quality teaching and learning as well as teacher professionalism.

At the core of the NWP model is a charge that teachers must experience and become expert in that which they teach; therefore, to guide students through a writing workshop experience, teachers participate in extended summer writing workshop institutes.

While NWP site-based institutes and other programs thrived against the weight of the accountability era, that appears to be waning under the weight of accountability-based mandates that are in a constant state of reform; teachers are routinely required to seek new certification while they and their students must adapt to a perpetually different set of standards and high-stakes tests.

That bureaucracy is often Orwellian since “best practice” and “evidence-based”—terminology birthed in authentic contexts such as the NWP—have become markers for programs and practices that are aligned with standards and testing, not with the research base of the field. The logic is cripplingly circular and disturbingly misleading.

This erosion and erasing of teaching writing well and effectively is paralleled all across the disciplines in K-12 education, in fact—although how writing is particularly ruined in standards- and testing-based programs and practices remains our best marker of accountability as discipline and control, not as education.

I want to end here by staying with writing, but shifting to the sacred cow of the reform movement: evidence.

High-stakes testing of writing has been a part of state accountability and national testing (NAEP and, briefly, the SAT) for more than 30 years since A Nation at Risk ushered in (deceptively) the accountability era of K-12 public education in the U.S.

What do we know about high-stakes testing as well as the accountability paradigm driven by standards and tests?

George Hillocks has documented [1] that high-stakes testing of writing reduces instruction to training students to conform to anchor papers, template writing, and prescriptive rubrics. In other words, as I noted above, “best practice” and “evidence-based” became whether or not teaching and learning about writing conformed to the way students were tested—not if students had become in any way authentic or autonomous writers, and thinkers.

My own analysis of NAEP tests of writing [2] details that standardized data touted as measuring writing proficiency are strongly skewed by student reading abilities and significant problems with the alignment of the assessment’s prompts and scoring guides.

And now, we have yet more proof that education reform is fundamentally flawed, as Jill Barshay reports:

“(T)he use of the computer may have widened the writing achievement gap,” concluded the working paper, “Performance of fourth-grade students in the 2012 NAEP computer-based writing pilot assessment.” If so, that has big implications as test makers, with the support of the Department of Education, move forward with their goal of moving almost all students to computerized assessments, which are more efficient and cheaper to grade.

Not only does high-stakes testing of writing fail the research base on how best to teach composition [3], but also the pursuit of efficiency [4] continues to drive all aspects of teaching and learning, effectively contradicting the central claims of reformers to be pursuing seemingly lofty goals such as closing the achievement gap.

Writing instruction and assessment are prisoners of the cult of proficiency that is K-12 education reform, and are just one example of the larger accountability machine that has chosen discipline and control over education.

Reform has become both the means and the ends to keeping students and teachers always “starting again,” “never [to be] finished with anything,” as Gilles Deleuze observed [5].

Barshay ends her coverage of the IES study on computer-based writing assessment with a haunting fear about how evidence drives practice in a high-stakes accountability environment, a fear I guarantee will inevitably become reality:

My fear is that some educators will respond by drilling poor kids in the QWERTY keyboard, when the time would be better spent reading great works of literature and writing essays and creative stories.

As long as reforming and accountability are the masters, we will continue to make the wrong instructional decisions, we will continue to be compelled to make the wrong decisions.

[1] See Hillocks’s “FightingBack: Assessing theAssessments” and The Testing Trap: How State Writing Assessments Control Learning.

[2] See 21st Century Literacy: If We Are Scripted, Are We Literate?, co-authored with Renita Schmidt.

[3] See The Impact of the SAT and ACT Timed Writing Tests – NCTE.

[4] See NCTE Position Statement on Machine Scoring.

[5] See Gilles Deleuze, Postscript on the Societies of Control:

The administrations in charge never cease announcing supposedly necessary reforms: to reform schools, to reform industries, hospitals, the armed forces, prisons….In the disciplinary societies one was always starting again (from school to barracks, from barracks to the factory), while in the societies of control one is never finished with anything—the corporation, the educational system, the armed services being metastable states coexisting in one and the same modulation, like a universal system of deformation….In the disciplinary societies one was always starting again (from school to the barracks, from the barracks to the factory), while in the societies of control one is never finished with anything.

AlterNet, College Board, Education, education reform, Equity, inequity, Meritocracy, opportunity gap, Poverty, privilege, race, racism, SAT, Standards, Testing

Elite or Selective?: Reconsidering Who We Educate and How

May 15, 2017 plthomasedd 1 Comment

Sharde Miller’s California teen describes his road from Compton to Harvard University offers a powerful subtext about the American Dream as well as the enduring belief in education as the “great equalizer,” embodied by Elijah Devaughn Jr.:

Devaughn grew up in a single-parent household in Compton, California, a city that has been plagued by gun violence and gang activity for decades….

“Getting accepted into a prestigious university like Harvard, I think it means the world,” Devaughn said. “It means God is able. It means that hard work pays off. It means that, you know, struggles end.”

What if we unpack the label of “prestigious” by making an important caveat: Is Harvard University elite or selective?

As a point of reference, over the past three decades of high-stakes accountability in public education, schools have been annually labeled as excelling and failing; however, once we look beneath the A-F rankings, a strong and consistent correlation persists between schools identified as excelling or failing and the socio-economic status of the students [1] (as well as the racial and language demographics).

Consider also that for every year of the SAT being administered, average scores have fallen perfectly in correlation with parental income and parental years of education [2].

My university has begun gathering data to analyze our impact on students. The university is selective, having high standards for the academic backgrounds and achievements of students.

Some initial data are telling. When students with high preparation are compared to students with low preparation, extrapolating over four years of college, high preparation students are more successful and the gap with low preparation students widens during years 2 and 3 and then never closes by year 4 (year 1 and year 4 gaps are about the same).

If we persist in suggesting that education is the great equalizer (despite ample evidence education does not, in fact, equalize) and a foundational mechanism of the American Dream, we must reconsider how and why we identify any schools as “prestigious.”

Alexander W. Astin’s Are You Smart Enough? seeks to examine if our prestigious and excelling schools are elite or merely selective. Astin exposes part of the problem with labeling colleges, for example, as “prestigious”:

The “quality” or “excellence” of a college or university is thus judged on the basis of the average test score of its entering students, rather than on how well it educates them once they enroll.

What is lost in the rush to ascribe success and failure to schools is, as Astin argues, the essential charge of any formal schooling:

On the contrary, the quality of our national talent pool depends heavily on how well colleges and university develops the students’ capacities during the college years. And this mean all students.

And thus, Astin asserts: “More parents need to be asking, ‘Why should an educational system invest the least in the students who may need the most in higher education?'”

Here, then, is the dirty little secret: “Prestigious school” (K-12 as well as colleges/universities) is a veneer for “selective,” not “elite” in terms of the educational impact but in terms of the conditions at those schools.

Public universities are less selective than private liberal arts colleges, and the former experience is distinct from the latter in, for example, faculty/student ratios, class size.

In other words, more academically successful students tend to be from more affluent and well educated parents, and then are afforded higher education experiences that are identifiably superior to relatively less successful students from lower levels of affluence and education.

Reconsidering how we label schools, the “selective” versus “elite” divide, is a first step in seeking ways to turn a tarnished myth (“education is the great equalizer”) into a reality.

Too often “prestigious” and “elite” are code for “selective,” praising a college/university for gatekeeping, and not educating; too often “excellent” and “failing” are code for student demographics, ranking K-12 schools for proximity, and not educating.

Testing, ranking, and accountability in the U.S. have entrenched social and educational inequity because, as Astin confronts, “there are two very different uses for educational assessment: (a) to rank, rate, compare, and judge the performance of different learners and (b) to enhance the learning process.”

We have chosen the former, pretending as well that those metrics reflect mostly merit although they are overwhelming markers of privilege.

Let’s return to Devaughn as a rags-to-riches story.

Late in the article we learn Devaughn attended private school before his acceptance to Harvard—again bringing us back to the issue of opportunity and what we are learning at my university about well prepared students versus less prepared students.

Devaughn’s story should not be trivialized, but carefully unpacked, it does not prove what I think it intended to show. The American Dream and claims education is the great equalizer are, in fact, deforming myths.

Race, gender, and the socioeconomic factors of homes and communities remain resilient causal factors in any person’s opportunities and success:

Black unemployment is significantly higher than white unemployment regardless of educational attainment | Economic Policy Institute

All schools at any level must re-evaluate who has access to the institution, and why, and then focus on what impact the educational experience has on those students. Therein must be the evidence for determining excellence and prestige.

[1] See here and here for examples in South Carolina.

[2] See The Conversation: Tests don’t improve learning. And PARCC will be no different.

Charleston Post and Courier, Charter schools, Education, education reform, Educational Research, Teach for America, Teacher Evaluation, Teaching, Testing, VAM

Don’t Buy Bluster from Teacher Quality VAM-pires

April 12, 2017 plthomasedd Leave a comment

The responses are predictable online and through social media any time I address teacher quality and policy focusing on teacher evaluation such as my recent commentary on Charleston adopting value-added methods (VAM).

How dare I, some respond, suggest that teacher quality does not matter!

The pattern is exhausting because most responding in indignation first misrepresent what I have claimed and then make the most extreme arguments themselves in order to derail the conversation along their own agenda, usually linked to the charter school movement grounded in teacher bashing and making unobtainable promises.

So let me state here that the central elements of what we know about teacher quality and efforts such as VAM-based teacher evaluation is that teacher quality is not an independent variable (any teacher may be effective for one student and ineffective for another, for example) and, since student high-stakes testing is not designed to measure teacher quality and is more strongly linked to out-of-school factors, VAM is both a horrible technique for identifying teacher quality and, ironically, a guaranteed process for devaluing the importance of teachers.

Teacher quality is unparalleled in importance in terms of student learning, but it is also nearly impossible to measure, quantify—especially through student scores on high-stakes standardized tests.

Teacher quality VAM-pires, then, often have agendas [1] that are masked by their bluster about teacher quality.

Trying to measure and quantify teacher quality is a mistake; linking any evaluation of teacher quality to student test scores lacks validity and reliability—and VAM discourages teachers from teaching the most challenging populations of students (high-poverty, special needs, English language learners).

Focusing on simplistic and inappropriate measures reduces teacher impact to 10-15% of what high-stakes standardized testing measures; in other words, VAM itself devalues teacher quality.

My informed argument, based on 18 years as a public school classroom teacher and 15 years as a teacher educator and scholar, then, is that we must recognize teacher quality is impacted by teacher preparation, teaching/learning conditions, student characteristics, and dozens of other factors inside and outside of schools—many of which are beyond the control of teachers or students.

As well, we must address the teacher quality issues that political and administrative leaders can control: class size, school funding, and most important of all, teacher assignment.

Just as decades of research have revealed that teacher quality accounts for no more than 10-15% of student test scores, decades of research show that affluent and white students are assigned the most experienced and certified teachers while poor and black/brown students are assigned new/inexperienced and un-/under-certified teachers.

The charter school crowd’s bluster about teacher quality is pure hokum because charter schools increase that inequity of teacher assignment by depending on new and uncertified teachers such as candidates from Teach For America.

No one is saying teacher quality does not matter—I clearly am not saying that—but dishonesty about teacher quality does lay at the feet of the edu-reformers and the VAM-pires who wave their collective arms any time we call them on their failed policies and their political agendas.

[1] See the evangelical urge of Broad-trained acolytes, the resume building and cut-and-run patterns of edu-reformers, and the post-truth practices of turn-around and charter advocacy.

Education, Testing

Collaborative Assessment in the De-Graded Classroom

March 3, 2017 plthomasedd 1 Comment

Today my foundations in education class took their midterm:

My @FurmanU foundations course – midterm class discussion @pgorski https://t.co/rmZlC7Yiii pic.twitter.com/wGtpa5z9oL

— Paul Thomas (@plthomasEdD) March 3, 2017

My classes are already disorienting for students, especially our high-achieving types we attract at a selective liberal arts college, because I do not grade any assignments—although I must give a final grade in the courses.

Just before this midterm, in fact, I returned the group grade sheets that had scores of √+, √, and √-, prompting one student to ask before the exam just what grade those are.

In this course, I do not have a traditional synoptic text, but I do assign two powerful books—Paul Gorski’s Reaching and Teaching Students in Poverty: Strategies for Erasing the Opportunity Gap and Chris Emdin’s For White Folks Who Teach in the Hood…and the Rest of Y’all Too.

Gorski’s book is the first half of the course, and Emdin’s, the second; but both are explored through a book club format in which students meet in small groups four or five times over the half of the semester the book is assigned to discuss as they read.

They submit written reflections, but there are no tests, except that we use Gorski’s book for our midterm experience.

I say “experience” since the midterm I now use is a discussion, a collaborative assessment.

All students must submit before the exam period four or five talking points from Gorski’s book, noting page numbers, quotes, key ideas, and possibly connecting these with other aspects of the class such as their tutoring field work or readings connected to the topics of the course.

Then during the exam period, students have small group discussions for about 15-20 minutes before we move to a whole-class discussion, all the while I am eaves dropping only.

I then use the final 7-10 minutes to debrief about the entire low-stakes reading experience and the unusual exam format.

Since I have been doing this now for several years, some key patterns have developed.

First, and I believe important to stress, despite not being graded or tested, virtually all the students actually read the book, and then the discussions are always animated and detailed.

Students today and in the past stressed that the low-stakes (no grades, no tests) helped make the reading and discussion richer.

Next, and related, the exam itself becomes a learning experience; students have greater understanding of the material after the exam than from preparing for the exam.

In a low-stakes collaborative exam setting, students who prepare well can feel confident they will have an opportunity to succeed—unlike the anxiety that occurs when students do study intensely but find the test itself unlike what they have prepared.

Of course, and we discussed this, some negative consequences do come with low-stakes collaborative assessment such as this class discussion.

One of the most complex is how we honor very limited ways for students to be engaged—talking aloud. Introverted or self-conscious students are at a disadvantage in these “on stage” activities.

The two ways I address that is having each student send in talking points and starting with small-group discussions in which virtually all students do feel comfortable participating.

Another problem is helping students overcome their natural anxiety about not being graded since they have depended on that process for many years of schooling.

I also address that by telling students they may any time and as often as needed meet with me in order to discuss what their grade would be in the course if I were grading.

Finally, since students run the entire exam discussion, we run the risk of misinformation being shared without any real mechanism to address; however, over the years, this has rarely happened, and when it has, I simply come back to it in a later class sessions.

At first, the class discussion exam was an experiment, but now, it is a staple of my courses that has proven time and again to be one of the best days in any course I teach.

AlterNet, Charleston Post and Courier, Charter schools, Education, education reform, Educational Research, inequity, miracle schools, No Excuses Reform, opportunity gap, politics, Poverty, privilege, race, racism, School choice, school funding, Segregation, Standards, Teach for America, Teaching, Testing, Truthout

Education Reform in the Absence of Political Courage: Charleston (SC) Edition

February 15, 2017 plthomasedd 5 Comments

Words matter, and thus, I must apologize by opening here with a mundane but essential clarification of terms.

As I have written over and over, everything involving humans is necessarily political, even and especially teaching and learning. Therefore, no teacher at any level can truly be apolitical, objective. Taking a neutral or objective pose is a political choice, and an endorsement of the status quo.

Key to that claim is recognizing the difference between political and partisan. Partisan politics involves allegiance to and advocacy for organized political parties, notably Republicans and Democrats.

A partisan feels compelled to place party loyalty above ideology or ethics. To be political can be and should be a moral imperative.

We can avoid being partisan, even as that is political. And when many people call for education and educators to avoid being political, what they really are seeking is that education and educators not be partisan—a position that is achievable and one I endorse.

This distinction matters in public education and public education reform because all public institutions in the U.S. are by their tax-supported status at the mercy of partisan politics.

From around 1980, in fact, politicians at the local, state, and national levels have discovered that public education is a powerful and effective political football. The standard politician’s refrain is “Schools are horrible, and I can make them better!”

The current rise of the inexpert ruling class at the presidential level has been foreshadowed for more than three decades by the partisan politics around education reform—politicians and political appointees with no experience or expertise in education imposing pet reform initiatives onto public schools because these policies appeal to an equally mis-informed public.

Even with large failed crucibles such as New Orleans post-Katrina, political leaders remain committed to finding themselves in a hole and continuing to dig.

In my home state of South Carolina, infamous for our Corridor of Shame, Charleston, on the east coast and part of that corridor, continues to represent the savage inequalities that result from a combination of an inexpert ruling class and an absence of political courage.

Charleston schools reflect the most stark facts about and problems with K-12 education across the U.S.: private and gate-keeping public schools (such as academies, magnet schools, and some charter schools) that provide outstanding opportunities for some students in contrast to grossly ignored high-poverty, majority-minority public schools that mis-serve “other people’s children.”

As a result of these inequities and dramatically different student outcomes exposed by the accountability era obsession with test scores, Charleston has played the education reform game, committing to provably failed policies over and over: school choice, school closures and takeovers, school turnaround scams, overstating charter schools as “miracles,” and investing in Teach For America.

Why do all these policies fail and what ultimately is wrong with inexpert leadership? The absence of political courage to address directly the blunt causes of inequitable student outcomes in both the lives and education of students.

Currently in Charleston, the closing of Lincoln High and transferring those students to Wando High (see here and here) highlight that the gap between commitments to failed edureform and political courage to do something different persists.

The debates and controversy over how former Lincoln students are now performing at Wando offer some important lessons, such as the following:

The media and the public should be aware of partisan political code. A garbled reach for “the soft bigotry of low expectations” has been used to explain why Lincoln students’ grades have dropped while at Wando. The “soft bigotry” mantra is a conservative slur triggering the public’s belief in “bleeding heart liberals,” who coddle minorities. But the more damning part of the code is that it focuses blame on the administration and teachers in high-poverty, majority-minority schools and thus away from political leadership.
And thus, the public needs to distinguish between blaming educators at Lincoln for low expectations (again, garbled as “low standards”) and the expected consequences of high-poverty, majority-minority schools suffering with high teacher turnover, annual under-staffing, and persistent teacher workforces that are new and/or un-/under-certified. Additionally, the accountability era has unrealistic demands of these schools when compared to low-poverty, low-minority schools that have much greater percentages of experienced and certified teachers.
The apparent drop in student grades and test scores from Lincoln to Wando is extremely important data that deserve close scrutiny, but so far, that scrutiny has been reduced to partisan politics and deflecting blame. Dozens of reasons could explain the grade differences, including the transfer as well as the staffing differences between the two schools (neither of which is the simplistic “soft bigotry” argument used primarily to justify closing a community school).

The partisan political approaches to schools and education reform are tarnished by both willful ignorance and a confrontational blame game.

The willful ignorance of politicians and the public refuses to acknowledge huge social inequity driven by racism and white privilege; the blame game seeks ways to blame the victims of those inequities instead of confronting systemic forces.

What should political leaders be doing and what should the public be demanding that is different from the patterns identified above, than the policies already proven as failures?

Recognize that in-school only reform creates two serious problems: (1) unrealistic demands with high-stakes consequences produce unethical behavior among otherwise good people (see the Atlanta cheating scandal), and (2) since out-of-school factors overwhelmingly influence measurable student achievement, even the right in-school only reform is unlikely to result in measurable improvement.
Interrogate the proclaimed cause of low student achievement—”low expectations”—and instead seek to understand the complex reasons behind that low achievement by poor and black/brown students based on available evidence that includes carefully interviewing the administrators, teachers, and students involved.
Advocate for public policy that addresses serious inequity in the lives of children—policy impacting access to health care, a stable workforce, access to safe and stable housing, and high-quality food security.
Refuse to ignore needed in-school reform, but reject accountability-based reform for equity-based reform focusing on equitable teacher assignment for all students, articulated school funding that increases funding for schools serving struggling communities, guaranteeing the same high-quality facilities and materials for all children regardless of socioeconomic status of the communities served, and eliminating gate-keeping policies that track high-needs students into test-prep while advantaged students gain access to challenging courses such as Advanced Placement and International Baccalaureate.

Ultimately, the absence of political courage in SC and across the U.S. is where the real blame lies for inequitable student achievement along race and class lines.

Many students, the evidence shows, are doubly and triply disadvantaged by the consequences of their lives and their schools.

Trite and misleading political rhetoric, along with “soft bigotry of low expectations,” includes soaring claims that a child’s ZIP code is not destiny.

Well, in fact, ZIP code is destiny in SC and the U.S.; it shouldn’t be, but that fact will remain as long as political leadership chooses to ignore the expertise within the field of education and continues to lead without political courage.

Political courage requires direct action, even when it isn’t popular, and refuses to deflect blame, refuses to wait for what market forces might accomplish by taking the right action now.

Political courage, as James Baldwin expressed, embraces that “[t]he challenge is in the moment, the time is always now.”

For More on Political Courage

Support Betsy Devos Shoot Yourself In The Foot, Andre Perry

Black Activists Don’t Want White Allies’ Conditional Solidarity!, Stacey Patton

Education, education reform, Education Week, Media, Statistics, Testing

Don’t Count on Grading, Ranking Educational Quality

January 4, 2017 plthomasedd 2 Comments

Having been a long-time advocate for and practitioner of de-testing and de-grading the classroom, I also reject the relentless obsession of mainstream media to grade and rank educational quality among states as well as internationally (see Bracey and Kohn).

As Kohn recognizes: “Beliefs that are debatable or even patently false may be repeated so often that at some point they come to be accepted as fact.”

And thus, with the monotonous regularity and mechanical lack of imagination of a dripping faucet, Education Week once again trumpets Quality Counts.

Like a college course no one wants to register for, Quality Counts 2017 gives the nation a C while no state makes an A or an F.

The appeal of all this much ado about nothing includes:

The U.S. has a perverse obsession with quantification that is contradicted by a people who are equally resistant to science and expertise.
People love the overly simplistic use of charts and interactive maps.
These grades and rankings always confirm the enduring narrative that public schools are failing.

However, the real problem is not how states and the nation rank, but that we persist at the grading and ranking as if that process reveals something of importance (it doesn’t) or as if that process somehow is curative (it isn’t).

How, then, does grading and ranking educational quality fail us?

As with regularly changing standards and high-stakes testing as part of accountability, grading and ranking educational quality is part of the larger failure of imagination, a belief in doing the same thing over and over while expecting different results. Media have been grading and ranking for decades, and the narrative of failing schools has continued; in other words, this process has no positive impact on education reform—but it feeds a media and social need to bash public schooling.
Anything can be quantified and ranked, and the statistics needed to quantify and rank are necessarily what drive both; thus, A-F grades and then extending the measurements so that ranking is possible become goals of the process that often distort the message of that process. For a simple analogy, in the 400-meter dash at the Olympics, the event creates finishers ranked 1-10; however, all of them are world-class and the distinction among them is minuscule, for all practical purposes irrelevant except for the need to declare winners and losers.
Grades and rankings of all kinds in education focus almost entirely on observable and measurable outcomes, glossing over or ignoring powerful influences on measurable student outcomes. Decades of research show that out-of-school factors account for 60-80+% of those measurable outcomes; and thus, outcome-based data of educational quality are more likely a reflection of social conditions than school-based quality. The inherent problem with using test scores, for example, for ranking and determining educational quality has been disputed by the College Board for years (see page 13).
Grades and rankings feed into a competition model as well as deficit ideology. These are both harmful in education because collaboration is more effective than competition and because our focus is on flaws (deficits) that we associate primarily with schools, teachers, and students, perpetuating a “blame the victim” mentality that ignores (as noted above) factors beyond the control of schools, teachers, and students (such as poverty, racism, sexism, etc.,—all of which significantly impact measurable learning outcomes).
And finally, grading and ranking fail because of a common misunderstanding about statistical facts as they contradict political and public expectations: large populations of humans (90% of students attend public schools) will always have a range of measurable outcomes (height, 40-yard dash times, test scores)—although also misunderstood, think the bell-shaped curve—which will appear to be a “failure” when posed against the political/public call for 100% proficiency by students. In other words, the U.S. demands that everyone be above average and then is disappointed when statistics show a range of human outcomes.

Since the mid-1800s, fueled by the Catholic church’s market fears, there has existed a media, political, and public obsession with bashing public education.

In this era of fake news and post-truth debate, as I have noted over and over, mainstream media are as culpable—if not exactly the same—as fake news and click-bait because practices such as Quality Counts by EdWeek are lazy and misleading, enduring, as Kohn noted, mostly because it is something media have always done and because these rankings feed into confirmation bias.

If quality counts, beating the grades-and-rankings drums is a sure way to insure that it will never be obtained.

If truth matters, a first step in that direction would include resisting the failed practice of grading and ranking educational quality.

Education, education reform, Standards, Testing

Education Reform and the Eternal Failure of the Unimaginative Bureaucratic Mind

December 9, 2016 plthomasedd Leave a comment

In the 2006 film Idiocracy, the U.S. five centuries into the future is suffering crippling crop shortages due to a dust bowl that the main character (a survivor of suspended animation from the present of the film’s opening, around 2005), Private Joe Bauers, discovers is human-made since the nation of idiots has been irrigating those crops with a Gatorade-like sports drink.

This science fiction satire has experienced a resurgence due to many pundits associating the rise of Trump with the film’s extrapolation about humanity becoming more and more a nation of idiots, but for those of us in education, Idiocracy speaks to the most recent era of education reform driven by accountability, standards, and testing.

The human-made dust bowl is the result of an initial false analogy: If the sports drink, they reasoned, is a powerful fluid for human hydration, then it must be an ideal solution to struggling crops needing hydration.

If we unpack this idiotic logic a bit more, we must add that even the initial idea—sports drinks filled with sugar and salt as powerful hydration fluids—is mostly a false belief based on a great deal of clever marketing and gullibility in the consumers.

Before Bauer forces this future of idiots to reimagine their problem in order to rethink their solution, the status quo of hydrating crops with sports drink continues along with the puzzlement among the idiots about why nothing is improving.

So let’s do a little thought experiment with that film in mind.

First, consider this from Rebecca Smith:

In the late 1800s, the United States was feeling the impact of the industrial revolution. Influenced by Taylorism and the desire for scientific management, statistics and measurement were evolving as objective methods used to evaluate and systematically organize information. Education was swept up in the measurement and statistical movement. Thorndike (1918), relying on his psychological work, believed scientific measurement utilized in educational settings could create efficient systems where ‘knowledge is replacing opinion, and evidence is supplanting guess-work in education as in every other field of human activity’ (p. 15). To Thorndike, the measurement of educational products was the means by which education could become scientific through rigor, reliability, and precision….

To [Thordike], the connection between science, measurement, and human behavior was clear (Cremin, 1964). Lewis Terman published the Binet-Simon IQ Test in 1916; this test provided the context for psychologists to assess abilities, explain differences in students’ performance, and improve schools (Chapman, 1988). Standardized academic tests measure performance in the areas of handwriting, maths, and reading. Data from these tests offered superintendents, teachers, parents, or pupils ‘guidance in many different sorts of decisions and actions’ including ‘the fate of pupils, the value of methods, and the achievement of school systems’ (Thorndike, 1918, pp. 19, 22). Although Thorndike used the term ‘product’ instead of ‘data’, concepts such as rigor, reliability, and precision became part of educational discourse, measuring unseen changes in human beings. Intelligence had become objectified in numbers. The quantification of children’s intelligence, demographic characteristics, and school performance resulted in columns of numbers compared, contrasted, and evaluated in the United States. As the scientific gaze turned towards children, they became classified, compared, and evaluated according to numbers (Cannella, 1997). (pp. 3, 4)

Just a decade after this film and almost a century after Thorndike, in 2016, consider this:

In the latest international comparison of student achievement, public schools in the United States ranked no better than 24th in the world. But the public schools of Massachusetts had few peers.

Perhaps Massachusetts has something to teach the rest of the nation.

Well, unless you listen to Massachusetts, where researchers determined that two-thirds of the state’s effort at education reform has been a failure:

The evidence we have gathered strongly suggests that two of the three major “reforms” launched in the wake of the 1993 law — high-stakes testing and Commonwealth charter schools — have failed to deliver on their promises.

On the other hand, the third major component of the law, providing an influx of more than $2 billion in state funding for our schools, had a powerfully positive impact on our classrooms. But we will show that, after two decades, the formula designed to augment and equalize education funding is no longer up to the task.

So what we have here is an idiocracy of education reform, a failure of imagination to reconsider the problem and then to rethink solutions.

The U.S. need not idealize Finland or Massachusetts—or let’s not forget in 1962, it was the Swiss.

And the relentless commitment to accountability based on ever-new standards and ever-new tests is no different than the idiots continuing to hydrate crops with sports drink.

Like sports drinks, testing is inherently a sham, and our refusal to step away from a paradigm that has never worked despite countless efforts at making it work is our own version of a very real and current Idiocracy.

Critical Pedagogy, Education, education reform, Teaching, Testing

Traditional Assessment Isolates Learning, Devalues Community and Collaboration

October 17, 2016 plthomasedd 2 Comments

I attended junior high well before the rise of the middle school; therefore, I did not enter high school until 10th grade.

But the greatest shift for me as a student was my sophomore English class taught by Lynn Harrill. Throughout junior high, English class has been a never-ending Sisyphean hell of grammar textbook exercises and a sentence-diagramming marathon throughout 9th grade.

I entered high school a devoted math and science student—but more importantly, I had written essentially nothing of consequence as a student, ever.

Until Mr. Harrill’s class, in which we wrote two essays that sophomore year.

My close friends were a somewhat smaller subset of the so-called “top” students who were tracked in the honors classes. We were both socially and academically close.

By my senior year, we had begun to peer-edit our essays—which we feared was cheating because the workshop approach to teaching writing was not in practice yet and we had as “good students” learned all the unspoken lessons of schooling.

From “Cover your papers” during tests to “Don’t copy your friend’s homework,” we knew that collaboration was cheating—but my close circle of friends also knew something very important: when we were collaborative, we learned, and we learned in ways that surpassed traditional teacher-centered learning.

We were each other’s spell checkers, grammar editors, and unofficial peer-teachers.

Despite the rise of the National Writing Project and the mostly widespread awareness of process writing (although it remains too often misunderstood and mischaracterized), students throughout K-12 and university education experience traditional assessment in isolation—significantly one of the least authentic aspects of traditional assessment.

Throughout my 30-plus-year career, I have advocated for and practiced de-testing and de-grading, but during the more recent 14-plus years at the university level, I have been able to experiment more fully with how this looks in the classroom.

One element of authentic assessment and feedback for students that I have explored is moving away from assessment that isolates and toward collaborative assessment, assessment opportunities that require and emphasize community.

While university professors benefit from much greater professional autonomy than K-12 teachers, university’s still require grades and mid-term/final exams; notably, these exam sessions are pretty strictly regulated in that professors need to show some use of the exam times/days for assessment.

Since I give no tests (a practice I started while a public school English teacher), I have developed mid-term sessions that are collaborative and discussion-based.

For example, each fall my first-year writing seminars and foundations of education class have assignments that build toward spending the actual mid-term exam time in small and whole group discussions.

Class discussions as mid-term exams pose several significant problems in the context of traditional schooling. First, every teacher has experienced the resistance by students and their parents to grades on group work—especially when “good” students get nicked on grades because the group had a member who didn’t pull her/his weight.

Discussion also privileges extroverted students and, just as most of traditional class structures do, disadvantages introverted students.

And as with any form of alternative assessment, students are often uncomfortable with and may fail to perform well because of different contexts for motivation and accountability.

The classroom discussion as mid-term exam originated with a foundations of education class several years ago—as we confronted the problems with traditional grades and tests, I encouraged the class to brainstorm with me how to create a more authentic mid-term experience.

Last week, I implemented the discussion as mid-term exam in both courses I am teaching.

First-year writing students choose, contact, and interview a professor in a department students are considering for a major. Each student records the interview as an artifact to prove she/he fulfilled that requirement, but then, students come to class with several key take-aways from the interview, which focuses on the professor as a scholar and writer.

The class begins with small-group (3-4 students) discussions that I casually monitor, and then we move to a whole group discussion.

I list the departments/disciplines on the board, and I help structure the discussion to focus on what scholars do and how academics write and submit work for publication (and how some disciplines do not conform to that norm, such as artists and musicians who create and perform).

In the foundations of education course, students read Paul Gorski’s Reaching and Teaching Students in Poverty: Strategies for Erasing the Opportunity Gap throughout the first half of the course, including a few class sessions for discussions of their reading.

Before the mid-term date, students submit talking points for the class discussions. I encourage those notes to be as specific as possible (quotes, page numbers).

The class session also starts with small-group discussion and then moves to whole group, but in this class, I remain entirely outside the discussions and require the students to navigate everything.

Briefly, at the end, I have a debrief about the experience.

These assessments have a few key elements in common: requiring artifacts of participation, creating small spaces for students to share if whole-group dynamics are uncomfortable, and shifting as much ownership of the learning to the students as possible.

I have been doing this for several years now, and every single one has been impressive. The actual mid-term sessions have always impressed me in a way that no traditional tests have.

In the debrief with my foundations of education class last week, I pointedly asked them to compare the mid-term discussion of a textbook reading to a standard test or individual essay.

Students were eager to argue that the discussion was far more powerful in terms of their understanding and engaging with Gorksi’s work; in fact, the whole-class discussion became extremely animated, and I witnessed students negotiating with both Gorski’s ideas on poverty and their classmates evolving awareness about poverty.

In short, the assessment was not a mere recording of learning, but a learning experience itself.

And what I learned, what the experience reinforced for me? Learning is collaborative, knowledge is the result of a community, and traditional assessment fails miserably since it isolates learners from each other and the teacher while reducing knowledge to a commodity.

As a critical educator, I continue on a journey to practice Paulo Freire’s vision of the teacher-student charged with educating students-teachers.

Assessment as collaboration and community is both something we can all practice in traditional settings and something we must do if we honor education as an act of liberation and the classroom as a space that honors human autonomy and dignity.