Tag Archives: VAM

Teacher Motivation: Context and Culture

Catherine Joynson and Ottoline Leyser’s The culture of scientific research identifies the motivation of scientists, which:

provide[s] additional insights into how they view research, and the majority of the survey respondents clearly chose a career in science in order to find out more about the world around them. When respondents were asked to rank phrases to describe what they believe motivates them in their work, the top three were:

  1. Improving my knowledge and understanding
  2. Making scientific discoveries for the benefit of society
  3. Satisfying my curiosity

And then, they confront the impact of competition:

High levels of competition in scientific research emerged as a strong theme running through all the project activities. Applying for funding is thought to be very competitive by the majority of the survey respondents (94 per cent), as is applying for jobs and promotions (77 per cent). Around nine in ten think making discoveries and gaining peer recognition is quite or very competitive.

High levels of competition for jobs and funding in scientific research are believed by survey respondents both to bring out the best in people and to create incentives for poor quality research practices, less collaboration, and headline chasing [emphasis added]. For example, behaviours such as rushing to finish and publish research, employing less rigorous research methods and increased corner-cutting in research were raised by 29 per cent of survey respondents who commented on the effects of competition on scientists.

Immediately this analysis reminded me of the increasing political calls for weeding out “bad” teachers and concurrently rewarding good teachers, notably from the right but also from the left, including support for value-added methods (VAM) for teacher evaluation and retention as well as merit pay.

While simplistic calls for rewarding good teachers are politically popular, they fail to confront the inherent negative consequences and to acknowledge the research base on what exactly motivates teachers.

Teaching and learning are highly sensitive to the same problems noted above about science: VAM and merit pay create competitive cultures in schools, discouraging collaboration and incentivizing teachers to view their students as tools of success (and thus, creating winners and loser when we claim a goal of everyone winning).

Research shows that merit pay for teachers is harmful:

Some researchers have warned, however, that merit pay may change the relationships between teachers and students: poor students may pose threats to the teacher’s rating and rewards [emphasis added] (Johnson 1986). Another concern is that merit pay plans may encourage teachers to adjust their teaching down to the program goals, setting their sights no higher than the standards (Coltham 1972).

Odden and Kelley reviewed recent research and experience and concluded that individual merit and incentive pay programs do not work and, in fact, are often detrimental (1997). A number of studies have suggested that merit pay plans often divide faculties, set teachers against their administrators, are plagued by inadequate evaluation methods, and may be inappropriate for organizations such as schools that require cooperative, collaborative work [emphasis added] (Lawler 1983).

Evidence on VAM reveals similar warnings:

High-stakes uses of teacher VAM scores could easily have additional negative consequences for children’s education. These include increased pressure to teach to the test, more competition and less cooperation among the teachers within a school, and resentment or avoidance of students who do not score well. In the most successful schools, teachers work together effectively (Atteberry & Bryk, 2010). If teachers are placed in competition with one another for bonuses or even future employment, their collaborative arrangements for the benefit of individual students as well as the supportive peer and mentoring relationships that help beginning teachers learn to teach better may suffer. (p. 24)

So what does motivate teachers?:

Frase identified two sets of factors that affect teachers’ ability to perform effectively: work context factors (the teaching environment, and work content factors (teaching)….

Work context factors are those that meet baseline needs. They include working conditions such as class size, discipline conditions, and availability of teaching materials; the quality of the principal’s supervision; and basic psychological needs such as money, status, and security.

In general, context factors clear the road of the debris that block effective teaching. In adequate supply, these factors prevent dissatisfaction. Even the most intrinsically motivated teacher will become discouraged if the salary doesn’t pay the mortgage….

Work content factors are intrinsic to the work itself. They include opportunities for professional development, recognition, challenging and varied work, increased responsibility, achievement, empowerment, and authority. Some researchers argue that teachers who do not feel supported in these states are less motivated to do their best work in the classroom (NCES 1997).

Data from the National Center for Education Statistics (1997) confirm that staff recognition, parental support, teacher participation in school decision making, influence over school policy, and control in the classroom are the factors most strongly associated with teacher satisfaction [emphasis added]. Other research concurs that most teachers need to have a sense of accomplishment in these sectors if they are to persevere and excel in the difficult work of teaching.

VAM, merit pay, accountability built on standards and high-stakes testing, “no excuses” ideologies, zero tolerance policies—these remain essential elements of education reform although they are likely to creates the worst possible contexts and cultures necessary for teaching and learning.

Top-down and technocratic approaches to school policy, which de-professionalize teaching and teachers, are creating harmful cultures in public schools, proving further that the partisan political control of education remains tone-deaf to evidence and educators.

Teachers, like scientists above, are already quite likely to have chosen the profession in order to serve others. VAM and merit pay destroy those initial reasons for teaching.

Political commitments to harmful policies suggest the real problem in education is the motivation of those political leaders, not teachers.

Time to Invoke Reagan Directive: “And please abolish that abomination, the Department of Education”

It’s the end of the world as we know it (and I feel fine).
R.E.M.
The challenge is in the moment, the time is always now.
James Baldwin

Speaking as a witness from within the bowels of the Ronald Reagan administration when President Reagan gave the committee responsible for A Nation at Risk their prime directives, Gerald Holton ended with Reagan’s emphatic “And please abolish that abomination, the Department of Education.”

About thirty years later, we must now admit it is time to invoke the Reagan directive because the USDOE cannot be any other kind of government than the very worst kind: All uninformed bureaucracy that seeks always to dig deeper from the bottom of a very deep and fruitless hole.

While Reagan’s characterizing the USDOE as an “abomination” may have been premature in the early 1980s, we must admit now that Reagan was prescient.

Throughout the 1980s and 1990s, states scrambled to “fix” public education through a series of accountability-based bureaucratic mandates built on standards (and new standards) and high-stakes tests (and new high-stakes tests).

By the turn of the century, we witnessed the tipping point that would prove Reagan right—No Child Left Behind [1], the ultimate shifting of know-nothing bureaucracy from the states to the federal government, specifically the USDOE.

Few could have been brave enough to predict that the George W. Bush 8 years of horrible education policy could be trumped by the Obama administration, but we are now solidly in the reality that the USDOE is a total Obamination—relentless failed bureaucracy piled on top of failed bureaucracy.

Under Obama and the appointee-leadership of Secretary Arne Duncan, public education has been bombarded by competitive grants, teacher bashing, union bashing, and a series of policies at the state and federal levels that are neither supported by research nor appropriate responses to the very real problems facing public schools (many of which are beyond the walls or control of those schools).

The two latest abominations are calling for expanding value-added methods (VAM) into teacher education and ranking colleges and universities.

Even those along a wide spectrum of ideologies who believe in the promise of VAM have consistently demonstrated that VAM is not as effective as policies claim and that VAM should not be used in any high-stakes contexts for schools, teachers, or teacher education.

Those of us who see no promise for VAM add that all this expanded testing is a tremendous waste of time and money—most notably because grasping at measurable data is missing the greatest problems burdening our schools, social and educational inequities (ironically, all circumstances that could be addressed effectively if government would behave as government as demonstrated in many other countries around the world).

As Gerald Bracey explained in numerous contexts, ranking itself is fool’s gold—and in education, ranking is particularly caustic since it creates competition where we should be in collaboration (this is also a fundamental problem with VAM as a mechanism for sorting teachers, schools, or schools/departments of education).

The two most recent abominations are not unique, however, but lie in a long line including Race to the Top, Opting Out of NCLB, and Common Core.

Simply stated, these policies are designed and promoted by people with no or little experience or expertise in the field of education. Their advocacy remains plagued by the bi-partisan political tactic of simple saying things that aren’t true and then using the bully pulpit of election or appointment to plow ahead (and thus, beware the roadbuilders).

Maybe this will sound outlandish, but let’s consider what people who have taught, studied, and researched the field of education recognize about the proposal to hold colleges/departments of education accountable for the test scores of students being taught by graduates from those colleges/departments (holding grandparents responsible for their grandchildren’s behavior, in effect):

Ridiculous I suppose—like asking the legal profession to weigh in on jurisprudence or the medical profession to craft health policy. [2]

Many people have called for the ghost of Ronald Reagan, and I never counted myself among them until now. But in the waning days of 2014, I welcome that ghost of administration’s past to ramble into the room and, as Holton paraphrased, make the call once again: “And please abolish that abomination, the Department of Education.”

[1] The irony is NCLB called for scientifically based policy in education, and we have gotten anything except: Whatever Happened to Scientifically Based Research in Education Policy?

[2] Education has a very long history of being ignored as a field in terms of policy, and public education has also long labored under a misguided business model; see from Callahan, R. E. (1962). Education and the cult of efficiency: A study of the social forces that have shaped the administration of the public schools. Chicago: The University of Chicago Press:

For while schools everywhere reflect to some extent the culture of which they are a part and respond to forces within that culture, the American public schools, because of the nature of their pattern of organization, support, and control, were especially vulnerable and responded quickly to the strongest social forces. . . .The business influence was exerted upon education in several ways: through newspapers, journals, and books; through speeches at educational meetings; and, more directly, through actions of school boards. It was exerted by laymen, by professional journalists, by businessmen or industrialists either individually or in groups. . ., and finally by educators themselves. Whatever its source, the influence was exerted in the form of suggestions or demands that the schools be organized and operated in a more businesslike way and that more emphasis by placed upon a practical and immediately useful education….

The tragedy itself was fourfold: that educational questions were subordinated to business considerations; that administrators were produced who were not, in any true sense, educators; that a scientific label was put on some very unscientific and dubious methods and practices; and that an anti-intellectual climate, already prevalent, was strengthened. (pp. 1, 5-6, 246)

VAM Remedy Part of Inequity Disease

But their remedies do not cure the disease: they merely prolong it.
Indeed, their remedies are part of the disease.

Oscar Wilde, “The Soul of Man under Socialism”

In Reliability and validity of inferences about teachers based on student test scores (ETS, 2013), Edward H. Haertel draws an important conclusion about value-added methods of evaluating teachers built on standardized tests: “Tests aligned to grade-level standards cannot fully register the academic progress of students far above grade level or far below grade level,” and thus create a “bias against those teachers working with the lowest performing or the highest performing classes,” adding:

High-stakes uses of teacher VAM scores could easily have additional negative consequences for children’s education. These include increased pressure to teach to the test, more competition and less cooperation among the teachers within a school, and resentment or avoidance of students who do not score well. In the most successful schools, teachers work together effectively (Atteberry & Bryk, 2010). If teachers are placed in competition with one another for bonuses or even future employment, their collaborative arrangements for the benefit of individual students as well as the supportive peer and mentoring relationships that help beginning teachers learn to teach better may suffer. (pp. 8, 24)

All of these consequences of high-stakes testing and VAM, then, are likely to impact negatively high-poverty and minority students, who disproportionately score low on such tests.

Matthew Di Carlo’s new examination of VAM in DC reinforces Haertel’s concern:

Specifically, you’ll notice that almost 30 percent of teachers in low-poverty schools receive the highest rating (“highly effective”), compared with just 7-10 percent in the other categories. In addition, just over seven percent of teachers in low-poverty schools receive one of the two lowest ratings (“minimally effective” or “ineffective,” both of which may result in dismissal), versus 18-21 percent in the medium- and high-poverty schools.

So, the relationship between school poverty and IMPACT ratings may not be linear, as the distributions for medium- and high-poverty schools are quite similar. Nevertheless, it seems very clear that IMPACT results are generally better among teachers in schools serving lower proportions of poor students (i.e., students eligible for subsidized lunch), and that the discrepancies are quite large.

High-poverty schools already share some disturbing characteristics, including that they often reflect and perpetuate the inequities found in the homes and communities of the children they serve (see HERE and HERE). But high-poverty schools also struggle to attract and retain experienced and certified/qualified teachers.

And while virtually no one advocates for using VAM in high-stakes policies, mounting evidence shows that VAM is likely to further deter teachers from the schools and students most needing high-quality dedicated teachers.

No Child Left Behind (NCLB) and then Common Core have been sold to the public as policy intended to close the so-called achievement gap (a misnomer for the equity gap; see HERE and HERE)—just as advocates of VAM have attributed school failure to “bad” teachers and VAM as a way to rid schools of those “bad” teachers, again to address the achievement gap.

However, the evidence refutes the rhetoric because accountability built on standards and high-stakes testing, Common Core, and VAM have not and will not address equity, but are likely to increase the exact problems advocates claim they will solve (see Mathis, 2012Hout & Elliot, 2011Haertel, 2013; Di Carlo, 2014).

If left unchecked, VAM as a education reform remedy will prove to be yet another part of the inequity disease.

South Carolina Officially Vamboozled

If anyone is wondering why the “bad” teacher crisis remains central to what political leaders want the public to hear, South Carolina offers a heaping dose of why: Political leaders are enormously incompetent and need us all to remain distracted from that fact.

Case in point: SC has now been officially vamboozled, passing a new teacher evaluation system that includes the new sham word “growth” nearly 50 times in a little over 30 pages—notably (in part):

The changes described in this document will result in a support and evaluation system that is valid, reliable, and fair and that will

…use multiple valid measures (including but not limited to observations, professional practice, and student growth) in determining performance levels, with data on student growth for all students (including English Language Learners and students with disabilities) being a significant factor in the calculation of the overall effectiveness score (growth measure for teachers of tested grades and subjects include growth based on statewide assessments as a component)….

Of course, the most important point here is that VAM (teacher evaluations based on student test scores, or now euphemistically “student growth”) has been proven again and again to fail against the measures of “valid, reliable, and fair.”

SC political leadership has emphasized that their hands are tied because of federal mandates linked to opting out of NCLB and the lure of “filthy lucre” promised therein—a baffling stance taken by a state infamous for taking the most preposterous stances just to shun the federal government.

It is well past time to stop listening to political leaders who have no credibility (hint: If you hear a politician or read a politician, you are in the presence of one who has no clue) and to turn our gazes away from the distractions (hint: “bad” teachers are not the problem).

It is time to invoke the Oliver Rule about claims about and policies including VAM; thus, here is the enormous body of evidence so far refuting VAM (see related research cited within the following):

There is, however, a growth we should be concerned about because it is cancerous; that growth is VAM. Let’s remove it before it spreads.

Devaluing Teachers in the Age of Value-Added

“We teach the children of the middle class, the wealthy and the poor,” explains Anthony Cody, continuing:

We teach the damaged and disabled, the whole and the gifted. We teach the immigrants and the dispossessed natives, the transients and even the incarcerated.

In years past we formed unions and professional organizations to get fair pay, so women would get the same pay as men. We got due process so we could not be fired at an administrator’s whim. We got pensions so we could retire after many years of service.

But career teachers are not convenient or necessary any more. We cost too much. We expect our hard-won expertise to be recognized with respect and autonomy. We talk back at staff meetings, and object when we are told we must follow mindless scripts, and prepare for tests that have little value to our students.

During the 1980s and 1990s, U.S. public schools and the students they serve felt the weight of standards- and test-based accountability—a bureaucratic process that has wasted huge amounts of tax-payers’ money and incalculable time and energy assigning labels, rankings, and blame. The Reagan-era launching of accountability has lulled the U.S. into a sort of complacency that rests on maintaining a gaze on schools, students, and test data so that no one must look at the true source of educational failure: poverty and social inequity, including the lingering corrosive influences of racism, classism, and sexism.

The George W. Bush and Barack Obama eras—resting on intensified commitments to accountability such as No Child Left Behind (NCLB) and Race to the Top (RTTT)—have continued that misguided gaze and battering, but during the past decade-plus, teachers have been added to the agenda.

As Cody notes above, however, simultaneously political leaders, the media, and the public claim that teachers are the most valuable part of any student’s learning (a factually untrue claim), but that high-poverty and minority students can be taught by those without any degree or experience in education (Teach for America) and that career teachers no longer deserve their profession—no tenure, no professional wages, no autonomy, no voice in what or how they teach.

And while the media and political leaders maintain these contradictory narratives and support these contradictory policies, value-added methods (VAM) of evaluating and compensating U.S. public teachers are being adopted, again simultaneously, as the research base repeatedly reveals that VAM is yet another flawed use of high-stake accountability and testing.

When Raj Chetty, John N. Friedman, and Jonah E. Rockoff released (and re-released) reports claiming that teacher quality equates to significant earning power for students, the media and political leaders tripped over themselves to cite (and cite) those reports.

What do we know about the Chetty, et al., assertions?

From 2012:

[T]hose using the results of this paper to argue forcefully for specific policies are drawing unsupported conclusions from otherwise very important empirical findings. (Di Carlo)

These are interesting findings. It’s a really cool academic study. It’s a freakin’ amazing data set! But these findings cannot be immediately translated into what the headlines have suggested – that immediate use of value-added metrics to reshape the teacher workforce can lift the economy, and increase wages across the board! The headlines and media spin have been dreadfully overstated and deceptive. Other headlines and editorial commentary has been simply ignorant and irresponsible. (No Mr. Moran, this one study did not, does not, cannot negate  the vast array of concerns that have been raised about using value-added estimates as blunt, heavily weighted instruments in personnel policy in school systems.) (Baker)

And now, a thorough review concludes:

Can the quality of teachers be measured the way that a person’s weight or height is measured? Some economists have tried, but the “value-added” they have attempted to measure has proven elusive. The results have not been consistent over tests or over time. Nevertheless, a two-part report by Raj Chetty and his colleagues claims that higher value-added scores for teachers lead to greater economic success for their students later in life. This review of the methods of Chetty et al. focuses on their most important result: that teacher value-added affects income in adulthood. Five key problems with the research emerge. First, their own results show that the calculation of teacher value-added is unreliable. Second, their own research also generated a result that contradicts their main claim—but the report pushed that inconvenient result aside. Third, the trumpeted result is based on an erroneous calculation. Fourth, the report incorrectly assumes that the (miscalculated) result holds across students’ lifetimes despite the authors’ own research indicating otherwise. Fifth, the report cites studies as support for the authors’ methodology, even though they don’t provide that support. Despite widespread references to this study in policy circles, the shortcomings and shaky extrapolations make this report misleading and unreliable for determining educational policy.

Similar to the findings in Edward H. Haertel’s analysis of VAM, Reliability and validity of inferences about teachers based on student test scores (ETS, 2013), the American Statistical Association has issued ASA Statement on Using Value-Added Models for Educational Assessment, emphasizing:

Research on VAMs has been fairly consistent that aspects of educational effectiveness that are measurable and within teacher control represent a small part of the total variation in student test scores or growth; most estimates in the literature attribute between 1% and 14% of the total variability to teachers. This is not saying that teachers have little effect on students, but that variation among teachers accounts for a small part of the variation in scores. The majority of the variation in test scores is attributable to factors outside of the teacher’s control such as student and family background, poverty, curriculum, and unmeasured influences.

The VAM scores themselves have large standard errors, even when calculated using several years of data. These large standard errors make rankings unstable, even under the best scenarios for modeling. Combining VAMs across multiple years decreases the standard error of VAM scores. Multiple years of data, however, do not help problems caused when a model systematically undervalues teachers who work in specific contexts or with specific types of students, since that systematic undervaluation would be present in every year of data.

Among DiCarlo, Baker, Haertel and the ASA, several key patterns emerge regarding VAM: (1) VAM remains an experimental statistical model, (2) VAM is unstable and significantly impacted by factors beyond a teacher’s control and beyond the scope of that statistical model to control, and (3) implementing VAM in high-stakes policies exaggerates the flaws of VAM.

The rhetoric about valuing teachers rings hollow more and more as teaching continues to be dismantled and teachers continue to be devalued by misguided commitments to VAM and other efforts to reduce teaching to a service industry.

VAM as reform policy, like NCLB, is sham-science being used to serve a corporate need for cheap and interchangeable labor. VAM, ironically, proves that evidence does not matter in education policy.

Like all workers in the U.S., we simply do not value teachers.

Political leaders, the media, and the public call for more tests for schools, teachers, and students, but they continue to fail themselves to acknowledge the mounting evidence against test-based accountability.

And thus, we don’t need numbers to prove what Cody states directly: “But career teachers are not convenient or necessary any more.”

Conservative Leadership Poor Stewardship of Public Funds

In South Carolina and across the U.S., conservative leadership of education reform has failed to fulfill a foundational commitment to traditional values, good stewardship of public funds. [1]

The evidence of that failed stewardship is best exposed in commitments to three education reform policies: Adopting and implementing Common Core State Standards (CCSS), designing and implementing new tests based on CCSS, and proposing and field-testing revised teacher evaluations based on value-added models (VAM).

SC committed a tremendous amount of time and public funding to the accountability movement thirty years ago as one of the first states to implement state standards and high-stakes testing. After three decades of accountability, SC, like every other state in the union, has declared education still lacking and thus once again proposes a new round of education reform primarily focusing on, yet again, accountability, standards, and high-stakes testing.

Several aspects of committing to CCSS, new high-stakes tests, and teacher evaluation reform that are almost absent from the political and public debate are needs and cost/benefit analyses of these policies.

More of the Same Failed Policies?

If thirty years of accountability has failed, why is more of the same the next course of reform? If thirty years of accountability has failed, shouldn’t SC and other states first clearly establish what the problems and goals of education are before committing to any policies aimed at solving those problems or meeting those goals?

Neither of these questions have been adequately addressed, yet conservative political leadership is racing to commit a tremendous amount of public funding and public workers’ time to CCSS, an increase in high-stakes testing never experienced by any school system, and teacher evaluations proposals based on discredited test-based metrics.

Just as private corporations have reaped the rewards of tax dollars in SC during the multiple revisions of our accountability system, moving through at least three versions of tests and a maze of reformed state standards, the only guaranteed outcomes of commitments to CCSS, new tests, and reformed teacher evaluations are profits for textbook companies, test designers, and private consultants—all of whom have already begun cashing in on branding materials with CCSS and the yet-to-be designed high-stakes tests that will eventually be implemented twice a year in every class taught in the state.

SC as a state and as an education system is burdened by one undeniable major problem, inequity of opportunities in society and in schools spurred by poverty.

Numerous studies in recent years have shown that schools across the U.S. tend to reflect and perpetuate inequity; thus, children born into impoverished homes and communities are disproportionately attending schools struggling against and mirroring the consequences of poverty.

Commitments in SC to CCSS, new high-stakes tests, and reforming teacher evaluations based in large part on those new tests are at their core poor stewardship of public funding in a state that has many more pressing issues needing the support of state government.

A further problem with conservative leadership endorsing these education reforms is that much of the motivation for CCSS, new test, and reforming teacher evaluations comes from funding mandates by the federal government.

Misguided education reform is not only a blow to conservative economics but also a snub to traditional trust in local government over federal control.

Recently, as well, a special issue on VAM from Education Policy Analysis Archives (EPAA) includes two analyses that should give policy makers in SC and all states key financial reasons to pause if not halt commitments to education reform based on student test scores—the potential for legal action from a variety of stakeholders in education.

Baker, Oluwole, and Green explain: “Overly prescriptive, rigid teacher evaluation mandates, in our view, are likely to open the floodgates to new litigation over teacher due process rights. This is likely despite the fact that much of the policy impetus behind these new evaluation systems is the reduction of legal hassles involved in terminating ineffective teachers.”

Further, Pullin warns: “For public policymakers, there are strong reasons to suggest that high-stakes implementation of VAM is, at best, premature and, as a result, the potential for successful legal challenge to its use is high. The use of VAM as a policy tool for meaningful education improvement has considerable limitations, whether or not some judges might consider it legally defensible.”

Do schools across SC need education reform? Yes, just as social policy in the state needs to address poverty as a key mechanism for supporting those schools once they are reformed.

But in a state driven by traditional values and conservative political leadership, current commitments to CCSS, new high-stakes tests, and reforming teacher evaluations are neither educationally sound nor conservative.

[1] Expanded version of Op-Ed published in The State (Columbia, SC), March 8, 2013: “Conservatives poor stewards of education funds”

NFL again a Harbinger for Failed Education Reform?

During the impending NFL strike in 2011—the act of a union—I drew a comparison between how the public in the U.S. responds to unionization in different contexts:

“I am speaking about the possible NFL strike that hangs over this coming Super Bowl weekend: a struggle between billionaires and millionaires, which, indirectly, shines an important light on the rise of teacher and teacher union-bashing in the US. Adam Bessie, in Truthout, identifies how the myth of the bad teacher has evolved.”

Once again, the NFL is facing a situation that I believe and even hope is another harbinger of how education reform can be halted: A suit filed by the family of Junior Seau:

“The family said the league not only ‘propagated the false myth that collisions of all kinds, including brutal and ferocious collisions, many of which lead to short-term and long-term neurological damage to players, are an acceptable, desired and natural consequence of the game,’ but also that ‘the N.F.L. failed to disseminate to then-current and former N.F.L. players health information it possessed’ about the risks associated with brain trauma.”

This law suit has prompted a considerable amount of debate concerning whether or not the NFL as we currently know it could be dramatically reconfigured under the pressure of more law suits. In other words, the inherent but often ignored or concealed dangers of football are now being exposed by legal action, in much the same way as the tobacco industry was unmasked and thus the entire culture of smoking has radically changed in the last couple decades.

With the release of the Education Policy Analysis Archives (EPAA) Special Issue on “Value-Added Model (VAM) Research for Educational Policy,” a similar question should now be raised about the future of implementing high-stakes accountability policies that focus on teacher evaluation and retention through VAM-style metrics.

“High-Stakes Implementation of VAM,…Premature”

Two articles in the special issue from EPAA examines the validity and reliability of VAM-based teacher evaluation in high-stakes settings and then places these policies in the context of legal ramifications faced by districts and states for those policies.

“The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era” (Baker, Oluwole, & Green, 2013) identifies the current trend: “Spurred by the Race-to-the-Top program championed by the Obama administration and a changing political climate in favor of holding teachers accountable for the performance of their students, many states revamped their tenure laws and passed additional legislation designed to tie student performance to teacher evaluations” (p. 3). Because of the political and public momentum behind reforming teacher evaluation, Baker, Oluwole, and Green seek “to bring some urgency to the need to re-examine the current legislative models that put teachers at great risk of unfair evaluation, removal of tenure, and ultimately wrongful dismissal” (p. 5).

While Baker, Oluwole, and Green offer a detailed and evidence-based examination of the VAM-based and student growth model approaches to high-stakes teacher accountability, they ultimately place the weaknesses of reform policies in the context of potential challenges from teachers who believe they have been wrongfully evaluated or dismissed:

“In this section, we address the various legal challenges that might be brought by teachers dismissed under the rigid statutory structures outlined previously in this article. We also address how arguments on behalf of teachers might be framed differently in a context where value-added measures are used versus one where student growth percentiles are used. Where value-added measures are used, we suspect that teachers will have to show that while those measures were intended to attribute student achievement to their effectiveness, the measures failed to do so in a number of ways. That is, where value-added measures are used to assign effectiveness ratings, we suspect that the validity and reliability, as well as understandability of those measures would need to be deliberated at trial. However, where student growth percentiles are used, we would argue that the measures on their face are simply not designed for attributing responsibility to the teacher, and thus making such a leap would necessarily constitute a wrongful judgment. That is, one would not necessarily even have to vet the SGP measures for reliability or validity via any statistical analysis, because on their face they are invalid for this purpose.”

The analysis ultimately discredits both the use of narrow metrics to determine teacher quality and the high-stakes policies being implemented using those metrics, concluding with the ironic consequences of these policies: “Overly prescriptive, rigid teacher evaluation mandates, in our view, are likely to open the floodgates to new litigation over teacher due process rights. This is likely despite the fact that much of the policy impetus behind these new evaluation systems is the reduction of legal hassles involved in terminating ineffective teachers” (pp. 18-19).

In “Legal Issues in the Use of Student Test Scores and Value-added Models (VAM) to Determine Educational Quality” (Pullin, 2013), the rapid increase of VAM-based accountability is further examined in the context of “a wide array of potential legal issues [that] could arise from the implementation of these programs” (p. 2).

Pullin notes the motivation for reforming teacher evaluation:

“VAM initiatives are consistent with a highly publicized press from the business community and many politicians to make government services more like private business, data-driven to measure productivity and accountability (Kupermintz, 2003). VAM approaches are in part a response to concerns that the current system of selecting and compensating teachers based their education and credentials is insufficient for insuring teacher quality (Corcoran, 2011; Gordon, Kane & Staiger, 2006; Hanushek & Rivkin, 2012; Harris, 2011). There have been increasing expressions of concern that teacher evaluation practices are not robust and do not improve practice (Kennedy, 2010). In the contemporary public policy context, much of the support for the use of student test scores for educator evaluation comes from a concern that the current system for evaluation is ineffective and that the current legal protections for teachers are too cumbersome for schools seeking to terminate teachers (Harris, 2009, 2011).”

While a business model for addressing quality control of a work force may seem efficient, Pullin highlights that legal ramifications are likely with these new models.

Pullin’s analysis offers a detailed and useful examination of previous court cases involving the use of test scores to evaluate educators, including recent cases involving VAM, concluding that the picture is not clear on how the courts may rule in the future, but that a pattern exists of “heavy judicial deference to state and local education policymakers and the allure of using test scores to make decisions about education quality” (p. 5).

Further, Pullin notes “there are differences of perspective among social scientists about VAM and the defensibility of using it to make high-stakes decisions about educators,” further complicating the concerns of legal action (p. 9).

While raising many other complications, Pullin also notes that students and parents may enter legal battles using VAM metrics “to substantiate their own legal claims that schools are not meeting their obligations to provide education” (p. 14).

Pullin concludes with a sobering look at teacher quality reform built on VAM and implemented in high-stakes environments:

“In the broad contemporary public policy context for education reform, the desire for accountability and transparency in government, coupled with heavily financed criticisms of public school teachers and their unions, may mean that VAM initiatives will prevail. The concerns of education researchers about VAM, coupled with legal obligations for the validity and reliability of education and evaluation programs should require judges and education policymakers to take a closer look for future decision-making. At the same time, the social science research community should be generating substantial new and persuasive evidence about VAM and the validity and reliability of all of its potential uses. For public policymakers, there are strong reasons to suggest that high-stakes implementation of VAM is, at best, premature and, as a result, the potential for successful legal challenge to its use is high. The use of VAM as a policy tool for meaningful education improvement has considerable limitations, whether or not some judges might consider it legally defensible.” (p. 17)

Like the NFL, federal and state governments may soon be compelled to reform the reform movement under the threat of legal action from a variety of stakeholders since the science of teacher evaluation remains far behind the curve of implementation, particularly when teacher evaluation is high-stakes and based on VAM and other metrics linked to student test scores.

The special issue from EPAA is yet another call for political leadership to pause if not end wide-scale teacher evaluation and retention models that pose legal, statistical, and funding challenges that those leaders appear unwilling to acknowledge or address.

VAM: A Primer

Education reform has existed at some policy and public levels since at least the 1890s in the U.S. The current reform movement grounded in state-based accountability began in the early 1980s with a Nation at Risk, and then was nationalized in 2001 with No Child Left Behind.

In the first decades of the recent accountability era, standards and high-stakes testing were implemented and periodically revised at the state level with the primary focus being on student and school accountability. The current cycle, however, has seen an increase in policies and practices aimed at teacher accountability and increasing teacher quality—despite a solid research base showing that teacher quality constitutes only 10-15% of measurable student outcomes (test data).

The focus on increasing teacher quality and accountability has included both experiments and policies with value added methods (VAM) of determining teacher quality in order to label, rank, sort, and retain or dismiss teachers. VAM claims to isolate teacher quality through pre- and post-testing methods that seek to identify teacher quality and isolate that from the other factors reflected in test scores.

Before policy-makers and stakeholders in education commit to reforming teacher evaluation and retention, foundational questions must be addressed, and then the current facts about VAM must be acknowledged.

First, the foundational questions:

(1) What evidence exists identifying teacher quality as a primary or significant problem facing a school or district? Where, then, does teacher quality rank as a priority for a school, district, or state in terms of cost effectiveness in committing funding to the reform?

(2) Are all elements of implementing VAM at any percentage to the revised teacher evaluation process valid for determining teacher quality? In other words, what measures are taken to account for using student test scores (designed to reflect student learning, and not designed to reflect teacher quality) as data points for teacher quality?

(3) Have the teaching and learning environments for students’ home and schools been addressed to insure equitable teaching and learning environments in which determining teacher quality becomes valid?

Next, what is the current knowledge base about VAM*?:

(1) Including VAM at any percentage in reformed teacher evaluation models is currently in the experimental phase. Data and the validity of including VAM are being tested, but almost no researchers currently claim VAM (at any percentage but certainly at high percentages such as 40-50%) to be ready for widespread implementation.

(2) VAM models for labeling teacher quality are highly unstable. Teacher rankings tend to shift with new populations of students.

(3) Researchers agree that VAM is unlikely ever to be completely stable; thus, it is possible that VAM will never be a practical or fair element in teacher evaluation, particularly at the individual teacher level or in any single year.

(4) The statistical and practical requirements to isolate teacher quality in student test scores pose tremendous costs in time and funding that also may prove to be not cost effective in the context of most school systems’ priorities. In order to implement a fair and equitable teacher evaluation system that includes VAM at any percentage, states must create and implement pre- and post-tests to all students in all teachers’ courses, creating a new and costly commitment to education funding.

(5) Decades of research on high-stakes testing have shown many negative unintended consequences to accountability measures focusing on students and schools; including VAM in high-stakes accountability policies focusing on teachers is likely to have similar unintended negative consequences such as discouraging high-quality teachers from working with high-needs populations of students.

Thus, policy-makes and stakeholders are strongly cautioned to consider education reform priorities, the experimental nature of VAM, and the current knowledge base on VAM before committing tax dollars to either field testing or implementing new teacher evaluation policies built in any way on VAM.

* See a recent review of teacher evaluation reform for links and citations to numerous research studies on VAM.

Assembled Pieces Reveal Disturbing Reform Picture

Every time I write about Michelle Rhee, as I noted in a recent post, I feel like I should reenact the shower and wire-brush scene in Silkwood to purge myself of participating in the ceaseless media attention disproportionately afforded Rhee while the voices, daily efforts, and expertise of K-12 practitioners are not just ignored, but marginalized and even demonized.

So it is with a shared reservation (see Jose Vilson’s excellent post) that I once again wade into the Common Core State Standards (CCSS) debate—not to rehash my unequivocal opposition to the CCSS movement, but to offer a brief look at the picture revealed once all the pieces of the corporate/ “no excuses” reform movement puzzle are assembled. First, then, let me identify the primary pieces of that puzzle:

CCSS

National high-stakes tests built on CCSS

Reformed teacher evaluation driven by VAM-based teacher ranking

Teach for America

Charter schools

These various pieces are an effective strategy with a common thread because separately each reform element creates a focal point of debate; for an educator or researcher to challenge any one of these policies is a seemingly endless task since the reform agenda is being set by those with political and financial power. Refuting the need for new standards, much less the flaws with implementing those new standards, immediately positions educators as reactionary and allows the self-appointed reformers to characterize those challenges as being for the status quo and against reform and accountability.

For example, teachers in my home state of South Carolina who have spoken against VAM-style teacher evaluation reform have been publicly labeled by the state superintendent of education, Mick Zais, as trying to avoid being held accountable for their work.

The picture these reform pieces show is not a patchwork of evidence-based and innovative strategies for improving public education, but a carefully unified process of infusing even more deeply the power of high-stakes standardized testing into the fabric of public schools. Look beneath any of the elements listed above and find the allure of new and better tests, as Secretary of Education Arne Duncan (2010) celebrated himself:

Today is a great day! I have looked forward to this day for a long time–and so have America’s teachers, parents, students, and school leaders. Today is the day that marks the beginning of the development of a new and much-improved generation of assessments for America’s schoolchildren. Today marks the start of Assessments 2.0. And today marks one more milestone, testifying to the transformational change now taking hold in our nation’s schools under the courageous leadership and vision of state and district officials.

Duncan’s entusiasm doesn’t stop there:

This new generation of mathematics and English language arts assessments will cover all students in grades three through eight and be used at least once in high school in every state that chooses to use them. In addition, the PARCC consortium will develop optional performance tasks to inform teachers about the development of literacy and mathematics knowledge and skills in kindergarten through second grade.

I am convinced that this new generation of state assessments will be an absolute game-changer in public education. For the first time, millions of schoolchildren, parents, and teachers will know if students are on-track for colleges and careers–and if they are ready to enter college without the need for remedial instruction. Yet that fundamental shift–re-orienting K-12 education to extend beyond high school graduation to college and career-readiness–will not be the only first here.

For the first time, many teachers will have the state assessments they have longed for– tests of critical thinking skills and complex student learning that are not just fill-in-the-bubble tests of basic skills but support good teaching in the classroom.

And what provides the basis upon which Duncan makes these claims?:

Yet existing assessments are only part of the problem. An assessment system and curriculum can only be as good as the academic standards to which the assessments and curriculum are pegged. We want teachers to teach to standards–if the standards are rigorous, globally competitive, and consistent across states. Unfortunately, in the last decade, numerous states dummied down their academic standards and assessments. In effect, they lied to parents and students. They told students they were proficient and on track to college success, when they were not even close.

The Common Core standards developed by the states, coupled with the new generation of assessments, will help put an end to the insidious practice of establishing 50 different goalposts for educational success. In the years ahead, a child in Mississippi will be measured against the same standard of success as a child in Massachusetts.

Even if we account for the sort of soaring rhetoric associated with political discourse, Duncan clearly envisions policy that must include a staggering and unprecedented commitment to testing that rises to the level of parody. But for all stakeholders in public education, the results of all the policies linked to standardized testing must include a brave new world of testing that boggles the mind in terms of the amount of time and funding required to design, field test, implement, and manage pre- and post-tests aligned with CCSS for every single course and teacher year after year after year.

As Yong Zhao has detailed carefully in an exchange with Marc Tucker, commitments to education reform policy linked to CCSS and the high-stakes tests built on these new standards are not anything new, are not justified by any clearly identified problems or needs, and are not consistent with the larger democratic goals of universal public education:

[L]et me restate my main point: it is impossible, unnecessary, and harmful for a small group of individuals to predetermine and impose upon all students the same set of knowledge and skills and expect all students progress at the same pace (if the students don’t, it is the teachers’ and schools’ fault). I am not against standards per se for good standards can serve as a useful guide. What I am against is Common and Core, that is, the same standards for all students and a few subjects (currently math and English language arts) as the core of all children’s education diet. I might even love the Common Core if they were not common or core.

Classroom teachers, educational researchers, and educational historians have offered and continue to offer a clear and valid voice that Duncan’s claims and the resulting policies are deeply flawed, but as Brian Jones asks, “If all of this testing is so bad for teaching and learning, why is it spreading?” According to Jones, the answer detailed in the full picture is clear:

As the tests spread and the consequences associated with them rise, absurdities abound….

The shift toward using student data to evaluate teachers is part of a larger trend of restructuring public education to align it with the rest of the economy. As one of the last heavily unionized groups of workers in the country, teachers stand in the way of privatization. And to the extent that they are self-governed, self-motivated and enjoy professional autonomy, teachers are a ‘bad’ example for other workers.

Even though it may not make for great teaching or genuine learning, high-stakes standardized testing is spreading because it is the perfect tool for controlling and disciplining teachers–and for training the next generation to internalize the priorities of the system.

The attempt to quantify and track every aspect of an employee’s ‘performance’ is not new.

Standardized testing—the inevitable consequence of commitments to CCSS, reformed teacher evaluation, and each piece of the corporate reform puzzle—combines the veneer of objectivity with the power of perpetual control over schools, teachers, and students, what Foucault characterized as “…entering the age of infinite examination and of compulsory objectification” (p. 200):

The exercise of discipline presupposes a mechanism that coerces by means of observation; an apparatus in which the techniques that make it possible to see induce effects of power in which, conversely, the means of coercion make those on whom they are applied clearly visible….

[T]he art of punishing, in the regime of disciplinary power, is aimed neither at expiation, nor precisely at repression….It differentiates individuals from one another, in terms of the following overall rule: that the rule be made to function as a minimal threshold, as an average to be respected, or as an optimum toward which one must move. It measures in quantitative terms and hierarchizes in terms of value the abilities, the level, the ‘nature’ of individuals….The perpetual penalty that traverses all points and supervises every instant in the disciplinary institution compares, differentiates, hierachizes, homogenizes, excludes. In short, it normalizes….

The examination combines the techniques of an observing hierarchy and those of normalizing judgment. It is a normalizing gaze, a surveillance that makes possible to qualify, to classify, and to punish….

In discipline, it is the subjects who have to be seen. Their visibility assures the hold of the power that is exercised over them. It is the fact of their being constantly seen…that maintains the disciplined individual in his subjection. And the examination is the technique by which power…holds them in a mechanism of objectification. (pp. 177, 170, 197, 199)

Now, in the context of whether or not the U.S. is committed to universal public education as a central element of a commitment to democracy and individual liberty, and then whether or not education reform is seeking that foundational goal, time has come to set aside the puzzle-piece-by-puzzle-piece dismantling of the corporate reform agenda and confront directly the central flaw with the picture itself, as Jones acknowledges:

The solution to this dilemma is not to develop better tests, but to tear down the whole enterprise of high-stakes standardized assessment and replace it with authentic assessments that are organic to the process of real teaching and learning.

In sum, the attempt to quantify learning and teaching in a standardized manner is extremely expensive; takes up weeks and, in some places, months of time in school; narrows the curriculum; undermines the intrinsic joy of learning; and leads to a culture of corruption and cheating. As a measure of student learning, standardized tests are an extremely limited instrument. As a measure of teacher effectiveness, they are even more flawed.

Measuring, labeling, ranking, and then sorting students, teachers, and schools is an anti-democratic process, a dehumanizing process, and a mechanism for control. At the center of this process being antithetical to both our democracy and our faith in education is the fundamental flaw of high-stakes standardized testing.

Do many of the puzzle pieces of the corporate reform puzzle misuse standardized tests and the data drawn from those tests? Yes.

But we must not fall prey to the simplistic claim that the problem is how tests are used and not the tests themselves.

The ugly full picture of corporate reform shows that the problem is testing. Period.