Category Archives: NAEP

Education, education reform, Educational Research, NAEP, reading, SAT, Science of Reading, Statistics, Testing

SAT Lessons Never Learned: NAEP Edition

June 13, 2023 plthomasedd

Yesterday, I spent an hour on the phone with the producer of a national news series.

I realized afterward that much of the conversation reminded me of dozens of similar conversations with journalists throughout my 40-year career as an educator because I had to carefully and repeatedly clarify what standardized tests do and mean.

Annually for more than the first half of my career, I had to watch as the US slipped into Education Crisis mode when SAT scores were released.

Throughout the past five decades, I have been strongly anti-testing and anti-grades, but most of my public and scholarly work challenging testing addressed the many problems with the SAT—and notably how the media, public, and politicians misunderstand and misuse SAT data.

See these for example:

The truth about failure in US schools | Paul Thomas
Testing capitalism: Perpetuating privilege behind the masks of merit and objectivity, The International Education Journal: Comparative Perspectives, 2013, 12(2), 85–103
PISA Brainwashing: Measure, Rank, Repeat
SAT Reboot 2016: “Nonsense It All Is”

Over many years of critically analyzing SAT data as well as the media/public/political responses to the college entrance exam, many key lessons emerged that include the following:

Lesson: Populations being tested impact data drawn from tests. The SAT originally served the needs of elite students, often those seeking Ivey League educations. However, over the twentieth century, increasingly many students began taking the SAT for a variety of reasons (scholarships and athletics, for example). The shift in population of students being tested from an elite subset (the upper end of the normal curve) to a more statistically “normal” population necessarily drove the average down (a statistical fact that has nothing to do with school or student quality). While statistically valid, dropping SAT scores because of population shifts created media problems (see below); therefore, the College Board recentered the scoring of the SAT.
Lesson: Ranking by test data must account for population differences among students tested. Reporting in the media of average SAT scores for the nation and by states created a misleading narrative about school quality. Part of that messaging was grounded in the SAT reporting average SAT scores by ranking states, and then, media reporting SAT average scores as a valid assessment of state educational quality. The College Board eventually issued a caution: “Educators, the media and others should…not rank or rate teachers, educational institutions, districts or states solely on the basis of aggregate scores derived from tests that are intended primarily as a measure of individual students.” However, the media continued to rank states using SAT average scores. SAT data has always been strongly correlated with parental income, parental level of education, and characteristics of students such as gender and race. But a significant driver of average SAT scores also included rates of participation among states. See for example a comparison I did among SC, NC, and MS (the latter having a higher poverty rate and higher average SAT because of a much lower participation rate, including mostly elite students):

Lesson: Conclusions drawn from test data must acknowledge purpose of test being used (see Gerald Bracey). The SAT has one very narrow purpose—predicting first-year college grades; and the SAT has primarily one use—a data point for college admission based on its sole purpose. However, historically, media/public/political responses to the SAT have used the data to evaluate state educational quality and the longitudinal progress of US students in general. In short, SAT data has been routinely misused because most people misunderstand its purpose.

Recently, the significance of the SAT has declined, students taking the ACT at a higher rate and more colleges going test-optional, but the nation has shifted to panicking over NAEP data instead.

The rise in significance of NAEP includes the focus on “proficiency” included in NCLB mandates (which required all states to have 100% student proficiency by 2014).

The problem now is that media/public/political responses to NAEP mimic the exact mistakes during the hyper-focus on the SAT.

NAEP, like the SAT, then, needs a moment of reckoning also.

Instead of helping public and political messaging about education and education reform, NAEP has perpetuated the very worst stories about educational crisis. That is in part because there is no standard for “proficiency” and because NAEP was designed to provide a check against state assessments that could set cut scores and levels of achievement as they wanted:

Since states have different content standards and use different tests and different methods for setting cut scores, obviously the meaning of proficient varies among the states. Under NCLB, states are free to set their own standards for proficiency, which is one reason why AYP school failure rates vary so widely across the states. It’s a lot harder for students to achieve proficiency in a state that has set that standard at a high level than it is in a state that has set it lower. Indeed, even if students in two schools in two different states have exactly the same achievement, one school could find itself on a failed-AYP list simply because it is located in the state whose standard for proficient is higher than the other state’s….

Under NCLB all states must administer NAEP every other year in reading and mathematics in grades 4 and 8, starting in 2003. The idea is to use NAEP as a “check” on states’ assessment results under NCLB or as a benchmark for judging states’ definitions of proficient. If, for example, a state reports a very high percentage of proficient students on its state math test but its performance on math NAEP reveals a low percentage of proficient students, the inference would be that this state has set a relatively easy standard for math proficiency and is trying to “game” NCLB.
What’s Proficient?: The No Child Left Behind Act and the Many Meanings of Proficiency

In other words, NAEP was designed as a federal oversight of state assessments and not an evaluation tool to standardize “proficient” or to support education reform, instruction, or learning.

As a result, NAEP, as the SAT/ACT has done for years, feeds a constant education crisis cycle that also fuels concurrent cycles of education reform and education legislation that has become increasingly authoritarian (mandating specific practices and programs as well as banning practices and programs).

With the lessons from the SAT above, then, NAEP reform should include the following:

Standardizing “proficient” and shifting from grade-level to age-level metrics.
Ending state rankings and comparisons based on NAEP average scores.
Changing testing population of students by age level instead of grade level (addressing impact of grade retention, which is a form of state’s “gaming the system” that NAEP sought to correct). NAEP testing should include children in an annual band of birth months/years regardless of grade level.
Providing better explanations and guidance for reporting and understanding NAEP scores in the context of longitudinal data.
Developing a collaborative relationship between federal and state education departments and among state education departments.

While I remain a strong skeptic of the value of standardized testing, and I recognize that we over-test students in the US, I urge NAEP reform and that we have a NAEP reckoning for the sake of students, teachers, and public education.

Test Scores Reflect Media, Political Agendas, Not Student or Educational Achievement [UPDATED]

May 21, 2023 plthomasedd

In the US, the crisis/miracle obsession with reading mostly focuses on NAEP scores. For the UK, the same crisis/miracle rhetoric around reading is grounded in PIRLS.

The media and political stories around the current reading crisis cycle have interested and overlapping dynamics in these two English-dominant countries, specifically a hyper-focus on phonics.

Here are some recent media examples for context:

Let’s start with the “soar[ing]” NAEP reading scores in MS, LA, and AL as represented by AP:

‘Mississippi miracle’: Kids’ reading scores have soared in Deep South states

Now, let’s add the media response to PIRLS data in the UK:

Reading ability of children in England scores well in global survey

Now I will share data on NAEP and PIRLS that shows media and political responses to test scores are fodder for their predetermined messaging, not real reflections of student achievement or educational quality.

A key point is that the media coverage above represents a bait-and-switch approach to analyzing test scores. The claims in both the US and UK are focusing on rank among states/countries and not trends of data within states/countries.

Do any of these state trend lines from FL, MS, AL, or LA appear to be “soar[ing]” data?

The fair description of the “miracle” states identified by AP is that test scores are mostly flat, and AL, for example, appears to have peaked more than a decade ago and is trending down.

The foundational “miracle” state, MS, has had two significant increases, one before their SOR commitment and one after; but there remains no research on why the increases:

Scroll up and notice that in the UK, PIRLS scores have tracked flat and slightly down as well.

The problematic elements in all of this is that many journalists and politicians have used flat NAEP scores to shout “crisis” and “miracle,” while in the UK, the current flat and slightly down scores are reason to shout “Success!” (although research on the phonics-centered reform in England since 2006 has not delivered as promised [1]).

Many problems exist with relying on standardized tests scores to evaluate and reform education. Standardized testing remains heavily race, gender, and class biased.

But the greatest issue with tests data is that inexpert and ideologically motivated journalists and politicians persistently conform the data to their desired stories—some times crisis, some times miracle.

Once again, the stories being sold—don’t buy them.

UPDATE

Mainstream media continues to push a false story about MS as a model for the nation. Note that MS, TN, AL, and LA demonstrate that political manipulation of early test data is a mirage, not a miracle.

All four states remain at the bottom of NAEP reading scores for both proficient and basic a full decade into the era of SOR reading legislation:

Crisis, Education, education reform, Educational Research, NAEP, reading, Science of Reading

The Proficiency Trap and the Never-Ending Crisis Cycles in Education: A Reader

May 7, 2023 plthomasedd

The newest NAEP crisis (until the next one) concerns history and civics NAEP scores post-pandemic.

Similar to the NAEP crisis around reading—grounded in a misunderstanding of “proficiency” and what NAEP shows longitudinally (see Mississippi, for example)—this newest round of crisis rhetoric around NAEP exposes a central problem with media, public, and political responses to test data as well as embedding proficiency mandates in accountability legislation.

As many have noted, announcing a reading crisis is contradicted by longitudinal NAEP data:

But possibly a more problematic issue with NAEP is confusing NAEP achievement levels with commonly used terms such as “grade level proficiency” (notably as related to reading).

Yet, as is explained clearly on the NAEP web site: “It should be noted that the NAEP Proficient achievement level does not represent grade level proficiency as determined by other assessment standards (e.g., state or district assessments).”

Public, media, and political claims that 2/3 of students are below grade level proficiency, then, is a false claim based on misreading NAEP data and misunderstanding the term “proficiency,” which is determined by each assessment or state (not a fixed metric).

Here is a reader for those genuinely interested in understanding NAEP data, what we mean by “proficiency,” and why expecting all students to be above any level of achievement is counter to understanding human nature (recall the failed effort in NCLB to mandate 100% of student achievement proficiency by 2014):