Third Grade Retention: The Fool’s Gold of Reading Reform

[Header Photo by Renee Kiffin on Unsplash]

Here is a report on reading reform across the US that is very important, but likely not in the ways intended: The Effects of Early Literacy Policies on Student Achievement, John Westall and Amy Cummings.

A key value in this report is the comprehensive data on reading reform in the US, such as these two figures:

Notably, most of the US has early literacy policy, significantly clustered since about 2010. While this is important context, the figures also reveal a key problem with this report—the source being a conservative think tank, ExcelinEd.

ExcelinEd is a Jeb Bush venture and represents the political and ideological connections among third grade retention, reading policy, and political gain.

I want here to focus on that dynamic, specifically how this report provides further evidence of the need for intense and critical re-evaluation of third grade retention.

ExcelinEd is grounded in Florida’s reading reform and high rates of grade retention that have produced exceptionally high NAEP scores in grade 4 reading (an outcome this report confirms across the US), but the largest decrease from grade 4 to grade 8 reading scores.

Let’s here note what Westall and Cummings detail about grade retention:

  • Third grade retention (required by 22 states) significantly contributes to increases in early grade high-stakes assessment scores as part of comprehensive early literacy policy.
  • Retention does not appear to drive similar increases in low-stakes assessments.
  • No direct causal claim is made about the impact of retention since other policy and practices linked to retention may drive the increases.

Here is where this report is important, I think, but, again, not as intended:

Similar to the results for states with comprehensive early literacy policies, states whose policies mandate third-grade retention see significant and persistent increases in high-stakes reading scores in all cohorts. The magnitude of these estimates is similar to that of the “any early literacy policy” estimates described in Section 4.1.1 above, suggesting that states with retention components essentially explain all the average effects of early literacy policies on high-stakes reading scores. By contrast, there is no consistent evidence that high-stakes reading scores increase in states without a retention component.

Grade retention has immediate political appeal since we as a nation primarily discuss and judge schools and students based on high-stakes testing data.

What is lost in that political appeal is that this report clearly notes that we still have significant gaps in understanding the role of retention in raising test scores, evidence that early test score increases fade by middle grade testing, and evidence that retention creates inequity and non-academic harm in students.

Therefore, third grade retention is the Fool’s Gold of reading reform.

What I suspect you will not see emphasized by the most ardent reading reform advocates is the closing concessions in this report:

Although our study sheds light on the potential benefits of early literacy policies, there are some limitations that point to areas for future research. For example, while we provide evidence that comprehensive early literacy policies and retention mandates play an important role in improving state summative assessment scores, we cannot examine the mechanisms by which these policy components improve outcomes. Further research on the implementation of these policy components is therefore vital to understanding how early literacy policies operate. Additionally, we only focus on short-run test-score outcomes. However, prior work has established the importance of early literacy skills in determining non-cognitive outcomes and long-term student success (Cunningham & Stanovich, 1997; Fiester & Smith, 2010; Hernandez, 2011; Sparks et al., 2014). To fully understand the benefits of early literacy policies, it is important to enumerate their non-cognitive and long-term impacts. Finally, this study does not examine the costs associated with early literacy policies.

I want here to emphasize the need to critically examine “mechanisms by which these policy components improve outcomes.”

Again, as I have stressed before, we need a more standard and understandable set of terminology and assessments that produce NAEP and state-level high-stakes testing data that can help drive authentic reform (not misleading early gains and then drops in later grades).

Currently, NAEP “proficient” remains misleading and the terminology used in state-level testing is incredibly mixed and difficult for the media, the public, and political leaders to navigate (see the information provided here).

Next, since England has implemented early literacy reform at a comprehensive and national level beginning in 2006, we must heed to lessons found in their outcomes.

In terms of the impact of grade retention on high-stakes testing, the UK implements phonics checks that have shown score increases by age month, suggesting that age-based development could be driving scores instead of any policy or instruction:

And thus, I agree with this argument from the UK:

There is certainly a strong argument for changing primary assessment to take account of age to lessen the risk of singling out summer born pupils as the low achievers. Assessments should be fewer in number, standardised, comparable with one another and generate norm-referenced age-standardised scores. And even then, the phrase ‘below age-related expectations‘ would be a misnomer; pupils with low attainment for their age would be more appropriate. This is not about re-designing the assessment system for Ofsted; this is about creating a more efficient and effective approach that would provide accurate, timely data capable of ironing out the creases caused by differences in age and allow attainment to be tracked over time. Yes, it would allow Inspectors  – and teachers – to identify those in the lowest 20% nationally – for their age! – but it would also have an interesting side-effect: a move to age standardisation would signal the end of expected standards as we know them.

My concern has always been that since NAEP is grade-based, grade retention removes the lowest scoring students from the testing pool and then reintroduces them when they are biologically older than their grade peers. Both of those skew test data by distorting the testing pool.

The NAEP Long-term trend (LTT) data is age-based and often reveals different outcomes that grade-based NAEP.

Finally, we must start with better data but also be more honest about what we know and do not know.

The first thing we know is that high-stakes testing data is causally related to out-of-school factors at 60%+ rate.

And as this report concludes, we do not know how the matrix of policy reforms [1] impact high- and low-stake testing:

This report is incredibly important in that it does suggest that despite that complex list of different policy elements, grade retention may be the single policy that produces the outcomes that are politically attractive (this same dynamic holds in college admission where despite using a matrix of admission criteria, SAT/ACT scores often are the determining data point).

Finally, although this report identifies evidence on grade retention as mixed, the body of research over decades confirms significant negative consequences from retention.

Therefore, until we can answer these questions, we are making political and not educational decisions about early literacy in the US:

  • How causally linked is biological age with high-stakes assessment, and thus, how does grade retention distort grade-level testing?
  • What are the criteria for assessments that are labeled “reading” and does that criteria impact the ability to increase test scores without improving student achievement?
  • Are there policies and practices linked to grade retention that can support student achievement without negative outcomes for those students?
  • How do we reform reading in the US by focusing more on equity than high-stakes testing data?

I predict that if we answered these questions we would expose grade retention as Fool’s gold in reading policy.

And unless we change how we are debating and mandating reading policy, those students who need and deserve reform the most will continue to be cheated by education reform as industry.


[1] Note that although most of the current state-level reading policy is identified as conforming to the “science of reading,” many of the mandates support practices not supported by the current body of research (LETRS training, Orton-Gillingham phonics, decodable texts, etc.):

The Science of Reading: A Literature Review