More Testing ≠ The Testing Effect: What is the Testing Effect?

The Testing Effect is a well-documented phenomenon: retrieving information through testing significantly improves long-term retention. Intrigued by this, I’ve spent the past few weeks reading more deeply into the research behind it. One thing has become clear: the testing effect is often misunderstood and misapplied. 


I remember the start of my teaching career and how often I tested my pupils. During my PGCE, the school I was placed in assessed students with a mid-topic test, an end-of-topic test, an end-of-term test, and an end-of-year exam. The teachers were constantly marking. Following the assessment, we wasted time creating intervention lessons, intervention tasks of different colours for different students and revision for their upcoming assessment, only two weeks later.

There was this constant drive to summatively assess pupil understanding, and at the time, I felt completely overwhelmed, especially by the sheer volume of red ink I was using. When speaking to another teacher about this, I was then introduced to the testing effect and held this completely wrong understanding of the testing effect for the first few years of my career.

Research into the Testing Effect

The testing effect has several origins, but most attribute the first empirical evidence of the effect as a method to improve student outcomes was Herbert Spitzer in his 1939 paper Studies in Retention. He tested 3,600 students split into groups; one had repeated testing and the other had no testing. It was this paper that recognised that testing students improved their retention of information over time.

While this was one of the first clear examples of the testing effect, it’s worth examining what Spitzer actually tested: He gave students two texts and asked students to remember as much as possible. They were split into eight groups; some took a test immediately, while others repeated the exact same test multiple times over several weeks. The study then concluded with both groups completing the test again. He noted that those who were repeatedly tested performed better than those who were not. Ergo, the testing effect.

In reality, this is impractical in a school setting. For example, in science, pupils are constantly acquiring new knowledge, and to run the same test over nine weeks would be, frankly, a pointless task. Even if they retain information from the first topic, when would I assess the second, or the third?

Even from this original study, the testing effect has been clearly defined as the end phenomenon of pupils being continually assessed – it is not the test, it is observed from regular retrieval of information. In other words, retrieval practice.

Further studies on retrieval from Tulving, Bjork, and the classic article from Roediger and Karpicke showed that retrieval practice leads to the testing effect. Their studies show that it is the act of retrieval, not the test, that strengthens memory and leads to the testing effect.

What is the Testing Effect?

During the beginning of my career, I was constantly testing students and stuck in an endless cycle of marking and feedback, all in the name of the testing effect. At the time, I believed the research justified it. But looking back, I can’t help but see how much time I wasted for such little impact. The effect on student outcomes was minimal, and the hours I spent could have been used far more effectively.

Through reading, I have developed the following answer for when anyone asks me: What is the testing effect?

The testing effect is the end phenomenon of pupils being continually assessed – it is not the test, it is the phenomenon observed from regular retrieval of information – retrieval practice. 

Whilst some testing is of course necessary, the evidence just isn’t there for the relentless cycle of testing, marking, and feedback. When teachers and students are constantly in high-stakes testing mode, more time is spent collecting data than actually using it. Spreadsheets might show that Timmy can’t recall the names of the periodic table groups, but with another test looming in two weeks (on content that hasn’t even been taught yet), the teacher has no time to intervene. They’re already planning interventions based on the next set of data from the next test.

For a teacher to make meaningful interventions based on student test data, there must be adequate time not only to interpret the data but also to thoughtfully plan how gaps in understanding can be addressed. Without this space, the value of the testing itself is diminished as both the teacher and the student have entered, what I like to call, data satiation.


Retrival Practice Leads to the Testing Effect

It’s important to remember that any form of retrieval practice can trigger the testing effect. A do now or review now task will do this. So will mini-whiteboard questioning or a well-designed homework task. All of these are significantly quicker to implement, place less demand on teachers, and are far less stressful for students than formal testing.

Interventions, then, do not need to be intensive, time-consuming sessions crafted from a detailed QLA. They can instead take the form of general insights the teacher has of pupils’ understandings of student misconceptions, which are then embedded into do now, review now, homework, or, if needed, full-reteaching sessions.

In my department, we’ve removed end-of-topic tests and now use a single assessment at the end of each term. Even that might be more than necessary, as my headteacher only expects two assessments per year (which I’m currently considering). The reason for this change is that more frequent assessments can become demanding, and the benefits they offer are relatively limited compared to what we do in the classroom. So, if the workload is burdensome, scrap it.

In the classroom, we are constantly assessing prior knowledge of our students within the actual lesson and adapting. Messy marking, do nows, mini-whiteboards, exam questions, etc. You cannot remove all the testing of the pupils if you are not constantly gleaning information from them within the lesson, otherwise, no testing effect occurs. 

Further Reading on the Testing Effect

Now, there is a wealth of articles and books available on retrieval practice from people with far more knowledge than I, so I won’t dwell on the broader evidence base. However, there are two studies I’ve read that I believe are particularly worth highlighting.

Sweller and van Gog (2015) highlight that the benefits of the testing effect tend to diminish as the complexity of the material being recalled increases. They emphasise the idea that as the number of interacting pieces of information within a task increases, so too does its complexity. 

In fact, Sweller and van Gog point out that this isn’t a new insight: research from as far back as a century ago had already observed that the testing effect weakens with more complex learning material. Yet, they both mention that this finding seems to have been largely forgotten or overlooked in more recent discussions around retrieval practice.

This means, when planning for retrieval practice, ensure that the task is not too complex. The complicated misconceptions and elaborate understanding can be tackled using other means, such as a show call under the visualiser, practising exam questions or reteaching sessions. Asking pupils to recall a method for a practical, for example, risks negatively impacting the testing effect due to its high level of complexity. Instead, retrieval practice should be limited to quick recall unless the teacher is there to support them along the way.

A meta-analysis by Yang et al. (2021) supports the importance of exposure and repetition. Their study found that repeated encounters with the same information can significantly improve outcomes, suggesting that the testing effect is strengthened not just by recall, but by revisiting material in a structured way. 

In practice, this means spaced retrieval. Again, something many teachers are aware of, but it is important to always stay up to date with the newest research, even if this confirms what we may already know.


The testing effect is undeniably powerful. However, there is often an overreliance on it, with the assumption that frequent assessments alone will lead to better retention. In reality, this approach can result in generating data for its own sake, rather than supporting meaningful learning. Effective use of the testing effect requires time for teachers to address misconceptions and support students in overcoming forgetting. By spacing out rigorous summative assessments to just a few times a year, teachers can more effectively plan spaced retrieval across multiple lessons. Importantly, this does not mean that infrequent summative assessments should be the sole source of information about student learning. Teachers must continuously assess students formatively within lessons, adapting their teaching to meet the needs of the learners in front of them.

Fewer assessments. Smarter use of data. Responsive teaching. Better outcomes, less burnout.

1 thought on “More Testing ≠ The Testing Effect: What is the Testing Effect?”

  1. […] However, these are not set in stone for the whole year. A second assessment at Easter allows us to make small adjustments and the small number of students who have progressed are moved accordingly. This ensures that the sets remain accurate. This is then repeated again in the summer ready for the sets at the start of the next academic year. I wrote about how our assessment is organised in a previous post, which you can read here. […]

    Like

Leave a comment