Karpicke, J.D., and Roediger, H.L., III (2008). The Critical Importance of Retrieval for Learning. Science, 319, 966-968
This year I’ve been teaching a couple of Instructional Technologies classes at my local university, and one thing I’ve noticed is that beginning instructional designers tend to think that the content presentation portions of their designs are where all the learning takes place, while the tests are neutral events that merely assess what was learned.
In this fascinating report, published in the February 15, 2008 issue of the journal Science, Karpicke and Roediger show that testing is not a neutral event. In fact, for improving long-term retrieval, testing can be even more important than studying.
They divided learners into 4 groups. Each group was given a list of 40 Swahili vocabulary words, along with their English equivalents. Then, in a series of alternating sessions, learners in each group were asked to first study the vocabulary list, and then were tested on it.
The first group, designated ST, studied all 40 words during each study session and was tested on all 40 during each test session.
In the second group, designated SnT, learners dropped words from their subsequent study sessions after they were correctly recalled during a test session. However, they continued to be tested on all 40 words at every test session.
The third group, designated STn, studied all 40 words at every study session, but were tested only on those words not yet recalled in a previous test session.
In the final group, designated SnTn, words were dropped from both study sessions and subsequent test sessions as soon as they were recalled correctly in a test session.
After 4 {study session, test session} pairs, the ST group had experienced 160 study events and 160 test events. The SnT group experienced, on average, 76.8 study events and 160 test events. The STn group experienced 160 study events and, on average, 83 test events. And the SnTn group experienced, on average, 77.4 study events and 77.4 test events.
One week later, learners in each of the 4 groups were tested on the complete list of 40 words. Which group do you think performed best on this long-term recall test?
As the authors point out, one possibility is that, after a word has been correctly recalled, a learner’s study time might be better spent focusing on learning the remaining words in the list. Alternatively, perhaps the learner will forget the word if it’s dropped from the study session. If testing is a neutral event, then dropping words from test sessions should have no important impact.
What they found was that the two groups that were consistently tested on all 40 words showed a greater than 150% improvement in long-term recall, compared to the two groups who dropped terms from test sessions.
At the same time, there was practically no difference in long-term recall between the two high-scoring groups (ST and SnT), even though the ST group underwent about 80 more study events than the SnT group. So all that extra studying led to no real gain and was essentially wasted effort.
Similarly, there was practically no difference in long-term recall between the two low-scoring groups (STn and SnTn), even though the STn group underwent about 80 additional study events compared to the SnTn group. Again, all that extra studying led to no practical gain.
The authors conclude: “Repeated retrieval practice enhanced long-term retention, whereas repeated studying produced essentially no benefit.” [p. 967]
So testing is not a neutral event.