Lawrence J. Zwier, testing expert and series advisor for Q: Skills for Success, Second Edition, looks at some strategies for measuring student progress in language learning.
Language teachers often discuss the difficulty of measuring how well their students are doing. A typical comment goes something like, “When you’re testing in a history class (or biology, or law, etc.) it’s easy. They either remember the material or they don’t.” This oversimplifies the situation in “content classes,” where analysis might be just as highly valued as memory, but the frustrated ESL/EFL teacher has a point. Teaching in a language class does not aim to convey a body of knowledge but to develop skills—and skill development is notoriously hard to assess. It’s even harder when the skills are meant for use outside the language classroom, but the only venue in which you can measure IS the language classroom.
However, all is not lost. There are many good, solid principles to apply in measuring how your students are doing. What’s more, they don’t require the assistance of test-construction experts or the statistical skills of a psychometrician. The average ESL/EFL teacher can do the measurement and interpret the results in ways that will have immediate benefits for their students.
The idea that measurement benefits students can get lost in discussions of measuring progress. So often, we think of measurement as serving the educational institution (which needs to promote people, issue grades, and so on) or the teacher (who needs to know how well a certain objective is being met). But it’s an established principle of memory science that frequent measurement (or, more familiarly, testing) is one of the best aids in learning. Researchers at Kent State University tested the recall of several pairs of English-Lithuanian word pairs—that is, they studied how well subjects remembered not just the Lithuanian or English words but also the pairing of those words across languages. The main variable was how often a given subject was tested on the associations of the pairs. The researchers found a clear correlation between the number of “retrievals”—the number of times a participant was required to recall the pairs on tests—and the long-term memory of the pairs.
You may be sensing a dichotomy you’ve noticed before, that of formative vs. summative evaluation. Summative evaluation comes after a period of learning and is meant to see how much learning took place. Think final exams, midterms, end-of-unit tests, and so on. Formative evaluation occurs during the period of learning and is a part of that learning. The test is a teaching tool. Each type of testing has its place. There’s nothing wrong with summative testing, and the educational system would lose structure without any of it. Many students would also lose motivation, because—love them or hate them—big tests have a way of making people work. But the Kent State research we mentioned clearly shows that formative testing is not just some touchy-feely distraction. Measuring your students often is instructive—both for you and for them. You can easily find examples of formative-assessment activities through a Web search; a good link to start out with is https://wvde.state.wv.us/teach21/ExamplesofFormativeAssessment.html.
Here is a brief look at some important principles in measuring the progress of ESL/EFL students.
Use many small measures, not just a few big ones. This is just common sense. If you rely on two or three measures during the course of a semester, your measurements are much more vulnerable to factors that skew the results—the students’ health, the students’ moods, problems with classroom technology, your own fallibility in writing test items, and so on. If your program requires some big tests, so be it. Make every effort to add other little tests/quizzes along the way as well—and have them influence the students’ grades in a significant way. Also, share the results of these measurements with your students. An especially effective technique is to make these smaller tests and their grading echo what happens in the larger tests. That way, the frequent tests offer not only periodic retrieval of language points but also practice with the format of the larger test.
Don’t administer what you can’t evaluate. You can’t give frequent assessments if it takes you five hours to grade each one. Most of your questions in measurements should be discrete-point items. This means that the questions have clearly identifiable correct answers that are limited in scope. Yes, I love seeing my students produce essays or get in front of class to give 5-minute presentations. However, I can’t assess—or give meaningful feedback on—more than two or three such long-form outputs in a semester. Especially when I’m teaching reading or listening, I have to depend on multiple-choice questions, true/false, fill-in, matching, and all those other limited-output formats. What you may have a harder time believing is that short-form questions are appropriate in writing and speaking classes as well. A writing student can demonstrate many skills in two or three sentences. A speaking student can demonstrate a lot by speaking for 45 or 60 seconds—as they do on the Internet-based TOEFL.
Avoid unnecessary interference from other skills. This dovetails with the previous point. If I am trying to measure reading comprehension—a very abstruse target, if you think about it—I don’t want the student’s weaknesses in writing, speaking, or even listening to get in the way. I want to ask a comprehension question that can tell me something about the student even if the student cannot compose a good sentence, articulate a spoken answer, or comprehend a long, spoken introduction. Give me something that requires minimal output to indicate the handling of input. Of course, there is no perfect question, nothing that can get me inside that student’s head and point out relevantly firing neurons, but a simply worded question that requires circling a letter, or writing T/F, or drawing a line is less likely to be muddied by other factors than one that requires complex answers. Gillian Brown and George Yule noted long ago how hard it is to assess actual listening comprehension. They pointed out that a listener’s “personal representation of the content of a text” is “inside the student’s head and not directly available for extraction and objective examination.” Simplify your attempts to examine it by avoiding obscurant factors.
Beware viral items. Digital technology makes test security harder every year. And don’t assume that student lore on the Internet concerns itself only with the big boys—the large, high-stakes tests. If you’ve handed out a piece of paper with a test question on it, there’s a decent chance that it now, somewhere, roams the pastures of the Web. If you were not terribly observant during the test, a student may have snapped a cell-phone picture of it. Even if you were hawkishly watching, many students, by the time they reach 18 or so, have prodigious memories and a tradition of getting together beforehand to divvy up the memorization of a test: “You take questions 1 – 3, Sam will take 4 -7, and I’ll take 8 -10.” My colleagues and I have adapted by just not re-using any old material in important measures of progress. For quick practices with nothing on the line, I might not care. However, each truly important measurement instrument is a new one—though perhaps based on an old one, with answers re-jigged and re-ordered. (Such reshuffling reduces the amount of writing I have to do.)
Be your own grumpy editor. I work frequently with the best item writers in the ESL/EFL field. One mark of a good item writer is that he/she assumes there’s something wrong with the first draft of anything. After you write a measurement item, let it sit for a few hours or a day. Then go back to it carrying a nice, sharp boxcutter. You’ll be surprised how often you discover that the question doesn’t really address what you want to assess, or that there are actually two possible correct answers in your set of multiple choice options, or that the reading/listening passage doesn’t clearly say whether a measurement statement is true or false. Good measurement is impossible without good items. It’s worth the effort to slash and rewrite.
References and Further Reading
Association for Psychological Science. “Testing improves memory: Study examines why memory is enhanced by repeated retrieval.” ScienceDaily. 16 June 2011. www.sciencedaily.com/releases/2011/06/110615171410.htm
Brown, Gillian, and George Yule. Teaching the Spoken Language: An Approach Based on the Analysis of Conversational English. Cambridge, UK: Cambridge University Press. 1983
West Virginia Department of Education, “Examples of Formative Assessment.” Accessed 31 October 2014, at https://wvde.state.wv.us/teach21/ExamplesofFormativeAssessment.html.