Workshop Review: Theresa Gorman
By Rob Nisbet
On a chilly Saturday in Wildau, Theresa Gorman led a group of us through the mysteries of the theory and practise of language testing. Theresa began by refreshing and enhancing our understanding of the key factors one must consider when setting a test for any purpose, whether a placement test, progress test or final examination. The theory came from ‘Language Testing in Practice’ (OUP 1996) by Bachman and Palmer – no, not a 1970s Prog rock group or a famous TV detective duo, but rather the authors of a useful and detailed reference work for understanding the process of the design, planning and organisation of tests.
We started off by brainstorming what we thought ‘made a useful test’. After having a chance to compare each other’s ideas we set off through a discussion of Bachman and Palmer’s 6 recommended qualities for any test, qualities that any test designer should bear in mind.
First the easy ones (paraphrased for brevity):
Authenticity – how closely does the test task correspond to the student’s real life tasks?
Reliability – can we reliably compare results from different students, at different times?
Practicality – essentially, how practical or ‘do-able’ is the test given the time and resources available to administer it?
Now the trickier ones:
Impact – the impact the test has on its takers, teachers, the educational system and society at large, both positive and negative.
Construct Validity – how well does our interpretation of the scoring of the test match the students abilities in the language skills we claim to be measuring. I.e. is the test really measuring what it’s supposed to measure?
Interactiveness – how much and in what way does a test task engage the student’s individual characteristics? These are defined as:
- Topical knowledge, or knowledge of real-world subjects and situations
- Affective schemata, or ‘emotional’ response to a test topic based on the student’s personal views, biases and experiences
- Language ability
Interactiveness turned out to be the most difficult of all the 6 qualities for us to get our heads around. This was partly due to its relative complexity, but also we felt due to our own preconceptions about what ‘interactivity’ is, given the preponderance of the term in modern life. It was suggested that the term should be re-named for future discussions!
So what to do with this knowledge? Well, tests are normally made up of a number of tasks, and Bachman and Palmer state that each task plus the test as a whole should be evaluated on how well they embody these 6 qualities. In the design of the test there will inevitably be a series of judgements and perhaps compromises to be made, and so the test as a whole should be designed following these three principles:
- It is the overall usefulness of the test which is to be maximised, rather than the individual qualities that affect usefulness.
- The individual test qualities cannot be evaluated independently, but must be evaluated in terms of their combined effect on the overall usefulness of the test.
- Test usefulness and the appropriate balance among the different qualities cannot be prescribed in general, but must be determined for each specific testing situation.
Armed with this useful theoretical framework and refreshed by the lunch that Wildau had kindly arranged for us, Theresa invited us to try out our new skills by evaluating a number of test tasks using the criteria which we were now familiar with.
The practise scenario was a 90 minute written test, given at the end of a Business English course taken by students with a range of abilities from A2 to C1, and including such topics as formal letter writing, email and report writing and conversation in various scenarios.
We worked in pairs, each pair evaluating one of six tasks from the test and then presenting their findings to the group. Following some discussion we gave each task a grade as to how useful we felt it was. Key to this was understanding how each task fitted into the whole, so whilst we could evaluate quite easily how each task related to the course topics and real world situations of the students, we needed to see the other tasks to evaluate how well ours fitted into the overall jigsaw.
The activity revealed how the theoretical considerations came to life in the practical considerations of designing a test. For example, was the topical knowledge required for the task of attempting to understand and complete a short report, likely to have been obtained from the course? Would even the weaker students know enough of the vocabulary and the content of a ‘Staff Performance Review System’ to understand the report, suggest section titles and then write the final section?
We argued about how clearly written some of the task instructions were, how clearly task marking criteria were set out (should % weighting be given to each criteria?) and whether a writing task which simulated a conversation scenario was appropriate at all, and if so was it too open-ended and un-structured?
We also strayed into some interesting related topics, such as how useful is it to test formal letter writing such as complaints letters when we write fewer and fewer formal letters these days (conclusion: still relevant as these are some of the most difficult letters to write) or the importance of testing ‘less formal’ English. This is the kind of English located between formal and personal, which we often use when corresponding with colleagues, collaborators or other business contacts via email, using a friendly and personal tone mixed with more formal vocabulary and work-related content.
So how was the seminar? It was authentic as we were put in the situation of evaluating and suggesting changes to tests in realistic situations. It had a positive impact as we all went away with a better understanding of what was involved in testing. And most important of all it was interactive in all senses of the word! Many thanks to Theresa and the guys at Wildau for a fun and stimulating day.