I have just walked up the hill to the University of Bath listening to my radio. On the Today program there was an interview with the head of Ofqual explaining the new GCSE (or whatever they are) exams. Instead of 8 grades as we presently have they will have 9 and so there will be more “differentiation” between candidates. I am due to write ½ an exam paper today for my bit of Quantum and Atomic Physics (PH20013/60) and that got me thinking.
Many moons ago Universities judged students by asking them questions in one-two-one interviews, or vivas as we call them. But these are time consuming and open to the whim of the examiner. So written exams were introduced. We still have them. I have sat countless exam papers during my school and undergraduate life. Apart from my very last exam as an UG, I always did well, hence why I’m now lecturer. I have to say I always enjoyed them, the 1-2-1 combat of pitting myself against the cunning and sly lecturer.
But exams are not really about the top end showing off. They are about giving a grade to a student that can be entered into a spread-sheet so that upon graduation a student can be given the correct piece of paper which tells the world something about them. But what does it tell? What are these elusive marks that is at the core of the game we play at University? These are grand ideas and questions that will have little bearing on how I in practice compile my exam paper.
Here’s roughly how I do it:
(1) Read through the lecture notes and the unit outline to get a feel for what I said the students should be able to do.
(2) Split the total number of marks roughly equally between the main section of the course.
(3) Write out questions for 40 % of my marks that allow conscientious and studious students (who will know more about the subject than those who didn’t sit through the lectures) but yet who are really not very bright, to pass. These will be typically “state the principle of…” style questions showing a bit of knowledge. And perhaps a few repeat questions already covered in problem sheets.
(4) Write a few more questions for say 20 % on number crunching.
(5) The next 20 % go on more extended questions on the physics of the course, which gets at understanding and may well include unseen questions.
(6) Finally the last 20 % bit. These will include physics from outside the course, typically from an earlier year, to see if the candidate can see the bigger picture and how things fit together.
OK, but the candidate has a set time for this and in this time we try and ensure we ask questions on as much of the course as possible, and we try to see if the candidate can think. This is hard.
I look through past papers, look at the course work problems, in books and on line for good ideas for questions. I bung them together. I then sit the exam myself and if I can’t physically write out all the answers in the half the allotted time, the exam is too long. I then revise the questions and, if I am feeling really diligent, I sit the exam again. Phew.
Marking? Well, I write out mark scheme for each of my questions. I divvy the marks up based on the length of the question and the degree of toughness, which usually reflects the number of “physics” steps required to get the answers.
I’m finished! I had in my exam and answers to the University and relax. How do we ensure my questions are fair? Peer review. One of my colleagues reads my paper and makes comment. Then we send it off to an external colleague and they also make comments. Once all these QA process have been ticked off I have it, a document that will tell me if you are worth 58 %, or 61 % or perhaps 36 % or even 97 % (well done). But wait, my exam is out of 60 marks, so the smallest % quantum is 1. 7%. But wait, the University explicitly bans certain % totals, the examiner is asked to review certain percentages and think again. But only for these cases, and with the understanding (or at least my understanding) that I’ll just bump the % up out of these forbidden zones. But we don’t do this for all, so the % scale is not only quantised but is also non-linear. These are the obvious sources of uncertainty on a final mark, what about the over sources?
What if a question or exam paper is just too easy or too hard. What then? Now we get into the world of normalization, mark scaling and other schnanagins so that the average mark looks OK. Last year I set an exciting and interesting exam. It was excellent. It really tested the understanding of the students and allowed many to shine and really show they are excellent physicists and scientists with broad grasp of the subject and a good understanding of how it all fits together. Well done me (and them).
However, what also happened was that the middling students, who are contentious and worked hard were shafted. I had slightly changed the rules on them. I had somehow abandoned my scheme outlined above and tried to get at understanding a bit more. Or it could have been just a slightly badly worded questions, or slightly too much in the paper, or a bit that I taught badly and the students just didn’t get. The upshot was a lot of post-processing to make the outcome fair for the students, fair that is in terms of the overall game that we play at University.
There is a lot of room for error, there is a lot of subjectivity, there is a lot of cross-correlation, there is a lot of reliance on the setter (me!) knowing the difference between a 56 % student and a 58 % student. These grades are important, but perhaps should come with a standard deviation.
Will the new GCSE exams, with their 9 grades, really have a standard deviation of less than 2 %?