19th August 2020

Blog | Reflections on exam results 2020: an avoidable crisis?

I know that it is my job to be evidenced based and data driven but I refuse to believe that March wasn’t at least five years ago now.

For those working in the country’s schools and colleges it may seem even longer. The start of lockdown, switching to remote learning, staying open for some, trying to ensure others were fed, wondering where the funding for it was coming from, waiting for the delivery van to turn up at the school gates with much needed laptops, trying to assess grades for students in GCSEs, A levels, BTECs and more, and when they were finished with that, figuring out how they were going to get everybody back into school in September.

Before we get there, we’ve got to complete the annual results season. These days in August are the traditional end of season finale. A culmination of years of schooling for hundreds of thousands of pupils and of the work of a similar number who have taught them, or supported them, through school.

But this is unlikely to be a finale that anybody looks back on with any degree of affection.

It started in March when the government announced that the summer’s exams had been cancelled and charged the qualifications regulator Ofqual with coming up with an approach to awarding qualifications to those that would have sat them. In his direction letter at the end of March, the Secretary of State said that: [1]

“students should be issued with calculated results based on their exam centres’ judgements of their ability in the relevant subjects, supplemented by a range of other evidence…In order to mitigate the risk to standards as far as possible, the approach should be standardised across centres…. Ofqual should ensure, as far as is possible, that qualification standards are maintained and the distribution of grades follows a similar profile to that in previous years.”

The approach used by Ofqual, now painfully familiar to all, was based on collating Centre Assessed Grades for all students and also asking exam centres to rank all students within those grades. This was required for the second part of the process where, in order to maintain a similar distribution of grades this year to previous years, grades were then adjusted to reflect the level and distribution of results in the exam centre in previous years.

We have been clear throughout that there is no perfect solution on offer here and it is not possible to fully replicate what would have happened if students had sat examinations. Teacher estimated grades are likely to be, on average, higher since they are largely trying to determine what a pupil is capable of. At an individual level it’s impossible to model some of the random factors that might affect performance on the day. Some might be positive – exams where the ‘right’ questions came up – but others, probably more prevalent, are negative – feeling ill, an argument at home, arriving at the last minute because the bus was late. How do you assign those in your estimated grades?

But we did highlight concerns we had with the proposed approach.

The first was that it did not address any within-school bias in reference to grading and underrepresented groups. The adjustments made as part of the process were to the overall performance of the school with no changes to the relative performance of pupils within them (i.e. no changes to ranks). Our second concern was that the proposal required teachers to generate grades for individual students from scratch, without any statistical-based starting point despite having to then conform to a statistical profile at the next stage. In short, we thought the ordering of teacher judgement followed by a statistical model was the wrong way round.

We recommended that, as a starting point, schools should be shown what their ranked order would look like if pupils followed national patterns from recent years based on prior attainment and characteristics and the performance of the school. Using this model as a starting point, teachers would then apply professional judgement as to how rankings within the school differ based on internal assessments, classwork and homework.

One key benefit is that it would be possible to build in validation checks which could highlight where the decisions of schools have disproportionately moved the ranking of particular groups up or down or indeed the whole cohort up or down and schools would then need to justify those changes had they had a material impact on grades.

While we were clear that this could result in a different national distribution of grades than in previous years – and in the spirit of openness we probably underestimated the effect – we considered that, on balance, fairness to pupils was a more important factor than a neat and consistent grade distribution.

A question we have asked ourselves over the past week is would this have been a better approach than the one adopted by Ofqual? Perhaps unsurprisingly we think yes. But it also seems unlikely that we would have come through A levels and GCSEs without some criticism of our approach, without people saying we had allowed grade inflation, or without cases where it had not worked well coming to light.

And it is why any statistical model needs to be grounded and tested in the real world for the purpose for which they are intended. If I do analysis at a national level of school funding allocations for EPI and I inadvertently get things wrong for two hundred schools and assign their money elsewhere, it is embarrassing, it is a failure of quality assurance, but it probably does not affect my results and it is highly unlikely to directly affect anyone in schools. If I did it within DfE when dealing with actual allocations to individual schools then we would have a funding crisis and calls for ministers to resign.

So last week we had ministers celebrating the fact that A level results had only increased by a couple of percentage points, that standards had been maintained. But this was not a model that needed to work at a national level, it needed to work for hundreds of thousands of individuals in thousands of schools. It does not matter if your total number of grades is correct if a large number of them have been assigned to the ‘wrong’ candidates. There’s no way of knowing if you have achieved that without going out and testing the results back in schools and colleges. Unfortunately, without verifying results over the summer, the testing stage and publication stage were the same thing.

There was also no attempt to recognise uncertainty in the outcomes in the awarded grades. There was uncertainty in the ranking of students, there was uncertainty in the baseline performance of the school (based as it is on a ‘sample’ of students who attended in the last few years), so there is uncertainty in the outputs of the model. Yet for the student it resulted in a single grade. Could some of the controversy have been eased by presenting results with a confidence interval, if not explicitly as that then as some kind of ‘band’ of results?

Ultimately moving to Centre Assessed Grades at GCSE and A level seems to have been the most pragmatic and fairest approach that was available so late in the process. The key aim of Ofqual’s algorithm for adjusting grades was to maintain confidence in this year’s qualifications, unfortunately the events of the last week show that that confidence has been lost and we need to ensure young people are given the opportunity to progress in education or into employment.

We still have concerns about how grades may have been assigned within schools and the potential for there to be bias in those judgements against particular groups. We expect that Ofqual will publish updated equalities analysis with these changes and these will need careful scrutiny – particularly in relation to  GCSE results. Equalities issues were always likely to be far more prevalent at GCSE than they are at A level simply because the A level cohort is restricted to pupils who have already done well at GCSE – so we plan to do much more analysis of our own as the detailed pupil level data becomes available through the DfE’s National Pupil Database.

Overall, we believe that throughout this process too much emphasis has been placed on maintaining standards over time. While Monday’s announcement tips the scales the other way, it is nevertheless a fairer outcome for young people who might otherwise have missed out on courses in schools, colleges and universities.

But it does now create new challenges throughout the system, not least in higher education where offers have already been fulfilled or withdrawn based on the results published last week. Some universities and colleges have suggested that they will allow those originally offered places this year to take them up or at least defer a place until next year. The government has supported that by lifting the cap that they have placed on student numbers in higher education this year but they do need to be mindful of protecting those institutions that are less in demand and may struggle to recruit in sufficient numbers and suffer financially should the more sought after universities expand rapidly. Similarly, the government may need to consider flexibility in the funding system to ensure that schools and colleges can offer places to students and enable them to move courses later on if necessary.

We urgently need a fully independent review of what happened this year so that the errors made are clearly understood and so that the right lessons are learned for the future. Ofqual have already consulted on exam arrangements for 2021, but apart from some relative minor changes to a limited number of courses this is largely predicated on being ‘business as usual’. Students who are just about to begin Year 11 and Year 13 have already been affected by the pandemic and many will be competing for university places next year against candidates who have just received a Centre Assessed Grade.

We urge Ofqual and the government to develop a credible contingency plan in case the COVID-19 pandemic is still affecting schools next spring and they need to do that now.

We simply cannot have a repeat of this year’s debacle.