How and Why to Make AI-resistant Assessments


 

I have spent most of the summer looking into, exploring and testing AI in relation to research and writing.

I have easily spent several full working weeks working on making my own assessments for my modules on our BA programmes AI-resistant.

I wanted to say ‘AI-proof’, but that is not possible.

I find myself fretting about the number of academics who are still unlikely to change their assessments this year.

Yet, this is the year I think we will see AI use and abuse peak among students. So I have tried to make significant changes to my approach to assessments and grading (despite the huge drag factor of not being allowed to change the form or nature of the assessments significantly - a process which can take up to a year to to because of our university regulations).

Here is a brief rundown of some of the things I have established:

  • All standard forms of written and spoken assessment can and will be faked by abuse of AI. All of them.

  • Presentations and essays can be generated easily, within seconds, and if you use the right services, these will neither be generic nor easy to spot. Nor will it be possible to prove that they were AI-generated. They will be well-referenced and nuanced - and the references will be real. The submission of generic ChatGPT slop will only be done by those who have not yet heard of all of the other AI services that are out there for students and researchers.

  • To those (better) platforms, you can now upload such things as module guidance documents, samples of work, or things you would like to plunder or copy, in form or style, and AI will do the work for you within the terms and instructions provided by the module handbooks and with one eye on the assessment criteria documents that you have uploaded to those platforms.

  • Therefore, any academic who doesn’t change their student assessments in light of this certainty is teaching their students nothing other than that they’d be a mug not to use AI, as AI will get them higher grades for a lot less work. People who don’t change their assessment this year are teaching their students that cheating is both easy and effectively mandatory. It’ll be the normalisation of cheating.


After loads and loads of work, here are some considerations that I have tried to factor into my assessment for this year:

  1. Make the student want to do this work. The best way to do this is to make it in some way about them. Everyone wants to talk about themselves, right? There is nothing more interesting. So, for me, rather than asking students to, for example, ‘use x theory to analyse y media text or film’, I am now asking them to choose a film or programme that mattered to them in some way, a text whose meaning or value for them has changed as a consequence of something on the module. This, I hope, will inspire all but the coldest, most closed down, disengaged students to want to open up intellectually about their relationship to something they care about.

  1. Tie the assessment somehow to something in lectures and seminars. In my case, I have asked students to reference conversations or points made in classes. They will be rewarded for this. This, I think, will also encourage attendance at lectures and seminars, and engagement while in the room. If it doesn’t, then students are missing out on potential points - points that could simply accrue from nothing more than paying attention while sitting in a room. (This aspect of the assessments will be hard to fake: even though my module is large with multiple seminar leaders, it will be easy for seminar tutors to share emails saying ‘hey, did this exchange ever happen in one of your classes?’ without compromising the mandatory but deeply flawed principle of anonymous marking.)

  2. Focus on process. I am requiring the submission of supplementary appendices documenting process and avenues that led to dead-ends. (These could of course be faked, but doing so would require probably as much or more effort than simply including real process notes and evidence of issues wrestled with.)

  3. Focus on complexity and irresolution. I have actually changed the entire structure of my lectures to focus on complexity and irresolution - the lack of crisp, clean, singular meanings in and around media, and in one’s own head. AI doesn’t like irresolution. (Sure, it can be simulated, but to ask AI to do so requires working out prompts that themselves demonstrate some kind of nascent awareness of complexity and irresolution, which is the whole point.)

  4. Encourage stylistic variety. On my module, we are reading one work (by Ben Highmore) that argues for the value of stylistic variety, a work that itself employs multiple voices. Even if this is ‘faked’ using AI, it will take effort and a rationale on the part of the student for which voices to use in which parts of the work, and why.

  5. Reward evidence of conceptual messiness, journey and struggle. I’m sure AI could help with this, but I imagine the work involved in getting it to produce anything worthwhile would be more than just writing down messiness, process and conceptual wrestling.

  6. Say directly that you will penalise overly polished or generic AI slop. My hope is that this simple warning will be enough to scare students into not presenting overly polished generic AI slop.

  7. Finally, and perhaps in a way that those teaching other subjects will not be in a position to replicate, I am lucky enough to be teaching a module on which the direct theoretical, ethical and political reflection on technology is directly relevant. So, I have framed the entire ‘assessment week’ in terms of philosophies and theories of technology, and also in terms of questions about the point and purpose of education within society. Not everyone will be able to do this. But I am, and many in my field (media, communication, film, cultural studies) will be able to, and so they should. (I blogged about this on Substack, here.)

In working on and with all of this, I have repeatedly consulted AI services. There is no sense in which using AI has reduced my workload. It has massively increased it. But I think the end result is - or will be - worth it. The (almost) final version of the (now many-page-long) assessment guidelines are not AI-proof, but ‘AI-informed’ and AI-abuse-resistant. (I also blogged about my new guidelines, here.)

Despite some disheartening early moments in the process, in which AI told me that my assessment structure was really great, only to follow up with the question ‘would you like me to draft a sample piece of work using these guidelines?’, it has recently told me the following: I asked it how easy it would be for an unscrupulous student to abuse AI in order to complete the assessments without really engaging with them authentically, and it said:

It is not easy for an unscrupulous student to cheat and have AI generate all or most of the assessment successfully in this module. The structure, marking rubrics, process-record requirements, and transparency rules are crafted specifically to detect and disadvantage generic, non-reflective, or impersonally generated work. Only with considerable deception (both in falsifying personal learning journeys and documentation) could a student attempt to cheat—and even then, the chance of producing a high-scoring, undetectable assignment is very low due to the rubric’s focus on personal engagement and originality.

It has been a huge amount of work. I have transformed my assessment guidelines from a few hundred words into thousands. In fact, I have rewritten the entire module, to focus on process, irresolution, polysemy, multivalence, etc. But I think that this has been worthwhile and in fact that it is arguably absolutely necessary this year.

I hope that (as frustrating as not being able to completely overhaul the entire assessment structure has been) more good will come of this in the coming semester than bad. For instance:

  1. Having now tied assessment to classroom events, I hope this will encourage better attendance.

  2. I also hope, by the same token, it will encourage better engagement while in the classroom.

  3. I have also now formulated explicit rewards AI use and non-use and explicit punishments for AI abuse.

  4. Crucially, I have also formulated grounds by which to deem something to be AI-abuse, even in the face of an apparent ‘lack of evidence’. This is much needed in an era when we are constantly told that you can’t prove AI use. In my instructions, I simply say that anything that looks too much like AI - especially in the context of an overall work that looks too much like AI - will be treated as sub-standard and unsatisfactory. This is not actually too much of a change. It is pretty much a translation or update of a piece of guidance I have always used, which is: ‘if you write about something “in general” then you are not writing about anything in particular, and this cannot get you more than 50%’. In the current case, this is simply a translation: if you are writing in a way that is clearly ‘too AI’, then that is not going to get you more than 49%. This is a clear stylistic and academic quality guideline, one that is not actually much of a change from a stipulation I have always made. No slop!

Comments