Some Advice To Handle AI in the Classroom this Fall
Should you focus on academic integrity or embrace ChatGPT this fall? Don’t believe the hype because you can actually do both, just by doing what you’ve always done.
Everyone seems worried that classes will be flooded with AI writing this fall. Many professors’ first instinct will be to ban AI which means policing violations with online detectors or their own intuition. But despite what you may have read, there is significant debate about whether AI detectors work—or at least whether they are reliable or equitable (for the record, I don’t think they are). Aside from the fact that they do make false accusations, it is important to understand why such accusations are also fundamentally impossible to prove. University lawyers are quickly discovering that this is a real problem.
So what do we do? Many seem to think that this means either giving up on academic integrity or reinventing the wheel when it comes to assessment, but that’s a false choice. What no one seems to realize is that the fears around AI plagiarism are somewhat of a red herring. As I’ll argue here, if you take a traditional approach focused on essay writing and in-class testing—much as we’ve always done in History—students will have a very hard time using AI to write their assignments for them. And importantly, this approach will also leave room for students to use it to do more legitimate things like help them edit their writing, understand concepts, and think through problems. You can, in other words, do what you’ve always done and still embrace AI.
Why You Won’t Know AI Writing When You See It (Even If You Think You Will)
To begin, though, we need to understand why trying to detect AI writing and prosecute offenders is a losing game. Large Language Models (LLMs) are designed to write and they form sentences and paragraphs by trying to predict the next word based on a massive amount of training data. I know it sounds logical to conclude from this that an LLM should always answer the same question in a similar way, but this is not how these models actually work.
Under the hood, LLMs like ChatGPT have a variety of settings which introduce randomness into the predictive process and actually penalize the model for repeating itself. When these settings are minimized, they indeed produce similar answers to similar questions. But by adjusting them in just the right way, as they are on all commercial models like ChatGPT, LLMs choose one of the most probable words to continue the answer rather than the most probable word. This introduces a level of variation that has a minimal effect on coherence but a profound, cascading effect on the diversity of the syntax and content of the answers. In effect, it ensures that AI writing is rarely ever exactly the same because the combinations of somewhat probable words are unimaginably large. In fact, they increase exponentially towards infinity as each new word is added to the AI’s response.
This is why you typically won’t be able to prove that something is AI written: it is almost never the same twice. Now this does not mean that LLMs are not derivative, repetitive, and stylized. They are. LLM writing is generally bland and formulaic, largely because the underlying models have been fine-tuned to answer questions in ways that humans find desirable (the underlying base models do not do this nearly as well). This is why if you ask ChatGPT to identify and explain the historical significance of Lord Durham’s report, it will almost always start with something like: “Lord Durham's Report, officially known as "Report on the Affairs of British North America," was published in 1839…” It will also usually end with something like “In conclusion, Lord Durham's Report is seen as a landmark document that had profound and lasting effects on Canadian governance and identity...” That is pretty bland, but there is an obvious problem with assuming that answers structured like this are automatically AI generated: the AI was, in fact, trained to write like this because it is what humans ideally want to see when they are asked a question about the importance of the Durham Report. Go back in time a year: is this not how you would want the answer to begin if generative AI did not exist? There is a real risk that if we start assuming everyone is using AI writing, we will begin to see it everywhere as one Texas professor discovered last spring.
The Perils of AI Detection Software
Enter the AI detectors. Because LLM writing is inherently unpredictable, so-called AI detectors (which are usually built on-top of LLMs themselves) are typically designed to look for these types of stylized formulations. They also look for an absence of spelling and grammatical errors and the presence of specific patterns of logical reasoning all of which are said to be typical of artificial intelligence. The problem is, though, that unlike conventional plagiarism detectors, these tools are rarely able to backup an accusation with hard evidence, only an assessment of the likelihood that a piece of writing was AI generated.
If that sounds “good enough” to you, think about it for a second: AI detectors are LLMs themselves that have been tasked with identifying AI generated writing. This means that all the normal arguments that are mounted against generative AI would also apply here. First, remember that we cannot explain how an LLM reaches any conclusion, and that is true when a detector says a text was “highly likely” to be AI written. Studies have also shown that the writing of ESL students and visible minorities is more likely to be flagged as AI generated by AIs. Remember too that LLMs are prone to hallucination, meaning sometimes they just make things up. This is exactly why OpenAI, which released one of the first AI detectors last winter, quietly withdrew it a few weeks ago: it was inaccurate.
Now here is an obvious caveat. The fine-tuning process also means that LLMs will sometimes say things like: “As a large language model developed by OpenAI, while I am unqualified to assess the significance of Lord Durham’s report…” So of course, if you see this in a student’s paper, to my mind at least, this would constitute pretty good evidence of AI use. But here’s the thing: LLMs only rarely include such caveats and when they do, I think most students would just delete them before submission.
Don’t Reinvent the Wheel to Thwart AI
So does this mean that you should either give up or completely re-evaluate all your assessment methods? The simple answer is “no” and “not necessarily”.
The truth is I think most professors are worried about AI plagiarism because they haven’t really become familiar with what generative AI can and cannot do yet. If you’re worried about AI plagiarism but don’t yet have a paid OpenAI account, as a first step I would strongly encourage you to fork over the $20 for at least one month and try out GPT-4. This is because the free version, GPT-3.5, is not what students will be using: it will underwhelm you with its tendency to make-up sources, hallucinate, and generally betray itself as a fallible AI at every opportunity. GPT-4 is far from perfect, but it is a wholly different animal than its predecessor.
The biggest reason to try GPT-4, though, is that it will help you face your worst fears. Cut and paste your assignment directions into the chatbot and see what you get back.
Don’t Use These Types of Assignments
After many months of working with both GPT-4 and Llama models here are the types of assignments at which I think Generative AI will excel.
First, ChatGPT is best at writing short answers of less than 750 words. It is extremely good at identify and explain type of questions, probably scoring around 90%-100%– even on very obscure things. So if you are still doing online exams or take-home tests, these are exactly the types of questions that ChatGPT will be able to answer in an undetectable way.
Next up are book reviews and primary document analyses. These staples of the history syllabus are relatively short assignments and even if they are longer than 1000 words are pretty easy to get ChatGPT to do them because of their formulaic nature. Even if a book is relatively new, GPT-4 will still do a pretty decent job at writing a critical review. Same with primary documents: try uploading a document to ChatGPT and ask it some questions about it. And if you were thinking of thwarting AI by using handwritten documents, take a look at Transkribus which offers a limited, free version on its website. It is even better at transcribing handwritten documents now (thanks to AI) so keep that in mind. This doesn’t mean that you need to abandon book review and document analyses, though. I would suggest that they need to be significantly longer: starting at around 2,500 words students will have a hard time assembling a coherent paper from a variety of ChatGPT responses. Or just consider incorporating them into in-class testing (see below).
Third, discussion board participation for grades. Again, this typically requires students to write short paragraphs in response to a question as well as student responses. To generate a discussion board answer with ChatGPT, all a student needs to do is cut and paste the conversation into the chatbot and ask it to respond. Remember too that you can tell ChatGPT to adopt a persona: because ChatGPT was trained on Twitter (X?), Reddit and Facebook posts, it can write informally when asked to do so.
Third, reflection papers are an equally easy target. Although ChatGPT cannot write more than around 750 words at a time, it can be prompted to expand its answers by cutting and pasting paragraphs back into the chatbot. Given the nature of reflective writing, versus essays, this is relatively straight forward and quick to do. Again, here is where I would encourage you to try GPT-4 to see for yourself: you can ask it to adopt any perspective, attitude, or identity or write for a specific purpose and audience.
These Assignments are Generally more AI Resistant (but also good pedagogy)
The ironic thing is that if you are worried about AI, a back to basics approach will probably work best. If you choose a longer written assignment and apply rigorous standards, any students looking to cheat with AI will most likely fail themselves due to a lack of coherence, citations, or length, all without having to meet the impossible burden of proving that AI was using in the writing process.
This is not because AI is not good at writing, but because it won’t (at present) produce more than 750-900 words at a time. You can get it to do more by working in stages and with some elaborate prompting, but it would be a real chore to try and get it to produce a full, coherent research paper of 3-5,000 words. Indeed crafting such a paper with AI would, in effect, require students to go through all the same pedagogical steps and processes as writing their own paper (hence why it may be worth considering letting them use AI anyway). But the reality is that when students try to do this in order to cheat, they will almost certainly end up with a short, disjointed, poorly cited paper. It will, in effect, fail on its own merits without the impossible burden of proving something was AI written.
In addition to length, sourcing (and citation style) and the use of direct quotes are also big stumbling blocks for AI, but not for the reason you might think. GPT-4 is pretty good at coming up with real sources (unlike its predecessor), but it is not good at quoting them directly. When prompted to so, GPT-4 will sometimes get it verbatim but more often it paraphrases a text within quotation marks. Misquoting a source is a problem in academic work and the intentional use of made-up quotations is again grounds for failure regardless of AI.
In terms of citations, GPT-4 will also tend to get the general place in a book or article correct, but will rarely get the exact page right. So if you are using a citation style that does not require page numbers, I would start to require them (or websites). Again, incorrect citations are always a problem, especially if they are pervasive.
Obviously closed book, in-class tests and exams are an excellent way of eliminating the AI problem too. But they also eliminate most forms of cheating. Again, a traditional approach here solves a lot of problems.
Don’t Panic and Be Open to Innovation
The point is: don’t panic. Students are going to use AI and it will be mostly impossible to prove. But instead of giving up or reinventing the wheel, choose traditional assignments that teach students the skills they need to learn as historians, but that will also be difficult to accomplish with AI. No, it won’t be impossible to produce a 5,000 word AI research paper and some students may see that as a challenge. But doing so and earning a good grade will take more work and more domain-specific knowledge than writing a B paper from scratch.
As worrying as AI might be for many of us, I think we should be wary of trying to avoid it altogether. Afterall, the students we are teaching right now will enter a workforce where they will almost certainly be expected to use AI in their research and writing. Pretending it doesn’t exist is not going to be a winning strategy in the long run.
So as you prepare for the fall, consider the possibility too that there may be some legitimate uses for AI in your classroom. Students that struggle with writing, including ESL students, can use generative AI as an editing tool. Because it is largely predictive, if you paste in an original paragraph and ask it to address grammatical and stylistic issues in the text, it will do so without changing the substance of the ideas in the text. When prompted, it can also explain to the student why it made those changes. Of course, it can do the same thing with readings, summarizing and explaining difficult concepts and filling in any knowledge gaps. It can really be an effective and helpful tool.
Whatever you choose to do, though, be crystal clear about how students can and cannot use generative AI for in your classroom. There is going to be a lot of variation out there this fall. And in all fairness to our students, remember that they will have a harder time navigating this confusing and paradoxical environment that you will. They also have a lot more at stake in world where the rules are rapidly but unevenly changing.