The integration of generative AI directly into word processors will give rise to a new, acceptable form of synthesized AI-human writing and universities are not prepared.
Thanks for posting this piece. We have been having conversations at my own institution about this very thing for the past year or so. Same concern: GAI will become ubiquitous once it is seamlessly integrated into word processors and other standard office software; we need to prepare for this change.
I also signed up for Copilot Pro and integrated it with my personal Microsoft 365 account. Here are my observations of the product right now:
1. Copilot for Microsoft 365 is still very much a beta product. It is somewhat stapled onto the existing suite of Office applications (currently just Word, Excel, PowerPoint, OneNote, and Outlook). It has varying functionality and efficacy depending on what program you’re using (Excel’s version of Copilot is quite limited, especially with natural language data analysis and no where near as sophisticated or useful as ChatGPT Plus).
The Copilot integrations are extremely buggy. The writer function in Word breaks repeatedly, throws up error messages, and sometimes just quits mid-sentence. But if you hammer away at it with prompts, you can get it to go and do much of what you say. Here’s a video of me writing a 2000+ word essay on the development of the welfare state in Canada in less than 15 minutes: https://youtu.be/htaX9qZR_e8?si=cpaehJbp8HtjgHh1
It’s impressive, but still limited. With that said, this is the worst this technology is going to get. It will improve with time.
2. The references are unreliable. In my tests writing AI-generated history essays, I struggled to get Copilot in Word to generate one real citation. I got some real historical quotes, but no accurate references. There were real journal titles, but fake articles, and other citation details. This continues to be a severe limitation in this kind of software and I don’t know how soon the capabilities of GAI in generating footnotes will improve. Microsoft still hasn’t enabled the ability to add comments to footnotes so I’m not holding my breath just yet.
3. If you watch the video of my welfare state essay, you’ll see much of what you describe in your article. I can insert paragraphs, ask for particular details to be added, and expand this into a more complex piece of work. As I worked on this, a colleague asked me a key question: Would you be able to generate such effective prompts if you didn’t already have prior knowledge of the subject matter. I should probably try writing something on a topic I know nothing about. That would, at the very least, severely limit my ability to catch factual errors. For example, in an essay on the Fall of New France that I tried writing with Copilot, the first draft said that the Seven Years War was fought between Britain, France, and India.
4. Writing with Copilot in Word reminds me of vacuuming my living room with a robot vacuum. I have to move the furniture, clear cables and cords off the floor, check for cat toys that might have rolled under the couch, and then do a little sweeping after to catch areas the robot missed. At a certain point, writing with Copilot can start to feel more like just writing this thing myself. The line can get blurry, but from the perspective of academic misconduct, the amount of work I need to do to achieve higher order thinking and analysis in my writing with Copilot starts to become more work than just writing the essay without Copilot.
I think this is where GAI is driving those who teach in higher education to rethink assessment. As you say, we will need to have difficult conversations about writing as a method of assessment. There are types of assignments that may no longer make sense and we may have to push for assignments that ask for high order thinking and analysis. For now, we can start by asking for some research sources and citations.
Please keep up your writing on this subject. I’m reading with great interest.
Its’s hard to understand what is actually happening with Copilot in Word behind the scenes. It is supposed to be powered by GPT-4 but yes, when you ask it for sources it responds much more like GPT-3.5. It seems to be better on internal search/writing which makes me think it’s GPT-4 for internal queries and 3.5 for external. Either way, my assumption is that the underlying engine will be upgraded over time as they build capacity.
I think we need to keep in mind that it was not designed to write research papers but was built to write and summarize documents for use in the “real world”. And I think that creates a real disconnect: businesses will expect employees to use this technology because it speeds things up. But doing that effectively (and making the process actually faster) requires a very different skill set than most of us currently teach. Finding a way to square those shifting externally imposed expectations with our teaching is not going to be all that easy.
Very good point about the purpose of these tools. We have been reviewing ChatGPT Plus, Copilot Pro (with Microsoft 365 integrations), and now Gemini Advanced. Each has different capabilities and strengths, but none are designed for the purposes of conducting historical research (or any sort of scholarly research really).
Copilot and Gemini seem mainly to be trying to perform as search engines of a sort. ChatGPT Plus has been the best (in my experience) with analyzing documents and data. These latter functions I've found the most useful for my own research.
None of this currently works with commercial licenses at universities unless your institution has (a) enabled the functionality and (b) paid the additional user fees on top of the standard institutional license. Most universities use A3 or A5 Microsoft 365 for Education licenses. It costs $30 per user more to add Copilot for Microsoft 365.
All the testing I did was on a personal account.
One more thing again:
Copilot for Word cannot insert formatted footnotes. It just seems to create manual footnotes without using the built-in footnote function under the Reference tab. Strange limitation. But as you note, the sidebar Copilot cannot currently format a document. Further demonstrates that this tool is still very much in development and buggy. Feels like it's been attached to Word with duct tape.
Hi Sean! Fully agree. Copilot is definitely underwhelming. Personally I was really hoping it might automate some of the clunkier things in word but no. Same is true of windows copilot. Interestingly, Microsoft released an open source Windows extension last week called UFO that apparently will do this (ie you can ask it to open a web browser and send an email for you apparently). It is here: https://github.com/microsoft/UFO. I haven’t tried it but it sounds like it runs on GPT-4 Vision so it looks at the screen, visually finds the thing it needs to click, and “clicks”. This stuff is all very buggy indeed!
Hi Mark (and Sean), thanks very much for this discussion.
I have a "newbie" question. If I were to write an original research paper in Word with Co-Pilot enabled, would that mean that my writing would be added to Co-Pilot's "memory" so to speak and then my own in-progress research findings would become publicly available?
Case in point, I downloaded Co-Pilot Pro this morning to test it out in Word, then later on started to draft a new research article in Word wherein I was going to put the results of my recent research, but that little "Co-Pilot" cursor was blinking at me. It certainly gave me pause.
Does enabling Co-Pilot in MS Office make our data less secure than ever? What are your thoughts? Thanks in advance.
It is pretty explicit: "Important: Prompts, responses, and data accessed through Microsoft Graph aren't used to train foundation LLMs, including those used by Microsoft Copilot for Microsoft 365."
I would add for clarity that in this case, the "data" is the paper you are working on in Word.
Hi Mark!
Thanks for posting this piece. We have been having conversations at my own institution about this very thing for the past year or so. Same concern: GAI will become ubiquitous once it is seamlessly integrated into word processors and other standard office software; we need to prepare for this change.
I also signed up for Copilot Pro and integrated it with my personal Microsoft 365 account. Here are my observations of the product right now:
1. Copilot for Microsoft 365 is still very much a beta product. It is somewhat stapled onto the existing suite of Office applications (currently just Word, Excel, PowerPoint, OneNote, and Outlook). It has varying functionality and efficacy depending on what program you’re using (Excel’s version of Copilot is quite limited, especially with natural language data analysis and no where near as sophisticated or useful as ChatGPT Plus).
The Copilot integrations are extremely buggy. The writer function in Word breaks repeatedly, throws up error messages, and sometimes just quits mid-sentence. But if you hammer away at it with prompts, you can get it to go and do much of what you say. Here’s a video of me writing a 2000+ word essay on the development of the welfare state in Canada in less than 15 minutes: https://youtu.be/htaX9qZR_e8?si=cpaehJbp8HtjgHh1
It’s impressive, but still limited. With that said, this is the worst this technology is going to get. It will improve with time.
2. The references are unreliable. In my tests writing AI-generated history essays, I struggled to get Copilot in Word to generate one real citation. I got some real historical quotes, but no accurate references. There were real journal titles, but fake articles, and other citation details. This continues to be a severe limitation in this kind of software and I don’t know how soon the capabilities of GAI in generating footnotes will improve. Microsoft still hasn’t enabled the ability to add comments to footnotes so I’m not holding my breath just yet.
3. If you watch the video of my welfare state essay, you’ll see much of what you describe in your article. I can insert paragraphs, ask for particular details to be added, and expand this into a more complex piece of work. As I worked on this, a colleague asked me a key question: Would you be able to generate such effective prompts if you didn’t already have prior knowledge of the subject matter. I should probably try writing something on a topic I know nothing about. That would, at the very least, severely limit my ability to catch factual errors. For example, in an essay on the Fall of New France that I tried writing with Copilot, the first draft said that the Seven Years War was fought between Britain, France, and India.
4. Writing with Copilot in Word reminds me of vacuuming my living room with a robot vacuum. I have to move the furniture, clear cables and cords off the floor, check for cat toys that might have rolled under the couch, and then do a little sweeping after to catch areas the robot missed. At a certain point, writing with Copilot can start to feel more like just writing this thing myself. The line can get blurry, but from the perspective of academic misconduct, the amount of work I need to do to achieve higher order thinking and analysis in my writing with Copilot starts to become more work than just writing the essay without Copilot.
I think this is where GAI is driving those who teach in higher education to rethink assessment. As you say, we will need to have difficult conversations about writing as a method of assessment. There are types of assignments that may no longer make sense and we may have to push for assignments that ask for high order thinking and analysis. For now, we can start by asking for some research sources and citations.
Please keep up your writing on this subject. I’m reading with great interest.
Its’s hard to understand what is actually happening with Copilot in Word behind the scenes. It is supposed to be powered by GPT-4 but yes, when you ask it for sources it responds much more like GPT-3.5. It seems to be better on internal search/writing which makes me think it’s GPT-4 for internal queries and 3.5 for external. Either way, my assumption is that the underlying engine will be upgraded over time as they build capacity.
I think we need to keep in mind that it was not designed to write research papers but was built to write and summarize documents for use in the “real world”. And I think that creates a real disconnect: businesses will expect employees to use this technology because it speeds things up. But doing that effectively (and making the process actually faster) requires a very different skill set than most of us currently teach. Finding a way to square those shifting externally imposed expectations with our teaching is not going to be all that easy.
Very good point about the purpose of these tools. We have been reviewing ChatGPT Plus, Copilot Pro (with Microsoft 365 integrations), and now Gemini Advanced. Each has different capabilities and strengths, but none are designed for the purposes of conducting historical research (or any sort of scholarly research really).
Copilot and Gemini seem mainly to be trying to perform as search engines of a sort. ChatGPT Plus has been the best (in my experience) with analyzing documents and data. These latter functions I've found the most useful for my own research.
One more thing:
None of this currently works with commercial licenses at universities unless your institution has (a) enabled the functionality and (b) paid the additional user fees on top of the standard institutional license. Most universities use A3 or A5 Microsoft 365 for Education licenses. It costs $30 per user more to add Copilot for Microsoft 365.
All the testing I did was on a personal account.
One more thing again:
Copilot for Word cannot insert formatted footnotes. It just seems to create manual footnotes without using the built-in footnote function under the Reference tab. Strange limitation. But as you note, the sidebar Copilot cannot currently format a document. Further demonstrates that this tool is still very much in development and buggy. Feels like it's been attached to Word with duct tape.
Hi Sean! Fully agree. Copilot is definitely underwhelming. Personally I was really hoping it might automate some of the clunkier things in word but no. Same is true of windows copilot. Interestingly, Microsoft released an open source Windows extension last week called UFO that apparently will do this (ie you can ask it to open a web browser and send an email for you apparently). It is here: https://github.com/microsoft/UFO. I haven’t tried it but it sounds like it runs on GPT-4 Vision so it looks at the screen, visually finds the thing it needs to click, and “clicks”. This stuff is all very buggy indeed!
Hi Mark (and Sean), thanks very much for this discussion.
I have a "newbie" question. If I were to write an original research paper in Word with Co-Pilot enabled, would that mean that my writing would be added to Co-Pilot's "memory" so to speak and then my own in-progress research findings would become publicly available?
Case in point, I downloaded Co-Pilot Pro this morning to test it out in Word, then later on started to draft a new research article in Word wherein I was going to put the results of my recent research, but that little "Co-Pilot" cursor was blinking at me. It certainly gave me pause.
Does enabling Co-Pilot in MS Office make our data less secure than ever? What are your thoughts? Thanks in advance.
HI Donica, Great question. The short answer is "no". Microsoft's privacy policy is here:
https://learn.microsoft.com/en-us/microsoft-365-copilot/microsoft-365-copilot-privacy#data-stored-about-user-interactions-with-microsoft-copilot-for-microsoft-365.
It is pretty explicit: "Important: Prompts, responses, and data accessed through Microsoft Graph aren't used to train foundation LLMs, including those used by Microsoft Copilot for Microsoft 365."
I would add for clarity that in this case, the "data" is the paper you are working on in Word.