32 Comments

Totally agree! It’s the detective work I enjoy too. I don’t think that will go away. But that is a different issue that the question of how far and fast the economic value of human research and analysis will decline in the face of automation. I often think of furniture making as a good example. I am a woodworker and I love building arts and crafts furniture in my spare time and I am pretty good at it. I’ve built a few pieces for other people too. 70 years ago I could have made a career of it, but the reality today is that it would be very difficult to thrive. The problem is that for most people a $200 IKEA coffee table is good enough compared to a custom one for $1,800. Doesn’t change the enjoyment I get from my hobby but it doesn’t change the economics of the business. Certainly there are professional custom furniture makers, just a lot less than there were 70 years ago.

Expand full comment

Interesting perspective, but not sure I agree. Deep Research is already an agentic system: it conducts systematic, unsupervised research on its own to solve a problem. In terms of reasoning, from a practical standpoint these models "reason" in that they are able to parse a question, identify relevant from irrelevant sources, read the material to find information that answers the question, and then write a coherent answer. You might look at the historiographical example as it is a more coherent, cogent response to a more general question. I am not sure how one would argue that an LLM could infer the answer from the question in this case.

Expand full comment

Thank you for this. I remain profoundly skeptical and also really surprised that so many people truly believe we have no control over how we as a society will use these technologies, and are not questioning the motivations behind their rapid expansion. We are in a time of radical attacks on history and especially on recent generations’ expansion and reclamation of suppressed histories. The idea that history is a simple and apolitical telling of the past is highly problematic. The threats to the profession that will make AI seem like an obvious choice are defunding, loss of tenured lines, attacks on the humanities, the rise of business schools, and the debasing of work by “women and DEI hires.” You may be taking this context for granted, but I think as historians we need to acknowledge it.

Expand full comment

I totally agree that this is part of a bigger picture. To be clear, I do think we can have control over how we choose to adopt the technology, but that requires a realistic knowledge and understanding of the technology. My sense is that most historians (and people in general) are not ready to engage in those discussions in a meaningful way due to a lack of awareness around how generative AI works, its actual capabilities and limitations (versus wishful thinking), how it will be used in the knowable future, and the ways in which it already being used. My fear is that if we continue to ignore AI, fail to keep-up with developments, or pretend it can't do the things it can very obviously do, we won't have a meaningful say in those discussions. Personally, the last thing I want to see is the big tech companies dictate how we come to use AI because the learning curve was too steep for most people to climb.

Expand full comment

They won’t because only a small percentage of archival collections are digitised and an even smaller percentage have had handwritten text recognition applied.

How many secondary sources have been ocr’d or are in digital form? Again a very small percentage… it will be a very very long time before all relevant resources are available online? What about all the resources which have been lost to fire, flood, etc? There are always gaps in the archival record.

What does a historian really do - interprets, interpolates… it needs them to reassess what they do and to evaluate what these ai tools can do and offer? AI won’t replace - it will be an assistant.

Some historians even hallucinate as well! AI will hit the wall of the actual and real losses in the archival record. It will have work with the remains.

Expand full comment

Interesting. I think a key point to remember here is that LLMs can actually read handwriting as well as humans now. Same with non-ocred printed text. And the reality is that the vast majority of printed secondary sources have been digitized.

Expand full comment

I disagree - in New Zealand that is not the case - a lot hasn’t been digitised and never will be - cost, priorities and policy issues. The key issue with LLMs - some have been trained in a variety of handwriting styles and texts from different time periods. However, when you try the with the texts in front of you, often the recognition isn’t great and needs close checking and correction. There are just some texts where you have to train a model from scratch. My comments come from using Transkribus and 19th Century archival documents. Transkribus now has great pretrained models which I would use first to see what they produce and then evaluate to see if the pretrained model is satisfactory or not. Examples where pretrained models aren’t great are texts with many handwriting styles in the text you are working with. Often these include print, and entries in tables as well as the handwriting styles. Letterpress texts containing copes of outward letters are another problem. At this point in time it is ‘horses for courses’. You have to analyse the texts you are working with, what is its context, its characteristics, etc. given the project objectives decide what is the best approach. Once you have transcripts in digital form there are also considerations about metadata - content, format and document structure, presentation of the data and reuse of the dataset - say visualisations or analysis and searching data.

Expand full comment

Have you tried my transcription Pearl software? LLMs beat Transkibus by a large margin and they are not “trained” to read handwriting, that is an emergent capability. If you combine Transkibus text with Sonnet 3.5 the error rates are about human level (ie 98% accuracy) for 18th and 19th English.

Expand full comment

The research it wrote was consistent looking but it overstretched some conclusions (the first conclusion about the copper deposits are not at all supported, the next river systems one is more supported but it makes the connection vapid). If you hadn't formed your question well enough it would have never connected these things. It just uses a format that looks proper but the reasoning is still the same. Re-using the same iterative processes over several cycles, and extending the workload, isn't going to make it agentic. There's still going to be the annoying aspect of asking it questions and knowing when you're simply not going to get anywhere with it. I wouldn't worry about this to the point you're saying. It's not a hardware or software issue per se so much as it's the way it literally functions. What you call a hallucination is literally all it does. It'll study papers, sources and make everything look like those to the best degree it can but it'll always be vapid because real reasoning is nothing like that. It's shooting all the real reasoning that you gave it right back to you.

Expand full comment

I think the role of serious historian is more important than ever even in the starkest view, but you have to take the starkest view seriously to see the serious historian’s real value. You are the last mile that matters. https://hollisrobbinsanecdotal.substack.com/p/its-later-than-you-think

Expand full comment

Hi Mark. I enjoyed this post. I look forward to the upcoming discussion.

I'm an amateur historian, not a professional, focused on preserving family histories.

I wonder, for family histories prepared by amateurs, whether an AI machine created history might be acceptable compared to the alternative of having no written history at all, since few people working in family history have the skill or commitment to produce a finished work.

For those of us doing this work in the last few decades of our lives, it will be fascinating to see how these new tools impact our small part of the world, where so much information is not yet online, and the final work requires capturing memories that may not yet be written down.

Expand full comment

I think AI generated histories will become an important part of family history research. I think lots of people really enjoy the detective work involved in genealogy but writing it up into a book or report is often a daunting challenge. Ai probably helps that enormously.

Expand full comment

Tried it out too with Grok. Definitely impressive and can cut out aspects of social science and humanities research. But still highly derivative. I think real in person work will become even more valuable (such as archive research).

Expand full comment

That may well be true and I hope you are right. When I try these technologies, though, I focus less on what they can do now than what they will likely be able to do in the near and medium term. This is the worst they will ever be.

Expand full comment

True. But I think they are very far from real world interface. E.g. walking into an archive, combing through materials. Someone still has to create the stuff they will use, even if that means more limited usage of the profession.

Expand full comment

Excellent essay, Mark. I'm a writer, not a historian, but I've had the same unsettling reaction to ChatGPT. The upgrades in the last several months have markedly improved the AI's inferential and critical "thinking" abilities. As we speak, ChatGPT is as talented as any top editor I've ever written for. There are still too many factual errors, but its reasoning capabilities are profound. But I'm *very* concerned about humans outsourcing critical thinking skills. We already have a shortage of people with critical thinking skills.

Expand full comment

Thanks, Mark. I’ve spent limited but intensive time in a couple of archives for my history project. Now I’ve tried ChatGPT and Perplexity for some initial deep-digging research and found the results to be occasionally time-saving but mostly ignorant. The reason is as others have mentioned above: much of detailed history, even as recent as the 1950s, still in paper form or behind paywalls (ever tried to obtain simple time series of past energy or exchange rates, etc.?). And I’m unlikely to try Deep Research at @200/month. But of course you’re also talking of possibilities and here the prospects are amazing.

Expand full comment

I think the application is interesting, but I'm curious about the market who will primarily use it. I know so many researchers, myself included, who are in the archives for stories and the dopamine boost - the lure of discovery. I'm just not ready to hand over the reins. For passion's sake, I guess.

Expand full comment

Thank you for this excellent article about Deep Research. I have no expertise in using AI, so I appreciated that you explained it very clearly. Even if we do not use AI directly, we need to be aware of the impact it has on our lives.

Expand full comment

If AI is going to allow people interested in history to pick or design the type of virtual historian they want to study under or use as a study partner then we can be fairly sure about a few parameters:

o It will not care at all for scholarly traditions surrounding the PhD, including status markers, hazing rituals, and hurdles or protocols setup to defend the elite status or sensibilities of PhD holders.

o It will care, probably too much, about maximinzing the student’s financial returns after graduation. Many students will steer developmet of the virtual tool in this direction to the exclusion of other factors.

o The Socratic method of lecture and class discussion will probably die off. Students are already on their phones throughout lectures. Why should they bother to go to in person lectures when their phone can deliver a bespoke lecture for them? The only thing keeping students in class is the external check that poor attendance penalizes grades.

o It will make the requirement that doctoral students and new professors conduct original research further at odds with the requirement that their published work also be widely read and cited, if they want promotion and tenure. Spending years on original dissertation research seems like a misallocation of resources.

o As AI is now critical to the understanding and production of knowledge in any academic discipline, it’s fair to say that universities that want to remain competitive will have to change PhD requirements to remove what is outdated and add formal study of how AI is applied to the discipline. Indeed, it might be more useful to the discipline for a PhD student to research a new AI application than it would be for them to submit a traditional dissertation.

Expand full comment

Much like John Warner argues that ChatGPT should kill off the five-paragraph essay, it seems like an LLM trained to take its time crawling the web and looking for sources will kill off the lit review. And, as you show, they may aid historians in analyzing unstructured data in the form of letters and diaries. That is if they can be made more reliable. Ben Breen's writing Res Obscura is a wonderful window into some of this potential.

You lose me with your headline. It is a big leap to imagine that this technology can replace historians. As Jade E. Davis puts it in The Other Side of Empathy, "Artificial Intelligence assumes the past has the answer to the future’s questions." AI can't ask the future's questions, though it may help historians answer today's questions.

LLMs are interesting new cultural tools, and they will extend the work of the digital humanities in ways we can barely see. But, replacing historians? Come on.

Expand full comment

Interesting analysis and thanks for the comment. On my good days, I would tend to agree with you that these tools will augment rather than replace. But I think we have to think through how even modest improvements to current models might significantly shift the equation. If you give Deep Research access to primary sources and academic secondary sourcesi, it can develop its own (sensible) research questions when given the simple task of “identify and fill a gap in the literature”. It can (right now) write a very good paper that does this. The main limitations at the moment seem to be its ability to access the right sources and this is a “solvable” technical problem rather than a limitation of the models. That is essentially what we do. I want to understand how we articulate and demonstrate value for a human research process that can take months or years versus a machine process that takes minutes but achieves similar ends. Will students want to learn how to do this in the “grass fed” way when they can just ask a model to answer a given question? Will governments fund this type of work? I understand that the reality of this is nearer in some sub-fields than others where readable digital sources are many, but that is how automation works. It plays out unevenly hitting the lowest hanging fruit first.

Expand full comment

These are the right questions, I think. And it may simply be a failure of imagination on my part to think there is something inherently human in asking interesting questions about the past. One question worth keeping in mind is what we want to automate. To repeat Joseph Weizenbaum's main point, there are some things we ought not have machines do.

Expand full comment

Fully agree, and this is exactly what we need to start talking about.

Expand full comment

Great post, Mark. It raises a few fears that, I think, are unfortunately increasingly likely to materialize. One is the reduction of the historian to digitizer and data organizer. I wonder if more and more the productive energies of historians will be channeled toward the problems associated with preparing data for use by AI agents, including most basically digitization but also maintaining/assigning bibliographic identifiers and other metadata that augment the work of AIs. Another fear is the adverse incentives that now exist to isolate or remove data from such processes. Historians threatened by AI will have even greater incentives to hoard images and text. Less perniciously, there will be a vague but powerful incentive to channel research toward fields whose data is least accessible to both AI agents and to other humans. But there is much to be excited about as well—I know I will certainly enjoy having custom, well-sourced histories of narrow topics available on demand.

Expand full comment

Agreed! My prompt to MidJourney for the image at the top was "an AI robot historian frustrated with the speed of its human historian research assistants" (or something to that effect). That sounds like a terribly dark timeline.

On the second point, I think that is a real concern and also probably already happening to some extent. But I don't think it will be all that sustainable even in the short-term. How would one publish or communicate results without sharing them with AI? If a human can find it, so can Deep Research. In Canada, if you hold a SSHRC grant, we are also increasingly expected to make our raw primary sources available via institutional repositories. Everything we touch becomes digital in some way.

This leaves aside the very difficult conversations governments are going to start to have with universities about the efficiency of human vs AI research and teaching in the near future. That is not a good thing, to be clear, but it will happen. Cloistering ourselves away is unlikely to help.

Expand full comment

As someone who does local history, I have not been impressed with the AI tools as they have been released in the last few years. Based on your article, I tried Deep Research and found it to be nearly as flawed as earlier ChatGPT versions. I think this have more to do with the fact that our local history is very poorly documented online.

Thank you for the article. I found it thought, and action, provoking.

Expand full comment

Are you sure it was Deep Research you tried? The names get very confusing. To access Deep Research you need a Pro subscription to ChatGPT which is $200 per month.

Expand full comment

Apparently I did not. My mistake.

Expand full comment

I wish the AI companies would stop using the same names for everything. It’s getting impossible to follow!

Expand full comment