Introducing Archive Studio
An experimental AI research tool for exploring the AI transcription and analysis of historical documents
Last fall, we introduced Transcription Pearl, a tool that used AI to transcribe historical, handwritten documents. At that time, the best models were performing better on English language documents than well-known programs like Transkribus but they were still far from useable “out-of-the-box”. As Mark wrote about a few weeks ago, though, that all changed with Gemini-2.5-pro: now you can get really accurate transcriptions (better than 95% WER) on the first iteration.
This brings us to Archive Studio, an open-source, more comprehensive program that attempts to automate some of the more mundane elements of historical document processing. Our goal here is to explore what AI might offer historians, hopefully allowing them to move seamlessly from images through transcription, correction, analysis, and structuring.
That said, many of the features we’ve added are experimental and are mainly about exploring the potential—and problems—of using AI in research. If people find them useful, that is great and we’d love to hear about it. But if you run into problems, or the AI generates inaccurate results, that is equally important—perhaps more so. The goal here is to help us generate information about what AI tools can and cannot do (yet?). Knowledge of both will help us have more robust discussions about how to respond to the technology.
Introducing Archive Studio
Archive Studio incorporates the original handwriting recognition and AI-powered correction capabilities available in Transcription Pearl, allowing users to generate initial transcriptions from images or refine existing ones using models from OpenAI, Anthropic, or Google. As we note above, though, we’ve found that with Gemini-2.5-pro, the correction step may no longer be worth the time.
You can download and run Archive Studio (which is free and open-source) from Githhub here: https://github.com/mhumphries2323/Archive_Studio. Right now you need to run it as a python script. As with Transcription Pearl, we will shortly release a standalone exe file for Windows.
We’ve also added some features to the integrated Image Preprocessing Tool which provides essential functions for preparing document images. This separate utility allows users to split two-page spreads, crop unwanted margins, manually or automatically straighten skewed images, and rotate them, ensuring cleaner inputs for the AI functions and potentially reducing processing costs. Batch processing features within this tool help streamline the preparation of large numbers of images.
The real expansion, however, lies in the new tools designed for working with the transcribed text. Researchers can now employ AI functions to automatically format text according to defined styles, remove awkward line or page breaks, and standardize layouts or dating formats for better readability or use in external databases.
Analytical Features
Verifying the accuracy of the text and analyzing the data both require identifying key information. Archive Studio includes functions to automatically identify, highlight, and extract the names of people and places mentioned within the text. These aren’t just static lists, though: a dedicated collation tool manages variations in spelling and can standardize them across the entire project if its desired.
To further aid historical analysis, especially with large collections, Archive Studio offers a relevance assessment feature. Users can now define specific research criteria, and use an AI model to analyze each document, determining whether it’s “Relevant” or “Partially Relevant.” Users can then quickly click through only those documents relevant to their criteria. This potentially allows for efficient filtering and focusing on the most pertinent materials within a large dataset. That said, we have very little information on the accuracy of these AI assessments, so we’ll look forward to hearing your feedback.
Sequential Data and Document Separation
One of the significant challenges we’ve had in using AI with historical documents is the sequential nature of historical data where key information often needs to be inferred from earlier elements of a lengthy text. Think of sources like diaries or letterbooks which are continuous in nature. For example, a year long fur trade journal may mention the year only on the first page while individual entries might be dated only with the day and omit the month altogether. These journals, although associated with a specific place, were also often written at more than one place over the course of the year as clerks travelled. Authorship too can change from one entry to the next. Without the right tooling, this critical historical context gets lost, resulting in a disembodied diary entry that might simply begin “Wednesday 2d”.
Archive Studio handles this problem through a multi-stage document separation process. First, an AI function analyzes the text, using customizable presets designed for different document types like “Letterbook” or “Diary,” to identify break points like the start of a new letter or entry. It inserts editable markers, joining text across page breaks where appropriate. Subsequently, after verifying their accuracy, the researcher can apply the separations. The result is that in the case of a 20-page post diary, instead of navigating page by page, the user is then able to navigate between, say, 300 discrete entries.
Customizable Functions
The behavior of all the AI functions is governed by customizable presets found in the Settings window. This allows researchers to select specific AI models, adjust parameters like temperature, and, critically, change the prompts and instructions given to the AI for each task from transcription to metadata extraction. Settings can be saved, loaded, and exported for sharing or backup.
We’ve found that the development of good, project-specific customized functions is essential. Thinking of our fur trade post journal example again, these documents often contain a mix of diary entries, letters, and tabular data like financial accounts, lists of supplies, and prices. To handle this variation, you can develop customized instructions which tell the AI what to do with each type of document. This may sound complicated, but it is actually quite simple.
In writing custom instructions, provide headers, explain what you want the AI to do and what you don’t want it to do, and give clear examples. Your instructions (which are technically just the prompts sent to the LLM) can be as long as you want. We’ve found that it’s best to describe your overall goal, the specific task, and requirements in the “General Instructions” (which is sent to the LLM as a system message). You can keep the “Specific Instructions” more brief. Generally, a lower temperature setting (that is, closer to 0 than to 1) is better for consistency and accuracy—on many tasks we suggest 0.2 to 0.3.
You can also use the settings to configure whether the LLM is sent images of the text on the current page or images of the previous or following pages (as well as how many of those images to send). This is useful when you want the AI to be able to read ahead (or behind) the current page of the document.
Metadata Generation and Structured Data
Transforming transcribed text into usable data requires extracting key pieces of structured information. Archive Studio attempts to automate this process, again relying on user-defined “Metadata Presets” which are configured within the Settings window. These presets act as blueprints, specifying not only which metadata fields the user wants to extract or create (such as author, date, place of creation, recipient, or a concise summary of the document), but also providing the precise instructions, or prompts, that guide the AI on how to locate or infer this information from the document’s text.
Again, this is an experimental feature and we include it because it is important for us, as historians, to understand how effective these models are at these types of tasks in order to make decisions about how to respond to their availability. If they are useful and accurate—or at least show promise—that is important to know. But if the models fail on these common tasks—or make significant numbers of errors—they may prove to be quite harmful to the research process. Either way, we need to know.
Once documents have been processed and analyzed within Archive Studio, you can export the results in a variety of formats like PDFs or text documents, or a CSV (Comma Separated Values) spreadsheet file which you can use in programs like Excel or SPSS. The CSV is especially important, as it is where the metadata and other extracted information can be stored and accessed.
Final Thoughts
Archive Studio is far from a finished product, it is a prototype—and a test case at that. While we have published on the accuracy of LLM translations, which offer a clear advantage over other automated options, we know surprisingly little about the accuracy of LLMs on analytical tasks. As a result, the use of the analysis features embedded in Archive Studio carry clear risks and, in some instances, may create more methodological problems than they solve. In many cases, the accuracy and usefulness of the outputs will depend on the type of instructions and criteria the user gives the LLM and the performance of the individual model assigned to the task. But in other cases, LLMs might not be up to the task you want them to perform.
We simply don’t know how good these models are at the types of tasks historians might ask them to perform and we hope you will report your results in the comments below. From our experience, testing shows that sometimes LLMs will miss a name or placename and may not always identify all the relevant documents in a given set. Its uncommon but not exactly rare. For this reason, always check and verify any results…and please report your findings!
Ultimately, Archive Studio is meant to be a contribution to the evolving conversation about how historians and archivists can productively engage with AI. It aims to provide a tangible means for researchers to experiment with these tools and evaluate their potential usefulness (or not).
Here it is important to remember that at present, most people are still using the types of ChatBot interfaces that don’t really provide a fair test of an LLM’s actual capabilities. So many of the issues people experience using programs like ChatGPT come from the tooling around the models—that is how information is passed to the model—rather than from the models themselves.
It may well be the case that LLMs will fail at some of the most basic things we do as historians, even when given the right tooling. If that is the case, we will need to identify and push back against the potential problems created by automated AI analysis—which is coming from private companies very soon. But if the models prove to be largely capable, that will necessitate other conversations. In truth, LLMs are likely to be capable in some areas, less so in others. Either way, we need to understand what these things can and cannot do to move forward.
Congratulations! I’ve forked the repository—hope to give you feedback soon.
Is this only designed for English written sources? I have thousands of photographs taken at the archives that I could use to test this tool, but they're all manuscripts written in Spanish...