Open-Source Archive Studio App Available for PC Users
Users who aren't comfortable running python scripts can now download the open-source PC App
For those of you who are interested in trying out Archive Studio, but don’t feel comfortable running python scripts, we’ve added a new executable file to the project’s GitHub Repository. In plain English, this means that if you want to download a Windows App that you can just double-click and use, it is now available. Right now it is only available for PC users but if any Mac programmers would like to help by creating an executable that would run on Apple computers, please let us know!
An important note: when you run Archive Studio, Windows Defender or other antivirus software on your PC may read the program as a virus or other potentially malicious software. Rest assured, it is not. The main reason this happens is that the program makes API calls to an external server. Professional software developers have access to certifications that allow them to verify the program’s authenticity with various antivirus providers, but that is not really something we are able to do. So apologies for the scary warnings, but if this happens, you can choose to add an exception and/or run the program anyway. If you are not sure how to do that, ask ChatGPT how to do it. It is really good at explaining those types of technical problems—and it is patient!
Getting Started
To use Archive Studio, you will need to get yourself at least one API key from Google, Anthropic, or OpenAI. This may sound scary but it takes a few minutes at most: it is as simple as signing up for any website and takes about three clicks. If you can subscribe to a newspaper, you can do this!
The process is outlined in the user manual, but here it is for each provider:
OpenAI API Keys
To register for an OpenAI API key, first create an account on the OpenAI platform (note that this will be different than a ChatGPT account). Click your profile icon in the top right-hand corner and select “Your Profile”. In the menu on the left, click API-keys and then click the green “+Create New Secret Key” at the top right. Give your Key a name and create the key.
Anthropic API Keys
To register for an Anthropic API key, first create an account on the Anthropic console (this will be different than your Claude chat account). After you create an account, cock your profile icon at the top right and select “API Keys” from the dropdown and then select the orange “+Create Key” button at the top right.
Gemini API Keys
The process with Google can be somewhat more confusing as it provides a number of different ways to access their models via API, specifically via the Google AI Studio and Vertex platforms. You need to use the Google AI Studio platform. First, create a Google AI Studio Account then, once you are logged in, select the blue “Get API Key” button on the left. In the next pane, click the blue “Create API Key” button.
Recommendations and Costs
If you are going to use just one provider, we would strongly recommend Google, not only because it is the best model currently available for these types of tasks but also because it is the most affordable.
The Google Gemini API provides a free tier to try out the service. It is also extremely affordable when you add a credit card number: we’ve found that the 2.5 Pro model is about $0.006 (or just over half a cent) USD per page while the 2.5 Flash model (which is often just as good) is only $0.0005 per page. We’d recommend using the Flash model for all other tasks like document segmentation, identifying named people and places, generating metadata, etc.
Be sure to review the costs associated with the model you choose to use before making large requests. To be clear, each of the processing operations (recognize text, correct, format, separate document, get names and places, generate metadata, etc) all involve calls to an API which cost you money. Although the costs per request are small, they can start to add up if you don’t pay attention. All three of the providers above allow you to monitor your usage in realtime and you should do so carefully at first, at least until you get a sense of how much various operations cost and their relative usefulness.
Errors and Rate Limits
When you send a document to an LLM via Archive Studio, you are doing so through an API which makes a call to the LLM’s server behind the scenes. These servers have rate limits which are like quotas dictating how many requests you can make per minute or per day as well as how much information you can send at one time. To speed things up, Archive Studio is designed to send documents in parallel (meaning at the same time) which may not work as well if you are not on a paid tier for one of the API services described above.
If all of your operations begin to fail (IE you get an error message with 0/10 documents completed and 10 errors) it’s likely you’ve hit your rate limit. You can change how many documents you try to send at once in Settings → API Settings → Batch Size. If you are on the free tier of Gemini, for example, you should set this to 1 as Gemini only allows 3 requests per minute and 15 per day for the pro model at that level. The first paid tier allows 150 requests per minute and 1000 per day. We suggest you review the Gemini, Anthropic, and OpenAI rate limits and set your batch size accordingly—they are all very different.
When using Gemini specifically, you may also sometimes find that one of your pages simply will not complete. When this happens, it may be because the page appears in the Gemini training data. Google has implemented a mechanism which identifies when Gemini is writing out data that was in its training set and stops the output. While the goal seems to be to prevent users from getting the models to commit overt copyright infringement, the mechanism does not discriminate between historical texts which are in the public domain and materials that are copyrighted. It only detects when the model is “reciting” from its training data. In those cases, you can switch to another model for that specific page (strangely, Gemini 2.5 Pro is much more likely to trigger this mechanism than Gemini 2.5 Flash although we don’t know why).
Practical Tips for Using Archive Studio
In the previous version of this program (Transcription Pearl), we recommended using one model for transcription and a second model for correction. As Mark wrote about earlier, though, most often this is no longer necessary. The Gemini 2.5 Pro models have gotten considerably better at transcription and are (for now) the gold standard. Correcting them actually tends to make the transcription worse. On our test set, Gemini 2.5 Pro achieves a raw Character Error Rate (CER) of 3.5% and Word Error Rate (WER) of 10.95%. That might sound high, but as we discuss in a recent article we published with our student research team in Historical Methods, when you ignore capitalization and punctuation errors, for Gemini 2.5 Pro those drop to 2.06% and 4.31% respectively. In practice, many transcriptions are great first drafts at the very least.
That said, the models are really strange. Sometimes they just aren’t able to properly read a text and the reasons are rarely obvious. It doesn’t happen often, but we’ve seen them excel at poorly written texts in low quality images but fail at standard 18th century handwriting in high resolution images. Gemini is also a “thinking” model meaning it’s intended to reminate before it answers. We’ve found this actually causes a significant performance drop and, again, the reasons are not immediately clear. In this latest version of Archive Studio, we’ve turned thinking off for Gemini models—a feature that only recently became available—which is one of the reasons we were waiting to announce the availability of the program.
So here are some practical tips if you’re not getting great results.
Don’t give the model two pages from a book at the same time because it is too much text at once. If you have a photograph that includes two pages of text, use the Image Editing utility (in the Tools menu) to split the image along the page boundary.
Don’t leave too much space around your document in the image. If your document has larger areas of null space (a desk, blank microfilm, or other pages of text under the relevant image) this will confuse the model. They do not read like we do…they take everything in at once. Use the crop tool in the Image Editing utility to crop the image to a reasonable size (there is an auto-crop feature that works well when the borders between your document and the background are relatively clear).
Try editing the instructions for text recognition. Under the settings menu, try playing around with the instructions you give the model under Functions → HTR. If you find that the model repeatedly makes the same mistakes, include a section labelled something like “RULES” and tell the model what you want it to do. For example, if it consistently reads a capital T as an F, tell it that “you might find that capital Fs look a lot like capital Ts in this text. Make sure you check the context of the sentence before finalizing a capital F or T.” You can then save these settings, export them, and import them for future documents. Use restore defaults to get back the original prompt.
Conclusion
We hope you find this program useful. Remember it is experimental and you should not trust it’s outputs uncritically (as with any generative AI tool). That said, if you find any tips or tricks that might help others, we would encourage you to leave them here in the comments.