Transcription Pearl for Non-Coders and Next Steps
I've added a manual and downloadable executable file for non-coders who want to try Transcription Pearl on Windows PCs (sorry Mac users)
If you’ve been interested in trying Transcription Pearl but are unfamiliar with running python code, I’ve created a stand-alone, no-code version of the program that runs on Windows PCs. Unfortunately, it won’t run on Macs at this time. Sorry!
You can download the program from the GitHub Repository or directly from this link. Then it is as simple as putting the file on your desktop and double-clicking. I’ve also co-authored a manual with Claude Sonnet-3.5 (more on that below…it was an interesting process) which you can download here. It will walk you through the process of installing the program and working with your documents.
An important security note: when you run Transcription Pearl, it may be read as a virus or other potentially malicious program by Windows Defender or other antivirus software. The main reason this happens, as far as I can tell, is that it makes API calls to an external server. If I were a professional software developer, I would have access to certifications that would allow me to verify the program’s authenticity with various antivirus providers, but that is not really something I am able to do as a non-professional. So apologies for the scary warnings, but if this happens, you can always choose to add an exception and/or run the program anyway.
It Sounds Scary, but Getting an API Key is Easy
To make this work, you will need to get at least one API Key from OpenAI, Anthropic, and/or Google Gemini. It is best to use one model for transcription and another for correction (we found that the Gemini 1.5-Pro-002 was best at the initial transcritpion while Claude Sonnet-3.5 was best at correcting transcriptions). While this might sound intimidating, it should not be: it is as simple as signing up for any website and takes about three clicks. If you can subscribe to a newspaper, you can do this!
The process is outlined in the user manual, but here it is for each provider:
OpenAI API Keys
To register for an OpenAI API key, first create an account on the OpenAI platform (note that this will be different than a ChatGPT account). Click your profile icon in the top right-hand corner and select “Your Profile”. In the menu on the left, click API-keys and then click the green “+Create New Secret Key” at the top right. Give your Key a name and create the key.
Anthropic API Keys
To register for an Anthropic API key, first create an account on the Anthropic consol (this will be different than your Claude chat account). After you create an account, cock your profile icon at the top right and select “API Keys” from the dropdown and then select the orange “+Create Key” button at the top right.
Gemini API Keys
The process with Google can be somewhat more confusing as it provides a number of different ways to access their models via API, specifically via the Google AI Studio and Vertex platforms. You need to use the Google AI Studio platform. First, create a Google AI Studio Account then, once you are logged in, select the blue “Get API Key” button on the left. In the next pane, click the blue “Create API Key” button.
APIs, Privacy, and Security
If you choose to try Transcription Pearl, you might have questions about privacy and security. Don’t we all! As this is an evolving field, there are lots of questions that still need to be defined about rights, ethics, etc. In general, though, I think that using APIs rather than ChatBots is more secure and more ethical. Here’s why.
Unlike ChatGPT, the API versions of the OpenAI, Anthropic, and Google LLMs will not retain or store your data nor will they use it for training their models. So you aren’t feeding the models with data. Don’t take my word for it, though: read each of their privacy policies (as they pertain to the API which is what Transcription Pearl uses) by clicking the links above. But if you don’t trust these companies to comply with their own user agreements, don’t use LLMs.
I personally see little difference between uploading a document to a cloud server like OneDrive and Google Drive or using an LLM via an API: both involve sending the same information over the internet to an external server for research purposes (although with LLMs it is not being retained or stored). To my mind, if you aren’t allowed to send it to an LLM for legal or ethical reasons, you probably also can’t store it on the cloud (even if you hadn’t given that as much thought). So the key to me when you go to transcribe a document is: do you have the necessary rights to use those materials? And if you don’t, or you are unsure, you can’t use them.
For that reason, I only use AI on materials that are open-access, out of copyright, or that I know I can clearly use under the educational and fair-use provisions of the Copyright Act. Of course, you also cannot generally use either ChatBots or APIs with sensitive materials that have restrictions placed on them by Research Ethics Boards or external organizations (in the same way that you can’t generally store them on cloud services either). But even here it is worth noting that this is starting to change: the OpenAI API can be certified for use with Health Insurance Portability and Accountability Act (HIPAA) related materials in the United States, provided researchers get the necessary ethics approvals.
For now, know that if you use Transcription Pearl, you are using the more secure API interfaces to the LLMs mentioned above, not unsecure ChatBots. Nevertheless, it is up to you to ensure that you have the necessary rights and permissions to work with your documents.
Writing Technical Manuals with AI
As something of an aside, the process of writing a comprehensive manual for Transcription Pearl was illuminating. I began by developing a table of contents and then copied and pasted my source code (about 5,000 lines between the Transcription Pearl and Image Pre-Processing Tool) into Claude Sonnet-3.5 along with a few images of the interface. I then asked Sonnet-3.5 to write each of the technical sections based on the source code and images and it generally did a great job. I then edited, refined, and added material as necessary.
The manual is about 5,000 words long and it only took around two hours to produce, start to finish. While I don’t think technical writing will be replaced by AI, I can see how it will make competent technical writers much faster and more efficient. And that means the field will invariably shrink.
Next Steps
I’ve been updating Transcription Pearl fairly regularily over the last few weeks in response to feedback (thanks!) and I’ll continue to do so. But I see Transcription Pearl as the first step in a larger workflow.
Ultimately, the goal is to produce software that will automate the process of allowing historians to transcribe texts, upload them to a database, and then perform various tasks with them like asking questions, summarization, and finding connections between documents that might otherwise remain hidden.
To this end, I’ve been working on the next version of Transcription Pearl which will allow users to automatically generate metadata for a corpus of documents, exporting a sort of finding aid or box list to a spreadsheet. It works like this: you might upload a 100 page folder of documents containing letters that vary in length from one page to a few pages.
After transcribing and correcting the documents in Transcription Pearl, the user hits “Process Text” and an LLM automatically generates a table of contents for the folder, noting where each letter begins and ends. It then reads each letter in turn and extracts metadata like the names of each person or place mentioned in the document, the name of its author, the name of the main correspondent, the date it was written, and where it was written.
It also summarizes the main points in the document according to requirements set by the user (IE to emphasize a certain research topic or theme), it answer questions like is this document relevant to X topic, and assigns keywords/subject headings either automatically or from a bank of choices created by the user. The output is then saved to a spreadsheet with each row assigned to a unqiue document, along with the original text and path to the image(s).
While I think this will prove quite useful for researchers on its own, the goal is to ladder this into a larger piece of software. The exported spreadsheet can also be uploaded to a database and combined with other similar spreadsheets, creating a huge archive. Over time, this would become a fully searchable archive using keyword or semantic approaches, making all one’s research accessible to an AI research assistant.
This is a fantastic innovation for all historians, I truly thank you for this. I have been testing out Transcription Pearl with Polish archives and so far the initial transcription seems quite accurate (more so for archives produced by a typewriter than for quickly handwritten texts). However, when I try to correct the initial transcription (pdf format) with Claude Sonnet 3.5, I keep getting an error. As I'm not a technical expert, I'm not sure what could be causing this...
I will test it and compare it with Transkribus. I look forward to your next steps! I’d love to bring you here to Wayne State in 25-26 for a talk…