15 Comments
User's avatar
Jon Bang Ploug's avatar

I got it to work on mac - if anyone is interested

Expand full comment
JS's avatar

How did you do this? I was so sad not to be able to try this out.

Expand full comment
Vivienne Cuff's avatar

You need to evaluate this approach against What Transkribus offers - there are some situations where creating your own models is necessary because cursive handwriting in 19th century archives is very difficult to decipher. For example tabular data or where a series of documents contains many hand with different personal handwriting styles.

Transkribus- https://www.transkribus.org

The key issues re these tools are usability for the end user and fit with an organisations IT environment. Many organisations will not allow end users to run programs in their environment for security reasons.

Expand full comment
Mark Humphries's avatar

Thanks for the comment. We did this in an article in Historical Methids using an earlier version of the software: https://www.tandfonline.com/doi/full/10.1080/01615440.2025.2500309.

Transbrikus is a great tool but it is also expensive and our goal is not to create a competitor but an open-source alternative. For context, context, on a 10,000 word, 50 page English language 18th and 19th c test set using dozens of different hands, out of the box (ie without fine tuning or training), we found Gemini-2.5-pro achieved a WER of 4.89% and a CER of 2.63% (excluding punctuation and capitalization as both can be ambiguous). On the same test set, the latest Transkribus Titan model achieves 13.2% WER and 6.6% CER. Transkribus also costs around 24 cents per page versus 0.8 cents per page with Gemini-2.5-pro.

Transkribus would probably approach and perhaps exceed Gemini’s performance if you fine tuned it on each ah d, but that requires around 50 pages of transcribed pages per hand. So on large datasets, Transkribus might be the best choice choice (and it might also be much better on non English sets, we don’t know). But for sets of mixed documents or small sets of documents (or where cost is an issue), Gemini-2.5-pro in the API via a program like Archive Studio offer an alternative.

Expand full comment
Florindo Palladino's avatar

Congratulations! I’ve forked the repository—hope to give you feedback soon.

Expand full comment
Silvia Escanilla Huerta's avatar

Is this only designed for English written sources? I have thousands of photographs taken at the archives that I could use to test this tool, but they're all manuscripts written in Spanish...

Expand full comment
Mark Humphries's avatar

In in our tests, it works just as well on French documents as English documents. Users have reported that it works well on German and Italian too...but I am not sure we have heard results for Spanish documents. The program uses Generative AI to read the documents, so success will entirely depend on the abilities of the model you use. My intuition is that Gemini-2.5-Pro would work well for Spanish documents, but I would be interested to hear any results.

Expand full comment
Gunnar W. Knutsen's avatar

How well it performs on Spanish depends on the handwriting and subject material. For Inquisition records the LLMs perform rather badly unless you use a fine-tuned version for error checking.

Expand full comment
Thiago's avatar

It seems revolutionary! I look forward to the .exe file that will make things easier for us, non-coding/lazy historians. Incidentally, I tried Gemini 2.5 Pro and it seemed really impressive at first - but then I realized it was just hallucinating in an extremely convincing way, totally making things up, even when I call it out on that. Much worse than ChatGPT. I wonder why, and if (or why) it's so much better when you use the API.

Expand full comment
Mark Humphries's avatar

Thanks for the comment. May I ask what you were asking it to do?

Expand full comment
Thiago's avatar

Transcribe a page of a volume from the Royal African Company, because the post said Gemini was fairly good at transcriptions. It is fairly legible, albeit BW. It was disconcerting: the output looked totally legit in terms of eighteenth-century writing, but when I checked it had no relation to the image. Very weird!

Expand full comment
Mark Humphries's avatar

So this was in the Gemini app, right? And I assume you uploaded a jpg rather than provide a URL? I just tried it in the Gemini app on a few different documents and it worked as expected. I can't share screenshots in the comments but this is what I did. First I went to Gemini:

https://gemini.google.com/app

Then I used the + button to upload a jpg. The JPG does not need to be perfect resolution, but it needs to be readable by a human. I then wrote "Transcribe this 18th century document." And the transcription was fine.

Not sure if that matches what you did, but I tried it with a few images.

Expand full comment
Thiago's avatar

Yes, that`s exactly what I did. It did really well on a French 1708 doc with perfect handwriting and a good image. Then it did poorly on a Brazilian notarial record from the 1700s (handwriting and image were worse). Then it hallucinated like it was 2022 with the RAC record. Then my preview ended (I only subscribe to ChatGPT Plus, not to other LLMs).

Expand full comment
Mark Humphries's avatar

Interesting. I don't subscribe to Gemini either, but I haven't run into any issues on the free tier.

Not sure what to say other than that I haven't seen that happen in Transcription Pearl or Archive Studio with the Gemini API. I've found that performance with the API is way more consistent.

Expand full comment
Thiago's avatar

Looking forward to experimenting with it!

Expand full comment