Is this only designed for English written sources? I have thousands of photographs taken at the archives that I could use to test this tool, but they're all manuscripts written in Spanish...
In in our tests, it works just as well on French documents as English documents. Users have reported that it works well on German and Italian too...but I am not sure we have heard results for Spanish documents. The program uses Generative AI to read the documents, so success will entirely depend on the abilities of the model you use. My intuition is that Gemini-2.5-Pro would work well for Spanish documents, but I would be interested to hear any results.
How well it performs on Spanish depends on the handwriting and subject material. For Inquisition records the LLMs perform rather badly unless you use a fine-tuned version for error checking.
It seems revolutionary! I look forward to the .exe file that will make things easier for us, non-coding/lazy historians. Incidentally, I tried Gemini 2.5 Pro and it seemed really impressive at first - but then I realized it was just hallucinating in an extremely convincing way, totally making things up, even when I call it out on that. Much worse than ChatGPT. I wonder why, and if (or why) it's so much better when you use the API.
Transcribe a page of a volume from the Royal African Company, because the post said Gemini was fairly good at transcriptions. It is fairly legible, albeit BW. It was disconcerting: the output looked totally legit in terms of eighteenth-century writing, but when I checked it had no relation to the image. Very weird!
So this was in the Gemini app, right? And I assume you uploaded a jpg rather than provide a URL? I just tried it in the Gemini app on a few different documents and it worked as expected. I can't share screenshots in the comments but this is what I did. First I went to Gemini:
Then I used the + button to upload a jpg. The JPG does not need to be perfect resolution, but it needs to be readable by a human. I then wrote "Transcribe this 18th century document." And the transcription was fine.
Not sure if that matches what you did, but I tried it with a few images.
Yes, that`s exactly what I did. It did really well on a French 1708 doc with perfect handwriting and a good image. Then it did poorly on a Brazilian notarial record from the 1700s (handwriting and image were worse). Then it hallucinated like it was 2022 with the RAC record. Then my preview ended (I only subscribe to ChatGPT Plus, not to other LLMs).
Interesting. I don't subscribe to Gemini either, but I haven't run into any issues on the free tier.
Not sure what to say other than that I haven't seen that happen in Transcription Pearl or Archive Studio with the Gemini API. I've found that performance with the API is way more consistent.
Congratulations! I’ve forked the repository—hope to give you feedback soon.
Is this only designed for English written sources? I have thousands of photographs taken at the archives that I could use to test this tool, but they're all manuscripts written in Spanish...
In in our tests, it works just as well on French documents as English documents. Users have reported that it works well on German and Italian too...but I am not sure we have heard results for Spanish documents. The program uses Generative AI to read the documents, so success will entirely depend on the abilities of the model you use. My intuition is that Gemini-2.5-Pro would work well for Spanish documents, but I would be interested to hear any results.
How well it performs on Spanish depends on the handwriting and subject material. For Inquisition records the LLMs perform rather badly unless you use a fine-tuned version for error checking.
It seems revolutionary! I look forward to the .exe file that will make things easier for us, non-coding/lazy historians. Incidentally, I tried Gemini 2.5 Pro and it seemed really impressive at first - but then I realized it was just hallucinating in an extremely convincing way, totally making things up, even when I call it out on that. Much worse than ChatGPT. I wonder why, and if (or why) it's so much better when you use the API.
Thanks for the comment. May I ask what you were asking it to do?
Transcribe a page of a volume from the Royal African Company, because the post said Gemini was fairly good at transcriptions. It is fairly legible, albeit BW. It was disconcerting: the output looked totally legit in terms of eighteenth-century writing, but when I checked it had no relation to the image. Very weird!
So this was in the Gemini app, right? And I assume you uploaded a jpg rather than provide a URL? I just tried it in the Gemini app on a few different documents and it worked as expected. I can't share screenshots in the comments but this is what I did. First I went to Gemini:
https://gemini.google.com/app
Then I used the + button to upload a jpg. The JPG does not need to be perfect resolution, but it needs to be readable by a human. I then wrote "Transcribe this 18th century document." And the transcription was fine.
Not sure if that matches what you did, but I tried it with a few images.
Yes, that`s exactly what I did. It did really well on a French 1708 doc with perfect handwriting and a good image. Then it did poorly on a Brazilian notarial record from the 1700s (handwriting and image were worse). Then it hallucinated like it was 2022 with the RAC record. Then my preview ended (I only subscribe to ChatGPT Plus, not to other LLMs).
Interesting. I don't subscribe to Gemini either, but I haven't run into any issues on the free tier.
Not sure what to say other than that I haven't seen that happen in Transcription Pearl or Archive Studio with the Gemini API. I've found that performance with the API is way more consistent.
Looking forward to experimenting with it!