22–25 Sept 2025
GW1 Uni Bremen
Europe/Berlin timezone

LLM-assisted transcription and correction of herbarium labels

23 Sept 2025, 16:15
5m
H0070 (GW1-HS)

H0070

GW1-HS

Kurzvortrag/Lightning Talk Anwendung Kurzvorträge

Speaker

Ilja Bezrukov (Max-Planck-Institut für Biologie Tübingen)

Description

Plant specimens from the herbarium of Zimbabwe were digitized in an ongoing project to preserve the data and build a resource to investigate changing distribution patterns.

To have access to metadata, the labels needed to be transcribed, which proved a challenge for standard OCR software. The labels date back to the beginning of the 20th century and are mostly handwritten. Using an LLM (gpt-4o) with a custom-designed prompt to produce JSON output yielded highly reliable results which required little manual correction. Geolocation was performed using the extracted metadata via Google Maps Geocoding API.
A user interface for manual curation was implemented as a Jupyter notebook.

The image transcription cost was 0.011$ per image, the geocoding requests were below the free cap and therefore did not incur costs.
Only moderate computational resources were required, with image digitization and curation performed on site in Zimbabwe, using an off-the-shelf windows laptop and a custom-designed photo station with a high-resolution digital camera. The project provides a workflow for a cost-effective digitization and metadata extraction for herbaria in low income countries.

Zustimmung zu Streaming/Agree to streaming ja/yes
Zustimmung zur Bereitstellung von Aufzeichnung/Agree to internal publication of recording ja/yes

Authors

Ms Langalenkosi Gatula (Department of Biological Science, University of Zimbabwe) Ilja Bezrukov (Max-Planck-Institut für Biologie Tübingen)

Co-authors

Christopher Chapano (National Herbarium and Botanic Garden, Department of Research and Specialist Services, Harare, Zimbabwe) Patience Chatukuta (Max-Planck-Institut für Biologie Tübingen) Clemence Zimudzi (Department of Biological Science, University of Zimbabwe)

Presentation materials

There are no materials yet.