Massey Documents by Type
Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294
Browse
2 results
Search Results
Item Large Multi-Modal Model Cartographic Map Comprehension for Textual Locality Georeferencing(Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025-08-15) Wijegunarathna K; Stock K; Jones CB; Sila-Nowicka K; Moore A; O’Sullivan D; Adams B; Gahegan MMillions of biological sample records collected in the last few centuries archived in natural history collections are un-georeferenced. Georeferencing complex locality descriptions associated with these collection samples is a highly labour-intensive task collection agencies struggle with. None of the existing automated methods exploit maps that are an essential tool for georeferencing complex relations. We present preliminary experiments and results of a novel method that exploits multimodal capabilities of recent Large Multi-Modal Models (LMM). This method enables the model to visually contextualize spatial relations it reads in the locality description. We use a grid-based approach to adapt these auto-regressive models for this task in a zero-shot setting. Our experiments conducted on a small manually annotated dataset show impressive results for our approach (∼1 km Average distance error) compared to uni-modal georeferencing with Large Language Models and existing georeferencing tools. The paper also discusses the findings of the experiments in light of an LMM's ability to comprehend fine-grained maps. Motivated by these results, a practical framework is proposed to integrate this method into a georeferencing workflow.Item Automated georeferencing of antarctic species(Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021-09-01) Scott J; Stock K; Morgan F; Whitehead B; Medyckyj-Scott D; Janowicz K; Verstegen JAMany text documents in the biological domain contain references to the toponym of specific phenomena (e.g. species sightings) in natural language form “In Garwood Valley summer activity was 0.2% for Umbilicaria aprina and 1.7% for Caloplaca sp....” While methods have been developed to extract place names from documents, and attention has been given to the interpretation of spatial prepositions, the ability to connect toponym mentions in text with the phenomena to which they refer (in this case species) has been given limited attention, but would be of considerable benefit for the task of mapping specific phenomena mentioned in text documents. As part of work to create a pipeline to automate georeferencing of species within legacy documents, this paper proposes a method to: (1) recognise species and toponyms within text and (2) match each species mention to the relevant toponym mention. Our methods find significant promise in a bespoke rules- and dictionary-based approach to recognise species within text (F1 scores up to 0.87 including partial matches) but less success, as yet, recognising toponyms using multiple gazetteers combined with an off the shelf natural language processing tool (F1 up to 0.62). Most importantly, we offer a contribution to the relatively nascent area of matching toponym references to the object they locate (in our case species), including cases in which the toponym and species are in different sentences. We use tree-based models to achieve precision as high as 0.88 or an F1 score up to 0.68 depending on the downsampling rate. Initial results out perform previous research on detecting entity relationships that may cross sentence boundaries within biomedical text, and differ from previous work in specifically addressing species mapping.
