Automated georeferencing of antarctic species

dc.citation.volume208
dc.contributor.authorScott J
dc.contributor.authorStock K
dc.contributor.authorMorgan F
dc.contributor.authorWhitehead B
dc.contributor.authorMedyckyj-Scott D
dc.contributor.editorJanowicz K
dc.contributor.editorVerstegen JA
dc.coverage.spatialPoznań, Poland (Virtual Conference)
dc.date.accessioned2025-06-10T03:00:54Z
dc.date.available2025-06-10T03:00:54Z
dc.date.finish-date2021-09-30
dc.date.issued2021-09-01
dc.date.start-date2021-09-27
dc.description.abstractMany text documents in the biological domain contain references to the toponym of specific phenomena (e.g. species sightings) in natural language form “In <LOCATION> Garwood Valley summer activity was 0.2% for <SPECIES> Umbilicaria aprina and 1.7% for <SPECIES> Caloplaca sp....” While methods have been developed to extract place names from documents, and attention has been given to the interpretation of spatial prepositions, the ability to connect toponym mentions in text with the phenomena to which they refer (in this case species) has been given limited attention, but would be of considerable benefit for the task of mapping specific phenomena mentioned in text documents. As part of work to create a pipeline to automate georeferencing of species within legacy documents, this paper proposes a method to: (1) recognise species and toponyms within text and (2) match each species mention to the relevant toponym mention. Our methods find significant promise in a bespoke rules- and dictionary-based approach to recognise species within text (F1 scores up to 0.87 including partial matches) but less success, as yet, recognising toponyms using multiple gazetteers combined with an off the shelf natural language processing tool (F1 up to 0.62). Most importantly, we offer a contribution to the relatively nascent area of matching toponym references to the object they locate (in our case species), including cases in which the toponym and species are in different sentences. We use tree-based models to achieve precision as high as 0.88 or an F1 score up to 0.68 depending on the downsampling rate. Initial results out perform previous research on detecting entity relationships that may cross sentence boundaries within biomedical text, and differ from previous work in specifically addressing species mapping.
dc.description.confidentialfalse
dc.format.paginationVII-
dc.identifier.citationScott J, Stock K, Morgan F, Whitehead B, Medyckyj-Scott D. (2021). Automated georeferencing of antarctic species. Janowicz K, Verstegen JA. Leibniz International Proceedings in Informatics Lipics. (pp. VII-). Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
dc.identifier.doi10.4230/LIPIcs.GIScience.2021.II.13
dc.identifier.elements-typec-conference-paper-in-proceedings
dc.identifier.isbn978-3-95977-208-2
dc.identifier.issn1868-8969
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/73020
dc.publisherSchloss Dagstuhl – Leibniz-Zentrum für Informatik
dc.publisher.urihttp://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GIScience.2021.II.13
dc.rights(c) 2021 The Author/s
dc.rightsCC BY 4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.source.journalLeibniz International Proceedings in Informatics Lipics
dc.source.name-of-conference11th International Conference on Geographic Information Science (GIScience 2021)
dc.subjectNamed Entity Recognition (NER)
dc.subjectTaxonomic Name Extraction
dc.subjectRelation Extraction
dc.subjectGeoreferencing
dc.titleAutomated georeferencing of antarctic species
dc.typeconference
pubs.elements-id448711
pubs.organisational-groupOther
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
448711 PDF.pdf
Size:
717.46 KB
Format:
Adobe Portable Document Format
Description:
Published version.pdf
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
9.22 KB
Format:
Plain Text
Description: