MM5: Multimodal image capture and dataset generation for RGB, depth, thermal, UV, and NIR

dc.citation.issuePart A
dc.citation.volume126
dc.contributor.authorBrenner M
dc.contributor.authorReyes NH
dc.contributor.authorSusnjak T
dc.contributor.authorBarczak ALC
dc.date.accessioned2025-08-27T01:52:07Z
dc.date.available2025-08-27T01:52:07Z
dc.date.issued2026-02-01
dc.description.abstractExisting multimodal datasets often lack sufficient modality diversity, raw data preservation, and flexible annotation strategies, seldom addressing modality-specific cues across multiple spectral channels. Current annotations typically concentrate on pre-aligned images, neglecting unaligned data and overlooking crucial cross-modal alignment challenges. These constraints significantly impede advanced multimodal fusion research, especially when exploring modality-specific features or adaptable fusion methodologies. To address these limitations, we introduce MM5, a comprehensive dataset integrating RGB, depth, thermal (T), ultraviolet (UV), and near-infrared (NIR) modalities. Our capturing system utilises off-the-shelf components, incorporating stereo RGB-D imaging to provide additional depth and intensity (I) information, enhancing spatial perception and facilitating robust cross-modal learning. MM5 preserves depth and thermal measurements in raw, 16-bit formats, enabling researchers to explore advanced preprocessing and enhancement techniques. Additionally, we propose a novel label re-projection algorithm that generates ground-truth annotations directly for distorted thermal and UV modalities, supporting complex fusion strategies beyond strictly aligned data. Dataset scenes encompass varied lighting conditions (e.g. shadows, dim lighting, overexposure) and diverse objects, including real fruits, plastic replicas, and partially rotten produce, creating challenging scenarios for robust multimodal analysis. We evaluate the effects of multi-bit representations, adaptive gain control (AGC), and depth preprocessing on a transformer-based segmentation network. Our preprocessing improved mean IoU from 70.66% to 76.33% for depth data and from 72.67% to 79.08% for thermal encoding, using our novel preprocessing techniques, validating MM5’s efficacy in supporting comprehensive multimodal fusion research.
dc.description.confidentialfalse
dc.edition.editionFebruary 2026
dc.identifier.citationBrenner M, Reyes NH, Susnjak T, Barczak ALC. (2026). MM5: Multimodal image capture and dataset generation for RGB, depth, thermal, UV, and NIR. Information Fusion. 126. Part A.
dc.identifier.doi10.1016/j.inffus.2025.103516
dc.identifier.eissn1872-6305
dc.identifier.elements-typejournal-article
dc.identifier.issn1566-2535
dc.identifier.number103516
dc.identifier.piiS1566253525005883
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/73428
dc.languageEnglish
dc.publisherElsevier B V
dc.publisher.urihttps://www.sciencedirect.com/science/article/pii/S1566253525005883
dc.relation.isPartOfInformation Fusion
dc.rights(c) 2025 The Author/s
dc.rightsCC BY 4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectMultimodal dataset
dc.subjectThermal imaging
dc.subjectUV imagingP
dc.subjectPreprocessing
dc.subjectSensor fusion
dc.subjectDataset annotation
dc.titleMM5: Multimodal image capture and dataset generation for RGB, depth, thermal, UV, and NIR
dc.typeJournal article
pubs.elements-id502698
pubs.organisational-groupOther
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
502698 PDF.pdf
Size:
4.49 MB
Format:
Adobe Portable Document Format
Description:
Published version.pdf
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
9.22 KB
Format:
Plain Text
Description:
Collections