MMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection

dc.citation.volume242
dc.contributor.authorZhang W
dc.contributor.authorShi H
dc.contributor.authorZhao Y
dc.contributor.authorFeng Z
dc.contributor.authorLovreglio R
dc.date.accessioned2024-08-04T23:09:56Z
dc.date.available2024-08-04T23:09:56Z
dc.date.issued2023-12-05
dc.description.abstractIn this paper, we propose a 3D object detection method called MMAF-Net that is based on the multi-view and multi-stage adaptive fusion of RGB images and LiDAR point cloud data. This is an end-to-end architecture, which combines the characteristics of RGB images, the front view of point clouds based on reflection intensity, and the bird's eye view of point clouds. It also adopts a multi-stage fusion approach of “data-level fusion + feature-level fusion” to fully exploit the strength of multimodal information. Our proposed method addresses key challenges found in current 3D object detection methods for autonomous driving, including insufficient feature extraction from multimodal data, rudimentary fusion techniques, and sensitivity to distance and occlusion. To ensure the comprehensive integration of multimodal information, we present a series of targeted fusion methods. Firstly, we propose a novel input form that encodes dense point cloud reflectivity information into the image to enhance its representational power. Secondly, we design the Region Attention Adaptive Fusion module utilizing an attention mechanism to guide the network in adaptively adjusting the importance of different features. Finally, we extend the 2D DIOU (Distance Intersection over Union) loss function to 3D and develop a joint regression loss based on 3D_DIOU and SmoothL1 to optimize the similarity between detected and ground truth boxes. The experimental results on the KITTI dataset demonstrate that MMAF-Net effectively addresses the challenges posed by highly obscured or crowded scenes while maintaining real-time performance and improving the detection accuracy of smaller and more difficult objects that are occluded at far distances.
dc.description.confidentialfalse
dc.identifier.citationZhang W, Shi H, Zhao Y, Feng Z, Lovreglio R. (2024). MMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection. Expert Systems with Applications. 242.
dc.identifier.doi10.1016/j.eswa.2023.122716
dc.identifier.elements-typejournal-article
dc.identifier.issn0957-4174
dc.identifier.number122716
dc.identifier.piiS0957417423032189
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/71183
dc.languageEnglish
dc.publisherElsevier B.V.
dc.publisher.urihttps://www.sciencedirect.com/science/article/pii/S0957417423032189
dc.relation.isPartOfExpert Systems with Applications
dc.rights(c) 2023 The Author/s
dc.rightsCC BY 4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject3D object detection
dc.subjectMulti-sensor fusion
dc.subjectAttention mechanism
dc.subjectJoint regression loss
dc.subjectAutonomous driving
dc.titleMMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection
dc.typeJournal article
pubs.elements-id485229
pubs.organisational-groupCollege of Health
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Published version.pdf
Size:
2.44 MB
Format:
Adobe Portable Document Format
Description:
485229 PDF.pdf
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
9.22 KB
Format:
Plain Text
Description:
Collections