Salient Object Detection for complex scenes : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, School of Mathematical and Computational Sciences, Massey University, Albany, Auckland, New Zealand
Loading...

Files
Date
2024-09-19
DOI
Open Access Location
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Massey University
Rights
© The Author
Abstract
Salient Object Detection (SOD), a primary objective in computer vision, aims to locate and segment the region most visually striking within an image. In this thesis, we present three innovative methods based on deep learning to improve SOD performance in complex scenes. Firstly, we introduce the Multiple Enhancement Network (MENet), inspired by boundary perception, gradual enhancement, frequency decomposition, and content integrity of the Human Visual System (HVS). We propose a flexible multi-scale feature enhancement module to aggregate and refine features and use iterative training to improve boundary and adaptive features in the dual-branch decoder of MENet. A multi-level hybrid loss guides the network in learning pixel-, region-, and object-level features. Evaluations of benchmark datasets show that MENet outperforms other SOD models, especially when the salient region has multiple objects with varied appearances or complex shapes. Secondly, we propose TFGNet, an effective frequency-guided network for saliency detection based on Transformer. TFGNet has a parallel two-branch decoder, which leverages a pixel-wise decoder and a Transformer decoder to optimise high-spatial frequency boundary details and low-spatial frequency salient features. A novel loss is also designed to use frequency distribution similarity measurement to further improve performance. The experimental results indicate that TFGNet can accurately locate salient objects with more complete and precise boundaries on various complex backgrounds. This framework also rekindles awareness of the advantages of exploiting images' spatial frequency features in SOD. Thirdly, we design a multi-source weakly supervised SOD (WSOD) framework that can effectively utilise pseudo-background (non-salient region) labels combined with scribble labels to obtain more accurate salient features. We first create a comprehensive salient pseudo-mask generator from multiple self-learning features. Also, we pioneer the exploration of generating salient pseudo-labels via point-prompted and box-prompted Segment-Anything Models (SAM). Then, a Transformer-based WSOD network named WBNet is proposed, which leverages pixel-decoder and transformer-decoder with auxiliary edge predictor with multi-source loss function to handle complex saliency detection tasks. In summary, we contribute three novel approaches to address salient object detection in complex scenes. Each model achieves cutting-edge performance across prestigious datasets validated through comprehensive experiments.
Description
Keywords
Salient Object Detection, deep learning, weakly supervised feature learning, computer vision