Massey Documents by Type
Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294
Browse
2 results
Search Results
Item Deep learning for video salient object detection : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, School of Mathematical and Computational Sciences, Massey University, Albany, Auckland, New Zealand(Massey University, 2025-09-23) Jiang, TaoVideo Salient Object Detection (VSOD) is a fundamental task in video analysis, focusing on identifying and segmenting the most visually prominent objects in dynamic scenes. In this thesis, we propose three novel deep learning-based approaches to enhance VSOD performance in complex environments. Firstly, we introduce an Inheritance Enhancement Network (IENet), a Transformer-based framework designed to improve the integration of long-term spatial-temporal dependencies. We propose a unidirectional cross-frame enhancement mechanism, ensuring consistent and orderly information propagation across frames while minimizing interference. Extensive experiments demonstrate that IENet significantly improves detection accuracy in complex video scenes. Secondly, we present a Knowledge-sharing Hierarchical Memory Fusion Network (KHMF-Net) to address the challenges of scribble-supervised VSOD, where annotations are sparse and ambiguous. Our approach utilizes a memory bank-based hierarchical encoder-decoder architecture to reduce error accumulation and mitigate background distractions. Additionally, we introduce a dual-attention knowledge-sharing strategy, enabling the model to refine predictions by leveraging learned information across frames. This approach effectively enhances feature consistency and improves the separation of foreground and background objects. Thirdly, we propose the Multimodal Energy Prompting Network (MEPNet), which leverages optical flow and depth as implicit prompts to fine-tune a Segment Anything Model (SAM) for improved VSOD performance. We first introduce a Spectrogram Energy Generator (SEG), which extracts energy-driven prompts that enhance the spatial-temporal representation of salient objects. Furthermore, the Modality-Energy Adapter (MEA) effectively integrates these prompts into SAM, improving the model's ability to capture motion and structural cues. Extensive evaluations show that MEPNet effectively incorporates multimodal information, resulting in more robust and precise VSOD outcomes. In summary, we propose three innovative approaches to improve VSOD in complex scenes. Each method undergoes extensive evaluation on benchmark datasets, achieving superior performance compared to existing state-of-the-art models. Our contributions provide new insights into advancing video-based salient object detection, paving the way for more robust and efficient VSOD frameworks.Item Graph learning and its applications : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, Albany, Auckland, New Zealand(Massey University, 2022) Hu, RongyaoSince graph features consider the correlations between two data points to provide high-order information, i.e., more complex correlations than the low-order information which considers the correlations in the individual data, they have attracted much attention in real applications. The key of graph feature extraction is the graph construction. Previous study has demonstrated that the quality of the graph usually determines the effectiveness of the graph feature. However, the graph is usually constructed from the original data which often contain noise and redundancy. To address the above issue, graph learning is designed to iteratively adjust the graph and model parameters so that improving the quality of the graph and outputting optimal model parameters. As a result, graph learning has become a very popular research topic in traditional machine learning and deep learning. Although previous graph learning methods have been applied in many fields by adding a graph regularization to the objective function, they still have some issues to be addressed. This thesis focuses on the study of graph learning aiming to overcome the drawbacks in previous methods for different applications. We list the proposed methods as follows. • We propose a traditional graph learning method under supervised learning to consider the robustness and the interpretability of graph learning. Specifically, we propose utilizing self-paced learning to assign important samples with large weights, conducting feature selection to remove redundant features, and learning a graph matrix from the low dimensional data of the original data to preserve the local structure of the data. As a consequence, both important samples and useful features are used to select support vectors in the SVM framework. • We propose a traditional graph learning method under semi-supervised learning to explore parameter-free fusion of graph learning. Specifically, we first employ the discrete wavelet transform and Pearson correlation coefficient to obtain multiple fully connected Functional Connectivity brain Networks (FCNs) for every subject, and then learn a sparsely connected FCN for every subject. Finally, the ℓ1-SVM is employed to learn the important features and conduct disease diagnosis. • We propose a deep graph learning method to consider graph fusion of graph learning. Specifically, we first employ the Simple Linear Iterative Clustering (SLIC) method to obtain multi-scale features for every image, and then design a new graph fusion method to fine-tune features of every scale. As a result, the multi-scale feature fine-tuning, graph learning, and feature learning are embedded into a unified framework. All proposed methods are evaluated on real-world data sets, by comparing to state-of-the-art methods. Experimental results demonstrate that our methods outperformed all comparison methods.
