Massey Research Online - Browsing by Author "Qu G"

Browsing by Author "Qu G"

Now showing 1 - 5 of 5

Cascaded Segmented Matting Network for Human Matting
(IEEE, 2021-11-04) Liu B; Jing H; Qu G; Guesgen HW; Raval MS
Human matting, high quality extraction of humans from natural images, is crucial for a wide variety of applications such as virtual reality, augmented reality, entertainment and so on. Since the matting problem is an ill-posed problem, most previous methods rely on extra user inputs such as trimap or scribbles as guidance to estimate alpha value for the pixels that are in the unknown region of the trimap. This phenomenon makes it difficult to be applied to large scale data. In order to solve these problems, we studied the unique role of semantics and details in image matting, and decomposed the matting task into two sub-tasks: trimap segmentation based on high-level semantic information and alpha regression based on low-level detailed information. Specifically, we proposed a novel Cascaded Segmented Matting Network (CSMNet), which uses a shared encoder and two separate decoders to learn these two tasks in a collaborative way to achieve the end-to-end human image matting. In addition, we established a large-scale dataset with 14,000 fine-labeled human matting images. A background dataset is also built to simulate real pictures. Comprehensive empirical studies on above datasets demonstrate that CSMNet could produce a stable and accurate alpha matte without the input of trimap and achieve an evaluation value that is comparable to the algorithm that requires trimap.
DeepSIM: a novel deep learning method for graph similarity computation
(Springer-Verlag GmbH, 2024-01) Liu B; Wang Z; Zhang J; Wu J; Qu G
Abstract: Graphs are widely used to model real-life information, where graph similarity computation is one of the most significant applications, such as inferring the properties of a compound based on similarity to a known group. Definition methods (e.g., graph edit distance and maximum common subgraph) have extremely high computational cost, and the existing efficient deep learning methods suffer from the problem of inadequate feature extraction which would have a bad effect on similarity computation. In this paper, a double-branch model called DeepSIM was raised to deeply mine graph-level and node-level features to address the above problems. On the graph-level branch, a novel embedding relational reasoning network was presented to obtain interaction between pairwise inputs. Meanwhile, a new local-to-global attention mechanism is designed to improve the capability of CNN-based node-level feature extraction module on another path. In DeepSIM, double-branch outputs will be concatenated as the final feature. The experimental results demonstrate that our methods perform well on several datasets compared to the state-of-the-art deep learning models in related fields.
Efficient Limb Range of Motion Analysis from a Monocular Camera for Edge Devices.
(MDPI (Basel, Switzerland), 2025-01-22) Yan X; Zhang L; Liu B; Qu G; Amerini I; Russo P; Di Ciaccio F
Traditional limb kinematic analysis relies on manual goniometer measurements. With computer vision advancements, integrating RGB cameras can minimize manual labor. Although deep learning-based cameras aim to offer the same ease as manual goniometers, previous approaches have prioritized accuracy over efficiency and cost on PC-based devices. Nevertheless, healthcare providers require a high-performance, low-cost, camera-based tool for assessing upper and lower limb range of motion (ROM). To address this, we propose a lightweight, fast, deep learning model to estimate a human pose and utilize predicted joints for limb ROM measurement. Furthermore, the proposed model is optimized for deployment on resource-constrained edge devices, balancing accuracy and the benefits of edge computing like cost-effectiveness and localized data processing. Our model uses a compact neural network architecture with 8-bit quantized parameters for enhanced memory efficiency and reduced latency. Evaluated on various upper and lower limb tasks, it runs 4.1 times faster and is 15.5 times smaller than a state-of-the-art model, achieving satisfactory ROM measurement accuracy and agreement with a goniometer. We also conduct an experiment on a Raspberry Pi, illustrating that the method can maintain accuracy while reducing equipment and energy costs. This result indicates the potential for deployment on other edge devices and provides the flexibility to adapt to various hardware environments, depending on diverse needs and resources.
Efficient Monocular Human Pose Estimation Based on Deep Learning Methods: A Survey
(IEEE, 2024-05-09) Yan X; Liu B; Qu G
Human pose estimation (HPE) is a crucial computer vision task with a wide range of applications in sports medicine, healthcare, virtual reality, and human-computer interaction. The demand for real-time HPE solutions necessitates the development of efficient deep-learning models that can be deployed on resource-constrained devices. While a few surveys exist in this area, none delve deeply into the critical intersection of efficiency and performance. This survey reviews the state-of-the-art efficient deep learning approaches for real-time HPE, focusing on strategies for improving efficiency without compromising accuracy. We discuss popular backbone networks for HPE, model compression techniques, network pruning and quantization, knowledge distillation, and neural architecture search methods. Furthermore, we critically analyze the existing works, highlighting their strengths, weaknesses, and applicability to different scenarios. We also present an overview of the evaluation datasets, metrics, and design for efficient HPE. Finally, we identify research gaps and challenges in the field, providing insights and recommendations for future research directions in developing efficient and scalable HPE solutions.
Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
(Springer Nature Switzerland AG, 2025-05) Yang B; Ding L; Li J; Li Y; Qu G; Wang J; Wang Q; Liu B
Digital medical imaging, particularly pathology images, is essential for cancer diagnosis but faces challenges in direct model training due to its super-resolution nature. Although weakly supervised learning has reduced the need for manual annotations, many multiple instance learning (MIL) methods struggle to effectively capture crucial spatial relationships in histopathological images. Existing methods incorporating positional information often overlook nuanced spatial correlations or use positional encoding strategies that do not fully capture the unique spatial dynamics of pathology images. To address this issue, we propose a new framework named TMIL (Transformer-based Multiple Instance Learning Network with 2D positional encoding), which leverages multiple instance learning for weakly supervised classification of histopathological images. TMIL incorporates a 2D positional encoding module, based on the Transformer, to model positional information and explore correlations between instances. Furthermore, TMIL divides histopathological images into pseudo-bags and trains patch-level feature vectors with deep metric learning to enhance classification performance. Finally, the proposed approach is evaluated on a public colorectal adenoma dataset. The experimental results show that TMIL outperforms existing MIL methods, achieving an AUC of 97.28% and an ACC of 95.19%. These findings suggest that TMIL’s integration of deep metric learning and positional encoding offers a promising approach for improving the efficiency and accuracy of pathology image analysis in cancer diagnosis.

Browsing by Author "Qu G"

Results Per Page

Sort Options