Journal Articles

Permanent URI for this collectionhttps://mro.massey.ac.nz/handle/10179/7915

Browse

Search Results

Now showing 1 - 10 of 15
  • Item
    Real-time human pose estimation and tracking on monocular videos: A systematic literature review
    (Elsevier B V, 2025-11-28) Chen Y; Feng Z; Paes D; Nilsson D; Lovreglio R
    Real-time human pose estimation and tracking on monocular videos is a fundamental task in computer vision with a wide range of applications. Recently, benefiting from deep learning-based methods, it has received impressive progress in performance. Although some works have reviewed and summarised the advancements in this field, few have specifically focused on real-time performance and monocular video-based solutions. The goal of this review is to bridge this gap by providing a comprehensive understanding of real-time monocular video-based human pose estimation and tracking, encompassing both 2D and 3D domains, as well as single-person and multi-person scenarios. To achieve this objective, this paper systematically reviews 68 papers published between 2014 and 2024 to answer six research questions. This review brings new insights into computational efficiency measures and hardware configurations of existing methods. Additionally, this review provides a deep discussion on trade-off strategies for accuracy and efficiency in real-time systems. Finally, this review highlights promising directions for future research and provides practical solutions for real-world applications.
  • Item
    Data-driven virtual sensor systems for dynamic temperature monitoring along food supply chains
    (Elsevier Ltd, 2026-01-01) Duan F; Meng X; Wu W; Zou Y; Zeng X
    Continuous monitoring of perishable food temperatures along supply chains is crucial for quality assurance and reducing food loss and waste. However, cost and installation constraints restrict sensor deployment, compromising the reliability of temperature monitoring. This study proposes a data-driven virtual sensor system that leverages deep learning to integrate multi-source data, enabling temperature estimation at sensor-inaccessible locations and thus reducing dependence on extensive physical sensor deployment. The system was evaluated across postharvest processing, storage, and transport. Results indicate that, with a fixed number of physical sensors, increasing the virtual-to-physical sensor ratio from 16 to 32 maintains the root mean square error below 0.3 °C. Further analysis shows that sensor placement within pallets has minimal impact on performance, whereas the choice of data sources and model architecture exerts a significant influence. Notably, a configuration of one sensor per pallet with a BiLSTM + attention model outperforms shallow networks, demonstrating the potential of data-driven virtual sensor system to enhance temperature monitoring and efficiency along food supply chains.
  • Item
    Accurate machine learning model for human embryo morphokinetic stage detection
    (Springer Science+Business Media, LLC, 2025-08-20) Misaghi H; Cree L; Knowlton N
    Purpose: The ability to detect, monitor, and precisely time the morphokinetic stages of human pre-implantation embryo development plays a critical role in assessing their viability and potential for successful implantation. Therefore, there is a need for accurate and accessible tools to analyse embryos. This work describes a highly accurate, machine learning model designed to predict 17 morphokinetic stages of pre-implantation human development, an improvement on existing models. This model provides a robust tool for researchers and clinicians, enabling the automation of morphokinetic stage prediction, standardising the process, and reducing subjectivity between clinics. Method: A computer vision model was built on a publicly available dataset for embryo Morphokinetic stage detection. The dataset contained 273,438 labelled images based on Embryoscope/ + © embryo images. The dataset was split 70/10/20 into training/validation/test sets. Two different deep learning architectures were trained and tested, one using EfficientNet-V2-Large and the other using EfficientNet-V2-Large with the addition of fertilisation time as input. A new postprocessing algorithm was developed to reduce noise in the predictions of the deep learning model and detect the exact time of each morphokinetic stage change. Results: The proposed model reached an overall test F1-score of 0.881 and accuracy of 87% across 17 morphokinetic stages on an independent test set. Conclusion: The proposed model shows a 17% accuracy improvement, compared to the best models on the same dataset. Therefore, our model can accurately detect morphokinetic stages in static embryo images as well as detecting the exact timings of stage changes in a complete time-lapse video.
  • Item
    Synthetic hyperspectral reflectance data augmentation by generative adversarial network to enhance grape maturity determination
    (Elsevier B V, 2025-08) Lyu H; Grafton M; Ramilan T; Irwin M; Sandoval E
    Non-destructive and rapid grape maturity detection is important for the wine industry. The ongoing development of hyperspectral imaging techniques and deep learning methods has greatly helped in non-destructive assessing of grape quality and maturity, but the performance of deep learning methods depends on the volume and the quality of labeled data for training. Building non-destructive grape quality or maturity testing datasets requires damaging grapes for chemical analysis to produce labels which are time consuming and resource intensive. To solve this problem, this study proposed a conditional Wasserstain Generative Adversarial Network (WGAN) with the gradient penalty data augmentation technique to generate synthetic hyperspectral reflectance data of two grape maturity categories (ripe and unripe) and different Total Soluble Solids (TSS) values. The conditional WGAN with the gradient penalty was trained for a range of epochs: 500, 1000, 2000, 8000, 10,000, and 20,000. After training of 10,000 epochs, synthetic hyperspectral reflectance data were very similar to real spectra for each maturity category and different TSS values. Thereafter, contextual deep three-dimensional CNN (3D-CNN), Spatial Residual Network (SSRN) and Support Vector Machine (SVM) are trained on original training and syn- thetic + original training datasets to classify grape maturity. The synthetic hyperspectral reflectance data, incrementally added to the original training set in steps of 250, 500, 1000, 1500, and 2000 samples, consistently resulted in higher model performance compared to training solely on the original dataset. The best results were achieved by augmenting the training dataset with 2000 synthetic samples and training with a 3D-CNN, yielding a classification accuracy of 91 % on the testing set. To better assess the effectiveness of GAN-based data augmentation methods, two widely used regression models: Partial Least Squares Regression (PLSR) and one-dimensional CNN (1D-CNN) were used based on same data augmentation method. The best result was achieved by adding 250 synthetic samples to the original training set when training 1D-CNN model, yielding an R2 of 0.78, RMSE of 0.63 ◦Brix, and RPIQ of 3.36 on the testing set. This study indicated that deep learning models combined with conditional WGAN with the gradient penalty data augmentation technique had a good application prospect in the grape maturity assessment.
  • Item
    An in-depth survey on Deep Learning-based Motor Imagery Electroencephalogram (EEG) classification
    (Elsevier BV, Netherlands, 2024-01) Wang X; Liesaputra V; Liu Z; Wang Y; Huang Z
    Electroencephalogram (EEG)-based Brain–Computer Interfaces (BCIs) build a communication path between human brain and external devices. Among EEG-based BCI paradigms, the most commonly used one is motor imagery (MI). As a hot research topic, MI EEG-based BCI has largely contributed to medical fields and smart home industry. However, because of the low signal-to-noise ratio (SNR) and the non-stationary characteristic of EEG data, it is difficult to correctly classify different types of MI-EEG signals. Recently, the advances in Deep Learning (DL) significantly facilitate the development of MI EEG-based BCIs. In this paper, we provide a systematic survey of DL-based MI-EEG classification methods. Specifically, we first comprehensively discuss several important aspects of DL-based MI-EEG classification, covering input formulations, network architectures, public datasets, etc. Then, we summarize problems in model performance comparison and give guidelines to future studies for fair performance comparison. Next, we fairly evaluate the representative DL-based models using source code released by the authors and meticulously analyse the evaluation results. By performing ablation study on the network architecture, we found that (1) effective feature fusion is indispensable for multi-stream CNN-based models. (2) LSTM should be combined with spatial feature extraction techniques to obtain good classification performance. (3) the use of dropout contributes little to improving the model performance, and that (4) adding fully connected layers to the models significantly increases their parameters but it might not improve their performance. Finally, we raise several open issues in MI-EEG classification and provide possible future research directions.
  • Item
    Potential rapid intraoperative cancer diagnosis using dynamic full-field optical coherence tomography and deep learning: A prospective cohort study in breast cancer patients
    (Elsevier B V on behalf of the Science China Press, 2024-06-15) Zhang S; Yang B; Yang H; Zhao J; Zhang Y; Gao Y; Monteiro O; Zhang K; Liu B; Wang S
    An intraoperative diagnosis is critical for precise cancer surgery. However, traditional intraoperative assessments based on hematoxylin and eosin (H&E) histology, such as frozen section, are time-, resource-, and labor-intensive, and involve specimen-consuming concerns. Here, we report a near-real-time automated cancer diagnosis workflow for breast cancer that combines dynamic full-field optical coherence tomography (D-FFOCT), a label-free optical imaging method, and deep learning for bedside tumor diagnosis during surgery. To classify the benign and malignant breast tissues, we conducted a prospective cohort trial. In the modeling group (n = 182), D-FFOCT images were captured from April 26 to June 20, 2018, encompassing 48 benign lesions, 114 invasive ductal carcinoma (IDC), 10 invasive lobular carcinoma, 4 ductal carcinoma in situ (DCIS), and 6 rare tumors. Deep learning model was built up and fine-tuned in 10,357 D-FFOCT patches. Subsequently, from June 22 to August 17, 2018, independent tests (n = 42) were conducted on 10 benign lesions, 29 IDC, 1 DCIS, and 2 rare tumors. The model yielded excellent performance, with an accuracy of 97.62%, sensitivity of 96.88% and specificity of 100%; only one IDC was misclassified. Meanwhile, the acquisition of the D-FFOCT images was non-destructive and did not require any tissue preparation or staining procedures. In the simulated intraoperative margin evaluation procedure, the time required for our novel workflow (approximately 3 min) was significantly shorter than that required for traditional procedures (approximately 30 min). These findings indicate that the combination of D-FFOCT and deep learning algorithms can streamline intraoperative cancer diagnosis independently of traditional pathology laboratory procedures.
  • Item
    Towards asteroid detection in microlensing surveys with deep learning
    (Elsevier B.V., 2023-01-30) Cowan P; Bond IA; Reyes NH
    Asteroids are an indelible part of most astronomical surveys though only a few surveys are dedicated to their detection. Over the years, high cadence microlensing surveys have amassed several terabytes of data while scanning primarily the Galactic Bulge and Magellanic Clouds for microlensing events and thus provide a treasure trove of opportunities for scientific data mining. In particular, numerous asteroids have been observed by visual inspection of selected images. This paper presents novel deep learning-based solutions for the recovery and discovery of asteroids in the microlensing data gathered by the MOA project. Asteroid tracklets can be clearly seen by combining all the observations on a given night and these tracklets inform the structure of the dataset. Known asteroids were identified within these composite images and used for creating the labelled datasets required for supervised learning. Several custom CNN models were developed to identify images with asteroid tracklets. Model ensembling was then employed to reduce the variance in the predictions as well as to improve the generalization error, achieving a recall of 97.67%. Furthermore, the YOLOv4 object detector was trained to localize asteroid tracklets, achieving a mean Average Precision (mAP) of 90.97%. These trained networks will be applied to 16 years of MOA archival data to find both known and unknown asteroids that have been observed by the survey over the years. The methodologies developed can be adapted for use by other surveys for asteroid recovery and discovery.
  • Item
    DeepSIM: a novel deep learning method for graph similarity computation
    (Springer-Verlag GmbH, 2024-01) Liu B; Wang Z; Zhang J; Wu J; Qu G
    Abstract: Graphs are widely used to model real-life information, where graph similarity computation is one of the most significant applications, such as inferring the properties of a compound based on similarity to a known group. Definition methods (e.g., graph edit distance and maximum common subgraph) have extremely high computational cost, and the existing efficient deep learning methods suffer from the problem of inadequate feature extraction which would have a bad effect on similarity computation. In this paper, a double-branch model called DeepSIM was raised to deeply mine graph-level and node-level features to address the above problems. On the graph-level branch, a novel embedding relational reasoning network was presented to obtain interaction between pairwise inputs. Meanwhile, a new local-to-global attention mechanism is designed to improve the capability of CNN-based node-level feature extraction module on another path. In DeepSIM, double-branch outputs will be concatenated as the final feature. The experimental results demonstrate that our methods perform well on several datasets compared to the state-of-the-art deep learning models in related fields.
  • Item
    A physically informed multi-scale deep neural network for estimating foliar nitrogen concentration in vegetation
    (Elsevier B.V., 2024-05-28) Dehghan-Shoar MH; Kereszturi G; Pullanagari RR; Orsi AA; Yule IJ; Hanly J
    This study introduces a Physically Informed Deep Neural Network (PINN) that leverages spectral data and Radiative Transfer Model insights to improve nitrogen concentration estimation in vegetation, addressing the complexities of physical processes. Utilizing a comprehensive spectroscopy dataset from various species across dry/ground (n = 2010), leaf (n = 1512), and canopy (n = 6007) scales, the study identifies 13 spectral bands key for chlorophyll and protein quantification. Key bands at 2276 nm, 755 nm, 1526 nm, 2243 nm, and 734 nm emerged vital for accurate N% prediction. The PINN outperforms partial least squares regression and standard deep neural networks, achieving an R2 of 0.71 and an RMSE of 0.42 (%N) on an independent validation set. Results indicate dry/ground data performed best (R2 = 0.9, RMSE = 0.24 %N), with leaf and canopy data showing lower efficacy (R2 = 0.67, RMSE = 0.45 %N; R2 = 0.65, RMSE = 0.46 %N, respectively). This multi-scale approach provides insights into spectral and N% relationships, enabling precise estimation across vegetation types and facilitating the development of transferable models. The PINN offers a new avenue for analyzing remote sensing data, demonstrating the significant potential for accurate, scale-spanning N% estimation in vegetation. Further enriching our analysis, the inclusion of seasonal data significantly enhanced our model's performance in field spectroscopy, with notable improvements observed across summer, spring, autumn, and winter. This adjustment underlines the model's increased accuracy and predictive capability at the field spectroscopy scale, emphasizing the vital role of integrating environmental factors, including climatic and physiological aspects, in future research.
  • Item
    DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning
    (BioMed Central Ltd, 2023-12) Wu J; Liu B; Zhang J; Wang Z; Li J
    PURPOSE: Sequenced Protein-Protein Interaction (PPI) prediction represents a pivotal area of study in biology, playing a crucial role in elucidating the mechanistic underpinnings of diseases and facilitating the design of novel therapeutic interventions. Conventional methods for extracting features through experimental processes have proven to be both costly and exceedingly complex. In light of these challenges, the scientific community has turned to computational approaches, particularly those grounded in deep learning methodologies. Despite the progress achieved by current deep learning technologies, their effectiveness diminishes when applied to larger, unfamiliar datasets. RESULTS: In this study, the paper introduces a novel deep learning framework, termed DL-PPI, for predicting PPIs based on sequence data. The proposed framework comprises two key components aimed at improving the accuracy of feature extraction from individual protein sequences and capturing relationships between proteins in unfamiliar datasets. 1. Protein Node Feature Extraction Module: To enhance the accuracy of feature extraction from individual protein sequences and facilitate the understanding of relationships between proteins in unknown datasets, the paper devised a novel protein node feature extraction module utilizing the Inception method. This module efficiently captures relevant patterns and representations within protein sequences, enabling more informative feature extraction. 2. Feature-Relational Reasoning Network (FRN): In the Global Feature Extraction module of our model, the paper developed a novel FRN that leveraged Graph Neural Networks to determine interactions between pairs of input proteins. The FRN effectively captures the underlying relational information between proteins, contributing to improved PPI predictions. DL-PPI framework demonstrates state-of-the-art performance in the realm of sequence-based PPI prediction.