Deep representation learning for action recognition : a dissertation presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Auckland, New Zealand

dc.contributor.authorRen, Jun
dc.date.accessioned2020-10-14T23:47:29Z
dc.date.available2020-10-14T23:47:29Z
dc.date.issued2019
dc.descriptionFigures 2.2 through 2.7, and 2.9 through 2.11 were removed for copyright reasons. Figures 2.8, and 2.12 through 2.16 are licensed on the arXiv repository under a Creative Commons Attribution licence (https://arxiv.org/help/license).en_US
dc.description.abstractThis research focuses on deep representation learning for human action recognition based on the emerging deep learning techniques using RGB and skeleton data. The output of such deep learning techniques is a parameterised hierarchical model, representing the learnt knowledge from the training dataset. It is similar to the knowledge stored in our brain, which is learned from our experience. Currently, the computer’s ability to perform such abstraction is far behind human’s level, perhaps due to the complex processing of spatio-temporal knowledge. The discriminative spatio-temporal representation of human actions is the key for human action recognition systems. Different feature encoding approaches and different learning models may lead to quite different output performances, and at the present time there is no approach that can accurately model the cognitive processing for human actions. This thesis presents several novel approaches to allow computers to learn discriminative, compact and representative spatio-temporal features for human action recognition from multiple input features, aiming at enhancing the performance of an automated system for human action recognition. The input features for the proposed approaches in this thesis are derived from signals that are captured by the depth camera, e.g., RGB video and skeleton data. In this thesis, I developed several geometric features, and proposed the following models for action recognition: CVR-CNN, SKB-TCN, Multi-Stream CNN and STN. These proposed models are inspired by the visual attention mechanisms that are inherently present in human beings. In addition, I discussed the performance of the geometric features that I developed along with the proposed models. Superior experimental results for the proposed geometric features and models are obtained and verified on several benchmarking human action recognition datasets. In the case of the most challenging benchmarking dataset, NTU RGB+D, the accuracy of the results obtained surpassed the performance of the existing RNN-based and ST-GCN models. This study provides a deeper understanding of the spatio-temporal representation of human actions and it has significant implications to explain the inner workings of the deep learning models in learning patterns from time series data. The findings of these proposed models can set forth a solid foundation for further developments, and for the guidance of future human action-related studies.en_US
dc.identifier.urihttp://hdl.handle.net/10179/15732
dc.language.isoenen_US
dc.publisherMassey Universityen_US
dc.rightsThe Authoren_US
dc.subjectHuman activity recognitionen_US
dc.subjectComputer visionen_US
dc.subjectMachine learningen_US
dc.subjectNeural networks (Computer science)en_US
dc.subject.anzsrc0801 Artificial Intelligence and Image Processingen
dc.titleDeep representation learning for action recognition : a dissertation presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Auckland, New Zealanden_US
dc.typeThesisen_US
massey.contributor.authorRen, Jun
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.levelDoctoralen_US
thesis.degree.nameDoctor of Philosophy (PhD)en_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RenPhDThesis.pdf
Size:
22.68 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.32 KB
Format:
Item-specific license agreed upon to submission
Description: