Novel approaches for multimedia data processing : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand

Ji, Wanting

Novel approaches for multimedia data processing : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand

dc.confidential	Embargo :Yes	en_US
dc.contributor.advisor	Ruili, Wang
dc.contributor.author	Ji, Wanting
dc.date.accessioned	2020-10-06T02:34:34Z
dc.date.accessioned	2021-05-10T02:39:47Z
dc.date.available	2020-10-06T02:34:34Z
dc.date.available	2021-05-10T02:39:47Z
dc.date.issued	2020
dc.description.abstract	Multimedia data processing is an active research field contributing to many frontiers of science and technology. It involves the processing of audio, image, video, text, and other forms of data. In this thesis, four novel approaches are proposed to address two key issues in multimedia data processing: (i) how to reduce the annotation costs of sound event classification/tagging, and (ii) how to improve the quality of video captions. To address the issue of how to reduce the annotation costs of sound event classification/tagging, we propose a Gabor dictionary-based active learning (DBAL) approach for semi-automatic sound event classification. In DBAL, sound features are extracted from audio recordings through a Gabor dictionary. Based on the extracted features, sound events in the recordings will be manual or automatic tagged through active learning. Then a classifier is trained by these recordings with their true or predicted labels. Thus, DBAL can be evaluated by the accuracy of the classifier. Further, a learnt dictionary-based active learning (LDAL) approach is proposed to tackle the same issue. In LDAL, a K-SVD learnt dictionary replaces the Gabor dictionary for feature extraction. The same active learning mechanism and classifier are used for tagging and evaluation. Compared with other existing approaches, our approaches (i.e., DBAL and LDAL) achieve higher classification accuracies but require much fewer annotation costs. To tackle the issue of how to improve the quality of video captions, we propose an attention-based dual learning (ADL) approach for video captioning. Two modules (i.e., a caption generation module and a video reconstruction module) are contained in ADL, which are fine-tuned via dual learning. Thus, ADL can enhance the quality of the generated captions by minimizing the differences between raw and reconstructed/reproduced videos. Further, we propose a bidirectional relational recurrent neural network (Bidirectional RRNN) to tackle the same issue. By fully utilizing the local and global context information as well as visual information in videos, Bidirectional RRNN can capture all events in a video, reason the relationships between events, and generate a set of informative sentences to describe video contents. Experimental results on benchmark datasets demonstrate that our approaches (i.e., ADL and Bidirectional RRNN) are superior to the state-of-the-art approaches. In conclusion, this thesis proposes four effective approaches for processing multimedia data. Experimental results show that our approaches outperform the state-of-the-art approaches.	en_US
dc.identifier.uri	http://hdl.handle.net/10179/16333
dc.publisher	Massey University	en_US
dc.rights	The Author	en_US
dc.subject	Multimedia systems	en
dc.subject	Research	en
dc.subject.anzsrc	460399 Computer vision and multimedia computation not elsewhere classified	en
dc.title	Novel approaches for multimedia data processing : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand	en_US
dc.type	Thesis	en_US
massey.contributor.author	Ji, Wanting	en_US
thesis.degree.discipline	Computer Science	en_US
thesis.degree.grantor	Massey University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy (PhD)	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JiPhDThesis.pdf
Size:: 2.44 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Theses and Dissertations