Repository logo
    Info Pages
    Content PolicyCopyright & Access InfoDepositing to MRODeposit LicenseDeposit License SummaryFile FormatsTheses FAQDoctoral Thesis Deposit
    Communities & Collections
    All of MRO
  • English
  • العربية
  • বাংলা
  • Català
  • Čeština
  • Deutsch
  • Ελληνικά
  • Español
  • Suomi
  • Français
  • Gàidhlig
  • हिंदी
  • Magyar
  • Italiano
  • Қазақ
  • Latviešu
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Српски
  • Svenska
  • Türkçe
  • Yкраї́нська
  • Tiếng Việt
Log In
New user? Click here to register using a personal email and password.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Ji, Wanting"

Filter results by typing the first few letters
Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • Loading...
    Thumbnail Image
    Item
    Novel approaches for multimedia data processing : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand
    (Massey University, 2020) Ji, Wanting
    Multimedia data processing is an active research field contributing to many frontiers of science and technology. It involves the processing of audio, image, video, text, and other forms of data. In this thesis, four novel approaches are proposed to address two key issues in multimedia data processing: (i) how to reduce the annotation costs of sound event classification/tagging, and (ii) how to improve the quality of video captions. To address the issue of how to reduce the annotation costs of sound event classification/tagging, we propose a Gabor dictionary-based active learning (DBAL) approach for semi-automatic sound event classification. In DBAL, sound features are extracted from audio recordings through a Gabor dictionary. Based on the extracted features, sound events in the recordings will be manual or automatic tagged through active learning. Then a classifier is trained by these recordings with their true or predicted labels. Thus, DBAL can be evaluated by the accuracy of the classifier. Further, a learnt dictionary-based active learning (LDAL) approach is proposed to tackle the same issue. In LDAL, a K-SVD learnt dictionary replaces the Gabor dictionary for feature extraction. The same active learning mechanism and classifier are used for tagging and evaluation. Compared with other existing approaches, our approaches (i.e., DBAL and LDAL) achieve higher classification accuracies but require much fewer annotation costs. To tackle the issue of how to improve the quality of video captions, we propose an attention-based dual learning (ADL) approach for video captioning. Two modules (i.e., a caption generation module and a video reconstruction module) are contained in ADL, which are fine-tuned via dual learning. Thus, ADL can enhance the quality of the generated captions by minimizing the differences between raw and reconstructed/reproduced videos. Further, we propose a bidirectional relational recurrent neural network (Bidirectional RRNN) to tackle the same issue. By fully utilizing the local and global context information as well as visual information in videos, Bidirectional RRNN can capture all events in a video, reason the relationships between events, and generate a set of informative sentences to describe video contents. Experimental results on benchmark datasets demonstrate that our approaches (i.e., ADL and Bidirectional RRNN) are superior to the state-of-the-art approaches. In conclusion, this thesis proposes four effective approaches for processing multimedia data. Experimental results show that our approaches outperform the state-of-the-art approaches.

Copyright © Massey University  |  DSpace software copyright © 2002-2025 LYRASIS

  • Contact Us
  • Copyright Take Down Request
  • Massey University Privacy Statement
  • Cookie settings
Repository logo COAR Notify