Massey Documents by Type

Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294

Browse

Search Results

Now showing 1 - 5 of 5
  • Item
    Novel approaches for multimedia data processing : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand
    (Massey University, 2020) Ji, Wanting
    Multimedia data processing is an active research field contributing to many frontiers of science and technology. It involves the processing of audio, image, video, text, and other forms of data. In this thesis, four novel approaches are proposed to address two key issues in multimedia data processing: (i) how to reduce the annotation costs of sound event classification/tagging, and (ii) how to improve the quality of video captions. To address the issue of how to reduce the annotation costs of sound event classification/tagging, we propose a Gabor dictionary-based active learning (DBAL) approach for semi-automatic sound event classification. In DBAL, sound features are extracted from audio recordings through a Gabor dictionary. Based on the extracted features, sound events in the recordings will be manual or automatic tagged through active learning. Then a classifier is trained by these recordings with their true or predicted labels. Thus, DBAL can be evaluated by the accuracy of the classifier. Further, a learnt dictionary-based active learning (LDAL) approach is proposed to tackle the same issue. In LDAL, a K-SVD learnt dictionary replaces the Gabor dictionary for feature extraction. The same active learning mechanism and classifier are used for tagging and evaluation. Compared with other existing approaches, our approaches (i.e., DBAL and LDAL) achieve higher classification accuracies but require much fewer annotation costs. To tackle the issue of how to improve the quality of video captions, we propose an attention-based dual learning (ADL) approach for video captioning. Two modules (i.e., a caption generation module and a video reconstruction module) are contained in ADL, which are fine-tuned via dual learning. Thus, ADL can enhance the quality of the generated captions by minimizing the differences between raw and reconstructed/reproduced videos. Further, we propose a bidirectional relational recurrent neural network (Bidirectional RRNN) to tackle the same issue. By fully utilizing the local and global context information as well as visual information in videos, Bidirectional RRNN can capture all events in a video, reason the relationships between events, and generate a set of informative sentences to describe video contents. Experimental results on benchmark datasets demonstrate that our approaches (i.e., ADL and Bidirectional RRNN) are superior to the state-of-the-art approaches. In conclusion, this thesis proposes four effective approaches for processing multimedia data. Experimental results show that our approaches outperform the state-of-the-art approaches.
  • Item
    A story environment for learning object annotation and collection : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand
    (Massey University, 2005) Chen, Tianjiao
    With the increase in computer power, network bandwidth and availability, e-learning is used more and more widely. In practice e-learning can be applied in a variety of ways, such as providing electronic resources to support teaching and learning, developing computer based tutoring programs or building computer supported collaborative learning environments. Nowadays e-learning becomes significantly important because it can improve the quality of learning through using interactive computers, online communications and information systems in ways that other teaching methods cannot achieve. The important advantage of e-learning is that it offers learners a large amount of sharable and reusable learning resources. The current approaches such as Internet search and learning object repository does not effectively help users to search for appropriate learning objects. The original story concept introduces a new semantic layer between collections of learning objects and learning material. The basic idea of the story concept is to add an interpretative, semantically rich layer, informally called 'Story' between learning objects and learning material that links learning objects according to specific themes and subjects (Heinrich & Andres, 2003a). One motivation behind this approach is to put a more focused, semantic layer on top of untargeted metadata that are commonly used to describe a single learning object. Speaking from an e-learning context the stories build on learning objects and become information resources for learning material. The overall aim of this project was to design and build a story environment to realize the above story concept. The development of the story environment includes story metadata, story environment components, the story browsing and authoring processes, and tools involved in story browsing and authoring. The story concept suggests different types of metadata should be used in a story. This project developed those different metadata specifications to support story environment. Two prototypes of tools have been designed and implemented in this project to allow users to evaluate the story concept and story environment. The story browser helps story readers to read the story narrative and look at a story from different perspectives. The story authoring tool is used by the story authors to author a story. The future work of this project has been identified in the area of adding features of current tools, user testing and further implementation of the story environment.
  • Item
    An investigation into teaching description and retrieval for constructed languages : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University
    (Massey University, 2004) Hoang, Son
    The research presented in this thesis focuses on an investigation on teaching concepts for constructed languages, and the development of a teaching tool, called VISL, for teaching a specific constructed language. Constructed languages have been developed for integration with computer systems to overcome ambiguities and complexities existing in natural language in information description and retrieval. Understanding and using properly these languages is one of the keys for successful use of these computer systems Unfortunately, current teaching approaches are not suitable for users to learn features of those languages easily. There are different types of constructed languages. Each has specific features adapted for specific uses but they have in common explicitly constructed grammar. In addition, a constructed language commonly embeds a powerful query engine that makes it easy for computer systems to search for correct information from descriptions following the conditions of the queries. This suggests new teaching principles that should be easily adaptable to teach any specific structured language's structures and its specific query engine. In this research, teaching concepts were developed that offer a multi-modal approach to teach constructed languages and their specific query engines. These concepts are developed based on the efficiencies of language structure diagrams over the cumbersome and non-transparent nature of textual explanations, and advantages of active learning strategies in enhancing language understanding. These teaching concepts then were applied successfully for a constructed language, FSCL, as an example The research also explains howr the concepts developed can be adapted for other constructed languages. Based on the developed concepts, a Computer Aided Language Learning (CALL) application called VISL is built to teach FSCL. The application is integrated as an extension module in PAC, the computer system using FSCL for description and retrieval of information in qualitative analysis. In this application, users will learn FSCL through an interconnection of four modes: FSCL structures through the first two modes and its specific query engine through the sccond two modes After going through four modes, users will have developed full understanding for the language. This will help users to construct a consistent vocabulary database, produce descriptive sentences conducive to retrieval, and create appropriate query sentences for obtaining relevant search results.
  • Item
    An investigation into multimedia local area networks : a thesis presented in partial fulfilment of the requirements for the degree of Master of Technology in Information Engineering at Massey University
    (Massey University, 1997) Kumar, Susarla Udaya
    In this thesis the performance of the Multimedia Local Asynchronous Transfer Mode Network (MLAN) protocol is evaluated by a computer simulation method using voice and data source models. SIMSCRIPT II.5, a discrete event simulation language is used for the simulation. In addition, Fiber Distributed Data Interface (FDDI) and Fast Ethernet networks were simulated for data traffic and their performance is evaluated using COMNET III, a communication network simulation package. The main aim of this work is to evaluate the performance of the MLAN and to analyse the suitability of MLAN for Multimedia Traffic. The work is further extended by comparing the performance of MLAN with FDDI and Fast Ethernet LANs. Simulation results show that MLAN protocol has some potential to operate as a Multimedia LAN. However, analysis shows that some modification of the protocol is required to increase the bandwidth utilisation.
  • Item
    Towards a synthesis of multimedia and intelligent tutoring systems : a dissertation presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University
    (Massey University, 1998) Vethanayagam, Emma A. K
    Multimedia is being used in almost every field. This study is about the use of multimedia in the area of intelligent tutoring systems. This project studies the advantages and disadvantages of interactive multimedia and intelligent tutoring systems, and analyses the ways of combining these technologies in search of an interesting, learnable, flexible, compelling and technology-enhanced educational tool. Educational packages need to be evaluated for effectiveness. When it comes to computer-based instruction, technical concerns such as multimedia effects are taken seriously and there is not enough emphasis on its educational value. There is not much concern about the appropriateness of the instruction method to the computer medium. This research proposes a framework for evaluating educational packages which include a number of issues. Several pieces of educational software were evaluated using this framework and Diagnosis for crop protection, a multimedia software package that aids in teaching the process of diagnosing crop problems, was selected for modification, as a practical application of the theoretical work. We studied different multimedia system development models and methodologies. We also analysed the cognitive issues and intelligent features that enhance the learnability. Finally, the appropriate intelligent features and other factors that could enhance Diagnosis for crop protection to be a more 'active knowledge constructing' environment have been identified. The current version of Diagnosis for crop protection was represented using an appropriate methodology and the proposed changes were described in detail.