Massey Documents by Type

Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294

Browse

Search Results

Now showing 1 - 9 of 9
  • Item
    Essays on finance and deep learning : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Finance, School of Economics and Finance, Massey University
    (Massey University, 2025-07-25) Pan, Guoyao
    This thesis aims to broaden the application of deep learning techniques in financial research and comprises three essays that make meaningful contributions to the related literature. Essay One integrates deep learning into the Hub Strategy, a novel chart pattern analysis method, to develop trading strategies. Utilizing deep learning models, which analyze chart patterns alongside data such as trading volume, price volatility, and sentiment indicators, the strategy forecasts stock price movements. Tests on U.S. S&P 500 index stocks indicate that Hub Strategy trading methods, when integrated with deep learning models, achieve an annualized average return of approximately 25%, significantly outperforming the benchmark buy-and-hold strategy's 9.6% return. Risk-adjusted metrics, including Sharpe ratios and Jensen’s alpha, consistently demonstrate the superiority of these trading strategies over both the buy-and-hold approach and standalone Hub Strategy trading rules. To address data snooping concerns, multiple tests validate profitability, and an asset pricing model with 153 risk factors and Lasso-OLS (Ordinary Least Squares) regressions confirms its ability to capture positive alphas. Essay Two utilizes deep learning techniques to explore the relationships between the abnormal return and its explanatory variables, including firm-specific characteristics and realized stock returns. Trained deep learning models effectively predict the estimated abnormal return directly. We evaluate the effectiveness of detecting abnormal returns by comparing our deep learning models against three benchmark methods. When applied to a random dataset, deep learning models demonstrate a significant improvement in identifying abnormal returns within the induced range of -3% to 3%. Moreover, their performance remains consistent across non-random datasets classified by firm size and market conditions. In addition, a regression of abnormal return prediction errors on firm-based factors, market conditions, and periods reveals that deep learning models are less sensitive to variables like firm size, market conditions, and periods than the benchmarks. Essay Three assesses the performance of deep learning predictors in forecasting momentum turning points using the confusion matrix and comparing them to the benchmark model proposed by Goulding, Harvey, and Mazzoleni (2023). Tested on U.S. stocks from January 1990 to December 2023, deep learning predictors demonstrate higher accuracy in identifying turning points than the benchmark. Furthermore, our deep learning-based trading rules yield higher mean log returns and Sharpe ratios, along with lower volatility, compared to the benchmark. Two models achieve average monthly returns of 0.0148 and 0.0177, surpassing the benchmark’s 0.0108. These gains are both economically and statistically significant, with consistent annual results. Regression analysis also shows that our models respond more effectively to changes in stock and market return volatility than the benchmark. Overall, these essays expand the application of deep learning in finance research, demonstrating high predictive accuracy, enhanced trading profitability, and effective detection of long-term abnormal returns, all of which hold significant practical value.
  • Item
    Towards explaining blackbox models using genetic network programming : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Science in Computer Science at Massey University, Albany, New Zealand
    (Massey University, 2024) Zhu, Haotian
    With the emergence of deep learning systems capable of learning intricate robot manoeuvres, control, team coordination, and planning both from data and through inter action with the environment, numerous complex and non-linear problems have found solutions. However, these systems often function as ’black boxes,’ lacking the ability to provide human-interpretable solutions. This study addresses the interpretability challenge in the field of Explainable AI by employing a ’black box’ system as a target model and subsequently transforming it into a computational graph, equivalent to a Genetic Network Program (GNP). Furthermore, we present a methodology for refining and reducing the size of the GNP solution. Lastly, we also test if we could utilize the black box system as a guide to the fitness function in a GNP architecture. To illustrate its efficacy, we use a multi-goal path-finding problem from the OpenAI Gym framework. The experimental results demonstrate the efficacy of the converted and refined GNP solution in successfully addressing the taxi problem across 500 environments, constituting a comprehensive dataset. Notably, the refined GNP solution exhibits no redundant or unnecessary nodes. Despite the research’s focused scope, centering on a single agent with multiple goals, the algorithms introduced in this study lay the groundwork for the development of more sophisticated and interpretable algorithms. These advancements are poised to tackle more intricate challenges in the future.
  • Item
    Deep learning for action recognition in videos : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, School of Mathematical and Computational Sciences, Massey University, Albany, Auckland, New Zealand
    (Massey University, 2024-07-25) Ma, Yujun
    Action recognition aims to identify human actions in videos through complete action execution. Current action recognition approaches are primarily based on convolutional neural networks (CNNs), Transformers, or hybrids of both. Despite their strong performance, several challenges persist: insufficient disentangled modeling of spatio-temporal features, (ii) a lack of fine-grained motion modelling in action representation, and (iii) limited exploration of the positional embedding of spatial tokens. In this thesis, we introduce three novel deep-learning approaches that address these challenges and enhance spatial and temporal representation in diverse action recognition tasks, including RGB-D, coarse-grained, and fine-grained action recognition. Firstly, we develop a multi-stage factorized spatio-temporal model (MFST) for RGB-D action recognition. This model addresses the limitations of existing RGB-D approaches that rely on entangled spatio-temporal 3D convolution. The MFST em ploys a multi-stage hierarchical structure where each stage independently constructs spatio-temporal dimensions. This progression from low-level features to higher-order semantic primitives results in a robust spatio-temporal representation. Secondly, we introduce a relative-position embedding based spatially and temporally decoupled Transformer (RPE-STDT) for coarse-grained and fine-grained action recognition. RPE-STDT addresses the high computational costs of Vision Transformers in video data processing, particularly due to the absolute-position embedding in frame patch tokenization. RPE-STDT utilizes dual Transformer encoder series: spatial encoders for intra-temporal index token interactions, and temporal encoders for inter-temporal dimension interactions with a subsampling strategy. Thirdly, we propose a convolutional transformer network (CTN) for fine-grained action recognition. Traditional Transformer models require extensive training data and additional supervision to rival CNNs in learning capabilities. The proposed CTN merges CNN’s strengths (e.g., weight sharing, and locality) with Transformer bene fits (e.g., dynamic attention, and long-range dependency learning), allows for superior fine-grained motion representation. In summary, we contribute three deep-learning models for diverse action recognition tasks. Each model achieves the state-of-the-art performance across multiple prestigious datasets, as validated by thorough experimentation.
  • Item
    Multi-source multimodal deep learning to improve situation awareness : an application of emergency traffic management : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Emergency Management at Massey University, Wellington, New Zealand
    (Massey University, 2023) Hewa Algiriyage, Rangika Nilani
    Traditionally, disaster management has placed a great emphasis on institutional warning systems, and people have been treated as victims rather than active participants. However, with the evolution of communication technology, today, the general public significantly contributes towards performing disaster management tasks challenging traditional hierarchies in information distribution and acquisition. With mobile phones and Social Media (SM) platforms widely being used, people in disaster scenes act as non-technical sensors that provide contextual information in multiple modalities (e.g., text, image, audio and video) through these content-sharing applications. Research has shown that the general public has extensively used SM applications to report injuries or deaths, damage to infrastructure and utilities, caution, evacuation needs and missing or trapped people during disasters. Disaster responders significantly depend on data for their Situation Awareness (SA) or the dynamic understanding of “the big picture” in space and time for decision-making. However, despite the benefits, processing SM data for disaster response brings multiple challenges. Among them, the most significant challenge is that SM data contain rumours, fake information and false information. Thus, responding agencies have concerns regarding utilising SM for disaster response. Therefore, a high volume of important, real-time data that is very useful for disaster responders’ SA gets wasted. In addition to SM, many other data sources produce information during disasters, including CCTV monitoring, emergency call centres, and online news. The data from these sources come in multiple modalities such as text, images, video, audio and meta-data. To date, researchers have investigated how such data can be automatically processed for disaster response using machine learning and deep learning approaches using a single source/ single modality of data, and only a few have investigated the use of multiple sources and modalities. Furthermore, there is currently no real-time system designed and tested for real-world scenarios to improve responder SA while cross-validating and exploiting SM data. This doctoral project, written within a “PhD-thesis-withpublication” format, addresses this gap by investigating the use of SM data for disaster response while improving reliability through validating data from multiple sources in real-time. This doctoral research was guided by Design Science Research (DSR), which studies the creation of artefacts to solve practical problems of general interest. An artefact: a software prototype that integrates multisource multimodal data for disaster response was developed adopting a 5-stage design science method framework proposed by Johannesson et al. [175] as the roadmap for designing, developing and evaluating. First, the initial research problem was clearly stated, positioned, and root causes were identified. During this stage, the problem area was narrowed down to Emergency traffic management instead of all disaster types. This was done considering the real-time nature and data availability for the artefact’s design, development and evaluation. Second, the requirements for developing the software artefacts were captured using the interviewing technique. Interviews were conducted with stakeholders from a number of disaster and emergency management and transport and traffic agencies in New Zealand. Moreover, domain knowledge and experimental information were captured by analysing academic literature. Third, the artefact was designed and developed. The fourth and final step was focused on the demonstration and evaluation of the artefact. The outcomes of this doctoral research underpin the potential for using validated SM data to enhance the responder’s SA. Furthermore, the research explored appropriate ways to fuse text, visual and voice data in real-time, to provide a comprehensive picture for disaster responders. The achievement of data integration was made through multiple components. First, methodologies and algorithms were developed to estimate traffic flow from CCTV images and CCTV footage by counting vehicle objects. These outcomes extend the previous work by annotating a large New Zealand-based vehicle dataset for object detection and developing an algorithm for vehicle counting by vehicle class and movement direction. Second, a novel deep learning architecture is proposed for making short-term traffic flow predictions using weather data. Previous research has mostly used only traffic data for traffic flow prediction. This research goes beyond previous work by including the correlation between traffic flow and weather conditions. Third, an event extraction system is proposed to extract event templates from online news and SM text data, answering What (semantic), Where (spatial) and When (temporal) questions. Therefore, this doctoral project provides several contributions to the body of knowledge for deep learning and disaster research. In addition, an important practical outcome of this research is an extensible event extraction system for any disaster capable of generating event templates by integrating text and visual formats from online news and SM data that could assist disaster responders’ SA.
  • Item
    Deep learning-based approaches for plant disease and weed detection : a thesis by publications presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering, Massey University, Auckland, New Zealand
    (Massey University, 2022) Saleem, Muhammad Hammad
    To match the ever-growing food demand, the scientific community has been actively focusing on addressing the various challenges faced by the agricultural sector. The major challenges are soil infertility, abrupt changes in climatic conditions, scarcity of water, untrained labor, emission of greenhouse gases, and many others. Moreover, plant diseases and weeds are two of the most important agricultural problems that reduce crop yield. Therefore, accurate detection of plant diseases and weeds is one of the essential operations to apply targeted and timely control measures. As a result, this can improve crop productivity, reduce the environmental effects and financial losses resulting from the excessive application of fungicide/herbicide spray on diseased plants/weeds. Among various ways of plant disease and weed detection, image-based methods are significantly effective for the interpretation of the distinct features. In recent years, image-based deep learning (DL) techniques have been reported in literature for the recognition of weeds and plant diseases. However, the full potential of DL has not yet been explored as most of the methods rely on modifications of the DL models for well-known and readily available datasets. The current studies lack in several ways, such as addressing various complex agricultural conditions, exploring several aspects of DL, and providing a systematic DL-based approach. To address these research gaps, this thesis presents various DL-based methodologies and aims to improve the mean average precision (mAP) for the identification of diseases and weeds in several plant species. The research on plant disease recognition starts with a publicly available dataset called PlantVillage and comparative analyses are conducted on various DL feature extractors, meta-architectures, and optimization algorithms. Later, new datasets are generated from various local New Zealand horticultural farms, named NZDLPlantDisease-v1 & v2. The proposed datasets consist of healthy and diseased plant organs of 13 economically important horticultural crops of New Zealand, divided into 48 classes. A performance-optimized DL model and a transfer learning-based approach are proposed for the detection of plant diseases using curated datasets. The weed identification has been performed on an open-source dataset called DeepWeeds. A two-step weed detection pipeline is presented to show the performance improvement of the deep learning model with a significant margin. The results for both agricultural tasks achieve superior performance compared to the existing method/default settings. The research outcomes elaborate the practical aspects and extended potential of DL for selected agricultural applications. Therefore, this thesis is a benchmark step for cost-effective crop protection and site-specific weed management systems (SSWM).
  • Item
    Deep learning for asteroid detection in large astronomical surveys : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand
    (Massey University, 2022) Cowan, Preeti
    The MOA-II telescope has been operating at the Mt John Observatory since 2004 as part of a Japan/NZ collaboration looking for microlensing events. The telescope has a total field of view of 1.6 x 1.3 degrees and surveys the Galactic Bulge several times each night. This makes it particularly good for observing short duration events. While it has been successful in discovering exoplanets, the full scientific potential of the data has not yet been realised. In particular, numerous known asteroids are hidden amongst the MOA data. These can be clearly seen upon visual inspection of selected images. There are also potentially many undiscovered asteroids captured by the telescope. As yet, no tool exists to effectively mine archival data from large astronomical surveys, such as MOA, for asteroids. The appeal of deep learning is in its ability to learn useful representations from data without significant hand-engineering, making it an excellent tool for asteroid detection. Supervised learning requires labelled datasets, which are also unavailable. The goal of this research is to develop datasets suitable for supervised learning and to apply several CNN-based techniques to identify asteroids in the MOA-II data. Asteroid tracklets can be clearly seen by combining all the observations on a given night and these tracklets form the basis of the dataset. Known asteroids were identified within the composite images, forming the seed dataset for supervised learning. These images were used to train several CNNs to classify images as either containing asteroids or not. The top five networks were then configured as an ensemble that achieved a recall of 97.67%. Next, the YOLO object detector was trained to localise asteroid tracklets, achieving a mean average precision (mAP) of 90.97%. These trained networks will be applied to 16 years of MOA archival data to find both known and unknown asteroids that have been observed by the telescope over the years. The methodologies developed can also be used by other surveys for asteroid recovery and discovery.
  • Item
    Deep learning for speech enhancement : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand
    (Massey University, 2022) Qiu, Yuanhang
    Speech enhancement, aiming at improving the intelligibility and overall perceptual quality of a contaminated speech signal, is an effective way to improve speech communications. In this thesis, we propose three novel deep learning methods to improve speech enhancement performance. Firstly, we propose an adversarial latent representation learning for latent space exploration of generative adversarial network based speech enhancement. Based on adversarial feature learning, this method employs an extra encoder to learn an inverse mapping from the generated data distribution to the latent space. The encoder establishes an inner connection with the generator and contributes to latent information learning. Secondly, we propose an adversarial multi-task learning with inverse mappings method for effective speech representation. This speech enhancement method focuses on enhancing the generator's capability of speech information capture and representation learning. To implement this method, two extra networks are developed to learn the inverse mappings from the generated distribution to the input data domains. Thirdly, we propose a self-supervised learning based phone-fortified method to improve specific speech characteristics learning for speech enhancement. This method explicitly imports phonetic characteristics into a deep complex convolutional network via a contrastive predictive coding model pre-trained with self-supervised learning. The experimental results demonstrate that the proposed methods outperform previous speech enhancement methods and achieve state-of-the-art performance in terms of speech intelligibility and overall perceptual quality.
  • Item
    Deep learning for action recognition in videos : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, Albany, Auckland, New Zealand
    (Massey University, 2021) Zong, Ming
    Video action recognition is a difficult and challenging task in video processing. In this thesis, we propose three novel deep learning approaches to improve the accuracy of action recognition. The first approach aims to learn multi-cue based spatiotemporal features by performing 3D convolutions. Previous 3D CNN models mainly perform 3D convolutions on individual cues (e.g., appearance and motion cues), which lacks the effective overall integration of the appearance information and motion information of videos. To address this issue, we propose a novel multi-cue 3D convolutional neural network (named M3D model for short), which integrates three individual cues (i.e. an appearance cue, a direct motion cue, and a salient motion cue) directly. The proposed M3D model directly performs 3D convolutions on multiple cues instead of a single cue, which can obtain more discriminative and robust features by integrating three different cues as a whole. In particular, we propose a novel deep residual multi-cue 3D convolution model (named R-M3D for short) to enhance the representation ability by benefitting from the increasing depth of the model, which can obtain more representative spatiotemporal features. The second approach aims to utilize the motion saliency information to enhance the accuracy of action recognition. We propose a novel motion saliency based multi-stream multiplier ResNets (named MSM-ResNets for short) for action recognition. The proposed MSM-ResNets model consists of three interactive streams: the appearance stream, motion stream and motion saliency stream. The appearance stream is responsible for capturing the appearance information with RGB video frames as input. The motion stream is responsible for capturing the motion information with optical flow frames as input. The motion saliency stream is responsible for capturing the salient motion information with motion saliency frames as input. In particular, to utilize the complementary information between different streams over time, the proposed MSM-ResNets model establishes multiplicative connections between different streams. Two kinds of different multiplicative connections are injected, the first one is to inject multiplicative connections to transmit the motion cue from the motion stream to the appearance stream, and the second one is to inject multiplicative connections to transmit the motion saliency cue from the motion saliency stream to the motion stream. The third approach aims to explore the salient spatiotemporal information over time evolution. We propose a novel spatial and temporal saliency based four-stream network with multi-task learning (named 3M model for short) for action recognition. The proposed 3M model comprises two parts: (i) The first part is a spatial and temporal saliency based four-stream network, which comprises four streams: an appearance stream, a motion stream, a novel spatial saliency stream and a novel temporal saliency stream. The novel spatial saliency stream is used to acquire spatial saliency information and the novel temporal saliency stream is used to acquire temporal saliency information. (ii) The second part is a multi-task learning based long short-term memory network (LSTM), which adopts the feature representations obtained by obtained convolutional neural networks (CNN) as input. The multi-task learning based LSTM can share the complementary knowledge between different streams and capture the long-term dependency relationships of consecutive frames. Experiments verify the effectiveness of all the proposed models and show that all the proposed models achieve a better performance than the state-of-the-art.
  • Item
    Deep learning for entity analysis : a thesis submitted in partial fulfilment for the degree of Doctor of Philosophy in Computer Science at the School of Natural and Computational Sciences, Massey University, Albany, New Zealand
    (Massey University, 2021) Hou, Feng
    Our research focuses on three sub-tasks of entity analysis: fine-grained entity typing (FGET), entity linking and entity coreference resolution. We aim at improving FGET and entity linking by exploiting the document-level type constraints and improving entity linking and coreference resolution by embedding fine-grained entity type information. To extract more efficient feature representations and offset label noises in the datasets for FGET, we propose three transfer learning schemes: (i) transferring sub-word embeddings to generate more efficient out-of-vocabulary (OOV) embeddings for mentions; (ii) using a pre-trained language model to generate more efficient context features; (iii) using a pre-trained topic model to transfer the topic-type relatedness through topic anchors and select confusing fine-grained types at inference time. The pre-trained topic model can offset the label noises without retreating to coarse-grained types. To reduce the distinctiveness of existing entity embeddings and facilitate the learning of contextual commonality for entity linking, we propose a simple yet effective method, FGS2EE, to inject fine-grained semantic information into entity embeddings. FGS2EE first uses the embeddings of semantic type words to generate semantic entity embeddings, and then combines them with existing entity embeddings through linear aggregation. Based on our entity embeddings, we have achieved new state-of-the-art performance on two of the five out-domain test sets for entity linking. Further, we propose a method, DOC-AET, to exploit DOCument-level coherence of named entity mentions and anonymous entity type (AET) words/mentions. We learn embeddings of AET words from the AET words’ inter-paragraph co-occurrence matrix. Then, we build AET entity embeddings and document AET context embeddings using the AET word embeddings. The AET coherence are computed using the AET entity embeddings and document context embeddings. By incorporating such coherence scores, DOC-AET has achieved new state-of-the-art results on three of the five out-domain test sets for entity linking. We also propose LASE (Less Anisotropic Span Embeddings) schemes for coreference resolution. We investigate the effectiveness of these schemes with extensive experiments. Our ablation studies also provide valuable insights about the contextualized representations. In summary, this thesis proposes four deep learning approaches for entity analysis. Extensive experiments show that we have achieved state-of-the-art performance on the three sub-tasks of entity analysis.