Massey Documents by Type

Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    Essays on finance and deep learning : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Finance, School of Economics and Finance, Massey University
    (Massey University, 2025-07-25) Pan, Guoyao
    This thesis aims to broaden the application of deep learning techniques in financial research and comprises three essays that make meaningful contributions to the related literature. Essay One integrates deep learning into the Hub Strategy, a novel chart pattern analysis method, to develop trading strategies. Utilizing deep learning models, which analyze chart patterns alongside data such as trading volume, price volatility, and sentiment indicators, the strategy forecasts stock price movements. Tests on U.S. S&P 500 index stocks indicate that Hub Strategy trading methods, when integrated with deep learning models, achieve an annualized average return of approximately 25%, significantly outperforming the benchmark buy-and-hold strategy's 9.6% return. Risk-adjusted metrics, including Sharpe ratios and Jensen’s alpha, consistently demonstrate the superiority of these trading strategies over both the buy-and-hold approach and standalone Hub Strategy trading rules. To address data snooping concerns, multiple tests validate profitability, and an asset pricing model with 153 risk factors and Lasso-OLS (Ordinary Least Squares) regressions confirms its ability to capture positive alphas. Essay Two utilizes deep learning techniques to explore the relationships between the abnormal return and its explanatory variables, including firm-specific characteristics and realized stock returns. Trained deep learning models effectively predict the estimated abnormal return directly. We evaluate the effectiveness of detecting abnormal returns by comparing our deep learning models against three benchmark methods. When applied to a random dataset, deep learning models demonstrate a significant improvement in identifying abnormal returns within the induced range of -3% to 3%. Moreover, their performance remains consistent across non-random datasets classified by firm size and market conditions. In addition, a regression of abnormal return prediction errors on firm-based factors, market conditions, and periods reveals that deep learning models are less sensitive to variables like firm size, market conditions, and periods than the benchmarks. Essay Three assesses the performance of deep learning predictors in forecasting momentum turning points using the confusion matrix and comparing them to the benchmark model proposed by Goulding, Harvey, and Mazzoleni (2023). Tested on U.S. stocks from January 1990 to December 2023, deep learning predictors demonstrate higher accuracy in identifying turning points than the benchmark. Furthermore, our deep learning-based trading rules yield higher mean log returns and Sharpe ratios, along with lower volatility, compared to the benchmark. Two models achieve average monthly returns of 0.0148 and 0.0177, surpassing the benchmark’s 0.0108. These gains are both economically and statistically significant, with consistent annual results. Regression analysis also shows that our models respond more effectively to changes in stock and market return volatility than the benchmark. Overall, these essays expand the application of deep learning in finance research, demonstrating high predictive accuracy, enhanced trading profitability, and effective detection of long-term abnormal returns, all of which hold significant practical value.
  • Item
    Source attribution models using random forest for whole genome sequencing data : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics, School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
    (Massey University, 2025-07-14) Smith, Helen
    Foodborne diseases, such as campylobacteriosis, represent a significant risk to public health. Preventing the spread of Campylobacter species requires knowledge of sources of human infection. Current methods of source attribution are designed to be used with a small number of genes, such as the seven housekeeping genes of the original multilocus sequence typing (MLST) scheme, and encounter issues when presented with whole genome data. Higher resolution data, however, offers the potential to differentiate within source groups (i.e., between different ruminant species in addition to differentiating between ruminants and poultry), which is poorly achieved with current methods. Random forest is a tree-based machine learning algorithm which is suitable for analysing data sets with large numbers of predictor variables, such as whole genome sequencing data. A known issue with tree-based predictive models occurs when new levels of a variable are present in an observation for prediction which were not present in the set of observations with which the model was trained. This is almost certain to occur with genomic data, which has a potentially ever-growing set of alleles for any single gene. This thesis investigates the use of ordinal encoding categorical variables to address the ‘absent levels’ problem in random forest models. Firstly, a method of encoding is adapted, based on correspondence analysis (CA) of a class by level contingency table, to be unbiased in the presence of absent levels. Secondly, a new method of encoding is introduced which utilises a set of supplementary information on the category levels themselves (i.e., the sequence information of alleles) and encodes them, as well as any new levels, according to their similarity or dissimilarity to each other via the method of principal coordinates analysis (PCO). Thirdly, based on the method of canonical analysis of principal coordinates (CAP), the encoding information of the levels from the CA on the contingency table is combined with the encoding information of the levels from the PCO on the dissimilarity matrix of the supplementary levels information, with a classical correspondence analysis (CCorA). Potential issues when using out-of-bag (OOB) data following variable encoding are then explored and an adaptation to the holdout variable importance method is introduced which is suitable for use with all methods of encoding. This thesis finishes by applying the CAP method of encoding to a random forest predictive model for source attribution of whole genome sequencing data from the Source Assigned Campylobacteriosis in New Zealand (SACNZ) study. The advantage of adding core genes and accessory genes as predictor variables is investigated, and the attribution results are compared to the results from a previously published study which used the asymmetric island model on the same set of isolates and the seven MLST genes.
  • Item
    Clinical epidemiology and outcomes of ventilator-associated pneumonia in critically ill adult patients: protocol for a large-scale systematic review and planned meta-analysis
    (BioMed Central Ltd, 20/07/2019) Hernandez M; Gutiérrez, JM; Borromeo A; Dueño AL; Paragas, Jr ED; Ellasus RO; Abalos-Fabia R; Abriam JA; Sonido AE; Generale AJA; Sombillo RC; Lacanaria MGC; Centeno MM; Laoingco JRC; Domantay JAA
    An increasing number of studies have investigated the clinical epidemiology and outcomes of ventilator-associated pneumonia (VAP) in intensive care units. However, these findings have not been clearly defined in broad subgroups of mechanically ventilated adults. Hence, this protocol for a systematic review and meta-analysis is designed to better understand the clinical and epidemiological features of VAP in these patient populations by establishing its overall prognosis of and risk factors for morbidity and mortality and to determine the differences in clinical and economic outcomes between VAP and non-VAP patients. This present review will systematically search available full-text articles without date and language restrictions and indexed in PubMed, CENTRAL, CINAHL, Web of Science, and EMBASE databases. In addition, reference lists and citations of retrieved articles and relevant medical and nursing journals will be manually reviewed. Supplementary search in other databases involving trials, reviews, and grey literatures, including conference proceedings, theses, and dissertations, will be performed. Study investigators will be contacted to clarify missing or unpublished data. All prognostic studies meeting the pre-defined eligibility criteria will be included. The study selection, risk of bias assessment, data extraction, and grading of the quality of evidence will be carried out in duplicate, involving independent evaluation by two investigators with consensus or a third-party adjudication. The degree of inter-rater agreement will be calculated using the kappa statistic. For meta-analysis, dichotomous and continuous outcome measures will be pooled using odds ratios and standardized mean differences with 95% confidence intervals, respectively. The Mantel-Haenszel or inverse variance methods with random effects model will be used as a guide for analysis. The heterogeneity of each outcome measure will be assessed using both X2 and I2 statistics. In addition, sensitivity and subgroup analyses will be performed to ensure consistency of pooled results. The review protocol described herein is in accordance with the PRISMA-P standards.Discussion: The investigation of the epidemiological profiles, prognostic factors, and outcomes associated with VAP is critical for the identification of high-risk groups of mechanically ventilated patients and evaluation of possible clinical endpoints. This may provide substantial links for improved VAP prevention practices.