Journal Articles

Permanent URI for this collectionhttps://mro.massey.ac.nz/handle/10179/7915

Browse

Search Results

Now showing 1 - 10 of 14
  • Item
    A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse
    (Springer Nature Singapore Pte Ltd, 2025-10-06) Henry NIN; Pedersen M; Williams M; Martin JLB; Donkin L
    The value-loading problem is a major obstacle to creating Artificial Intelligence (AI) systems that align with human values and preferences. Central to this problem is the establishment of safe limits for repeatable AI behaviors. We introduce hormetic alignment, a paradigm to regulate the behavioral patterns of AI, grounded in the concept of hormesis, where low frequencies or repetitions of a behavior have beneficial effects, while high frequencies or repetitions are harmful. By modeling behaviors as allostatic opponent processes, we can use either Behavioral Frequency Response Analysis (BFRA) or Behavioral Count Response Analysis (BCRA) to quantify the safe and optimal limits of repeatable behaviors. We demonstrate how hormetic alignment solves the ‘paperclip maximizer’ scenario, a thought experiment where an unregulated AI tasked with making paperclips could end up converting all matter in the universe into paperclips. Our approach may be used to help create an evolving database of ‘values’ based on the hedonic calculus of repeatable behaviors with decreasing marginal utility. Hormetic alignment offers a principled solution to the value-loading problem for repeatable behaviors, augmenting current techniques by adding temporal constraints that reflect the diminishing returns of repeated actions. It further supports weak-to-strong generalization – using weaker models to supervise stronger ones – by providing a scalable value system that enables AI to learn and respect safe behavioral bounds. This paradigm opens new research avenues for developing computational value systems that govern not only single actions but the frequency and count of repeatable behaviors.
  • Item
    Accurate machine learning model for human embryo morphokinetic stage detection
    (Springer Science+Business Media, LLC, 2025-08-20) Misaghi H; Cree L; Knowlton N
    Purpose: The ability to detect, monitor, and precisely time the morphokinetic stages of human pre-implantation embryo development plays a critical role in assessing their viability and potential for successful implantation. Therefore, there is a need for accurate and accessible tools to analyse embryos. This work describes a highly accurate, machine learning model designed to predict 17 morphokinetic stages of pre-implantation human development, an improvement on existing models. This model provides a robust tool for researchers and clinicians, enabling the automation of morphokinetic stage prediction, standardising the process, and reducing subjectivity between clinics. Method: A computer vision model was built on a publicly available dataset for embryo Morphokinetic stage detection. The dataset contained 273,438 labelled images based on Embryoscope/ + © embryo images. The dataset was split 70/10/20 into training/validation/test sets. Two different deep learning architectures were trained and tested, one using EfficientNet-V2-Large and the other using EfficientNet-V2-Large with the addition of fertilisation time as input. A new postprocessing algorithm was developed to reduce noise in the predictions of the deep learning model and detect the exact time of each morphokinetic stage change. Results: The proposed model reached an overall test F1-score of 0.881 and accuracy of 87% across 17 morphokinetic stages on an independent test set. Conclusion: The proposed model shows a 17% accuracy improvement, compared to the best models on the same dataset. Therefore, our model can accurately detect morphokinetic stages in static embryo images as well as detecting the exact timings of stage changes in a complete time-lapse video.
  • Item
    A machine learning-guided semi-empirical model for predicting single-sided natural ventilation rates
    (Elsevier B V, 2025-10-01) Han JM; Wu W; Malkawi A
    Most of the state-of-the-art natural ventilation models were developed for either single-sided, or cross ventilation mode, or buoyancy-driven ventilation. Natural ventilation (NV) of a single zone may vary between different modes in different seasons depending on the design and the operation of other building systems. This paper tailors the machine learning embedded semi-empirical models to predict the natural ventilation rate in a single zone. The process of model development consists of two parts: 1) semi-empirical model development for single-sided ventilation with a local context 2) machine learning driven component to accurately predict a specific lab condition. By taking a case study, the series of steps were taken to validate model accuracy with an estimated flowrate in given window operable areas. Firstly, the contextual inputs and localized wind speed as well as window models were investigated. Finally, we developed a machine learning model to predict the localized lab environment by using pressure sensor's data on façade. The random forest model was trained and fine-tuned to predict localized pressure coefficients (Cp). Over 75 % of the predicted values fall within the model's ± 1 standard deviation credible interval, demonstrating not only high predictive reliability but also suitability for integration into empirical ventilation models. These results highlight the model's potential as a robust input generator for semi-empirical frameworks with locally collected weather data, particularly in applications involving window operation control and site-specific model calibration.
  • Item
    Modelling and mapping of subsurface nitrate-attenuation index in agricultural landscapes
    (Elsevier Ltd, 2025-06) Collins SB; Singh R; Mead SR; Horne DJ; Zhang L
    Environmental management of nutrient losses from agricultural lands is required to reduce their potential impacts on the quality of groundwater and eutrophication of surface waters in agricultural landscapes. However, accurate accounting and management of nitrogen losses relies on a robust modelling of nitrogen leaching and its potential attenuation – specifically, the reduction of nitrate to gaseous forms of nitrogen – in subsurface flow pathways. Subsurface denitrification is a key process in potential nitrate attenuation, but the spatial and temporal dynamics of where and when it occurs remain poorly understood, especially at catchment-scale. In this paper, a novel Landscape Subsurface Nitrate-Attenuation Index (LSNAI) is developed to map spatially variable subsurface nitrate attenuation potential of diverse landscape units across the Manawatū-Whanganui region of New Zealand. A large data set of groundwater quality across New Zealand was collated and analysed to assess spatial and temporal variability of groundwater redox status (based on dissolved oxygen, nitrate and dissolved manganese) across different hydrogeological settings. The Extreme Gradient Boosting algorithm was used to predict landscape unit subsurface redox status by integrating the nationwide groundwater redox status data set with various landscape characteristics. Applying the hierarchical clustering analysis and unsupervised classification techniques, the LSNAI was then developed to identify and map five landscape subsurface nitrate attenuation classes, varying from very low to very high potential, based on the predicted groundwater redox status probabilities and identified soil drainage and rock type as key influencing landscape characteristics. Accuracy of the LSNAI mapping was further investigated and validated using a set of independent observations of groundwater quality and redox assessments in shallow groundwaters in the study area. This highlights the potential for further research in up-scaling mapping and modelling of landscape subsurface nitrate attenuation index to accurately account for spatial variability in subsurface nitrate attenuation potential in modelling and assessment of water quality management measures at catchment-scale in agricultural landscapes.
  • Item
    Novel machine learning-driven comparative analysis of CSP, STFT, and CSP-STFT fusion for EEG data classification across multiple meditation and non-meditation sessions in BCI pipeline.
    (BioMed Central Ltd, 2025-02-08) Liyanagedera ND; Bareham CA; Kempton H; Guesgen HW
    This study focuses on classifying multiple sessions of loving kindness meditation (LKM) and non-meditation electroencephalography (EEG) data. This novel study focuses on using multiple sessions of EEG data from a single individual to train a machine learning pipeline, and then using a new session data from the same individual for the classification. Here, two meditation techniques, LKM-Self and LKM-Others were compared with non-meditation EEG data for 12 participants. Among many tested, three BCI pipelines we built produced promising results, successfully detecting features in meditation/ non-meditation EEG data. While testing different feature extraction algorithms, a common neural network structure was used as the classification algorithm to compare the performance of the feature extraction algorithms. For two of those pipelines, Common Spatial Patterns (CSP) and Short Time Fourier Transform (STFT) were successfully used as feature extraction algorithms where both these algorithms are significantly new for meditation EEG. As a novel concept, the third BCI pipeline used a feature extraction algorithm that fused the features of CSP and STFT, achieving the highest classification accuracies among all tested pipelines. Analyses were conducted using EEG data of 3, 4 or 5 sessions, totaling 3960 tests on the entire dataset. At the end of the study, when considering all the tests, the overall classification accuracy using SCP alone was 67.1%, and it was 67.8% for STFT alone. The algorithm combining the features of CSP and STFT achieved an overall classification accuracy of 72.9% which is more than 5% higher than the other two pipelines. At the same time, the highest mean classification accuracy for the 12 participants was achieved using the pipeline with the combination of CSP STFT algorithm, reaching 75.5% for LKM-Self/ non-meditation for the case of 5 sessions of data. Additionally, the highest individual classification accuracy of 88.9% was obtained by the participant no. 14. Furthermore, the results showed that the classification accuracies for all three pipelines increased with the number of training sessions increased from 2 to 3 and then to 4. The study was successful in classifying a new session of EEG meditation/ non-meditation data after training machine learning algorithms using a different set of session data, and this achievement will be beneficial in the development of algorithms that support meditation.
  • Item
    On the origin of optical rotation changes during the κ-carrageenan disorder-to-order transition
    (Elsevier Ltd., 2024-06-01) Westberry BP; Rio M; Waterland MR; Williams MAK
    It is well established that solutions of both polymeric and oligomeric κ-carrageenan exhibit a clear change in optical rotation (OR), in concert with gel-formation for polymeric samples, as the solution is cooled in the presence of certain ions. The canonical interpretation - that this OR change reflects a 'coil-to-helix transition' in single chains - has seemed unambiguous; the solution- or 'disordered'-state structure has ubiquitously been assumed to be a 'random coil', and the helical nature of carrageenan in the solid-state was settled in the 1970s. However, recent work has found that κ-carrageenan contains substantial helical secondary structure elements in the disordered-state, raising doubts over the validity of this interpretation. To investigate the origins of the OR, density-functional theory calculations were conducted using atomic models of κ-carrageenan oligomers. Changes were found to occur in the predicted OR owing purely to dimerization of chains, and - together with the additional effects of slight changes in conformation that occur when separated helical chains form double-helices - the predicted OR changes are qualitatively consistent with experimental results. These findings contribute to a growing body of evidence that the carrageenan 'disorder-to-order' transition is a cooperative process, and have further implications for the interpretation of OR changes demonstrated by macromolecules in general.
  • Item
    Nphos: Database and Predictor of Protein N-phosphorylation.
    (Oxford University Press, 2024-04-10) Zhao M-X; Ding R-F; Chen Q; Meng J; Li F; Fu S; Huang B; Liu Y; Ji Z-L; Zhao Y; Xue Y
    Protein N-phosphorylation is widely present in nature and participates in various biological processes. However, current knowledge on N-phosphorylation is extremely limited compared to that on O-phosphorylation. In this study, we collected 11,710 experimentally verified N-phosphosites of 7344 proteins from 39 species and subsequently constructed the database Nphos to share up-to-date information on protein N-phosphorylation. Upon these substantial data, we characterized the sequential and structural features of protein N-phosphorylation. Moreover, after comparing hundreds of learning models, we chose and optimized gradient boosting decision tree (GBDT) models to predict three types of human N-phosphorylation, achieving mean area under the receiver operating characteristic curve (AUC) values of 90.56%, 91.24%, and 92.01% for pHis, pLys, and pArg, respectively. Meanwhile, we discovered 488,825 distinct N-phosphosites in the human proteome. The models were also deployed in Nphos for interactive N-phosphosite prediction. In summary, this work provides new insights and points for both flexible and focused investigations of N-phosphorylation. It will also facilitate a deeper and more systematic understanding of protein N-phosphorylation modification by providing a data and technical foundation. Nphos is freely available at http://www.bio-add.org/Nphos/ and http://ppodd.org.cn/Nphos/.
  • Item
    Forecasting patient flows with pandemic induced concept drift using explainable machine learning
    (BioMed Central Ltd, 2023-04-21) Susnjak T; Maddigan P
    Accurately forecasting patient arrivals at Urgent Care Clinics (UCCs) and Emergency Departments (EDs) is important for effective resourcing and patient care. However, correctly estimating patient flows is not straightforward since it depends on many drivers. The predictability of patient arrivals has recently been further complicated by the COVID-19 pandemic conditions and the resulting lockdowns. This study investigates how a suite of novel quasi-real-time variables like Google search terms, pedestrian traffic, the prevailing incidence levels of influenza, as well as the COVID-19 Alert Level indicators can both generally improve the forecasting models of patient flows and effectively adapt the models to the unfolding disruptions of pandemic conditions. This research also uniquely contributes to the body of work in this domain by employing tools from the eXplainable AI field to investigate more deeply the internal mechanics of the models than has previously been done. The Voting ensemble-based method combining machine learning and statistical techniques was the most reliable in our experiments. Our study showed that the prevailing COVID-19 Alert Level feature together with Google search terms and pedestrian traffic were effective at producing generalisable forecasts. The implications of this study are that proxy variables can effectively augment standard autoregressive features to ensure accurate forecasting of patient flows. The experiments showed that the proposed features are potentially effective model inputs for preserving forecast accuracies in the event of future pandemic outbreaks.
  • Item
    Segregation of ‘Hayward’ kiwifruit for storage potential using Vis-NIR spectroscopy
    (Elsevier BV, 2022-07) Li M; Pullanagari R; Yule I; East A
    Kiwifruit are often harvested unripe and kept in local coolstores for extended periods of time before being marketed. Many pre-harvest factors contribute to variation in fruit quality at harvest and during coolstorage, resulting in the difficulty in segregating fruit for their storage potential. The ability to forecast storage potential, both within and between populations of fruit, could enable segregation systems to be implemented at harvest to assist with inventory decision making and improve profitability. Visible-near infrared (Vis-NIR) spectroscopy is one of the most commonly used non-destructive techniques for estimation of internal quality of kiwifruit. Whilst many previous attempts focused on instantaneous quantification of quality attributes, the objective of this work was to investigate the use of Vis-NIR spectroscopy utilised at harvest to qualitatively forecast storage potential of individual or batches of kiwifruit. Commercially sourced ‘Hayward’ kiwifruit capturing large variability of storability were measured non-destructively at harvest using Vis-NIR spectrometer, and then assessed at 75, 100, 125 and 150 days after coolstorage at 0 °C. Machine learning classification models were developed using at-harvest Vis-NIR spectral data, to segregate storability of kiwifruit into two groups based on the export FF criterion of 9.8 N. The best prediction was obtained for fruit stored at 0 °C for 125 days: approximately 54% of the soft fruit (short storability) and 79% of the good fruit (long storability) could be predicted. Further novelty of this work lies within an independent external validation using data collected from a new season. Kiwifruit were repacked at harvest based on their potential storability predicted by the developed model, with the actual post-storage performance of the same fruit assessed to evaluate model robustness. Segregation between grower lines at harvest achieved 30% reduction in soft fruit after storage. Should the model be applied in the industry to enable sequential marketing, significant costs could be saved because of reduced fruit loss, repacking and condition checking costs.
  • Item
    Identifying important microbial and genomic biomarkers for differentiating right- versus left-sided colorectal cancer using random forest models
    (BioMed Central Ltd, 2023-07-11) Kolisnik T; Sulit AK; Schmeier S; Frizelle F; Purcell R; Smith A; Silander O
    BACKGROUND: Colorectal cancer (CRC) is a heterogeneous disease, with subtypes that have different clinical behaviours and subsequent prognoses. There is a growing body of evidence suggesting that right-sided colorectal cancer (RCC) and left-sided colorectal cancer (LCC) also differ in treatment success and patient outcomes. Biomarkers that differentiate between RCC and LCC are not well-established. Here, we apply random forest (RF) machine learning methods to identify genomic or microbial biomarkers that differentiate RCC and LCC. METHODS: RNA-seq expression data for 58,677 coding and non-coding human genes and count data for 28,557 human unmapped reads were obtained from 308 patient CRC tumour samples. We created three RF models for datasets of human genes-only, microbes-only, and genes-and-microbes combined. We used a permutation test to identify features of significant importance. Finally, we used differential expression (DE) and paired Wilcoxon-rank sum tests to associate features with a particular side. RESULTS: RF model accuracy scores were 90%, 70%, and 87% with area under curve (AUC) of 0.9, 0.76, and 0.89 for the human genomic, microbial, and combined feature sets, respectively. 15 features were identified as significant in the model of genes-only, 54 microbes in the model of microbes-only, and 28 genes and 18 microbes in the model with genes-and-microbes combined. PRAC1 expression was the most important feature for differentiating RCC and LCC in the genes-only model, with HOXB13, SPAG16, HOXC4, and RNLS also playing a role. Ruminococcus gnavus and Clostridium acetireducens were the most important in the microbial-only model. MYOM3, HOXC4, Coprococcus eutactus, PRAC1, lncRNA AC012531.25, Ruminococcus gnavus, RNLS, HOXC6, SPAG16 and Fusobacterium nucleatum were most important in the combined model. CONCLUSIONS: Many of the identified genes and microbes among all models have previously established associations with CRC. However, the ability of RF models to account for inter-feature relationships within the underlying decision trees may yield a more sensitive and biologically interconnected set of genomic and microbial biomarkers.