Journal Articles

Permanent URI for this collectionhttps://mro.massey.ac.nz/handle/10179/7915

Browse

Search Results

Now showing 1 - 10 of 13
  • Item
    pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.
    (Oxford University Press, 2024-10-07) Kolisnik T; Keshavarz-Rahaghi F; Purcell RV; Smith ANH; Silander OK
    Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.
  • Item
    Variable Selection from Image Texture Feature for Automatic Classification of Concrete Surface Voids.
    (Hindawi Limited, 2021-03-08) Zhao Z; Liu T; Zhao X; Haber RE
    Machine learning plays an important role in computational intelligence and has been widely used in many engineering fields. Surface voids or bugholes frequently appearing on concrete surface after the casting process make the corresponding manual inspection time consuming, costly, labor intensive, and inconsistent. In order to make a better inspection of the concrete surface, automatic classification of concrete bugholes is needed. In this paper, a variable selection strategy is proposed for pursuing feature interpretability, together with an automatic ensemble classification designed for getting a better accuracy of the bughole classification. A texture feature deriving from the Gabor filter and gray-level run lengths is extracted in concrete surface images. Interpretable variables, which are also the components of the feature, are selected according to a presented cumulative voting strategy. An ensemble classifier with its base classifier automatically assigned is provided to detect whether a surface void exists in an image or not. Experimental results on 1000 image samples indicate the effectiveness of our method with a comparable prediction accuracy and model explicable.
  • Item
    The Use of Triaxial Accelerometers and Machine Learning Algorithms for Behavioural Identification in Domestic Dogs (Canis familiaris): A Validation Study
    (MDPI (Basel, Switzerland), 2024-09-13) Redmond C; Smit M; Draganova I; Corner-Thomas R; Thomas D; Andrews C; Fullwood DT; Bowden AE
    Assessing the behaviour and physical attributes of domesticated dogs is critical for predicting the suitability of animals for companionship or specific roles such as hunting, military or service. Common methods of behavioural assessment can be time consuming, labour-intensive, and subject to bias, making large-scale and rapid implementation challenging. Objective, practical and time effective behaviour measures may be facilitated by remote and automated devices such as accelerometers. This study, therefore, aimed to validate the ActiGraph® accelerometer as a tool for behavioural classification. This study used a machine learning method that identified nine dog behaviours with an overall accuracy of 74% (range for each behaviour was 54 to 93%). In addition, overall body dynamic acceleration was found to be correlated with the amount of time spent exhibiting active behaviours (barking, locomotion, scratching, sniffing, and standing; R2 = 0.91, p < 0.001). Machine learning was an effective method to build a model to classify behaviours such as barking, defecating, drinking, eating, locomotion, resting-asleep, resting-alert, sniffing, and standing with high overall accuracy whilst maintaining a large behavioural repertoire.
  • Item
    Development and evaluation of a predictive algorithm and telehealth intervention to reduce suicidal behavior among university students.
    (Cambridge University Press, 2024-04-01) Hasking PA; Robinson K; McEvoy P; Melvin G; Bruffaerts R; Boyes ME; Auerbach RP; Hendrie D; Nock MK; Preece DA; Rees C; Kessler RC
    BACKGROUND: Suicidal behaviors are prevalent among college students; however, students remain reluctant to seek support. We developed a predictive algorithm to identify students at risk of suicidal behavior and used telehealth to reduce subsequent risk. METHODS: Data come from several waves of a prospective cohort study (2016-2022) of college students (n = 5454). All first-year students were invited to participate as volunteers. (Response rates range: 16.00-19.93%). A stepped-care approach was implemented: (i) all students received a comprehensive list of services; (ii) those reporting past 12-month suicidal ideation were directed to a safety planning application; (iii) those identified as high risk of suicidal behavior by the algorithm or reporting 12-month suicide attempt were contacted via telephone within 24-h of survey completion. Intervention focused on support/safety-planning, and referral to services for this high-risk group. RESULTS: 5454 students ranging in age from 17-36 (s.d. = 5.346) participated; 65% female. The algorithm identified 77% of students reporting subsequent suicidal behavior in the top 15% of predicted probabilities (Sensitivity = 26.26 [95% CI 17.93-36.07]; Specificity = 97.46 [95% CI 96.21-98.38], PPV = 53.06 [95% CI 40.16-65.56]; AUC range: 0.895 [95% CIs 0.872-0.917] to 0.966 [95% CIs 0.939-0.994]). High-risk students in the Intervention Cohort showed a 41.7% reduction in probability of suicidal behavior at 12-month follow-up compared to high-risk students in the Control Cohort. CONCLUSIONS: Predictive risk algorithms embedded into universal screening, coupled with telehealth intervention, offer significant potential as a suicide prevention approach for students.
  • Item
    The Use of Triaxial Accelerometers and Machine Learning Algorithms for Behavioural Identification in Domestic Cats (Felis catus): A Validation Study
    (MDPI (Basel, Switzerland), 2023-08-14) Smit M; Ikurior SJ; Corner-Thomas RA; Andrews CJ; Draganova I; Thomas DG; Vanwanseele B
    Animal behaviour can be an indicator of health and welfare. Monitoring behaviour through visual observation is labour-intensive and there is a risk of missing infrequent behaviours. Twelve healthy domestic shorthair cats were fitted with triaxial accelerometers mounted on a collar and harness. Over seven days, accelerometer and video footage were collected simultaneously. Identifier variables (n = 32) were calculated from the accelerometer data and summarized into 1 s epochs. Twenty-four behaviours were annotated from the video recordings and aligned with the summarised accelerometer data. Models were created using random forest (RF) and supervised self-organizing map (SOM) machine learning techniques for each mounting location. Multiple modelling rounds were run to select and merge behaviours based on performance values. All models were then tested on a validation accelerometer dataset from the same twelve cats to identify behaviours. The frequency of behaviours was calculated and compared using Dirichlet regression. Despite the SOM models having higher Kappa (>95%) and overall accuracy (>95%) compared with the RF models (64-76% and 70-86%, respectively), the RF models predicted behaviours more consistently between mounting locations. These results indicate that triaxial accelerometers can identify cat specific behaviours.
  • Item
    Mapping immunogenic epitopes of an adhesin-like protein from Methanobrevibacter ruminantium M1 and comparison of empirical data with in silico prediction methods.
    (Springer Nature Limited, 2022-06-21) Khanum S; Carbone V; Gupta SK; Yeung J; Shu D; Wilson T; Parlane NA; Altermann E; Estein SM; Janssen PH; Wedlock DN; Heiser A
    In silico prediction of epitopes is a potentially time-saving alternative to experimental epitope identification but is often subject to misidentification of epitopes and may not be useful for proteins from archaeal microorganisms. In this study, we mapped B- and T-cell epitopes of a model antigen from the methanogen Methanobrevibacter ruminantium M1, the Big_1 domain (AdLP-D1, amino acids 19-198) of an adhesin-like protein. A series of 17 overlapping 20-mer peptides was selected to cover the Big_1 domain. Peptide-specific antibodies were produced in mice and measured by ELISA, while an in vitro splenocyte re-stimulation assay determined specific T-cell responses. Overall, five peptides of the 17 peptides were shown to be major immunogenic epitopes of AdLP-D1. These immunogenic regions were examined for their localization in a homology-based model of AdLP-D1. Validated epitopes were found in the outside region of the protein, with loop like secondary structures reflecting their flexibility. The empirical data were compared with epitope predictions made by programmes based on a range of algorithms. In general, the epitopes identified by in silico predictions were not comparable to those determined empirically.
  • Item
    Mitigating cognitive biases in developing AI-assisted recruitment systems: A knowledge-sharing approach
    (IGI Global, 2022) Soleimani M; Intezari A; Pauleen DJ
    Artificial intelligence (AI) is increasingly embedded in business processes, including the human resource (HR) recruitment process. While AI can expedite the recruitment process, evidence from the industry, however, shows that AI-recruitment systems (AIRS) may fail to achieve unbiased decisions about applicants. There are risks of encoding biases in the datasets and algorithms of AI which lead AIRS to replicate and amplify human biases. To develop less biased AIRS, collaboration between HR managers and AI developers for training algorithms and exploring algorithmic biases is vital. Using an exploratory research design, 35 HR managers and AI developers globally were interviewed to understand the role of knowledge sharing during their collaboration in mitigating biases in AIRS. The findings show that knowledge sharing can help to mitigate biases in AIRS by informing data labeling, understanding job functions, and improving the machine learning model. Theoretical contributions and practical implications are suggested.
  • Item
    Stochastic simulation of multiscale complex systems with PISKaS: A rule-based approach
    (Elsevier Inc, 2018-03-29) Perez-Acle T; Fuenzalida I; Martin AJM; Santibañez R; Avaria R; Bernardin A; Bustos AM; Garrido D; Dushoff J; Liu JH
    Computational simulation is a widely employed methodology to study the dynamic behavior of complex systems. Although common approaches are based either on ordinary differential equations or stochastic differential equations, these techniques make several assumptions which, when it comes to biological processes, could often lead to unrealistic models. Among others, model approaches based on differential equations entangle kinetics and causality, failing when complexity increases, separating knowledge from models, and assuming that the average behavior of the population encompasses any individual deviation. To overcome these limitations, simulations based on the Stochastic Simulation Algorithm (SSA) appear as a suitable approach to model complex biological systems. In this work, we review three different models executed in PISKaS: a rule-based framework to produce multiscale stochastic simulations of complex systems. These models span multiple time and spatial scales ranging from gene regulation up to Game Theory. In the first example, we describe a model of the core regulatory network of gene expression in Escherichia coli highlighting the continuous model improvement capacities of PISKaS. The second example describes a hypothetical outbreak of the Ebola virus occurring in a compartmentalized environment resembling cities and highways. Finally, in the last example, we illustrate a stochastic model for the prisoner's dilemma; a common approach from social sciences describing complex interactions involving trust within human populations. As whole, these models demonstrate the capabilities of PISKaS providing fertile scenarios where to explore the dynamics of complex systems.
  • Item
    A multi-objective genetic algorithm to find active modules in multiplex biological networks
    (PLOS, 2021-08-30) Novoa-Del-Toro EM; Mezura-Montes E; Vignes M; Térézol M; Magdinier F; Tichit L; Baudot A; Jensen P
    The identification of subnetworks of interest-or active modules-by integrating biological networks with molecular profiles is a key resource to inform on the processes perturbed in different cellular conditions. We here propose MOGAMUN, a Multi-Objective Genetic Algorithm to identify active modules in MUltiplex biological Networks. MOGAMUN optimizes both the density of interactions and the scores of the nodes (e.g., their differential expression). We compare MOGAMUN with state-of-the-art methods, representative of different algorithms dedicated to the identification of active modules in single networks. MOGAMUN identifies dense and high-scoring modules that are also easier to interpret. In addition, to our knowledge, MOGAMUN is the first method able to use multiplex networks. Multiplex networks are composed of different layers of physical and functional relationships between genes and proteins. Each layer is associated to its own meaning, topology, and biases; the multiplex framework allows exploiting this diversity of biological networks. We applied MOGAMUN to identify cellular processes perturbed in Facio-Scapulo-Humeral muscular Dystrophy, by integrating RNA-seq expression data with a multiplex biological network. We identified different active modules of interest, thereby providing new angles for investigating the pathomechanisms of this disease.
  • Item
    What Are Sheep Doing? Tri-Axial Accelerometer Sensor Data Identify the Diel Activity Pattern of Ewe Lambs on Pasture
    (MDPI (Basel, Switzerland), 2021-10) Ikurior SJ; Marquetoux N; Leu ST; Corner-Thomas RA; Scott I; Pomroy WE
    Monitoring activity patterns of animals offers the opportunity to assess individual health and welfare in support of precision livestock farming. The purpose of this study was to use a triaxial accelerometer sensor to determine the diel activity of sheep on pasture. Six Perendale ewe lambs, each fitted with a neck collar mounting a triaxial accelerometer, were filmed during targeted periods of sheep activities: grazing, lying, walking, and standing. The corresponding acceleration data were fitted using a Random Forest algorithm to classify activity (=classifier). This classifier was then applied to accelerometer data from an additional 10 ewe lambs to determine their activity budgets. Each of these was fitted with a neck collar mounting an accelerometer as well as two additional accelerometers placed on a head halter and a body harness over the shoulders of the animal. These were monitored continuously for three days. A classification accuracy of 89.6% was achieved for the grazing, walking and resting activities (i.e., a new class combining lying and standing activity). Triaxial accelerometer data showed that sheep spent 64% (95% CI 55% to 74%) of daylight time grazing, with grazing at night reduced to 14% (95% CI 8% to 20%). Similar activity budgets were achieved from the halter mounted sensors, but not those on a body harness. These results are consistent with previous studies directly observing daily activity of pasture-based sheep and can be applied in a variety of contexts to investigate animal health and welfare metrics e.g., to better understand the impact that young sheep can suffer when carrying even modest burdens of parasitic nematodes.