Journal Articles

Permanent URI for this collectionhttps://mro.massey.ac.nz/handle/10179/7915

Browse

Search Results

Now showing 1 - 10 of 19
  • Item
    Predicting Distance and Direction from Text Locality Descriptions for Biological Specimen Collections
    (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022-08-22) Liao R; Das PP; Jones CB; Aflaki N; Stock K; Ishikawa T; Fabrikant SI; Winter S
    A considerable proportion of records that describe biological specimens (flora, soil, invertebrates), and especially those that were collected decades ago, are not attached to corresponding geographical coordinates, but rather have their location described only through textual descriptions (e.g. North Canterbury, Selwyn River near bridge on Springston-Leeston Rd). Without geographical coordinates, millions of records stored in museum collections around the world cannot be mapped. We present a method for predicting the distance and direction associated with human language location descriptions which focuses on the interpretation of geospatial prepositions and the way in which they modify the location represented by an associated reference place name (e.g. near the Manawatu River). We study eight distance-oriented prepositions and eight direction-oriented prepositions and use machine learning regression to predict distance or direction, relative to the reference place name, from a collection of training data. The results show that, compared with a simple baseline, our model improved distance predictions by up to 60% and direction predictions by up to 31%.
  • Item
    Evaluating the Effects of Novel Enrichment Strategies on Dog Behaviour Using Collar-Based Accelerometers
    (MDPI (Basel, Switzerland), 2025-06-03) Redmond C; Draganova I; Corner-Thomas R; Thomas D; Andrews C; Gaunet F
    Environmental enrichment is crucial to improve welfare, reduce stress, and encourage natural behaviours in dogs housed in confined environments. This study aimed to use accelerometery and machine learning to evaluate the effect of different enrichment types on dog behaviour. Three enrichments (food, olfactory, and tactile) were provided to dogs for five consecutive days, with four days between each treatment. Acceleration data were collected using a collar mounted ActiGraph®. Nine behaviours were classified using a validated machine learning model. Behaviour and activity differed significantly among the dogs. Dogs interacted most with the food enrichment, followed by the olfactory and then tactile enrichments. The dogs were least active during the olfactory enrichment, whereas activity was relatively consistent during the food and tactile enrichments. For all enrichments, dogs exhibited the most exploratory/locomotive behaviour during the first hour of each enrichment period, but this declined over the treatment period indicating habituation. For exploratory and locomotive behaviour, food enrichment was the most stimulating for the dogs with longer daily engagement than for both olfactory and tactile enrichments. These results illustrate that accelerometery and machine learning can be used to evaluate enrichment strategies in dogs, but it is important to consider variation among dogs and habituation.
  • Item
    pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.
    (Oxford University Press, 2024-10-07) Kolisnik T; Keshavarz-Rahaghi F; Purcell RV; Smith ANH; Silander OK
    Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.
  • Item
    The impact of ethnicity and intra-pancreatic fat on the postprandial metabolome response to whey protein in overweight Asian Chinese and European Caucasian women with prediabetes
    (Frontiers Media S.A., 2022-10-14) Joblin-Mills A; Wu Z; Fraser K; Jones B; Yip W; Lim JJ; Lu L; Sequeira I; Poppitt S; Li X
    The “Thin on the Outside Fat on the Inside” TOFI_Asia study found Asian Chinese to be more susceptible to Type 2 Diabetes (T2D) compared to European Caucasians matched for gender and body mass index (BMI). This was influenced by degree of visceral adipose deposition and ectopic fat accumulation in key organs, including liver and pancreas, leading to altered fasting plasma glucose, insulin resistance, and differences in plasma lipid and metabolite profiles. It remains unclear how intra-pancreatic fat deposition (IPFD) impacts TOFI phenotype-related T2D risk factors associated with Asian Chinese. Cow’s milk whey protein isolate (WPI) is an insulin secretagogue which can suppress hyperglycemia in prediabetes. In this dietary intervention, we used untargeted metabolomics to characterize the postprandial WPI response in 24 overweight women with prediabetes. Participants were classified by ethnicity (Asian Chinese, n=12; European Caucasian, n=12) and IPFD (low IPFD < 4.66%, n=10; high IPFD ≥ 4.66%, n=10). Using a cross-over design participants were randomized to consume three WPI beverages on separate occasions; 0 g (water control), 12.5 g (low protein, LP) and 50 g (high protein, HP), consumed when fasted. An exclusion pipeline for isolating metabolites with temporal (T0-240mins) WPI responses was implemented, and a support vector machine-recursive feature elimination (SVM-RFE) algorithm was used to model relevant metabolites by ethnicity and IPFD classes. Metabolic network analysis identified glycine as a central hub in both ethnicity and IPFD WPI response networks. A depletion of glycine relative to WPI concentration was detected in Chinese and high IPFD participants independent of BMI. Urea cycle metabolites were highly represented among the ethnicity WPI metabolome model, implicating a dysregulation in ammonia and nitrogen metabolism among Chinese participants. Uric acid and purine synthesis pathways were enriched within the high IPFD cohort’s WPI metabolome response, implicating adipogenesis and insulin resistance pathways. In conclusion, the discrimination of ethnicity from WPI metabolome profiles was a stronger prediction model than IPFD in overweight women with prediabetes. Each models’ discriminatory metabolites enriched different metabolic pathways that help to further characterize prediabetes in Asian Chinese women and women with increased IPFD, independently.
  • Item
    Forecasting patient demand at urgent care clinics using explainable machine learning
    (John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology., 2023-09-01) Susnjak T; Maddigan P
    Urgent care clinics and emergency departments around the world periodically suffer from extended wait times beyond patient expectations due to surges in patient flows. The delays arising from inadequate staffing levels during these periods have been linked with adverse clinical outcomes. Previous research into forecasting patient flows has mostly used statistical techniques. These studies have also predominately focussed on short-term forecasts, which have limited practicality for the resourcing of medical personnel. This study joins an emerging body of work which seeks to explore the potential of machine learning algorithms to generate accurate forecasts of patient presentations. Our research uses datasets covering 10 years from two large urgent care clinics to develop long-term patient flow forecasts up to one quarter ahead using a range of state-of-the-art algorithms. A distinctive feature of this study is the use of eXplainable Artificial Intelligence (XAI) tools like Shapely and LIME that enable an in-depth analysis of the behaviour of the models, which would otherwise be uninterpretable. These analysis tools enabled us to explore the ability of the models to adapt to the volatility in patient demand during the COVID-19 pandemic lockdowns and to identify the most impactful variables, resulting in valuable insights into their performance. The results showed that a novel combination of advanced univariate models like Prophet as well as gradient boosting, into an ensemble, delivered the most accurate and consistent solutions on average. This approach generated improvements in the range of 16%–30% over the existing in-house methods for estimating the daily patient flows 90 days ahead.
  • Item
    How Lazy Are Pet Cats Really? Using Machine Learning and Accelerometry to Get a Glimpse into the Behaviour of Privately Owned Cats in Different Households
    (MDPI (Basel, Switzerland), 2024-04-19) Smit M; Corner-Thomas R; Draganova I; Andrews C; Thomas D; Friedrich CM
    Surprisingly little is known about how the home environment influences the behaviour of pet cats. This study aimed to determine how factors in the home environment (e.g., with or without outdoor access, urban vs. rural, presence of a child) and the season influences the daily behaviour of cats. Using accelerometer data and a validated machine learning model, behaviours including being active, eating, grooming, littering, lying, scratching, sitting, and standing were quantified for 28 pet cats. Generalized estimating equation models were used to determine the effects of different environmental conditions. Increasing cat age was negatively correlated with time spent active (p < 0.05). Cats with outdoor access (n = 18) were less active in winter than in summer (p < 0.05), but no differences were observed between seasons for indoor-only (n = 10) cats. Cats living in rural areas (n = 7) spent more time eating than cats in urban areas (n = 21; p < 0.05). Cats living in single-cat households (n = 12) spent more time lying but less time sitting than cats living in multi-cat households (n = 16; p < 0.05). Cats in households with at least one child (n = 20) spent more time standing in winter (p < 0.05), and more time lying but less time sitting in summer compared to cats in households with no children (n = 8; p < 0.05). This study clearly shows that the home environment has a major impact on cat behaviour.
  • Item
    Systematic Mapping of Global Research on Disaster Damage Estimation for Buildings: A Machine Learning-Aided Study
    (MDPI (Basel, Switzerland), 2024-06-20) Rajapaksha D; Siriwardana C; Ruparathna R; Maqsood T; Setunge S; Rajapakse L; De Silva S; Witt E; Bilau AA; Sun B
    Research on disaster damage estimation for buildings has gained extensive attention due to the increased number of disastrous events, facilitating risk assessment, the effective integration of disaster resilience measures, and policy development. A systematic mapping study has been conducted, focusing on disaster damage estimation studies to identify trends, relationships, and gaps in this large and exponentially growing subject area. A novel approach using machine learning algorithms to screen, categorise, and map the articles was adopted to mitigate the constraints of manual handling. Out of 8608 articles from major scientific databases, the most relevant 2186 were used in the analysis. These articles were classified based on the hazard, geographical location, damage function properties, and building properties. Key observations reveal an emerging trend in publications, with most studies concentrated in developed and severely disaster-affected countries in America, Europe, and Asia. A significant portion (68%) of the relevant articles focus on earthquakes. However, as the key research opportunities, a notable research gap exists in studies focusing on the African and South American continents despite the significant damage caused by disasters there. Additionally, studies on floods, hurricanes, and tsunamis are minimal compared to those on earthquakes. Further trends and relationships in current studies were analysed to convey insights from the literature, identifying research gaps in terms of hazards, geographical locations, and other relevant parameters. These insights aim to effectively guide future research in disaster damage estimation for buildings.
  • Item
    Mapping a Cloud-Free Rice Growth Stages Using the Integration of PROBA-V and Sentinel-1 and Its Temporal Correlation with Sub-District Statistics
    (MDPI (Basel, Switzerland), 2021-04-13) Ramadhani F; Pullanagari R; Kereszturi G; Procter J; Farooque AA
    Monitoring rice production is essential for securing food security against climate change threats, such as drought and flood events becoming more intense and frequent. The current practice to survey an area of rice production manually and in near real-time is expensive and involves a high workload for local statisticians. Remote sensing technology with satellite-based sensors has grown in popularity in recent decades as an alternative approach, reducing the cost and time required for spatial analysis over a wide area. However, cloud-free pixels of optical imagery are required to pro-duce accurate outputs for agriculture applications. Thus, in this study, we propose an integration of optical (PROBA-V) and radar (Sentinel-1) imagery for temporal mapping of rice growth stages, including bare land, vegetative, reproductive, and ripening stages. We have built classification models for both sensors and combined them into 12-day periodical rice growth-stage maps from January 2017 to September 2018 at the sub-district level over Java Island, the top rice production area in Indonesia. The accuracy measurement was based on the test dataset and the predicted cross-correlated with monthly local statistics. The overall accuracy of the rice growth-stage model of PROBA-V was 83.87%, and the Sentinel-1 model was 71.74% with the Support Vector Machine classifier. The temporal maps were comparable with local statistics, with an average correlation between the vegetative area (remote sensing) and harvested area (local statistics) is 0.50, and lag time 89.5 days (n = 91). This result was similar to local statistics data, which correlate planting and the harvested area at 0.61, and the lag time as 90.4 days, respectively. Moreover, the cross-correlation between the predicted rice growth stage was also consistent with rice development in the area (r > 0.52, p < 0.01). This novel method is straightforward, easy to replicate and apply to other areas, and can be scaled up to the national and regional level to be used by stakeholders to support improved agricultural policies for sustainable rice production.
  • Item
    Pregnancy status predicted using milk mid-infrared spectra from dairy cattle
    (Elsevier Inc. and Fass Inc. on behalf of the American Dairy Science Association, 2022-04) Tiplady KM; Trinh M-H; Davis SR; Sherlock RG; Spelman RJ; Garrick DJ; Harris BL
    Accurate and timely pregnancy diagnosis is an important component of effective herd management in dairy cattle. Predicting pregnancy from Fourier-transform mid-infrared (FT-MIR) spectroscopy data is of particular interest because the data are often already available from routine milk testing. The purpose of this study was to evaluate how well pregnancy status could be predicted in a large data set of 1,161,436 FT-MIR milk spectra records from 863,982 mixed-breed pasture-based New Zealand dairy cattle managed within seasonal calving systems. Three strategies were assessed for defining the nonpregnant cows when partitioning the records according to pregnancy status in the training population. Two of these used records for cows with a subsequent calving only, whereas the third also included records for cows without a subsequent calving. For each partitioning strategy, partial least squares discriminant analysis models were developed, whereby spectra from all the cows in 80% of herds were used to train the models, and predictions on cows in the remaining herds were used for validation. A separate data set was also used as a secondary validation, whereby pregnancy diagnosis had been assigned according to the presence of pregnancy-associated glycoproteins (PAG) in the milk samples. We examined different ways of accounting for stage of lactation in the prediction models, either by including it as an effect in the prediction model, or by pre-adjusting spectra before fitting the model. For a subset of strategies, we also assessed prediction accuracies from deep learning approaches, utilizing either the raw spectra or images of spectra. Across all strategies, prediction accuracies were highest for models using the unadjusted spectra as model predictors. Strategies for cows with a subsequent calving performed well in herd-independent validation with sensitivities above 0.79, specificities above 0.91 and area under the receiver operating characteristic curve (AUC) values over 0.91. However, for these strategies, the specificity to predict nonpregnant cows in the external PAG data set was poor (0.002-0.04). The best performing models were those that included records for cows without a subsequent calving, and used unadjusted spectra and days in milk as predictors, with consistent results observed across the training, herd-independent validation and PAG data sets. For the partial least squares discriminant analysis model, sensitivity was 0.71, specificity was 0.54 and AUC values were 0.68 in the PAG data set; and for an image-based deep learning model, the sensitivity was 0.74, specificity was 0.52 and the AUC value was 0.69. Our results demonstrate that in pasture-based seasonal calving herds, confounding between pregnancy status and spectral changes associated with stage of lactation can inflate prediction accuracies. When the effect of this confounding was reduced, prediction accuracies were not sufficiently high enough to use as a sole indicator of pregnancy status.
  • Item
    Evaluating Alternatives to Locomotion Scoring for Detecting Lameness in Pasture-Based Dairy Cattle in New Zealand: In-Parlour Scoring
    (MDPI (Basel, Switzerland), 2022-03-11) Werema CW; Yang DA; Laven LJ; Mueller KR; Laven RA; Kofler J
    Earlier detection followed by efficient treatment can reduce the impact of lameness. Currently, locomotion scoring (LS) is the most widely used method of early detection but has significant limitations in pasture-based cattle and is not commonly used routinely in New Zealand. Scoring in the milking parlour may be more achievable, so this study compared an in-parlour scoring (IPS) technique with LS in pasture-based dairy cows. For nine months on two dairy farms, whole herd LS (4-point 0−3 scale) was followed 24 h later by IPS, with cows being milked. Observed for shifting weight, abnormal weight distribution, swollen heel or hock joint, and overgrown hoof. Every third cow was scored. Sensitivity and specificity of individual IPS indicators and one or more, two or more or three positive indicators for detecting cows with locomotion scores ≥ 2 were calculated. Using a threshold of two or more positive indicators were optimal (sensitivity > 92% and specificity > 98%). Utilising the IPS indicators, a decision tree machine learning procedure classified cows with locomotion score class ≥2 with a true positive rate of 75% and a false positive rate of 0.2%. IPS has the potential to be an alternative to LS on pasture-based dairy farms.