Massey Documents by Type

Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294

Browse

Search Results

Now showing 1 - 9 of 9
  • Item
    Bayesian distributions of species abundance along environmental gradients : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Statistics at Massey University, Albany, New Zealand
    (Massey University, 2023) Rabel, Hayden Daniel
    Understanding the relationship between species abundance and environmental conditions is crucial for conservation and management efforts. This thesis presents a novel approach for predicting distributions of species abundance along environmental gradients (DAEG) by refining the parameter space of a nonlinear zero-inflated negative binomial with modskurt mean (NZM) model and utilising Bayesian prior probability. Chapter 2 elucidates the NZM model, highlighting the challenges posed by the intricate nature of parameter estimation due to the model’s complexity. A refined parameter subspace is proposed to address the issue of multimodality in the likelihood surface, enhancing the reliability of predicting DAEGs. Chapter 3 employs Bayesian inference and proposes a prior distribution for the NZM model that increases the ecological structure used in the parameter estimation process and improves the reliability of making realistic predictions. A step-by-step workflow for using the Bayesian implementation is presented and demonstrated with a case study. The thesis includes as a supplement an R package and interactive resources (Appendix A; https://hdrab127.github.io/modskurt/) that enable straightforward fitting of DAEGs using Bayesian NZM models. This work contributes to more accurate predictions of DAEGs, provides a practical tool for ecological research, and promotes effective conservation and management efforts.
  • Item
    Monitoring the mean with locally weighted averages for skewed processes : this dissertation is submitted for the degree of Master of Philosophy in Statistics, School of Mathematics and Computational Sciences, Massey University, New Zealand
    (Massey University, 2022) Wickramasinghe, W. M. P. M.
    Averaging functions are used in many research areas such as decision making, image processing, pattern recognition and statistics. The basic averaging function, arithmetic mean, is most widely used in statistical quality control to monitor a particular quality characteristic. However, other averaging functions such as weighted averages can be used in control charting to improve the probability of detection in process level shifts when a process distribution deviates from the normality assumption. This study focused on applying locally weighted averages as the control statistic in quality control charts to detect the process mean of a right-skewed process. Six weights were defined: Max-weight – based on the maximum distance; PDF-weight – based on the probability density function of the process; CoPDF-weight – based on the complement of the probability density function of the process; CDF-weight – based on the cumulative probability density function of the process; CoCDF-weight – based on the complement of the cumulative density function of the process; and Haz-weight – based on the hazard function of the process. Weighted average control charts; 𝑋̃𝑚𝑎𝑥, 𝑋̃ 𝑝𝑑𝑓, 𝑋̃ 1−𝑝𝑑𝑓, 𝑋̃ 𝑐𝑑𝑓, 𝑋̃ 1−𝑐𝑑𝑓, and 𝑋̃ ℎ𝑎𝑧 were proposed to monitor the process mean using the weighted averages based on Max-weight, PDF-weight, CoPDF-weight, CDF-weight, CoCDF-weight, and Haz-weight, respectively as the control statistic. First, the behaviour of these control statistics was explored for symmetric distributions using the standard normal distribution. Second, the performance of these control charts was compared to Shewhart 𝑋̅ control chart for right-skewed distributions using the average run length (𝐴𝑅𝐿) and the standard deviation of the run length (𝑆𝐷𝑅𝐿). Exponential and three gamma distributions were considered to illustrate positively skewed distributions in this study. Monte-Carlo simulations were used in evaluating the 𝐴𝑅𝐿s and 𝑆𝐷𝑅𝐿s and control limits for Phase II applications. Then Phase I control limits were established for all the distributions considered using bootstrapping. When the process is symmetric, 𝑋̅ control chart was suitable for monitoring the process mean as expected. On the other hand, 𝑋̃ 𝑐𝑑𝑓 and 𝑋̃ 1−𝑐𝑑𝑓 control charts were able to detect the variance of symmetric distributions. The importance of these results is that the weighted average control charts and the 𝑋̅ control chart can be plotted in the same graph facilitating to simultaneously detect the mean and the variance, this is discussed as joint monitoring in the literature. Weighted average control charts cannot monitor the process mean when the underlying distribution of the quality characteristic is identified as exponential. However, when the quality characteristic follows a gamma distribution, weighted averages outperformed the Shewhart 𝑋̅ control chart in a variety of situations. Therefore, the locally weighted averages proposed in this study are useful in monitoring the process mean for gamma-distributed data and variance of symmetric distributions.
  • Item
    Estimating credibility of science claims : analysis of forecasting data from metascience projects : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Albany, New Zealand
    (Massey University, 2021) Gordon, Michael
    The veracity of scientific claims is not always certain. In fact, sufficient claims have been proven incorrect that many scientists believe that science itself is facing a “replication crisis”. Large scale replication projects provided empirical evidence that only around 50% of published social and behavioral science findings are replicable. Multiple forecasting studies showed that the outcomes of replication projects could be predicted by crowdsourced human evaluators. The research presented in this thesis builds on previous forecasting studies, deriving new findings and exploring new scope and scale. The research is centered around the DARPA SCORE (Systematizing Confidence in Open Research and Evidence) programme, a project aimed at developing measures of credibility for social and behavioral science claims. As part of my contribution to SCORE, myself, along with a international collaboration, elicited forecasts from human experts via surveys and prediction markets to predict the replicability of 3000 claims. I also present research on other forecasting studies. In chapter 2, I pool data from previous studies to analyse the performance of prediction markets and surveys with higher statistical power. I confirm that prediction markets are better at forecasting replication outcomes than surveys. This study also demonstrates the relationship between p-values of original findings and replication outcomes. These findings are used to inform the experimental and statistical design to forecast the replicability of 3000 claims as part of the SCORE programme. A full description of the design including planned statistical analyses is included in chapter 3. Due to COVID-19 restrictions, our generated forecasts could not be validated through direct replication, experiments conducted by other teams within the SCORE collaboration, thereby preventing results being presented in this thesis. The completion of these replications is now scheduled for 2022, and the pre-analysis plan presented in Chapter 3 will provide the basis for the analysis of the resulting data. In chapter 4, an analysis of ‘meta’ forecasts, or forecasts regarding field wide replication rates and year specific replication rates, is presented. We presented and published community expectations that replication rates will differ by field and will increase over time. These forecasts serve as valuable insights into the academic community’s views of the replication crisis, including those research fields for which no large-scale replication studies have been undertaken yet. Once the full results from SCORE are available, there will be additional insights from validations of the community expectations. I also analyse forecaster’s ability to predict replications and effect sizes in Chapters 5 (Creative Destruction in Science) and 6 (A creative destruction approach to replication: Implicit work and sex morality across cultures). In these projects a ‘creative destruction’ approach to replication was used, where a claim is compared not only to the null hypothesis but to alternative contradictory claims. I conclude forecasters can predict the size and direction of effects. Chapter 7 examines the use of forecasting for scientific outcomes beyond replication. In the COVID-19 preprint forecasting project I find that forecasters can predict if a preprint will be published within one year, including the quality of the publishing journal. Forecasters can also predict the number of citations preprints will receive. This thesis demonstrates that information about scientific claims with respect to replicability is dispersed within scientific community. I have helped to develop methodologies and tools to efficiently elicit and aggregate forecasts. Forecasts about scientific outcomes can be used as guides to credibility, to gauge community expectations and to efficiently allocate sparse replication resources.
  • Item
    Sources of bias in mobile phone surveys in developing countries : a thesis presented in partial fulfilment of the requirements for the degree of Master of Applied Statistics at Massey University, Manawatu, New Zealand
    (Massey University, 2020) Harman, Prudence Coverdale
    This study analyses three surveys carried out to measure food security in the poorest regions of Nepal: a baseline face-to-face (F2F) survey and two dual-mode surveys where respondents received either a F2F or a mobile phone interview. The goal of the analysis was to investigate whether mobile phone surveys could replace traditional F2F surveys without compromising the accuracy of data. Across all three surveys, households not owning mobile phones were found to be less food secure than households owning mobile phones: they consumed less food, had poorer diets and lower levels of food stocks. These findings reflected the results from analyses of demographic and socio-economic indicators which indicated that households not owning phones were poorer and less educated than households owning mobile phones. The mode of interview (mobile phone or F2F) was analysed for one survey. It appeared that responses about food security do not differ if given in a F2F interview or a mobile phone interview. In the two dual-mode surveys, non-response was analysed for those assigned a mobile phone interview. The results were contradictory: in one survey, mobile phone respondents were found to be more food-secure (also better educated and wealthier) than non-respondents while, in the other survey, they were found to be less food-secure (also poorer and less educated) than non-respondents. It is concluded that food security estimates from mobile phone surveys are biased with systematic differences between respondents of mobile phone surveys and the population. The overall bias is comprised of coverage bias and non-response bias. It is expected that coverage bias will decrease over time as mobile phone ownership increases, but that non-response bias will continue to affect food security estimates. Due to the contradictory results of the non-response analysis, it was not possible to consider bias correction techniques such as post-stratification. It was therefore concluded that reliable food security estimates cannot yet be obtained from mobile phone surveys in Nepal, and the continuation of dual-mode surveys was recommended.
  • Item
    Sparse summaries of complex covariance structures : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics, School of Natural & Computational Sciences, Massey University, Auckland, New Zealand
    (Massey University, 2020) Bashir, Amir
    A matrix that has most of its elements equal to zero is called a sparse matrix. The zero elements in a sparse matrix reduce the number of parameters for its potential interpretability. Bayesians desiring a sparse model frequently formulate priors that enhance sparsity. However, in most settings, this leads to sparse posterior samples, not to a sparse posterior mean. A decoupled shrinkage and selection posterior - variable selection approach was proposed by (Hahn & Carvalho, 2015) to address this problem in a regression setting to set some of the elements of the regression coefficients matrix to exact zeros. Hahn & Carvallho (2015) suggested to work on a decoupled shrinkage and selection approach in a Gaussian graphical models setting to set some of the elements of a precision matrix (graph) to exact zeros. In this thesis, I have filled this gap and proposed decoupled shrinkage and selection approaches to sparsify the precision matrix and the factor loading matrix that is an extension of Hahn & Carvallho’s (2015) decoupled shrinkage and selection approach. The decoupled shrinkage and selection approach proposed by me uses samples from the posterior over the parameter, sets a penalization criteria to produce progressively sparser estimates of the desired parameter, and then sets a rule to pick the final desired parameter from the generated parameters, based on the posterior distribution of fit. My proposed decoupled approach generally produced sparser graphs than a range of existing sparsification strategies such as thresholding the partial correlations, credible interval, adaptive graphical Lasso, and ratio selection, while maintaining a good fit based on the log-likelihood. In simulation studies, my decoupled shrinkage and selection approach had better sensitivity and specificity than the other strategies as the dimension p and sample size n grew. For low-dimensional data, my decoupled shrinkage and selection approach was comparable with the other strategies. Further, I have extended my proposed decoupled shrinkage and selection approach for one population to two populations by modifying the ADMM (alternating directions method of multipliers) algorithm in the JGL (joint graphical Lasso) R – package (Danaher et al, 2013) to find sparse sets of differences between two inverse covariance matrices. The simulation studies showed that my decoupled shrinkage and selection approach for two populations for the sparse case had better sensitivity and specificity than the sensitivity and specificity using JGL. However, sparse sets of differences were challenging for the dense case and moderate sample sizes. My decoupled shrinkage and selection approach for two populations was also applied to find sparse sets of differences between the precision matrices for cases and controls in a metabolomics dataset. Finally, decoupled shrinkage and selection is used to post-process the posterior mean covariance matrix to produce a factor model with a sparse factor loading matrix whose expected fit lies within the upper 95% of the posterior over fits. In the Gaussian setting, simulation studies showed that my proposed DSS sparse factor model approach performed better than fanc (factor analysis using non-convex penalties) (Hirose and Yamamoto, 2015) in terms of sensitivity, specificity, and picking the correct number of factors. Decoupled shrinkage and selection is also easily applied to models where a latent multivariate normal underlies non-Gaussian marginals, e.g., multivariate probit models. I illustrate my findings with moderate dimensional data examples from modelling of food frequency questionnaires and fish abundance.
  • Item
    Risk analysis of life-time acceptance sampling plans under model uncertainties : a thesis submitted for the degree of Master of Science, School of Fundamental Sciences (SFS), Massey University, New Zealand
    (Massey University, 2020) Yang, Ruizhe
    Lifetime acceptance sampling is one of the important branches of quality engineering because lifetime is a critical characteristic of many industrial and agricultural products. Due to budget and time constraints, lifetime acceptance sampling plans usually suffer from the curse of small sample sizes. Given a sufficiently large sample size, the test or sample data can identify its parent distribution easily. However, it is a challenge to find out the parent distribution for small sample sizes, especially when the data comes from a lifetime distribution having a shape parameter. In this thesis, I propose a variables sampling plan, called the M-method plan, to resolve the distribution-data identification issue in lifetime acceptance sampling with small sample sizes. Extensive Monte Carlo simulation studies were carried out to compare the Operating Characteristic (OC) curves of the M-method plans, and two existing alternative plans. Furthermore, I show that the lognormal distribution, which is a shape free lifetime distribution, can be used as a surrogate for Weibull or gamma distributions when the sample size is small. In other words, model uncertainties can be ignored when designing a lifetime acceptance sampling plan under the M-method. The M-method based sampling plan, under the correctly-specified distribution, is compared with various M-method based sampling plans under the scenario of misspecified distributions. Even though the OC curves are distinct from each other significantly depending on the operating procedure, the OC curves can be matched under the proposed method when the parent distribution is fully misspecified as a lognormal distribution for small sample sizes.
  • Item
    Statistical modelling for zoonotic diseases : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand
    (Massey University, 2020) Liao, Sih-Jing
    Preventing and controlling zoonoses through the design and implementation of public health policies requires a thorough understanding of epidemiology and transmission pathways. A pathogen may have complex transmission pathways that could be affected by environmental factors, different reservoirs and the food chain. One way to get more insight into a zoonosis is to trace back the putative sources of infection. Approaches to attribute the infection to sources include epidemiological observations and microbial subtyping techniques. In order for source attribution from the pathways to human infection to be delineated, this thesis proposes statistical modelling methods with an integration of demographic variables with multilocus sequence typing data derived from human cases and sources. These models are framed in a Bayesian context, allowing for a flexible use of limited knowledge about the illness to make inferences about the potential sources contributing to human infection. These methods are applied to campylobacteriosis data collected from a surveillance sentinel site in the Manawatu region of New Zealand. A link between genotypes found from sources and human samples is considered in the modelling scheme, assuming genotypes from sources are equal or linked indirectly to that from human cases. Model diagnostics show that the assumption of equal prevalence of genotypes between humans and sources is not tenable, with a few types being potentially more prevalent in humans than in sources, or vice versa. Thus, a model that allows genotypes on humans to differ from those on sources is implemented. In addition, an approximate Bayesian model is also proposed, which essentially cuts the link between human and source genotype distributions when conducting inference. The final inference from these approaches is the probability for human cases attributable to each source, conditional on the extent to which each case resides in a rural compared to urban environment. Results from the effective models suggest that poultry and ruminants are important sources for human campylobacteriosis. The more rural human cases are located, the higher the likelihood of ruminant-sourced cases is. In contrast, cases are more poultry-associated when their locations are more urban. A little rurality effect is noticed for water and other sources due to small sample sizes compared to that from poultry and ruminants. In addition, animal faeces are believed to be the primary cause of water contamination via rainfall or runoff coming from farmland and pasture. When water is treated as a medium in the transmission, instead of an end point, water birds are suggested to be the most likely contributor to water contamination. These findings have implications for public health practice and food safety risk management. A risk management strategy had been carried out in the poultry industry in New Zealand, leading to a marked decrease of urban case rates from a poultry source. However, the findings of this thesis suggest a further step with a focus on rural areas as rural case rates are observed to be relatively higher than urban rates. Further, by exploring the role that water plays in the transmission, it deepens our knowledge of the epidemiology about waterborne campylobacteriosis and highlights the importance of water quality. This opens a potential research direction to study the association of water quality and environmental factors such as higher global temperatures for this disease.
  • Item
    Functional biodiversity of New Zealand's marine fishes versus depth and latitude : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Auckland, New Zealand
    (Massey University, 2020) Myers, Elisabeth
    Understanding patterns and processes governing biodiversity along broad-scale environmental gradients requires an assessment of not only taxonomic richness, but also morphological and functional traits of organisms. The deep sea is the largest habitat on earth and provides many important ecosystem services. Decreases in light, temperature, and trophic resources, along with increases in pressure that occur with greater depth, renders the deep sea one of the most constraining environments for supporting life. However, little is known about how biodiversity, and especially functional biodiversity, changes along the depth gradient. This thesis aimed to fill this gap by using a combination of traits associated with food acquisition and locomotion to quantify and characterise patterns of functional diversity across large-scale depth and latitude gradients and to investigate potential mechanisms driving biodiversity. First, to identify the major selective forces acting on morphology, I documented patterns of variation in the traits of fishes at broad spatial scales. I found that with increasing depth, fishes, on average, became larger and more elongate, and had a larger oral gape and eye size. With increasing depth, fish morphology shifted towards body shapes that enable energy-efficient undulatory swimming styles and an increased jaw-length versus mouth width to aid opportunistic feeding. Second, I investigated the role of environmental filters versus biotic interactions in shaping the functional space of communities along depth and latitude gradients by measuring the intra- and inter-specific richness, dispersion and regularity in functional trait space. I found that functional alpha diversity was unexpectedly high in deep-sea communities, but decreased with increasing latitude, and that competition within and among species shaped the multi-dimensional functional space for fishes at the local alpha diversity level. Third, I described spatial patterns in functional beta diversity for New Zealand marine fishes versus depth and latitude, and delineated functional bioregions. The functional turnover in fish communities was greater across depth than latitude, and latitudinal functional turnover decreased with increasing depth. I surmise that environmental filtering may be the primary driver of broad-scale patterns of beta diversity in the deep sea. Overall, this thesis contributes new knowledge regarding broad-scale functional biodiversity patterns across depth and latitude via the morphological and functional traits of New Zealand’s marine fishes. Through the measurement of individual trait variation, and the quantification of functional alpha and beta diversity, this thesis characterised variation in the traits of fishes over large spatial scales, determined the spatial turnover of functional traits, and described the relative importance of environmental versus biotic drivers in shaping the functional space of deep-sea communities. These contributions provide foundational understanding for future research on the functional diversity of marine fishes, biodiversity patterns across the depth gradient, and the monitoring of biodiversity change across New Zealand’s latitudinal and depth gradients.
  • Item
    Some diagnostic techniques for small area estimation : with applications to poverty mapping : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand
    (Massey University, 2019) Livingston, Alison
    Small area estimation (SAE) techniques borrow strength via auxiliary variables to provide reliable estimates at finer geographical levels. An important application is poverty mapping, whereby aid organisations distribute millions of dollars every year based on small area estimates of poverty measures. Therefore diagnostics become an important tool to ensure estimates are reliable and funding is distributed to the most impoverished communities. Small area models can be large and complex, however even the most complex models can be of little use if they do not have predictive power at the small area level. This motivated a variable importance measure for SAE that considers each auxiliary variable’s ability to explain the variation in the dependent variable, as well as its ability to distinguish between the relative levels in the small areas. A core question addressed is how candidate survey-based models might be simplified without losing accuracy or introducing bias in the small area estimates. When a small area estimate appears to be biased or unusual, it is important to investigate and if necessary remedy the situation. A diagnostic is proposed that quantifies the relative effect of each variable, allowing identification of any variables within an area that have a larger than expected influence on the small area estimate for that area. This highlights possible errors which need to be checked and if necessary corrected. Additionally in SAE, it is essential that the estimates are at an acceptable level of precision in order to be useful. A measure is proposed that takes the ratio of the variability in the small areas to the uncertainty of the small area estimates. This measure is then used to assist in determining the minimum level of precision needed in order to maintain meaningful estimates. The diagnostics developed cover a wide range of small area estimation methods, consisting of those based on survey data only and those which combine survey and census data. By way of illustration, the proposed methods are applied to SAE for poverty measures in Cambodia and Nepal.