Massey Documents by Type
Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294
Browse
13 results
Search Results
Item PROMETHEUS: Probability in the Mediterranean of Tephra dispersal for various grain sizes. A tool for the evaluation of the completeness of the volcanic record in medial-distal archives(Elsevier BV, 2024-03) Billotta E; Sulpizio R; Selva J; Costa A; Bebbington MPROMETHEUS is a statistical tool that allows creating maps showing the probability of finding tephra deposits of different grain sizes, originating from eruptions of a specific volcanic source, at any location around the vent. It couples wind profiles at different heights in the Mediterranean area with terminal velocity of volcanic particles. The input parameters include the height of the eruption column (which characterizes the intensity of the eruption), wind statistics (directions and intensities), and tephra deposits of a selected grain size. In particular, we used the parameterizations provided by Costa et al. (2016) and performed simulations using the HAZMAP tephra dispersal model to determine the maximum reachable distances that tephra can cover under weak, medium, and strong wind conditions (e.g. 7, 30, and 70 m/s velocities at the tropopause) and with column heights of 10, 20, and 30 km, depositing of at least the loading corresponding to 0.1 mm (corresponding to cryptotephra). Three alternative configurations of the model are validated analyzing first the eruptive source of Somma Vesuvius, with the related explosive eruptions from 22 ka Pomici di Base to the 1944 eruption. A further validation is made by comparing the probabilistic maps with the tephrostratigraphy of known marine and terrestrial cores using standard test of proportions (binomial distributions) and the binary logistic regression model, statistically quantifying the effectiveness of the model against the tephrostratigraphy recorded within this time frame. Based on this validation, a preferred configuration of PROMETHEUS is selected. PROMETHEUS probability maps will guide the selection of sampling sites for specific tephra deposits and could also support the study of the completeness of overall eruption catalogs over time.Item Manipulating the alpha level cannot cure significance testing – comments on "Redefine statistical significance"(PeerJ Preprints, 2017-11-14) Trafimov D; Amrhein V; Areshenkoff CN; Barrera - Causil C; Beh EJ; Bilgiç Y; Bono R; Bradley MT; Briggs WM; Cepeda - Freyre HA; Chaigneau SE; Ciocca DR; Correa JC; Cousineau D; de Boer MR; Dhar SS; Dolgov I; Gómez - Benito J; Grendar M; Grice J; Guerrero - Gimenez ME; Gutiérrez A; Huedo - Medina TB; Jaffe K; Janyan A; Karimnezhad A; Korner - Nievergelt F; Kosugi K; Lachmair M; Ledesma R; Limongi R; Liuzza MT; Lombardo R; Marks M; Meinlschmidt G; Nalborczyk L; Nguyen HT; Ospina R; Perezgonzalez JD; Pfister R; Rahona JJ; Rodríguez - Medina DA; Romão X; Ruiz - Fernández S; Suarez I; Tegethoff M; Tejo M; van de Schoot R; Vankov I; Velasco - Forero S; Wang T; Yamada Y; Zoppino FCM; Marmolejo - Ramos FWe argue that depending on p-values to reject null hypotheses, including a recent call for changing the canonical alpha level for statistical significance from .05 to .005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable criterion levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and determining sample sizes much more directly than significance testing does; but none of the statistical tools should replace significance testing as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, or implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.Item Measuring Māori identity and health : the cultural cohort approach : a thesis presented in fulfilment of the requirements for the degree of Doctor of Philosophy in Public Health at Massey University, Palmerston North, Aotearoa New Zealand(Massey University, 2023) Stevenson, BrendanCurrent statistical methods of disaggregating populations by ethnic or cultural identity wrongly assume cultural invariance within an ethnic population over time and place. Calculating risk factors between ethno-cultural populations also wrongly assumes homogeneity of risk, obscuring what may be distinct sub-populations with very different demographics, risk profiles, and health outcomes. The Cultural Cohort Approach (CCA) proposes a novel method for understanding within-ethnic population difference, whereby cultural identity is framed as the enduring membership of multiple related cultural cohorts, rather than the contextual and unstable measure of ethnic group affiliation currently used. It predicts that multiple cultural cohorts exist inside an ethno-cultural population, that these cultural cohorts are resilient and culturally distinct, exist over generations, and can divide at pre-existing social or economic stratifications in response to powerful external forces. The cultural cohort approach unites history, extant identity theories and research to identify and describe these within-ethnic cultural cohorts. The measurement of a Māori cultural cohort joins existing Māori identity research, historical documents, and personal accounts to enumerate distinct Māori cultural cohorts, describe relationships between cultural cohorts, and exclude unrelated cultural cohorts. Across three distinct components of this thesis the Cultural Cohort Approach (CCA) is first described and a worked example of its use in identifying Māori cultural cohorts given. Second, these hypothesised cultural cohorts were mapped to a cross-sectional data collection wave of Māori participants (n=3287, born between 1941 and 1955) from Massey University’s longitudinal Health, Work and Retirement (HWR) study in a test of the CCA’s predictive accuracy using latent class analysis. Third, longitudinal HWR study data for Māori participants (n=1252, born between 1941 and 1955) was used in a second worked example to test the stability of the predicted cultural cohorts using latent transition analyses and further refine the CCA. The Māori cultural cohorts identified using the CCA had clear narratives, shared cultural characteristics, and identifiable cultural differences that persisted across time as predicted. The CCA will allow researchers to better represent the diverse lived realities of ethno-cultural populations and support more nuanced analytical insights into how health and well-being is patterned between distinct cultural cohorts.Item Discrimination or diversity? A balanced score card review of perceptions of gender quotas : prepared in partial fulfilment of a Master of Business Studies, Massey University (Albany)(Massey University, 2019) Burrell, Erin KathleenCreating an economy where gender equality is at the forefront could be claimed to be beneficial to most, if not all, citizens and countries. Recent mandates of gender reporting at the Director and Officer levels have created a dichotomous environment in New Zealand. Taking learnings from other countries experiences with quotas, with a particular focus on Norway, adds insight into what could happen if implementation were to occur. Using qualitative interviews across a diverse group of participants, this study investigates current perspectives and implications of gender quotas. Understanding the role of the board to govern and design organisational strategy, the Balanced Score Card was selected as a clear instrument for analysis and recommendations. This exploration showcases the complexity of equity strategy as a component of board construction and the realisation that gender alone will not deliver a diverse board of directors. Empowered by the BSC structure, this effort delivers a recommendation for driving organisational change through diversity programming and contributes to academic discourse through a business outcome focused approach to qualitative research. Findings display that social policy does have a place in the boardroom, but that efforts must be measured and documented consistently over time, a process that is lacking in many NZ firms. Further, outcomes from the study show that quotas are not preferred as a tool for gender equity with just 27.78% of participants supporting the concept. This study makes a three-fold contribution: first, it investigates a broader range of participants than does existing NZ work, second, it leverages the Balanced Score Card for analysis to support real-time application of findings by practitioners outside of the academic sphere, and third, it introduces gender diversity as an element of gender quotas.Item Tree-based models for poverty estimation : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Manawatu(Massey University, 2016) Bilton, Penelope AThe World Food Programme utilises the technique of poverty mapping for efficient allocation of aid resources, with the objective of achieving the first two United Nations Sustainable Development Goals, elimination of poverty and hunger. A statistical model is used to estimate levels of deprivation across small geographical domains, which are then displayed on a poverty map. Current methodology employs linear mixed modelling of household income, the predictions from which are then converted to various area-level measures of well-being. An alternative technique using tree-based methods is developed in this study. Since poverty mapping is a small area estimation technique, the proposed methodology needs to include auxiliary information to improve estimate precision at low levels, and to take account of complex survey design of the data. Classifcation and regression tree models have, to date, mostly been applied to data assumed to be collected through simple random sampling, with a focus on providing predictions, rather than estimating uncertainty. The standard type of prediction obtained from tree-based models, a "hard" tree estimate, is the class of interest for classification models, or the average response for regression models. A \soft" estimate equates to the posterior probability of being poor in a classification tree model, and in the regression tree model it is represented by the expectation of a function related to the poverty measure of interest. Poverty mapping requires standard errors of prediction as well as point estimates of poverty, but the complex structure of survey data means that estimation of variability must be carried out by resampling. Inherent instability in tree-based models proved a challenge to developing a suitable variance estimation technique, but bootstrap resampling in conjunction with soft tree estimation proved a viable methodology. Simulations showed that the bootstrap based soft tree technique was a valid method for data with simple random sampling structure. This was also the case for clustered data, where the method was extended to utilise the cluster bootstrap and to incorporate cluster effects into predictions. The methodology was further adapted to account for stratification in the data, and applied to generate predictions for a district in Nepal. Tree-based estimates of standard error of prediction for the small areas investigated were compared with published results using the current methodology for poverty estimation. The technique of bootstrap sampling with soft tree estimation has application beyond poverty mapping, and for other types of complex survey data.Item The Independent Newspapers Limited study: an investigation into occupational overuse syndrome within the newspaper industry : a thesis presented in partial fulfilment of the requirements for the degree of Master of Arts in Psychology at Massey University(Massey University, 1993) Pirie, RossAn investigation was undertaken into occupational overuse injuries. Overuse injuries are commonly associated with repetitive movements, sustained or constrained postures, and forceful movements. Other factors, such as work environment, amount of keyboard use, and the ergonomic status of the work area, have been identified as elements in the development of overuse injuries. These perspectives were used to provide research objectives in studying a sample of subjects working in the newspaper industry. Five hundred and seventy five respondents completed a questionnaire, which included a measure of the incidence and severity of overuse injuries, and questions aimed at discovering the effectiveness of different types of treatment and intervention strategies. Using a combination of descriptive and bivariate statistics, this data was analyzed. The analysis revealed low levels of reported muscular aches and pains. Of those subjects who did report some form of ache or pain, the majority answered that the level of their aches and pains had remained the same. As well, the image of the aetiology of overuse injuries which emerged, was in contradiction to much of the proceeding research in this area. The analysis also demonstrated that the treatment and intervention strategies being employed were ineffective. This was despite the fact that subjects often reported that they considered a particular strategy to be productive in managing their overuse injury. In the discussion section, the limitations of the questionnaire as a survey technique in this area of research was considered, and the possible effects these limitations have on the present study. This point has special relevance to the application of clinical models of overuse injury. It was concluded that the results demonstrated a need for research into effectively manipulating working conditions to counter-act the development, incidence and severity of overuse injuries. Such strategies as job enlargement and job rotation were suggested.Item Modeling the role of social structures in population genetics : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Manawatu, New Zealand(Massey University, 2015) Guillot, Elsa GratianneBuilding on a theoretical framework, population genetics has been widely applied to diverse organisms, from bacteria to animals. On humans, this has led to the reconstruction of history, the timing of settlements, and migration between populations. Mostly based on the coalescent theory, modern population genetic studies are challenged by human social structures, which are difficult to incorporate into analytically models. The implications of social structure on population genetics are mostly unknown. This work presents new modeling and inference methods to model the role of social structure in poulation genetics. The applications of these new techniques permit to gain better understanding of the history and practices of a number of Indonesian island communities. This thesis comprises three published, organized as sequential chapters. The Introduction describes population genetic models and the statistical tools that are used to make inferences. The second chapter presents the first paper, which measures the change of population size through time on four Indonesian islands structured by history and geography. The third chapter presents SMARTPOP, a new simulation tool to study social structure, including mating systems and genetic diversity. The fourth chapter focuses on Asymmetric Prescriptive Alliance, a famous kinship system linking the migration of women between communities with cousin alliance. The fifth chapter presents a conclusion and future directions. In combination, this body of work shows the importance of including social structure in population genetics and proposes new ways to reconstruct aspects of social history.Item Complexity measurement for dealing with class imbalance problems in classification modelling : a thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy, Massey University, 2012(Massey University, 2012) Anwar, Muhammad NafeesThe class imbalance problem is a challenge in the statistical, machine learn- ing and data mining domains. Examples include fraud/intrusion detection, medical diagnosis/monitoring, bioinformatics, text categorization, insurance claims, and target marketing. The problem with imbalanced data sets is that the conventional classifiers (both statistical and machine learning algorithms) aim at maximizing overall accuracy, which is often achieved by allocating all, or almost all, cases to the majority class. Thus there tends to be bias against the minority class in class imbalance situations. Despite numerous algorithms and re-sampling techniques proposed in the last few decades to tackle imbalanced classification problems, there is no consistent winning strategy for all data sets (neither in terms of sampling, nor learning algorithm). Special attention needs to be paid to the data in hand. In doing so, one should take into account several factors simultaneously: the imbalance rate, the data complexity, the algorithms and their associated parameters. As suggested in the literature, mining such datasets can only be improved by algorithms tailored to data characteristics; therefore it is important and necessary to do data exploratory analysis before deciding on a learning algorithm or re-sampling techniques. In this study, we have developed a framework "Complexity Measurement" (CM) to explore the connection between the imbalanced data problem and data complexity. Our study shows that CM is an ideal candidate to be recognized as a "goodness criterion" for various classifiers, re-sampling and feature selection techniques in the class imbalance framework. We have used CM as a meta-learner to choose the classifier and under-sampling strategy that best fits the situation. We design a systematic over-sampling technique, Over-sampling using Complexity Measurement (OSCM) for dealing with class overlap. Using OSCM, we do not need to search for an optimal class distribution in order to get favorable accuracy for the minority class, since the amount of over-sampling is determined by the complexity; ideally using CM would detect fine structural differences (class-overlap and small disjunct) between different classes.Existing feature selection techniques were never meant for class imbalanced data. We propose Feature Selection using Complexity Measurement (FSCM), which can specifically focus on the minority class, hence those features (and multivariate interactions between predictors) can be selected, which form a better model for the minority class. Methods developed have been applied to real datasets. The results from imbalanced datasets show that CM, OSCM and FSCM are effective as a systematic way of correcting class imbalance/overlap and improving classifier performance. Highly predictive models were built; discriminating patterns were discovered, and automated optimization was proposed. The methodology proposed and knowledge discovered will benefit exploratory data analysis for imbalanced datasets. It may be taken as a judging criterion for new algorithms and re-sampling techniques. Moreover, new data sets may be evaluated using our CM criterion in order to build a sensible model.Item A comparison of tree-based and traditional classification methods : a thesis presented in partial fulfilment of the requirements for the degree of PhD in Statistics at Massey University(Massey University, 1994) Lynn, Robert DTree-based discrimination methods provide a way of handling classification and discrimination problems by using decision trees to represent the classification rules. The principal aim of tree-based methods is the segmentation of a data set, in a recursive manner, such that the resulting subgroups are as homogeneous as possible with respect to the categorical response variable. Problems often arise in the real world involving cases with a number of measurements (variables) taken from them. Traditionally, in such circumstances involving two or more groups or populations, researchers have used parametric discrimination methods, namely, linear and quadratic discriminant analysis, as well as the well known non-parametric kernel density estimation and Kth nearest neighbour rules. In this thesis, all the above types of methods are considered and presented from a methodological point of view. Tree-based methods are summarised in chronological order of introduction, beginning with the Automatic Interaction Detector (AID) method of Morgan and Sonquist (1963) through to the IND method of Buntine (1992). Given a set of data, the proportion of observations incorrectly classified by a prediction rule is known as the apparent error rate. This error rate is known to underestimate the actual or true error rate associated with the discriminant rule applied to a set of data. Various methods for estimating this actual error rate are considered. Cross-validation is one such method which involves omitting each observation in turn from the data set, calculating a classification rule based on the remaining (n-1) observations and classifying the observation that was omitted. This is carried out n times, that is for each observation in the data set and the total number of misclassified observations is used as the estimate of the error rate. Simulated continuous explanatory data was used to compare the performance of two traditional discrimination methods, linear and quadratic discriminant analysis, with two tree-based methods, Classification and Regression Trees (CART) and Fast Algorithm for Classification Trees (FACT), using cross-validation error rates. The results showed that linear and/or quadratic discriminant analysis are preferred for normal, less complex data and parallel classification problems while CART is best suited for lognormal, highly complex data and sequential classification problems. Simulation studies using categorical explanatory data also showed linear discriminant analysis to work best for parallel problems and CART for sequential problems while CART was also preferred for smaller sample sizes. FACT was found to perform poorly for both continuous and categorical data. Simulation studies involving the CART method alone provided certain situations where the 0.632 error rate estimate is preferred to cross-validation and the one standard error rule over the zero standard error rule. Studies undertaken using real data sets showed that most of the conclusions drawn from the continuous and categoiical simulation studies were valid. Some recommendations are made, both from the literature and personal findings as to what characteristics of tree-based methods are best in particular situations. Final conclusions are given and some proposals for future research regarding the development of tree-based methods are also discussed. A question worth considering in any future research into this area is the use of non-parametric tests for determining the best splitting variable.Item Quantification of individual rugby player performance through multivariate analysis and data mining : a thesis presented for the fulfilment of the requirements for the degree of Doctor of Philosophy at Massey University, Albany, New Zealand(Massey University, 2003) Bracewell, Paul JThis doctoral thesis examines the multivariate nature of performance to develop a contextual rating system for individual rugby players on a match-by-match basis. The data, provided by Eagle Sports, is a summary of the physical tasks completed by the individual in a match, such as the number of tackles, metres run and number of kicks made. More than 130 variables were available for analysis. Assuming that the successful completion of observed tasks are an expression of ability enables the extraction of the latent dimensionality of the data, or key performance indicators (KPI), which are the core components of an individual's skill-set. Multivariate techniques (factor analysis) and data mining techniques (self-organising maps and self-supervising feed-forward neural networks) are employed to reduce the dimensionality of match performance data and create KPI's. For this rating system to be meaningful, the underlying model must use suitable data, and the end model itself must be transparent, contextual and robust. The half-moon statistic was developed to promote transparency, understanding and interpretation of dimension reduction neural networks. This novel non-parametric multivariate method is a tool for determining the strength of a relationship between input variables and a single output variable, whilst not requiring prior knowledge of the relationship between the input and output variables. This resolves the issue of transparency, which is necessary to ensure the rating system is contextual. A hybrid methodology is developed to combine the most appropriate KPI's into a contextual, robust and transparent univariate measure for individual performance. The KPI's are collapsed to a single performance measure using an adaptation of quality control ideology where observations are compared with perfection rather than the average to suit the circumstances presented in sport. The use of this performance rating and the underlying key performance indicators is demonstrated in a coaching setting. Individual performance is monitored with the use of control charts enabling changes in form to be identified. This enables the detection of strengths/weakness in the individual's underlying skill-set (KPI's) and skills. This process is not restricted to rugby or sports data and is applicable in any field where a summary of multivariate data is required to understand performance.

