Massey Documents by Type

Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294

Browse

Search Results

Now showing 1 - 5 of 5
  • Item
    Some diagnostic techniques for small area estimation : with applications to poverty mapping : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand
    (Massey University, 2019) Livingston, Alison
    Small area estimation (SAE) techniques borrow strength via auxiliary variables to provide reliable estimates at finer geographical levels. An important application is poverty mapping, whereby aid organisations distribute millions of dollars every year based on small area estimates of poverty measures. Therefore diagnostics become an important tool to ensure estimates are reliable and funding is distributed to the most impoverished communities. Small area models can be large and complex, however even the most complex models can be of little use if they do not have predictive power at the small area level. This motivated a variable importance measure for SAE that considers each auxiliary variable’s ability to explain the variation in the dependent variable, as well as its ability to distinguish between the relative levels in the small areas. A core question addressed is how candidate survey-based models might be simplified without losing accuracy or introducing bias in the small area estimates. When a small area estimate appears to be biased or unusual, it is important to investigate and if necessary remedy the situation. A diagnostic is proposed that quantifies the relative effect of each variable, allowing identification of any variables within an area that have a larger than expected influence on the small area estimate for that area. This highlights possible errors which need to be checked and if necessary corrected. Additionally in SAE, it is essential that the estimates are at an acceptable level of precision in order to be useful. A measure is proposed that takes the ratio of the variability in the small areas to the uncertainty of the small area estimates. This measure is then used to assist in determining the minimum level of precision needed in order to maintain meaningful estimates. The diagnostics developed cover a wide range of small area estimation methods, consisting of those based on survey data only and those which combine survey and census data. By way of illustration, the proposed methods are applied to SAE for poverty measures in Cambodia and Nepal.
  • Item
    Ratio estimators in agricultural research : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Statistics at Massey University, Palmerston North, New Zealand
    (Massey University, 2002) Qiao, Chun Gui
    This thesis addresses the problem of estimating the ratio of quantitative variables from several independent samples in agricultural research. The first part is concerned with estimating a binomial proportion, the ratio of discrete counts, from several independent samples under the assumption that there is a single underlying binomial proportion p in the population of interest. The distributions and properties of two linear estimators, a weighted average and an arithmetic average, are derived and merits of the approaches discussed. They are both unbiased estimators of the population proportion, with the weighted average having lower variability than the arithmetic average. These findings are obtained through a first principles analysis, with a geometrical interpretation presented. This variability result is also a consequence of the Rao-Blackwell theorem, a well-known result in the theory of statistical inference. Both estimators are used in the literature but we conclude that the weighted average estimate should always be used when the sample sizes are unequal. These results are illustrated by a simulation experiment and are validated using survey data in the study of lodging percentage of sunflower cultivar, Improved Peredovic, in Jilin Province, China in 1994. The second part of the research addresses the problem of estimating the ratio μͯ / μ, of the means of continuous variables in agricultural research. The distributional properties of the ratio X/Y of independent normal variables are examined, both theoretically and using simulation. The results show that the moments of the ratio do not exist in general. The moments exist, however, for a punctured normal distribution of the denominator variable if we only sample points for which | Y |>ε, ε being a small positive quantity. We draw out the practical rule-of-thumb that the ratio of two independent normal variables can be used to estimate μͯ / μ, when the coefficient of variation of the denominator variable is sufficiently small (less than or equal to 0.2). Lastly the thesis evaluates the relative merits of two common estimators of the ratio of the means of continuous variables in agricultural research, an arithmetic average and a weighted average, via simulation experiments using normal distributions. In the first simulation, the ratio and common coefficient of variation are changed while the sample size is kept moderately large. In the second simulation, the ratio and sample size are changed while the coefficient of variation is held constant. Results show that the weighted average always provides a better estimate of the true ratio and has lower variability than the arithmetic average. It is recommended that the weighted average be used for estimating the ratio from several pairs of observations. These results are tested using research data from rice breeding multi-environment trials in Jilin Province, China in 1995 and 1996. These data are used to demonstrate the diagnostic approach developed for assessing the 'safety' use of the arithmetic and the weighted average methods for estimating the ratio of the means of independent normal variables.
  • Item
    Tree-based models for poverty estimation : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Manawatu
    (Massey University, 2016) Bilton, Penelope A
    The World Food Programme utilises the technique of poverty mapping for efficient allocation of aid resources, with the objective of achieving the first two United Nations Sustainable Development Goals, elimination of poverty and hunger. A statistical model is used to estimate levels of deprivation across small geographical domains, which are then displayed on a poverty map. Current methodology employs linear mixed modelling of household income, the predictions from which are then converted to various area-level measures of well-being. An alternative technique using tree-based methods is developed in this study. Since poverty mapping is a small area estimation technique, the proposed methodology needs to include auxiliary information to improve estimate precision at low levels, and to take account of complex survey design of the data. Classifcation and regression tree models have, to date, mostly been applied to data assumed to be collected through simple random sampling, with a focus on providing predictions, rather than estimating uncertainty. The standard type of prediction obtained from tree-based models, a "hard" tree estimate, is the class of interest for classification models, or the average response for regression models. A \soft" estimate equates to the posterior probability of being poor in a classification tree model, and in the regression tree model it is represented by the expectation of a function related to the poverty measure of interest. Poverty mapping requires standard errors of prediction as well as point estimates of poverty, but the complex structure of survey data means that estimation of variability must be carried out by resampling. Inherent instability in tree-based models proved a challenge to developing a suitable variance estimation technique, but bootstrap resampling in conjunction with soft tree estimation proved a viable methodology. Simulations showed that the bootstrap based soft tree technique was a valid method for data with simple random sampling structure. This was also the case for clustered data, where the method was extended to utilise the cluster bootstrap and to incorporate cluster effects into predictions. The methodology was further adapted to account for stratification in the data, and applied to generate predictions for a district in Nepal. Tree-based estimates of standard error of prediction for the small areas investigated were compared with published results using the current methodology for poverty estimation. The technique of bootstrap sampling with soft tree estimation has application beyond poverty mapping, and for other types of complex survey data.
  • Item
    Efficient biased estimation and applications to linear models : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University
    (Massey University, 1981) Moore, Terry
    In recent years biased estimators have received a great deal of attention because they can often produce more accurate estimates in multiparameter problems. One sense in which biased estimators are often more accurate is that the mean square error is smaller. In this work several parametric families of estimators are examined and good values of the parameters are sought by approximate analytical arguments. These parametric values are then tested by computing and plotting graphs of the mean square error. In this way the risks of various estimators may be seen and it is possible to discard some estimators which have large risk. The risk functions are computed by numerical integration - a method faster and more accurate than the usual simulation studies. The advantage of this is that it is possible to evaluate a greater number of estimators; however, the method only copes with spherically symmetric estimators. The relationship of biased estimation to the use of prior information is made clear. This leads to discussion of partially spherically symmetric estimators and the fact that, although not uniformly better than spherically symmetric ones, they are usually better in a practical sense. It is shown how the theoretical results may be applied to the linear model. The linear model is discussed in the very general case in which it is not of full rank and there are linear restrictions on the parameter. A kind of weak prior knowledge which is often assumed for such a model makes the partially symmetric estimators attractive. Distributions of spherically symmetric estimators are briefly discussed.
  • Item
    Multivariate estimation of variance and covariance components using restricted maximum likelihood, in dairy cattle : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Animal Science at Massey University
    (Massey University, 1992) Sosa Ferreyra, Carlos Francisco
    The multivariate estimation of sire additive and residual variances and covariances by Restricted Maximum Likelihood (REML) is addressed. Particular emphasis is given to its application to dairy cattle data when all traits are explained by the same model and no observations are missing. Special attention is given to the analysis of new traits being included in a sire evaluation programme, for which a model has to be developed and no previous estimates of the population parameters exist. Results obtained by using either the multivariate Method 3 of Henderson, multivariate REML excluding the Numerator Relationship Matrix (NRM) or by multivariate REML including the NRM were compared. When a large number of traits were fitted simultaneously the variance-covariance matrix estimated by Method 3 was negative-definite (outside the allowable parameter space). REML estimates obtained while ignoring the NRM were biased. The number and sequence of traits fitted in the analysis affected the estimates at convergence. A canonical transformation of the variance-covariance matrix was undertaken to simplify the computation by means of an Expectation Maximisation (EM) algorithm. Approaches to choosing initial values for their use in iterative methods were compared via their values at convergence and the number of iterations required to converge. To further simplify the use of multivariate REML, three transformations of the Mixed Model Equations (MME) were integrated: the absorption of proven sire effects taken as fixed, a triangular factorisation of the NRM, and the singular value decomposition of the coefficient matrix in the MME. One statistical algorithm (EM) and one mathematical algorithm (Scoring type) were developed to iteratively solve the REML equations on the transformed scale, such that the transformed coefficient matrix of the MME did not need to be inverted at each iteration and the required quantities to build the REML equations were obtained through vector operations. Traits other than Production (TOP) from New Zealand Holstein-Friesian dairy cows were analysed (4 management and 13 conformation characteristics), each trait scored using a linear scale from 1 to 9, with extreme values corresponding to extreme phenotypes. Mixed model methodology was used for the analysis of TOP as no significant departure from normality was observed. To model the TOP, the fixed effects of herd, inspector, age, stage of lactation (linear and quadratic) and breed of dam were tested for significance. Only the effects of inspector and herd were significant for all traits, with breed of dam significantly affecting adaptability to milking, shed temperament and stature. Estimates of phenotypic means and standard deviations, and heritabilities for TOP were: adaptability to milking 5.4 ± 1.7, 0.20; shed temperament 5.5 ± 1.6, 0.12; milking speed 5.7 ± 1.5, 0.11; farmer's overall opinion 5.7 ± 1.7,.14; stature 5.1 ± 1.0, 0.14; weight 4.4 ± 1.0, 0.37; capacity 5.3 ± 1.0, 0.40; rump angle 5.4 ± 0.7, 0.16; rump width 5.2 ± 0.7, 0.08; legs 5.2 ± 0.6, 0.34; udder support 5.3 ± 1.0, 0.63; fore udder 4.9 ± 1.1, 0.48; rear udder 4.9 ± 1.0, 0.33; front teat placement 4.2 ± 0.7, 0.22; rear teat placement 5.2 ± 0.8, 0.22; udder overall 4.8 ± 1.1, 0.42; and dairy conformation 5.3 ± 1.1, 0.32. Large positive phenotypic correlations among management traits were obtained, while the correlations of these traits with type were small and positive when significant. Large and positive correlations among udder traits were found. All traits related to size were positively correlated amongst themselves. Most of the traits were positively correlated with dairy conformation. Estimated genetic correlations for stature and weight with other conformation traits were generally negative. With the exception of udder support, all udder traits were positively correlated amongst themselves. Dairy conformation was positively correlated with most traits, except with stature, rump angle, legs, rear udder and udder overall. The estimates obtained in this study shold be used in the evaluation of Holstein-Friesian sires and cows lor TOP in New Zealand.