Massey Documents by Type
Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294
Browse
2 results
Search Results
Item Contributions to high-dimensional data analysis : some applications of the regularized covariance matrices : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Albany, New Zealand(Massey University, 2015) Ullah, InshaHigh-dimensional data sets, particularly those where the number of variables exceeds the number of observations, are now common in many subject areas including genetics, ecology, and statistical pattern recognition to name but a few. The sample covariance matrix becomes rank deficient and is not invertible when the number of variables are more than the number of observations. This poses a serious problem for many classical multivariate techniques that rely on an inverse of a covariance matrix. Recently, regularized alternatives to the sample covariance have been proposed, which are not only guaranteed to be positive definite but also provide reliable estimates. In this Thesis, we bring together some of the important recent regularized estimators of the covariance matrix and explore their performance in high-dimensional scenarios via numerical simulations. We make use of these regularized estimators and attempt to improve the performance of the three classical multivariate techniques in high-dimensional settings. In a multivariate random effects models, estimating the between-group covariance is a well known problem. Its classical estimator involves the difference of two mean square matrices and often results in negative elements on the main diagonal. We use a lasso-regularized estimate of the between-group mean square and propose a new approach to estimate the between-group covariance based on the EM-algorithm. Using simulation, the procedure is shown to be quite effective and the estimate obtained is always positive definite. Multivariate analysis of variance (MANOVA) face serious challenges due to the undesirable properties of the sample covariance in high-dimensional problems. First, it suffer from low power and does not maintain accurate type-I error when the dimension is large as compared to the sample size. Second, MANOVA relies on the inverse of a covariance matrix and fails to work when the number of variables exceeds the number of observation. We use an approach based on the lasso regularization and present a comparative study of the existing approaches including our proposal. The lasso approach is shown to be an improvement in some cases, in terms of power of the test, over the existing high-dimensional methods. Another problem that is addressed in the Thesis is how to detect unusual future observations when the dimension is large. The Hotelling T2 control chart has traditionally been used for this purpose. The charting statistic in the control chart rely on the inverse of a covariance matrix and is not reliable in high-dimensional problems. To get a reliable estimate of the covariance matrix we use a distribution free shrinkage estimator. We make use of the available baseline set of data and propose a procedure to estimate the control limits for monitoring the individual future observations. The procedure do not assume multivariate normality and seems robust to the violation of multivariate normality. The simulation study shows that the new method performs better than the traditional Hotelling T2 control charts.Item Multivariate estimation of variance and covariance components using restricted maximum likelihood, in dairy cattle : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Animal Science at Massey University(Massey University, 1992) Sosa Ferreyra, Carlos FranciscoThe multivariate estimation of sire additive and residual variances and covariances by Restricted Maximum Likelihood (REML) is addressed. Particular emphasis is given to its application to dairy cattle data when all traits are explained by the same model and no observations are missing. Special attention is given to the analysis of new traits being included in a sire evaluation programme, for which a model has to be developed and no previous estimates of the population parameters exist. Results obtained by using either the multivariate Method 3 of Henderson, multivariate REML excluding the Numerator Relationship Matrix (NRM) or by multivariate REML including the NRM were compared. When a large number of traits were fitted simultaneously the variance-covariance matrix estimated by Method 3 was negative-definite (outside the allowable parameter space). REML estimates obtained while ignoring the NRM were biased. The number and sequence of traits fitted in the analysis affected the estimates at convergence. A canonical transformation of the variance-covariance matrix was undertaken to simplify the computation by means of an Expectation Maximisation (EM) algorithm. Approaches to choosing initial values for their use in iterative methods were compared via their values at convergence and the number of iterations required to converge. To further simplify the use of multivariate REML, three transformations of the Mixed Model Equations (MME) were integrated: the absorption of proven sire effects taken as fixed, a triangular factorisation of the NRM, and the singular value decomposition of the coefficient matrix in the MME. One statistical algorithm (EM) and one mathematical algorithm (Scoring type) were developed to iteratively solve the REML equations on the transformed scale, such that the transformed coefficient matrix of the MME did not need to be inverted at each iteration and the required quantities to build the REML equations were obtained through vector operations. Traits other than Production (TOP) from New Zealand Holstein-Friesian dairy cows were analysed (4 management and 13 conformation characteristics), each trait scored using a linear scale from 1 to 9, with extreme values corresponding to extreme phenotypes. Mixed model methodology was used for the analysis of TOP as no significant departure from normality was observed. To model the TOP, the fixed effects of herd, inspector, age, stage of lactation (linear and quadratic) and breed of dam were tested for significance. Only the effects of inspector and herd were significant for all traits, with breed of dam significantly affecting adaptability to milking, shed temperament and stature. Estimates of phenotypic means and standard deviations, and heritabilities for TOP were: adaptability to milking 5.4 ± 1.7, 0.20; shed temperament 5.5 ± 1.6, 0.12; milking speed 5.7 ± 1.5, 0.11; farmer's overall opinion 5.7 ± 1.7,.14; stature 5.1 ± 1.0, 0.14; weight 4.4 ± 1.0, 0.37; capacity 5.3 ± 1.0, 0.40; rump angle 5.4 ± 0.7, 0.16; rump width 5.2 ± 0.7, 0.08; legs 5.2 ± 0.6, 0.34; udder support 5.3 ± 1.0, 0.63; fore udder 4.9 ± 1.1, 0.48; rear udder 4.9 ± 1.0, 0.33; front teat placement 4.2 ± 0.7, 0.22; rear teat placement 5.2 ± 0.8, 0.22; udder overall 4.8 ± 1.1, 0.42; and dairy conformation 5.3 ± 1.1, 0.32. Large positive phenotypic correlations among management traits were obtained, while the correlations of these traits with type were small and positive when significant. Large and positive correlations among udder traits were found. All traits related to size were positively correlated amongst themselves. Most of the traits were positively correlated with dairy conformation. Estimated genetic correlations for stature and weight with other conformation traits were generally negative. With the exception of udder support, all udder traits were positively correlated amongst themselves. Dairy conformation was positively correlated with most traits, except with stature, rump angle, legs, rear udder and udder overall. The estimates obtained in this study shold be used in the evaluation of Holstein-Friesian sires and cows lor TOP in New Zealand.
