Some aspects of covariance regularisation in discriminant analysis : a thesis presented in fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, New Zealand
Statistical discriminant analysis and classification are multivariate techniques concerned with separating distinct set of objects, and with allocating new objects to previously defined populations or groups. In this process the covariance matrix plays an important role, and usually this matrix has to be estimated from sample data. In this thesis, attention is focussed on investigating the problem of (poor) estimation of the covariance structure and its effects in statistical discriminant analysis. The quality or statistical properties of these estimates usually affect the resultant classification rules which are constructed using them. Reasons for the (usually, consistent) estimators of the covariance matrices being poor are mainly to do with the quality and/or size of the training sample in relation to the number of parameters which have to be estimated. In this thesis, we are interested in investigating this problem as it occurs in the small sample, high-dimensional situation. In particular, we are interested in the problem of co-variance estimation in the situations when the sample size to dimension ratios are relatively small. The criterion used to determine the success or otherwise of various methods used to address this problem is the estimated (overall) error rate. One method of dealing with a situation which potentially results in poor estimation of the covariance matrix is to impose a prescribed (simple) structure on the covariance matrix, such as the identity matrix, or multiple of it. Another method is to make the assumption that all the groups have the same covariance matrix. The effect of such simplifying assumptions is to reduce the number of parameters to be estimated. Consequently, the (fewer) parameters are estimated with higher precision. It has been demonstrated that this may result in better statistical discriminant analysis, even if the simplifying assumptions may not be entirely correct. Of the classification rules based on the normal distribution, the quadratic discriminant function (QDF) makes no restrictions on the population parameters, and as such is the most general of this class of classification rules. However, it is also the one most affected by poor population parameter estimates. The two common simplifying techniques mentioned earlier (i.e. imposing an identity matrix structure on the covariance matrix, or assuming a common covariance among all populations) lead to two other discriminant rules, namely, the Euclidean distance function (EDF, based on the Euclidean distance between the group means) and the popular linear discriminant function (LDF, based on the Mahalanobis distance between the groups) respectively. The sample-based versions of these two classifiers are compared using expected error rates (conditional on a set of training data), and these expected error rates are obtained through the derivation of asymptotic expansions. The expansions are evaluated under a range of settings, defined by employing combinations of various values of dimension, group separation, and co-variance structure. It is shown that the simpler sample Euclidean distance function (SEDF) performs as well as or better than the sample linear discriminant function (SLDF) under most of the settings used. Exceptions occurred when the Mahalanobis distance between populations was much greater than the Euclidean distance. A flexible discrimination model, or rather, class of models, was developed by Friedman (1989), and called the regularised discriminant function (RDF). The sample version of the RDF (i.e. SRDF) model incorporates the general sample quadratic discriminant function (SQDF), the two previously-mentioned restricted models (SEDF and SLDF), as well as a wide range of models intermediate to these, through the use of additional "regularisation" parameters. The method employs two types of shrinkage of the covariance estimates - towards the pooled estimate on one hand, and towards a multiple of the identity matrix on the other. A separate regularisation parameter controls shrinkage to each. The training data is used in the model selection process to determine appropriate values for the regularisation parameters, through the use of cross-validation. The quality of model selection procedure which specifies a discriminant model is a crucial factor, since if it is performing well, it will result in a classification rule close to the optimal one from the class of models available. Through large-scale simulation studies, the performance of the sample regularised discriminant function (SRDF) is investigated and it is shown that the SRDF generally leads to lower overall error rates than the standard classification rules. This is found to be largely due to the facility which allows shrinkage of the covariance matrices to sphericity, or eigenvalue regularisation. It is also found that the SEDF performs very well in relation to the SRDF for a variety of settings. Further simulation studies show that the performance of the SRDF is more sensitive to the parameter controlling shrinkage to sphericity than the one controlling covariance mixing. Also, it is found that under some circumstances, the SRDF performs better than the other classifiers even for quite large sample size to dimension ratios. A crucial negative feature of the SRDF is its lack of scale invariance. The cause of this is eigenvalue regularisation. A modified classification rule is developed which is scale invariant, and is compared to the SRDF and the other classifiers via simulation. The modified rule omits eigenvalue regularisation, but otherwise increases sensitivity to the data by allowing for varying degrees of shrinkage to the pooled covariance for each group. It is shown that eigenvalue regularisation is generally beneficial for discrimination in medium to large dimensional problems, through its variance-reduction effect which stabilises the covariance estimates. Thus, the study concludes that scale invariance must be sacrificed in order to achieve reductions in error rate, in the absence of a suitable replacement for eigenvalue regularisation. The use of cross-validation in the model selection process of the SRDF is also investigated, for several reasons: the computational effort involved, and the fact that it rarely leads to a unique choice of model, and often uses only a small subset of the available observations, in the model selection process. Consequently, another method for determining the optimal regularisation parameters is investigated. In particular, it is investigated whether appropriate values for the regularisation parameters can be indicated from a measure of the distance between the groups. For this purpose, the Bhattacharyya distance is chosen since it comprises a term primarily pertaining to the difference between group means, and a further term which indicates the level of disparity between group covariance structures. It is shown that the magnitudes of the various components of the Bhattacharyya distance, when considered on their own and in relation to each other, do give information as to appropriate values for the regularisation parameters. A new simulation study, as well as various case studies are presented to assess the performance of a new regularised discriminant function which uses the Bhattacharyya distance estimates between groups to select regularisation parameters for given training data. This classifier is shown to perform as well as the SRDF, and is computationally much faster since it avoids any re-sampling methods. It is clear that most of the investigations and assessments of the various regularised discriminant rules have to be undertaken using Monte-Carlo simulation techniques, especially to estimate error rates. This is because exact analytical expressions for the unconditional error rate of the SRDF do not exist, except in certain limited circumstances. It has not been possible to obtain asymptotic expansions or some form of approximations of these error rates in a general context. However, an approximation which can be used to calculate algebraically the error rate of the SQDF, assuming known population parameters under (other) strict conditions, is available in the literature. This approximation is used in this thesis to further examine the effects (observed in earlier simulation work) of the covariance regularisation parameters on error rates. This is the last piece of work in the thesis and, in spite of its limited extent (because of the restricted conditions of the approximations given), it largely confirms the results which were obtained from simulation experiments in the previous parts of the thesis.
Content removed due to copyright
Koolaard, J.P. & Lawoko, C.R.O. (1993). Estimating error rates in discriminant analysis with correlated training observations: a simulation study. Journal of statistical computation and simulation, 48, 81-99.