Dealing with sparsity in genotype x environment analyses : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand

dc.contributor.authorGodfrey, A. Jonathan R.
dc.date.accessioned2010-09-06T03:53:10Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2010-09-06T03:53:10Z
dc.date.issued2004
dc.description.abstractResearchers are frequently faced with the problem of analyzing incomplete and often unbalanced genotype-by-environment (GxE) matrices which arise as a trials programme progresses over seasons. The principal data for this investigation, arising from a ten year programme of onion trials, has less than 2,300 of the 49,200 combinations from the 400 genotypes and 123 environments. This 'sparsity' renders standard GxE methodology inapplicable. Analysis of this data to identify onion varieties that suit the shorter, hotter days of tropical and subtropical locations therefore presented a unique challenge. Removal of some data to form a complete GxE matrix wastes information and is consequently undesirable. An incomplete GxE matrix can be analyzed using the additive main effects and multiplicative interaction (AMMI) model in conjunction with the EM algorithm but proved unsatisfactory in this instance. Cluster analysis has been commonly used in GxE analyses, but current methods are inadequate when the data matrix is incomplete. If clustering is to be applied to incomplete data sets, one of two routes needs to be taken: either the clustering procedure must be modified to handle the missing data, or the missing entries must be imputed so that standard cluster analysis can be performed. A new clustering method capable of handling incomplete data has been developed. 'Two-stage clustering', as it has been named, relies on a partitioning of squared Euclidean distance into two independent components, the GxE interaction and the genotype main effect. These components are used in the first and second stages of clustering respectively. Two-stage clustering forms the basis for imputing missing values in a GxE matrix, so that a more complete data array is available for other GxE analyses. 'Two-stage imputation' estimates unobserved GxE yields using inter-genotype similarities to adjust observed yield data in the environment in which the yield is missing. This new imputation method is transferrable to any two-way data situation where all observations are measured on the same scale and the two factors are expected to have significant interaction. This simple, but effective, imputation method is shown to improve on an existing method that confounds the GxE interaction and the genotype main effect. Future development of two-stage imputation will use a parameterization of two-stage clustering in a multiple imputation process. Varieties recommended for use in a certain environment would normally be chosen using results from similar environments. Differing cluster analysis approaches were applied, but led to inconsistent environment clusterings. A graphical summary tool, created to ease the difficulty in identifying the differences between pairs of clusterings, proved especially useful when the number of clusters and clustered observations were high. 'Cluster influence diagrams' were also used to investigate the effects the new imputation method had on the qualitative structure of the data. A consequence of the principal data's sparsity was that imputed values were found to be dependent on the existence of observable inter-genotype relationships, rather than the strength of these observable relationships. As a result of this investigation, practical recommendations are provided for limiting the detrimental effects of sparsity. Applying these recommendations will enhance the future ability of two-stage imputation to identify those onion varieties that suit tropical and subtropical locations.en_US
dc.identifier.urihttp://hdl.handle.net/10179/1616
dc.language.isoenen_US
dc.publisherMassey Universityen_US
dc.rightsThe Authoren_US
dc.subjectStatistical analysisen_US
dc.subjectOnion genotypesen_US
dc.subject.otherFields of Research::230000 Mathematical Sciences::230200 Statistics::230204 Applied statisticsen_US
dc.titleDealing with sparsity in genotype x environment analyses : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealanden_US
dc.typeThesisen_US
massey.contributor.authorGodfrey, A. Jonathan R.
thesis.degree.disciplineStatisticsen_US
thesis.degree.grantorMassey Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophy (Ph.D.)en_US
Files
Original bundle
Now showing 1 - 5 of 6
Loading...
Thumbnail Image
Name:
06_FileID.pdf
Size:
41.14 KB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
05_SPlus code.zip
Size:
6.19 KB
Format:
ZIP Archive
Description:
Loading...
Thumbnail Image
Name:
04_Simulation data.zip
Size:
45.87 KB
Format:
ZIP Archive
Description:
Loading...
Thumbnail Image
Name:
03_Onion Data.zip
Size:
37.76 KB
Format:
ZIP Archive
Description:
Loading...
Thumbnail Image
Name:
02_whole.pdf
Size:
11.08 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
895 B
Format:
Item-specific license agreed upon to submission
Description: