• Login
    View Item 
    •   Home
    • Massey Documents by Type
    • Theses and Dissertations
    • View Item
    •   Home
    • Massey Documents by Type
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Dealing with sparsity in genotype x environment analyses : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand

    Icon
    View/Open Full Text
    06_FileID.pdf (41.14Kb)
    05_SPlus code.zip (6.186Kb)
    04_Simulation data.zip (45.87Kb)
    03_Onion Data.zip (37.75Kb)
    02_whole.pdf (11.08Mb)
    01_front.pdf (1.765Mb)
    Export to EndNote
    Abstract
    Researchers are frequently faced with the problem of analyzing incomplete and often unbalanced genotype-by-environment (GxE) matrices which arise as a trials programme progresses over seasons. The principal data for this investigation, arising from a ten year programme of onion trials, has less than 2,300 of the 49,200 combinations from the 400 genotypes and 123 environments. This 'sparsity' renders standard GxE methodology inapplicable. Analysis of this data to identify onion varieties that suit the shorter, hotter days of tropical and subtropical locations therefore presented a unique challenge. Removal of some data to form a complete GxE matrix wastes information and is consequently undesirable. An incomplete GxE matrix can be analyzed using the additive main effects and multiplicative interaction (AMMI) model in conjunction with the EM algorithm but proved unsatisfactory in this instance. Cluster analysis has been commonly used in GxE analyses, but current methods are inadequate when the data matrix is incomplete. If clustering is to be applied to incomplete data sets, one of two routes needs to be taken: either the clustering procedure must be modified to handle the missing data, or the missing entries must be imputed so that standard cluster analysis can be performed. A new clustering method capable of handling incomplete data has been developed. 'Two-stage clustering', as it has been named, relies on a partitioning of squared Euclidean distance into two independent components, the GxE interaction and the genotype main effect. These components are used in the first and second stages of clustering respectively. Two-stage clustering forms the basis for imputing missing values in a GxE matrix, so that a more complete data array is available for other GxE analyses. 'Two-stage imputation' estimates unobserved GxE yields using inter-genotype similarities to adjust observed yield data in the environment in which the yield is missing. This new imputation method is transferrable to any two-way data situation where all observations are measured on the same scale and the two factors are expected to have significant interaction. This simple, but effective, imputation method is shown to improve on an existing method that confounds the GxE interaction and the genotype main effect. Future development of two-stage imputation will use a parameterization of two-stage clustering in a multiple imputation process. Varieties recommended for use in a certain environment would normally be chosen using results from similar environments. Differing cluster analysis approaches were applied, but led to inconsistent environment clusterings. A graphical summary tool, created to ease the difficulty in identifying the differences between pairs of clusterings, proved especially useful when the number of clusters and clustered observations were high. 'Cluster influence diagrams' were also used to investigate the effects the new imputation method had on the qualitative structure of the data. A consequence of the principal data's sparsity was that imputed values were found to be dependent on the existence of observable inter-genotype relationships, rather than the strength of these observable relationships. As a result of this investigation, practical recommendations are provided for limiting the detrimental effects of sparsity. Applying these recommendations will enhance the future ability of two-stage imputation to identify those onion varieties that suit tropical and subtropical locations.
    Date
    2004
    Author
    Godfrey, A. Jonathan R.
    Rights
    The Author
    Publisher
    Massey University
    URI
    http://hdl.handle.net/10179/1616
    Collections
    • Theses and Dissertations
    Metadata
    Show full item record

    Copyright © Massey University
    Contact Us | Send Feedback | Copyright Take Down Request | Massey University Privacy Statement
    DSpace software copyright © Duraspace
    v5.7-27.11.15
     

     

    Tweets by @Massey_Research
    Information PagesContent PolicyDepositing content to MROCopyright and Access InformationDeposit LicenseDeposit License SummaryTheses FAQFile FormatsDoctoral Thesis Deposit

    Browse

    All of MROCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Copyright © Massey University
    Contact Us | Send Feedback | Copyright Take Down Request | Massey University Privacy Statement
    DSpace software copyright © Duraspace
    v5.7-27.11.15