Non-parametric estimation of geographical relative risk functions : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand

Thumbnail Image
Open Access Location
Journal Title
Journal ISSN
Volume Title
Massey University
The Author
The geographical relative risk function is a useful tool for investigating the spatial distribution of disease based on case and control data. The most common way of estimating this function is using the ratio of spatial kernel density estimates constructed from the locations of cases and controls respectively. This technique is known as the density ratio method. The performance of kernel density estimators depends on the choice of kernel and the smoothing parameter (bandwidth). The choice of kernel is not critical to the statistical performance of the method but the bandwidth is crucial. Di erent bandwidth selectors such as least squares cross validation (LSCV) and likelihood cross validation (LCV) are chosen to control the degree of smoothing during the computation of the density ratio estimator. An alternative way of estimating this relative risk function is local linear regression approach. This deserves consideration since the density ratio estimator can be less natural when the relative risk has a global trend, as one might expect to see when there is a line source of risk such as a polluted river or a road. The use of local linear regression for estimation of log relative risk functions per se has not been examined in any detail in the literature, so our work on this methodology is a novel contribution. A detailed account of local linear approach in the estimation of log relative risk function is provided, consisting of an analysis of asymptotic properties and a method for computing tolerance contours to emphasize the regions of signi cantly high risk. Data driven bandwidth selectors for the local linear method including a novel plug-in methodology is examined.A simulation study to compare the performance of density ratio and local linear estimators using a range of data-driven bandwidth selectors is presented. The analysis of two speci c data sets is examined. The estimation of the spatial relative risk function is extended to spatio-temporal estimation through the use of suitable temporal kernel functions, since time-scale is an important consideration when estimating disease risk. The extended version of the kernel density estimation is applied here to compute the unknown densities of the spatio-temporal relative risk function. Next we investigate the time derivatives of the space-time relative risk function to see how the disease change with time. This discussion provides novel contributions with the introduction to time derivatives of the relative risk function as well as asymptotic methods for the computation of tolerance contours to highlight subregions of signi cantly elevated risk. LSCV and subjective bandwidths are used to compute these estimators since it performs well in density ratio method. The analysis on a real application to foot and mouth disease (FMD) of 1967 outbreak is employed to illustrate these estimators. The relative risk function is investigated when the data include a spatially varying covariate. The discussion produces the introduction to generalized relative risk function in two ways and also asymptotic properties of estimators for both cases as novel works. Generalized kernel density estimation is used to replace the unknown densities in the relative risk function. Asymptotic theories are used to compute tolerance contours to identify the areas which show high risk. LSCV bandwidth selector is described in this estimation process providing the implicit formulae. We illustrate this methodology on data from the 2001 outbreak of FMD in the UK, examining the e ect of farm size as a covariate.
Non-parametric statistics, Relative risk, Disease risk estimation, Foot and mouth disease, Statistical analysis