Non-parametric estimation of geographical relative risk functions : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand
The geographical relative risk function is a useful tool for investigating the spatial
distribution of disease based on case and control data. The most common way of estimating
this function is using the ratio of spatial kernel density estimates constructed
from the locations of cases and controls respectively. This technique is known as the
density ratio method. The performance of kernel density estimators depends on the
choice of kernel and the smoothing parameter (bandwidth). The choice of kernel is
not critical to the statistical performance of the method but the bandwidth is crucial.
Di erent bandwidth selectors such as least squares cross validation (LSCV) and likelihood
cross validation (LCV) are chosen to control the degree of smoothing during
the computation of the density ratio estimator.
An alternative way of estimating this relative risk function is local linear regression
approach. This deserves consideration since the density ratio estimator can be less
natural when the relative risk has a global trend, as one might expect to see when
there is a line source of risk such as a polluted river or a road. The use of local linear
regression for estimation of log relative risk functions per se has not been examined
in any detail in the literature, so our work on this methodology is a novel contribution.
A detailed account of local linear approach in the estimation of log relative risk
function is provided, consisting of an analysis of asymptotic properties and a method
for computing tolerance contours to emphasize the regions of signi cantly high risk.
Data driven bandwidth selectors for the local linear method including a novel plug-in
methodology is examined.A simulation study to compare the performance of density ratio and local linear estimators
using a range of data-driven bandwidth selectors is presented. The analysis
of two speci c data sets is examined.
The estimation of the spatial relative risk function is extended to spatio-temporal
estimation through the use of suitable temporal kernel functions, since time-scale is
an important consideration when estimating disease risk. The extended version of
the kernel density estimation is applied here to compute the unknown densities of the
spatio-temporal relative risk function. Next we investigate the time derivatives of the
space-time relative risk function to see how the disease change with time. This discussion
provides novel contributions with the introduction to time derivatives of the
relative risk function as well as asymptotic methods for the computation of tolerance
contours to highlight subregions of signi cantly elevated risk. LSCV and subjective
bandwidths are used to compute these estimators since it performs well in density
ratio method. The analysis on a real application to foot and mouth disease (FMD)
of 1967 outbreak is employed to illustrate these estimators.
The relative risk function is investigated when the data include a spatially varying
covariate. The discussion produces the introduction to generalized relative risk function
in two ways and also asymptotic properties of estimators for both cases as novel
works. Generalized kernel density estimation is used to replace the unknown densities
in the relative risk function. Asymptotic theories are used to compute tolerance
contours to identify the areas which show high risk. LSCV bandwidth selector is
described in this estimation process providing the implicit formulae. We illustrate
this methodology on data from the 2001 outbreak of FMD in the UK, examining the
e ect of farm size as a covariate.