The statistical application considered here arose in epigenomics, linking the DNA methylation proportions measured at specific genomic sites to characteristics such as phenotype or birth order. It was found that the distribution of errors in the proportions of chemical modification (methylation) on DNA, measured at CpG sites, may be successfully modelled by a Laplace distribution which is perturbed by a Hermite polynomial. We use a linear model with such a response function. Hence, the response function is known, or assumed well estimated, but fails to be differentiable in the classical sense due to the modulus function. Our problem was to estimate coefficients for the linear model and the corresponding covariance matrix and to compare models with varying numbers of coefficients. The linear model coefficients may be found using the (derivative-free) simplex method, as in quantile regression. However, this theory does not yield a simple expression for the covariance matrix of the coefficients of the linear model. Assuming response functions which are 2 except where the modulus function attains zero, we derive simple formulae for the covariance matrix and a log-likelihood ratio statistic, using generalized calculus. These original formulae enable a generalized analysis of variance and further model comparisons.
ISRN Probability and Statistics, 2012, 0 pp. 1 - 32
Creative Commons CC-BY https://creativecommons.org/licenses/by/4.0/
Hindawi Publishing Corporation