Austin has strongly criticized the use of symmetric Gaussian response functions in gradient analysis (Austin 1976, 1980, 1987; Austin & Gaywood 1994). The Gaussian response function has been the only model in which the parameters can be estimated effectively (Austin et al. 1984; ter Braak & Looman 1986; Jongman et al. 1987). Although the evidence for the Gaussian model can be criticized strongly (Austin 1976, 1980), all the methods proposed to test the significance of skewness of the response have had some problems: Visual judgement of skewness (Økland 1986) is subjective; third degree polynomials (Austin et al. 1990) have irrealistic shapes; beta function (Austin et al. 1994) confounds the location of the maximum and the skewness (Oksanen 1997); and smoothing by generalized additive models (Hastie & Tibshirani 1990; Yee & Mitchell 1991) must be judged subjectively. However, Huisman, Olff and Fresco (1993) have proposed a set of hierarchical models which include a skewed response and a symmetric response. They assume that these models cannot be fitted using maximum likelihood (Huisman et al. 1993). However, this can be done, and so it is possible to (1) use more realistic error distribution for the observation than the originally implied normal distribution, and (2) it is possible to test statistically whether a response is significantly skewed, or whether symmetric [Fig: jpeg, 13K] or monotonic response could be used.
The program HOF fits the model of Huisman, Olff and Fresco to the vegetation data using maximum likelihood estimation with Poisson error. The program runs under MS-DOS, it reads in CEP or CANOCO formatted files, and reports results both on the screen and in a log file on hard disk. This paper gives an overview of the theory behind the program, and an introduction to its use and limitations.
A set of five hierarchic models was originally proposed (Huisman
et al. 1993). Hierarchic means that a simpler model has
(1) fewer parameters than the more complex model, and (2) can
be derived from the more complex models by fixing one of the parameters.
The most complex model proposed defines a skewed response function.
It can be written in slightly modified form:
Here m is the expected value which is dependent on the known values of the gradient (x) and maximum possible value (M), and the four estimated parameters (a, b, c, d) of the function. This model can be simplified as follows [Fig: gif, 5.6K]:
Model Estimated / fixed parameters No. of estimated parameters V: Skewed a b c d 4 IV: Symmetric a b c b 3 II: Monotone a b -inf 0 2 I: Flat a 0 -inf 0 1
Here estimated parameters are in bold and fixed parameters in ordinary print showing their value. Originally the authors proposed one more model (III) with parameters (a, b, c, 0), but since it uses the same number of parameters as model IV we did not use it: It cannot be tested statistically against model IV and its shape is less interesting than that of model IV (model IV gives a symmetric response resembling the Gaussian response, and model III a monotone response resembling model II but converging to a plateau below M instead of M).
Originally only a least squares method was proposed, and it was said that maximum likelihood cannot be used with these models (Huisman et al. 1993). However, the use of maximum likelihood is fairly straightforward (cf. Oksanen et al. 1990), though it may be computationally demanding.
Least squares fitting can be seen as a maximum likelihood estimation
using normal error. In least squares we compare fitted values
(mu) with the observed values (y) trying to minimize
So the task is to find parameter estimates that give the fitted values (mu) that give the least squares fit to the observed values (y). The density function of normal distribution for one indexed observation is given by:
Here sigma is a scaling parameter which is constant for all observations. To find the most likely values of mu we can write this into a log-likelihood function (LL):
Maximum likelihood estimate of mu is the value that maximizes LL. The value of the function is dependent only on the indexed variables y and mu. To maximize LL we have to minimize the least squares (y-mu)^2. So the least squares give the maximum likelihood estimate with normal error.
Other error distributions can be used analogously. For instance,
the probability function of Poisson distribution is:
In maximum likelihood estimation with Poisson error we have to find the parameters that give the values of mu that maximize that function.
In maximum likelihood estimation we can (1) use log-likelihood
(LL) instead of original likelihood, and (2) we can omit
the term y! since it is not dependent on the fitted values
and does not influence the estimation. In that case we get the
quasilikelihood (qLL) estimates:
Quasilikelihood function will give the same estimates as the original likelihood function. However, the value of LL and qLL evaluated at these estimates of mu will be different.
Maximum likelihood estimates maximize the likelihood function LL or equivalently minimize -LL. Maximum likelihood function used in the HOF model is non-linear in parameters, and so the parameters must be estimated using iterative methods. There are several alternative methods for non-linear minimization. The one used here is a quasi-Newton method known as the Broyden-Fletcher-Goldfarb-Shanno variant of the Davidon-Fletcher-Powell method (Press et al. 1992). The method is much more effective than the traditional steepest descent method. The maximum likelihood estimates of the parameters are found by minimizing 2qLL. No convergence to a global optimum can be guaranteed in iterative fitting. When convergence has been found, the minimization is started again at that point to see whether the program can get anywhere.
If the model fits the data, -2LL follows Chi-squared distribution (Lindgren 1976). This means that the differences of -2LL from two different models follow the Chi-squared distribution as well. Consequently, if we have two models where one has one more parameter than the other, and the difference in -2LL is larger than 3.84, the extra parameter is significant at the level P < 0.05. So the hierarchic HOF models can be compared by analysing the change in -2LL and adopting the model which cannot be simplified without significant change in -2LL.
A major condition in this scheme is that -2LL follows Chi-squared distribution only if the model fits the data. If the variance is larger than assumed in the likelihood function, the data are overdispersed, and the statistical test must be adapted to this overdispersion (McCullagh & Nelder 1989). Since the estimation is based on quasilikelihood (-2qLL) we must (1) transform -2qLL to a statistic known as a deviance (McCullagh & Nelder 1989) so that we can analyse the degree of overdispersion, and (2) adjust statistical testing to overdispersion.
The smallest possible -2qLL with the current data (y)
is obtained when m = y, i.e., fitted values are
identical with observed values. Deviance (D) is defined
as the difference between minimum 2LL and observed 2LL
(McCullagh & Nelder 1989):
Deviance follows Chi-squared distribution similarly as the -2LL. The expectation of a Chi-squared variate equals its degrees of freedom. If we have n observations, and we estimate p parameters, the residual degrees of freedom is n-p. The data are overdispersed if the deviance exceeds the degrees of freedom. Overdispersion is a rule in community data sets (Kemp 1987; Anderson 1988; Oksanen et al. 1990, 1991).
If the data are overdispersed, we cannot use the simple Chi-squared
test to analyse the significance of the parameters, but we have
to use F test instead (Anderson 1988; McCullagh & Nelder
1989). If we have a model with p estimated parameters and
deviance D(p) , and a simpler model with p-1 estimated
parameters and deviance D(p-1) , then the testing can be
based on the statistic (Anderson 1988; McCullagh & Nelder
The program HOF implements maximum likelihood estimation of the parameters of the HOF model (Huisman et al. 1993) as explained above: It uses Poisson error and statistical testing based on overdispersed data.
The program reads in data from normal CEP files or files accepted by the program CANOCO (ter Braak 1987).
The program asks the names of vegetation data and environmental data files, the number of the one and only environmental variable used, the minimum frequency of species analysed, and the value of greatest theoretical value (M). The alternatives for M are either the sum of values for the observations as computed from the data (useful for counts), or a constant (e.g. 100 for percentage cover).
The gradient is scaled to the range 0..1, mainly to simplify setting the starting values and estimation of the parameters.
The output screen [Fig: gif, 14K] has following entries:
-2qLL gives the fit at the current iteration step. When the convergence has been found, the program is started again, and if a minimum was found (stops at iteration 0) or -2qLL changes only a little, the iteration stops. With Pentium the calculations are so fast that you can hardly follow this column, but with slower computers (e.g. 486DX) this shows you that the computer is indeed doing something.
Graphics screen plots first the observations against the scaled gradient, and then in addition the latest fitted significant curve.
The summary screen shows the parameter estimates, fit statistics, and significance of the removed term. This information will be printed to a disk file called HOFLOG.TXT.
The model is simplified as long as the removed parameters are insignificant at P < 0.05. Estimation is stopped when first significant term is found. The graphic screen will show the curve from the largest significant model.
Go to [ top ] [my homepage]