Model fitting

Note added Dec 9, 2004

All these models are best fitted using R statistical environment. Current version of my vegan package contains function humpfit for completely nonlinear maximum likelihood fitting of the hump model.

Old document

This is a more technical chapter, interesting only to those who are actually analysing species richness data. Most information can be found in my article (Journal of Ecology 84, 293-295; 1996). Most model fitting can be done using GLIM software, and probably with other software with similar properties (e.g. Survo, S-Plus, Genstat, SAS PROC GENMOD, though I haven't tried them). For ecologists, the best introduction to GLIM is Crawley MJ (1993) GLIM for Ecologists. Blackwell, Oxford. After the basic model, there is a description of non-linear least squares fitting of the no-interaction model. The last model is a generalized additive model which is more general than GLM: The predictor is found with smoothing data values instead of using a linear function. A good, but technical introduction is Hastie TJ & Tibshirani RJ (1990) Generalized additive models. Chapman & Hall, London. Yee TW & Mitchell ND (Journal of Vegetation Science 2, 587-602; 1991) discuss GAM in plant ecology. Software for GAM include GLIM (from version 4), S-Plus, and freeware GAMFIT.

No-interaction model

The model is fitted as a piecewise regression:

Crowding biomass is a third parameter that must be estimated. It is non-linear, and it must be found iteratively.

Technical details of model fitting are given in the Journal of Ecology archives.

As a suitable small data set to illustrate the models, I used data from Al-Mufti et al. (1977), Journal of Ecology, 65, 759-791. The no-interaction model fits well these data [Fig: jpeg, 21K].

The model fitting is based on GLIM release 3.77. The use of $OWN models is changed in GLIM release 4. Ads say it is now easier, but I haven't tried it.

Non-linear least squares estimation of the no-interaction model

It seems to be possible to use non-linear least squares fitting to estimate the parameters of the no-interaction model. This is much easier than iteration in GLIM. In many cases the results are very similar to the corresponding Poisson likelihood model in GLIM.

I have written a program called humpfit.exe to fit the no-interaction model which you can download through my software pages. Least squares fitting [Fig: jpeg, 19K] gave very similar results as the GLIM model above. Even more interestingly, the program easily estimates the shape of Fisher's link function, as can be seen in bowing down of the upward slope in fitted models for simulated data [Fig: jpeg, 27K].

Polynomial model

This is the traditional method which has been used by many scientists. Polynomial model fitting can be done using most general purpose statistical programs, and so GLIM is not necessary.

I criticized polynomial models on the grounds that they produce humped response, but there are no parameters that could be directly interpreted. Naturally, you can get the location of the peak, but there is no quantification for assumed stresses at both sides of the peak.

Most scientists have used second-degree polynomials, but at least García et al. (1993), Journal of Vegetation Science, 4, 417-424 have used third-degree polynomials. Third degree polynomials are in general even more difficult to interpret directly than the second-degree polynomials. However, in that case they produced a simple peaked form which was clearly skewed. My impression is that similar response could be obtained using log-transformed biomass with simpler second-degree polynomials.

Though the second-degree polynomials finds a peak, it produces more rounded, blunter shapes than often pictured, or the previous model. In the test data set (Al-Mufti et al. 1977), second degree polynomial did not find the hump within the range studied. Better but still non-ideal responses [Fig: jpeg, 22K] were found both with 3rd degree polynomial and 2nd degree polynomial with log-transformed biomass.

Beta-response model

Beta response model can be given in two equivalent forms:

The advantage of the latter form is that after fixing parameters a and b, the other parameters can be estimated using generalized linear models in the GLIM software. Parameters a and b give the endpoints of range. When they are fixed, we can transform x into two explanatory variables log(x-a) and log(b-x), define logarithmic link function and so find estimates for parameters k (constant), alpha and gamma. These parameters can be regarded as giving direct estimates of intensity of stress at low biomass (alpha) and high biomass (gamma).

However, it should be noted that the these three parameters are conditional on the selection of endpoints. For the best fit, the endpoints should be found iteratively. The problems of beta model are discussed in my article (Journal of Vegetation Science 8, 147-152; 1997).

The data (Al-Mufti et al. 1977) have such a sharp peak that beta-response model could not follow it very well (and polynomial models had the same problem). The selection of the endpoints changes the response shape considerably; only one decent example is shown. [Fig: jpeg, 16K]

Generalized additive model

In Generalized additive models (GAM), the degree of smoothing is defined by the number of degrees of freedom. High number of degrees of freedom means that there is not much smoothing, but the response tracks closely to the data points. Low number of degrees of freedom means much smoothing, at the extreme, one degree of freedom defining a linear fit. With moderate four degrees of freedom, cubic spline smoothing fits the data very well [Fig: jpeg, 15K].

The fitting was done using GAMFIT software, available free at the web pages of Rob Tibshirani.


[ front page][ assumptions and model][ misconceptions][ simulations][ what to do?] [ case study ]

Updated 9/12/04 Jari Oksanen