Instability of ordination results under changes in input data order: explanations and remedies

by Jari Oksanen & Peter R. Minchin

An extract from an article in Journal of Vegetation Science 8, 447-454 (1997)

Rescaling bug

Every programmer knows that, no matter how carefully a program is written and how rigorously it is tested, it is impossible to guarantee that it does not contain a programming error or "bug". For such widely used programs as DECORANA and CANOCO, any bugs that caused severe errors would most likely have been observed and corrected. We thought it possible, however, that a bug with less drastic consequences may have gone undetected and may contribute to the instability reported by Tausch et al. (1995). We therefore examined the FORTRAN source code of CANOCO carefully, in an effort to find any bugs that might make the computations sensitive to the input order of species or sites.

We discovered a bug in the subroutine SMOOTH, which forms part of the non-linear rescaling procedure in DCA (Hill 1979a; Hill & Gauch 1980). This part of CANOCO was inherited from DECORANA, so DCA performed by DECORANA is also affected.

Rescaling involves the smoothing of ordination scores within segments of an axis. According to the documentation (Hill 1979a) and comments in the program code, the intention was to perform (1,2,1)-smoothing enough times so that there are no empty segments and then twice more. If there are initially no empty segments, this results in a (1,6,15,20,15,6,1)-smoothing. Subroutine SMOOTH in CANOCO and DECORANA has a correct test only for the emptiness of second segment (the first segment is, by definition, not empty, since it always contains the extreme point). For the third and subsequent segments, the test, as programmed, is always true. So the smoothing is done until the second segment is not empty, instead of being performed until all segments are occupied.

In any eigenanalysis, the sign of a given eigenvector is arbitrary: v and -v are equally correct solutions. With small changes in the data, or simply with a different input ordering of species and sites, the power algorithm may find -v rather than v for a particular axis (i.e. the scores on that axis are "reflected"). Because of the bug, rescaling results may then be different, depending on the orientation of the axis. This is because a different total number of smoothings may be performed. As a consequence, the final ordination scores on that axis may be very different.

The bug occurs on line 17 of subroutine SMOOTH. The faulty statement is:

IF(AZ3.LT.0.0) ISTOP=0

To correct the bug, this should be altered as follows:

IF(AZ3.LE.0.0) ISTOP=0

We shall demonstrate below that the bug in SMOOTH is a major contributing factor to the instability of DCA. Results from the original version of CANOCO were compared with ordinations produced by a debugged version.


Note: Where is the hiding place of rescaling bug?

Cajo ter Braak has challenged our interpretation on rescaling bug. Ter Braak is correct in saying that we have identified a wrong line as guilty, but the bug remains just the same. It only moves from line 17 to line 11 of the subroutine. More complete explanation can be found in ORDNEWS listserver message.