IS THE DECORANA/CANOCO BUG, REALLY A BUG?

Reply by Peter Minchin and Jari Oksanen posted to ORDNEWS listserver 9/3/98.

Cajo ter Braak has questioned whether the bug in the rescaling procedure in DECORANA and CANOCO, which we reported in our paper in volume 8, issue 3 of the Journal of Vegetation Science (Oksanen & Minchin 1997), is "really a bug". His comments can be read on CANOCO www pages.

Our answer is that it most certainly is a bug, applying the usual definition of that term in computer science. To explain this, we need to go into some rather technical details, which will probably make sense only to those with a programming background.

The original code of subroutine SMOOTH (part of the non-linear rescaling procedure used in DCA) does not do what its author, Mark Hill, intended it do do. The comment in the source code says "TAKES A VECTOR Z AND DOES (1,2,1)-SMOOTHING UNTIL NO BLANKS LEFT AND THEN 2 MORE ITERATIONS OF (1,2,1)-SMOOTHING". The bug concerns the way in which the program tests for "blanks" or empty segments (i.e. sections of an ordination axis that contain no points). The way this is done is order-dependent. It results in the program doing a different number of (1,2,1)-smoothings, depending on the orientation of the vector Z. In other words, if you reverse the order of the elements in Z, the program sometimes does a different total number of smoothings. We have verified this empirically by inserting code to run SMOOTH with each orientation of Z and report the number of smoothings done. in practice, the orientation of Z depends on the sign of an extracted eigenvector, which, as Cajo correctly states, is arbitrary. Variation in the sign of the vector occurs with different orderings of the species and/or sites in the input community data matrix. This means that DCA scores on any axis, after non-linear rescaling, will exhibit variation with different orderings of the input data, resulting from the different numbers of smoothings done by SMOOTH. In our JVS paper, we present empirical results, based on a number of real data sets, that clearly show the instability in DCA results due to the bug (Table 3 vs Table 4).

The bug is therefore real, though Cajo has identified an inaccuracy in our interpretation and description of the bug in the paper. This arose from our misunderstanding of the method used by Mark Hill to indicate "blanks". We had assumed that empty segments would be zero, whereas they actually contain a small negative number. There are two statements in SMOOTH that test for blanks. The first one (on line 11) tests if the 2nd segment is empty:

       IF(AZ3.EQ.0.0) ISTOP=0

and the other, on line 17, tests the 3rd and subsequent segments:

       IF(AZ3.LT.0.0) ISTOP=0

To fix the bug, both of these should be altered to read:

       IF(AZ3.LE.0.0) ISTOP=0

Checking the "debugged" CANOCO code used in our research, we found that we had, in fact, corrected both statements, though the paper only mentions the one on line 17. In our paper, as Cajo points out, we incorrectly state, referring to the test on line 17: "the test, as programmed, is never true". It is, in fact, the test on line 11 which is the real problem. Probably, it will never be true, since empty cells are small negative numbers and not zeroes. In any case, it is bad programming practice to ever test for the equality of real numbers, many of which have no exact binary representation.

Our misinterpretation of the mechanism of the problem, does not alter the fact that what we found is truly an order-dependent bug. The correct interpretation is that the original code does not correctly detect the initial emptiness of segment 2. If the order of vector Z is reversed (as will happen with some input data orderings), it will be the 2nd last segment instead that is empty, and this will be detected. Hence the total number of smoothings will be different in the two cases (one more in the latter case). Since receiving feedback from Cajo, we have verified this interpretation by inserting code to report the functioning of SMOOTH, before and after debugging.

The fact that SMOOTH, as originally programmed, performed a different number of smoothings and hence produced different ordination co-ordinates with the same input data (but with sites or species in a different order) is clearly not what the programmer, nor for that matter the users of the program, would have expected or desired. To us, this clearly qualifies as a bug. Whether or not Cajo and Mark are prepared to call what we found a bug, we are pleased that Cajo has decided to correct it in the latest release of CANOCO.

Reference:

Oksanen, J. & Minchin, P.R. (1997) Instability of ordination results under changes in input data order: explanations and remedies. Journal of Vegetation Science 8: 447-454.


Peter Minchin & Jari Oksanen