bioenv {vegan}R Documentation

Best Subset of Environmental Variables with Maximum (Rank) Correlation with Community Dissimilarities


Function finds the best subset of environmental variables, so that the Euclidean distances of scaled environmental variables have the maximum (rank) correlation with community dissimilarities.


## Default S3 method:
bioenv(comm, env, method = "spearman", index = "bray",
       upto = ncol(env), trace = FALSE, partial = NULL, ...)
## S3 method for class 'formula':
bioenv(formula, data, ...)


comm Community data frame.
env Data frame of continuous environmental variables.
method The correlation method used in cor.
index The dissimilarity index used for community data in vegdist.
upto Maximum number of parameters in studied subsets.
formula, data Model formula and data.
trace Trace the advance of calculations
partial Dissimilarities partialled out when inspecting variables in env.
... Other arguments passed to cor.


The function calculates a community dissimilarity matrix using vegdist. Then it selects all possible subsets of environmental variables, scales the variables, and calculates Euclidean distances for this subset using dist. Then it finds the correlation between community dissimilarities and environmental distances, and for each size of subsets, saves the best result. There are 2^p-1 subsets of p variables, and an exhaustive search may take a very, very, very long time (parameter upto offers a partial relief).

The function can be called with a model formula where the LHS is the data matrix and RHS lists the environmental variables. The formula interface is practical in selecting or transforming environmental variables.

With argument partial you can perform “partial” analysis. The partializing item must be a dissimilarity object of class dist. The partial item can be used with any correlation method, but it is strictly correct only for Pearson.

Clarke & Ainsworth (1993) suggested this method to be used for selecting the best subset of environmental variables in interpreting results of nonmetric multidimensional scaling (NMDS). They recommended a parallel display of NMDS of community dissimilarities and NMDS of Euclidean distances from the best subset of scaled environmental variables. They warned against the use of Procrustes analysis, but to me this looks like a good way of comparing these two ordinations.

Clarke & Ainsworth wrote a computer program BIO-ENV giving the name to the current function. Presumably BIO-ENV was later incorporated in Clarke's PRIMER software (available for Windows). In addition, Clarke & Ainsworth suggested a novel method of rank correlation which is not available in the current function.


The function returns an object of class bioenv with a summary method.


If you want to study the ‘significance’ of bioenv results, you can use function mantel or mantel.partial which use the same definition of correlation. However, bioenv standardizes environmental variables to unit standard deviation using function scale and you must do the same in mantel for comparable results. Further, bioenv selects variables to maximize the Mantel correlation, and significance tests based on a priori selection of variables are biased.


Jari Oksanen


Clarke, K. R & Ainsworth, M. 1993. A method of linking multivariate community structure to environmental variables. Marine Ecology Progress Series, 92, 205–219.

See Also

vegdist, dist, cor for underlying routines, isoMDS for ordination, procrustes for Procrustes analysis, protest for an alternative, and rankindex for studying alternatives to the default Bray-Curtis index.


# The method is very slow for large number of possible subsets.
# Therefore only 6 variables in this example.
sol <- bioenv(wisconsin(varespec) ~ log(N) + P + K + Ca + pH + Al, varechem)

[Package vegan version 1.16-32 Index]