This document contains answers to some of the most frequently asked questions about R package vegan. This is version of $Date: 2008-01-29 11:07:55 +0200 (Tue, 29 Jan 2008) $.
This work is licensed under the Creative Commons Attribution 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
Copyright © 2007 Jari Oksanen
Vegan is an R package for community ecologists. It contains most multivariate analysis needed in analysing ecological communities, and tools for diversity analysis, and other potentially useful functions. Vegan is not self-contained but it must be run under R statistical environment, and it also depends on many other R packages. Vegan is free software and distributed under GPL2 license.
R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.
R has a home page at http://www.R-project.org/. It is free software distributed under a GNU-style copyleft, and an official part of the GNU project (“GNU S”).
Both R and latest release version of vegan can be obtained through CRAN. Unstable development version of vegan can be obtained through R-Forge.
Some vegan functions depend on packages
lattice. These all are recommended standard
R packages that should be available in every R installation. In
addition, some vegan functions
require non-standard R packages.
Vegan declares these packages only as suggested ones, and you can install vegan
and use most of its functions without these packages. The
non-standard packages needed by some vegan functions are:
ellipseis needed by
scatterplot3dis needed by
rglis needed by
tcltkis needed by
CRAN Task Views include entries like
Spatial that describe several useful packages and
functions. If you install R package
ctv, you can
inspect Task Views from your R session, and automatically install sets
of most important packages.
Vegan is a fully documented R package with standard help pages. These
are the most authoritative sources of documentation. Vegan package ships
with other documents which can be read with
(documented in the vegan help). The documents included in the vegan
Web documents outside the package include:
Roeland Kindt has made package
BiodiversityR which provides a
GUI for vegan. The package is available at
It is not a mere GUI for vegan, but adds some new functions and
complements vegan functions in order to provide a
workbench for biodiversity analysis. You can install
install.packages("BiodiversityR") or graphical package
management menu in R. The GUI works on Windows, MacOS X and Linux.
citation("vegan") in R to see the recommended
citation to be used in publications.
From version 1.10-0, vegan is developed at R-Forge and there is a general progression of version numbers mixed with stable (at CRAN) and devel versions (at R-Forge).
Up to versions 1.8-7 and 1.9-34, vegan version numbers were of type x.y-z, where number y is even for stable release versions at CRAN and odd for unstable release versions at my personal homepage. Version 1.8-8 was a backport of bug fixes from the 1.10 series.
In general, you do not need to build vegan from sources, but binary builds of release versions are available through CRAN for Windows and MacOS X. If you use some other operating systems, or want to use unstable devel versions, you may have to use source packages. Vegan is a standard R package, and can be built like instructed in R documentation. Vegan contains source files in C and FORTRAN, and you need appropriate compilers (which may need more work in Windows and MacOS X).
R-Forge runs daily
tests on the devel package, and if passed, it builds source package and
Windows binaries. You can install those packages within R with command
repos="http://r-forge.r-project.org/"). However, MacOS X binaries
are not available from R-Forge.
If you think you have found a bug in vegan, you should report it to me. The bug report should be so detailed that I can correct the bug. To correct a bug, I should be able to reproduce the buggy behaviour. Preferably, you should send me an example that causes a bug. If it needs a data set that is not available in R, you should send me minimal data set as well. You also should paste the output or error message in your message. You also should tell me which version of vegan you used.
Bug reports are welcome: they are the only way to make vegan non-buggy.
Please note that you shall not send bug reports to R mailing lists, since vegan is not a standard R package.
There also is a bug reporting tool at R-Forge, but you need to register as a site user to report bugs (this is site policy).
It is not necessarily a bug if some function gives different
results than you expect: That may be a deliberate design decision. It
may be useful to check the documentation of the function to see what
was the intended behaviour. It may also happen that function has an
argument to switch the behaviour to match your expectation. For
vegdist always calculates quantitative
indices (when this is possible). If you expect it to calculate a
binary index, you should use argument
binary = TRUE.
Vegan is dependent on user contribution. All feedback is welcome. If you have problem with vegan, it may be as simple as incomplete documentation, and I'll do my best to improve the documents.
Feature requests also are welcome, but they are not necessarily fulfilled. A new feature will be added if it is easy to do and it looks useful to me or in general.
Contributed code and functions are welcome and more certain to be included than mere requests. However, not all functions will be added, but I must judge them to be suitable for vegan. I also audit the code, and typically I edit the code in vegan style for easier maintenance. All included contributions will be credited. You can easily see that many vegan functions were contributed by other people, and they are listed as authors in the documentation.
Yes. Most vegan methods can handle binary data or cover abundance data. Most statistical tests are based on permutation, and do not make distributional assumptions. There are some methods (mainly in diversity analysis) that need count data. These methods check that input data are integers, but they may be fooled by cover class data.
Most commonly the reason is that other software use presence–absence
data whereas vegan used quantitative data. Usually vegan indices are
quantitative, but you can use argument
binary = TRUE to make them
presence–absence. However, the index name is the same in both cases,
although different names usually occur in literature. For instance,
Jaccard index actually refers to the binary index, but vegan uses
"jaccard" for the quantitative index, too.
Another reason may be that indices indeed are defined differently, because people use same names for different indices.
You can use argument
zerodist = "add" in
metaMDSdist to handle zero dissimilarities. With this argument,
zero dissimilarities are replace with a small above zero value, and they
can be handled in
isoMDS. This is a kluge, and some people do
not like this. A more principal solution is to remove duplicate sites
using R command
unique. However, after some standardizations or
with some dissimilarity indices, originally non-unique sites can have
zero dissimilarity, and you have to resort to the kluge (or work
harder with your data).
capscale regularly and normally gives warnings of
negative eigenvalues. These warnings are harmless, and
will ignore the axes with negative eigenvalues. The warnings are
generated by underlying function
cmdscale or metric
multidimensional scaling (a.k.a. principal coordinates analysis), The
metric MDS assumes that dissimilarities are metric, but most
ecologically useful indices are semimetric. The warnings only concern
the very last minor axes, and the axes with negative eigenvalues will
be ignored in
capscale. If the warnings are
disturbing, you can use argument
add = TRUE in
which implements “correction method 2” of Legendre & Legendre (1998,
p. 434) in
You can get a warning about negative eigenvalues also with metric
indices if you have deficit rank data. This happens, for instance,
when number of species (columns) is lower than number of sites (rows),
or if some sites are linear combinations of other sites. You can find
the rank of the data using, for instance, vegan function
which is identical to
capscale with Euclidean distance.
In general, vegan does not directly give any statistics on the “variance explained” by ordination axes or by the constrained axes. This is a design decision: I think this information is normally useless and often misleading. In community ordination, the goal typically is not to explain the variance, but to find the “gradients” or main trends in the data. The “total variation” often is meaningless, and all proportions of meaningless values also are meaningless. Often a better solution explains a smaller part of “total variation”. For instance, in unstandardized principal components analysis most of the variance is generated by a small number of most abundant species, and they are easy to “explain” because data really are not very multivariate. If you standardize your data, all species are equally important. The first axes explains much less of the “total variation”, but now they explain all species equally, and results typically are much more useful for the whole community. Correspondence analysis uses another measure of variation (which is not variance), and again it typically explains a “smaller proportion” with a better result. Detrended correspondence analysis and nonmetric multidimensional scaling even do not try to “explain” the variation, but use other criteria. All methods are incommensurable, and it is impossible to compare methods using “explanation of variation”.
If you still want to get “explanation of variation” (or a deranged editor requests that from you), it is possible to get this information for some methods:
capscalegive the variation of conditional (partialled), constrained (canonical) and residual components, but you must calculate the proportions by hand. The
summarygives the contributions of the axes. Function
goodnessgives the same statistics for individual species or sites (species are unavailable with
capscale). In addition, there is a special function
varpartfor unbiased partitioning of variance between up to four separate components in redundancy analysis.
decorana). The total amount of variation is unknown and undefined in detrended correspondence analysis, and therefore proportions from total also are unknown and undefined. DCA is not a method for decomposition of variation, and therefore these proportions would not make sense either.
stressplotdisplays the nonlinear fit and gives this statistic.
Vegan does not have a concept of passive points, or a point that
should only little influence the ordination results. However, you can
add points to eigenvector methods using
predict functions with
newdata. You can first perform an ordination without some
species or sites, and then you can find scores for all points
using your complete data as
functions are available for basic eigenvector methods in vegan
decorana, for an up-to-date list use
methods("predict")). You also can simulate the passive points in R by using
low weights to row and columns (this is the method used in software
with passive points). For instance, the following command makes row 3
dune[3,] <- 0.001*dune[3,].
You should define a class variable as an R
factor, and vegan will
automatically handle them with formula interface. You also can define
constrained ordination without formula interface, but then you must
code your class variables by hand.
R (and vegan) knows both unordered and ordered factors. Unordered factors are internally coded as dummy variables, but one redundant level is removed or aliased. With default contrasts, the removed level is the first one. Ordered factors are expressed as polynomial contrasts. Both of these contrasts explained in standard R documentation.
You should never make your own dummy variables, but you should use standard R factors. R will internally change these factors into dummies in a consistent and correct way.
vegan uses standard
R utilities for defining
contrasts. The default in standard installations is to use treatment
contrasts, but you can change the behaviour globally setting
options or locally by using keyword
R help pages and user manuals for details.
Aliased variable has no information because it can be expressed with the help of other variables. Such variables are automatically removed in constrained ordination in vegan. The aliased variables can be redundant levels of factors or whole variables.
alias gives the defining equations for aliased
variables. If you only want to see the names of aliased variables or
levels in solution
You can fit vectors or class centroids for aliased variables using
envfit function. The
envfit function uses weighted
fitting, and the fitted vectors are identical to the vectors in
You can constrain your permutations within
strata or levels of
factors. You can use stratified permutations in all
functions that use permutation, such as
Vegan has an alternative permutation function
which allows restricted permutation designs for time series, line
transects, spatial grids and blocking factors. Over time, the other
functions that currently use the older
permuted.index will be updated
permuted.index2, but at the moment it is used only in one
The default ordination
plot function is intended for fast
plotting and it is not very configurable. To use different plotting
symbols, you should first create and empty ordination plot with
plot(..., type="n"), and then add
the created empty frame (here
... means other arguments you want
to give to your
plot command). The
commands are fully configurable, and allow different plotting symbols
If there is a really high number of species or sites, the graphs often are congested and many labels are overwritten. It may be impossible to have complete readable graphics with some data sets. However, here are some tricks you can use:
plot(..., type="n"), if you are not satisfied with the default graph. (Here and below
...means other arguments you want to give to your
identifyfor ordination graphics, if you do not need to see all labels. You may need to first create an empty plot using
plot(..., type="n"), if you are not satisfied with the default graph.
orditorpfunction that uses labels only if these can be added to a graph without overwriting other labels, and points otherwise, if you do not need to see all labels. You must first create an empty plot using
plot(..., type="n"), and then add labels or points with
orditkplotfunction that lets you drag labels of points to better positions if you need to see all labels. Only one set of points can be used.
plotfunctions allow you to zoom to a part of the graph using
ylimarguments to reduce clutter in congested areas.
No. It may be possible to port TWINSPAN to vegan, but it is not among my top priorities. If anybody wants to try porting, I will be happy to help. TWINSPAN has a very permissive license, and it would be completely legal to port the function into R.
Some vegan functions, such as
radfit use base R facility of
family in maximum likelihood estimation. This allows use of
several alternative error distributions, among them
"gaussian". The R
family also defines the
deviance. You can see the equations for deviance with commands like
In general, deviance is 2 times log.likelihood shifted so that models with exact fit have zero deviance.