This document contains answers to some of the most frequently asked questions about R package vegan. This is version of $Date: 2008-01-29 11:07:55 +0200 (Tue, 29 Jan 2008) $.
This work is licensed under the Creative Commons Attribution 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.Copyright © 2007 Jari Oksanen
Vegan is an R package for community ecologists. It contains most multivariate analysis needed in analysing ecological communities, and tools for diversity analysis, and other potentially useful functions. Vegan is not self-contained but it must be run under R statistical environment, and it also depends on many other R packages. Vegan is free software and distributed under GPL2 license.
R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.
R has a home page at http://www.R-project.org/. It is free software distributed under a GNU-style copyleft, and an official part of the GNU project (“GNU S”).
Both R and latest release version of vegan can be obtained through CRAN. Unstable development version of vegan can be obtained through R-Forge.
Some vegan functions depend on packages MASS, mgcv,
cluster and lattice. These all are recommended standard
R packages that should be available in every R installation. In
addition, some vegan functions require non-standard R packages.
Vegan declares these packages only as suggested ones, and you can install vegan
and use most of its functions without these packages. The
non-standard packages needed by some vegan functions are:
ellipse
is needed by ordiellipse
scatterplot3d
is needed by ordiplot3d
rgl
is needed by ordirgl
tcltk
is needed by orditkplot
CRAN Task Views include entries like Environmetrics, Multivariate
and Spatial that describe several useful packages and
functions. If you install R package ctv, you can
inspect Task Views from your R session, and automatically install sets
of most important packages.
Vegan is a fully documented R package with standard help pages. These
are the most authoritative sources of documentation. Vegan package ships
with other documents which can be read with vegandocs command
(documented in the vegan help). The documents included in the vegan
package are
ChangeLog.
FAQ-vegan.pdf).
intro-vegan.pdf).
diversity-vegan.pdf).
decision-vegan.pdf).
varpart (partitioning.pdf).
Web documents outside the package include:
Roeland Kindt has made package BiodiversityR which provides a
GUI for vegan. The package is available at
CRAN.
It is not a mere GUI for vegan, but adds some new functions and
complements vegan functions in order to provide a
workbench for biodiversity analysis. You can install BiodiversityR using
install.packages("BiodiversityR") or graphical package
management menu in R. The GUI works on Windows, MacOS X and Linux.
Use command citation("vegan") in R to see the recommended
citation to be used in publications.
From version 1.10-0, vegan is developed at R-Forge and there is a general progression of version numbers mixed with stable (at CRAN) and devel versions (at R-Forge).
Up to versions 1.8-7 and 1.9-34, vegan version numbers were of type x.y-z, where number y is even for stable release versions at CRAN and odd for unstable release versions at my personal homepage. Version 1.8-8 was a backport of bug fixes from the 1.10 series.
In general, you do not need to build vegan from sources, but binary builds of release versions are available through CRAN for Windows and MacOS X. If you use some other operating systems, or want to use unstable devel versions, you may have to use source packages. Vegan is a standard R package, and can be built like instructed in R documentation. Vegan contains source files in C and FORTRAN, and you need appropriate compilers (which may need more work in Windows and MacOS X).
R-Forge runs daily
tests on the devel package, and if passed, it builds source package and
Windows binaries. You can install those packages within R with command
install.packages("vegan",
repos="http://r-forge.r-project.org/"). However, MacOS X binaries
are not available from R-Forge.
If you think you have found a bug in vegan, you should report it to me. The bug report should be so detailed that I can correct the bug. To correct a bug, I should be able to reproduce the buggy behaviour. Preferably, you should send me an example that causes a bug. If it needs a data set that is not available in R, you should send me minimal data set as well. You also should paste the output or error message in your message. You also should tell me which version of vegan you used.
Bug reports are welcome: they are the only way to make vegan non-buggy.
Please note that you shall not send bug reports to R mailing lists, since vegan is not a standard R package.
There also is a bug reporting tool at R-Forge, but you need to register as a site user to report bugs (this is site policy).
It is not necessarily a bug if some function gives different
results than you expect: That may be a deliberate design decision. It
may be useful to check the documentation of the function to see what
was the intended behaviour. It may also happen that function has an
argument to switch the behaviour to match your expectation. For
instance, function vegdist always calculates quantitative
indices (when this is possible). If you expect it to calculate a
binary index, you should use argument binary = TRUE.
Vegan is dependent on user contribution. All feedback is welcome. If you have problem with vegan, it may be as simple as incomplete documentation, and I'll do my best to improve the documents.
Feature requests also are welcome, but they are not necessarily fulfilled. A new feature will be added if it is easy to do and it looks useful to me or in general.
Contributed code and functions are welcome and more certain to be included than mere requests. However, not all functions will be added, but I must judge them to be suitable for vegan. I also audit the code, and typically I edit the code in vegan style for easier maintenance. All included contributions will be credited. You can easily see that many vegan functions were contributed by other people, and they are listed as authors in the documentation.
Yes. Most vegan methods can handle binary data or cover abundance data. Most statistical tests are based on permutation, and do not make distributional assumptions. There are some methods (mainly in diversity analysis) that need count data. These methods check that input data are integers, but they may be fooled by cover class data.
Most commonly the reason is that other software use presence–absence
data whereas vegan used quantitative data. Usually vegan indices are
quantitative, but you can use argument binary = TRUE to make them
presence–absence. However, the index name is the same in both cases,
although different names usually occur in literature. For instance,
Jaccard index actually refers to the binary index, but vegan uses
name "jaccard" for the quantitative index, too.
Another reason may be that indices indeed are defined differently, because people use same names for different indices.
You can use argument zerodist = "add" in metaMDS or
metaMDSdist to handle zero dissimilarities. With this argument,
zero dissimilarities are replace with a small above zero value, and they
can be handled in isoMDS. This is a kluge, and some people do
not like this. A more principal solution is to remove duplicate sites
using R command unique. However, after some standardizations or
with some dissimilarity indices, originally non-unique sites can have
zero dissimilarity, and you have to resort to the kluge (or work
harder with your data).
Function capscale regularly and normally gives warnings of
negative eigenvalues. These warnings are harmless, and capscale
will ignore the axes with negative eigenvalues. The warnings are
generated by underlying function cmdscale or metric
multidimensional scaling (a.k.a. principal coordinates analysis), The
metric MDS assumes that dissimilarities are metric, but most
ecologically useful indices are semimetric. The warnings only concern
the very last minor axes, and the axes with negative eigenvalues will
be ignored in capscale. If the warnings are
disturbing, you can use argument add = TRUE in capscale
which implements “correction method 2” of Legendre & Legendre (1998,
p. 434) in cmdscale.
You can get a warning about negative eigenvalues also with metric
indices if you have deficit rank data. This happens, for instance,
when number of species (columns) is lower than number of sites (rows),
or if some sites are linear combinations of other sites. You can find
the rank of the data using, for instance, vegan function rda
which is identical to capscale with Euclidean distance.
In general, vegan does not directly give any statistics on the “variance explained” by ordination axes or by the constrained axes. This is a design decision: I think this information is normally useless and often misleading. In community ordination, the goal typically is not to explain the variance, but to find the “gradients” or main trends in the data. The “total variation” often is meaningless, and all proportions of meaningless values also are meaningless. Often a better solution explains a smaller part of “total variation”. For instance, in unstandardized principal components analysis most of the variance is generated by a small number of most abundant species, and they are easy to “explain” because data really are not very multivariate. If you standardize your data, all species are equally important. The first axes explains much less of the “total variation”, but now they explain all species equally, and results typically are much more useful for the whole community. Correspondence analysis uses another measure of variation (which is not variance), and again it typically explains a “smaller proportion” with a better result. Detrended correspondence analysis and nonmetric multidimensional scaling even do not try to “explain” the variation, but use other criteria. All methods are incommensurable, and it is impossible to compare methods using “explanation of variation”.
If you still want to get “explanation of variation” (or a deranged editor requests that from you), it is possible to get this information for some methods:
rda, cca and capscale give the variation
of conditional (partialled), constrained (canonical) and residual
components, but you must calculate the proportions by hand. The
summary gives the contributions of the axes.
Function goodness gives the same statistics for individual
species or sites (species are unavailable with capscale). In
addition, there is a special function
varpart for unbiased partitioning of variance between up to four
separate components in redundancy analysis.
decorana).
The total amount of variation is unknown and undefined in detrended
correspondence analysis, and therefore proportions from total also are
unknown and undefined. DCA is not a method for
decomposition of variation, and therefore these proportions would not
make sense either.
stressplot displays the
nonlinear fit and gives this statistic.
Vegan does not have a concept of passive points, or a point that
should only little influence the ordination results. However, you can
add points to eigenvector methods using predict functions with
newdata. You can first perform an ordination without some
species or sites, and then you can find scores for all points
using your complete data as newdata. The predict
functions are available for basic eigenvector methods in vegan
(cca, rda, decorana, for an up-to-date list use
methods("predict")). You also can simulate the passive points in R by using
low weights to row and columns (this is the method used in software
with passive points). For instance, the following command makes row 3
“passive”: dune[3,] <- 0.001*dune[3,].
You should define a class variable as an R factor, and vegan will
automatically handle them with formula interface. You also can define
constrained ordination without formula interface, but then you must
code your class variables by hand.
R (and vegan) knows both unordered and ordered factors. Unordered factors are internally coded as dummy variables, but one redundant level is removed or aliased. With default contrasts, the removed level is the first one. Ordered factors are expressed as polynomial contrasts. Both of these contrasts explained in standard R documentation.
You should never make your own dummy variables, but you should use standard R factors. R will internally change these factors into dummies in a consistent and correct way.
vegan uses standard R utilities for defining
contrasts. The default in standard installations is to use treatment
contrasts, but you can change the behaviour globally setting
options or locally by using keyword contrasts. Please
check the R help pages and user manuals for details.
Aliased variable has no information because it can be expressed with the help of other variables. Such variables are automatically removed in constrained ordination in vegan. The aliased variables can be redundant levels of factors or whole variables.
Vegan function alias gives the defining equations for aliased
variables. If you only want to see the names of aliased variables or
levels in solution sol, write sol$CCA$alias.
You can fit vectors or class centroids for aliased variables using
envfit function. The envfit function uses weighted
fitting, and the fitted vectors are identical to the vectors in
correspondence analysis.
You can constrain your permutations within strata or levels of
factors. You can use stratified permutations in all vegan
functions that use permutation, such as adonis, anosim,
anova.cca, mantel, mrpp, envfit and
protest.
Vegan has an alternative permutation function permuted.index2
which allows restricted permutation designs for time series, line
transects, spatial grids and blocking factors. Over time, the other
functions that currently use the older permuted.index will be updated
to use permuted.index2, but at the moment it is used only in one
pilot function.
The default ordination plot function is intended for fast
plotting and it is not very configurable. To use different plotting
symbols, you should first create and empty ordination plot with
plot(..., type="n"), and then add points or text to
the created empty frame (here ... means other arguments you want
to give to your plot command). The points and text
commands are fully configurable, and allow different plotting symbols
and characters.
If there is a really high number of species or sites, the graphs often are congested and many labels are overwritten. It may be impossible to have complete readable graphics with some data sets. However, here are some tricks you can use:
plot(..., type="n"), if you are not satisfied with the default
graph. (Here and below ... means other arguments you want
to give to your plot command.)
identify
for ordination graphics, if you do not need to see all labels. You may
need to first create an empty plot using plot(..., type="n"), if
you are not satisfied with the default graph.
orditorp function that uses labels only if these can be
added to a graph without overwriting other labels, and points otherwise,
if you do not need to see all labels. You must first create an empty
plot using plot(..., type="n"), and then add labels or points
with orditorp.
orditkplot function that lets you drag
labels of points to better positions if you need to see all labels. Only
one set of points can be used.
plot functions allow you to zoom to a part of the
graph using xlim and ylim arguments to reduce clutter in
congested areas.
No. It may be possible to port TWINSPAN to vegan, but it is not among my top priorities. If anybody wants to try porting, I will be happy to help. TWINSPAN has a very permissive license, and it would be completely legal to port the function into R.
Some vegan functions, such as radfit use base R facility of
family in maximum likelihood estimation. This allows use of
several alternative error distributions, among them "poisson"
and "gaussian". The R family also defines the
deviance. You can see the equations for deviance with commands like
poisson()$dev or gaussian()$dev.
In general, deviance is 2 times log.likelihood shifted so that models with exact fit have zero deviance.