January, 2016


Why Classification

  • Classification is an alternative to ordination: simplified description of the data
  • Problem: World is not made of distinct classes
  • Traditional tool of analysis, and several informal subjective systems: Finnish forest and mire types, Braun-Blanquet approach hierarchic classes
  • Here we only describe objective, numerical methods

Classification of Classification

  • There is a huge number of classification methods – and you can always invent a new one
  • Hierarchic methods have several levels of classification
    • Agglomerative methods combine sampling units to classes and classes to classes
    • Divisive methods split data, and then split the splits
  • Non-hierarchic methods perform classification at one level
    • Often optimized at one level iteratively
  • Probabilistic and fuzzy models: not a sharp classification, but a probability to belong to a class
  • Model-based clustering: assume there is a real class structure with multivariate random variability
  • Bags of tricks combine several approaches to a (practical) tool

Hierarchic Agglomerative Clustering

Cluster Analysis

  1. Start with combining two most similar sampling units
  2. Proceed with combining two next similar or cluster to cluster
  3. Continue until all SUs are combined to a tree

But It is a Tree, not a Classification

  • All SUs are combined to a tree, and you can extract \(1...n\) classes
cl <- hclust(vegdist(dune), "average")
groups <- cutree(cl, 4)
rect.hclust(cl, 4)

## Hey Joe!

Clustering Strategies

  • How to define dissimilarity to classes?
  • Nearest neighbour or Single linkage: the closest points
  • Furthest neighbour or Complete linkage: the most different points
  • Average linkage: distances of group centroids


All Sensible, but Different


Properties of Basic Strategies

  • Single Linkage chains: individual SUs are joined to growing clusters
    • Finds discontinuities: one good property of classes
    • Related to Minimum Spanning Tree: shortest route that joins all points together
  • Complete Linkage makes compact clusters
    • Minimizes the diameter of groups, or the longest possible distance within a cluster
    • Groups do not grow heteregeneous, but remain small and compact
    • One good property of classification, but cannot be satisfied simultaneously with distinct classes
  • Average Linkage is a compromise between these two

Minimum Spanning Tree

You must enable Javascript to view this page properly.

Trees and Distances

  • The height of the common root for two SUs is their distance in a tree
  • Trees approximate input dissimilarities
  • Single Linkage tree distances are among nearest points in clusters
  • Complete Linkage tree distances are among furthest points in clusters
  • Average Linkage tree distances are among cluster centroids

Cophenetic Correlation


Interpretation of Clusters

Interpretation of Clusters

d <- vegdist(sqrt(mite))
cl <- hclust(d, "complete")
plot(cl, hang=-1, cex=0.8)
rect.hclust(cl, 4)

gr <- cutree(cl, 4)

Basic analysis

Plot results

boxplot(WatrCont ~ gr, data=mite.env, col="hotpink", lty=1, pch=16, notch=TRUE)


Order of Tree Leaves

  • Trees have no order: you can take a branch and turn it around its root
  • Similarity between points can only interpreted topologically: through their common root
    • Being a neihbour in a tree means nothing if the common root is far away
  • Software arranges leaves arbitrarily or by simple criteria
    • Often: more homogeneous sister (= longer branch down) to the left
  • Tree can be reordered by turning around branches on their root

Tree and a Gradient


Tree Reordered by a Gradient


Numerical Analysis

anova(lm(WatrCont ~ gr, data=mite.env))
## Analysis of Variance Table
## Response: WatrCont
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## gr         1 790535  790535  88.427 6.339e-14 ***
## Residuals 68 607917    8940                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ordination and Clustering

mod <- metaMDS(d, trace = FALSE)
plot(mod, dis="si")
ordihull(mod, gr, col=1:4, draw="poly")
ordispider(mod, gr, col=1:4, label=TRUE)


Non-hierarchical classification

Partitioning of Data

  • Non-hierarchical partitioning classifies data into specified number of groups
  • The number of groups must be decided in advance
    • Optimality criteria based on ANOVA ideas
  • Commonly iterative methods
    1. Start with some classification
    2. Move observations to the optimal group
    3. Re-evaluate group means and move again
  • Several methods and bags of tricks

\(K\)-means Clustering

  • Allocate observations to \(K\) groups
  • Iterative: reallocate observations to better groups and re-evaluate centroids
  • Hierarchic methods have a history: old decisions cannot be reconsidered to make better clusters at certain level
  • \(K\)-means clustering used to optimize classification at a given level
  • The classification criterion in \(K\)-means may not match with the original hierarchic clustering – even though there are several variants
  • You cannot add new leaves to a tree, but you must construct a new tree
  • \(K\)-means works by centroids, and you can allocate points to fixed centroids (in principle – but canned tools may be missing)

Optimize Clustering

cl <- hclust(vegdist(sqrt(mite)))
km <- kmeans(sqrt(mite), 4)
plot(cl, hang=-1, label = fitted(km, "classes"), cex=0.7)
rect.hclust(cl, 4)

. ## Optimizing \(K\)

plot(cascadeKM(decostand(mite, "hellinger"), 2, 20))


Fuzzy Clustering

fc <- fanny(d, k=4, memb.exp=sqrt(2)) # cluster package
head(fc$membership) ## Probability of membership
##        [,1]      [,2]       [,3]       [,4]
## 1 0.3918715 0.3618037 0.12316241 0.12316241
## 2 0.3705928 0.3798974 0.12475488 0.12475488
## 3 0.3864999 0.3978146 0.10784278 0.10784278
## 4 0.3754520 0.4383846 0.09308168 0.09308168
## 5 0.3678309 0.5166475 0.05776076 0.05776076
## 6 0.3534142 0.5619900 0.04229790 0.04229790
head(fc$clustering) ## Crips clustering
## 1 2 3 4 5 6 
## 1 2 2 2 2 2