January, 2016

Why Classification

• Classification is an alternative to ordination: simplified description of the data
• Problem: World is not made of distinct classes
• Traditional tool of analysis, and several informal subjective systems: Finnish forest and mire types, Braun-Blanquet approach hierarchic classes
• Here we only describe objective, numerical methods

Classification of Classification

• There is a huge number of classification methods – and you can always invent a new one
• Hierarchic methods have several levels of classification
• Agglomerative methods combine sampling units to classes and classes to classes
• Divisive methods split data, and then split the splits
• Non-hierarchic methods perform classification at one level
• Often optimized at one level iteratively
• Probabilistic and fuzzy models: not a sharp classification, but a probability to belong to a class
• Model-based clustering: assume there is a real class structure with multivariate random variability
• Bags of tricks combine several approaches to a (practical) tool

Cluster Analysis

2. Proceed with combining two next similar or cluster to cluster
3. Continue until all SUs are combined to a tree

But It is a Tree, not a Classification

• All SUs are combined to a tree, and you can extract $$1...n$$ classes
cl <- hclust(vegdist(dune), "average")
groups <- cutree(cl, 4)
plot(cl)
rect.hclust(cl, 4)

## Hey Joe!

Clustering Strategies

• How to define dissimilarity to classes?
• Nearest neighbour or Single linkage: the closest points
• Furthest neighbour or Complete linkage: the most different points
• Average linkage: distances of group centroids

##

##

Properties of Basic Strategies

• Single Linkage chains: individual SUs are joined to growing clusters
• Finds discontinuities: one good property of classes
• Related to Minimum Spanning Tree: shortest route that joins all points together
• Complete Linkage makes compact clusters
• Minimizes the diameter of groups, or the longest possible distance within a cluster
• Groups do not grow heteregeneous, but remain small and compact
• One good property of classification, but cannot be satisfied simultaneously with distinct classes
• Average Linkage is a compromise between these two

Trees and Distances

• The height of the common root for two SUs is their distance in a tree
• Trees approximate input dissimilarities
• Single Linkage tree distances are among nearest points in clusters
• Complete Linkage tree distances are among furthest points in clusters
• Average Linkage tree distances are among cluster centroids

.

Interpretation of Clusters

d <- vegdist(sqrt(mite))
cl <- hclust(d, "complete")
plot(cl, hang=-1, cex=0.8)
rect.hclust(cl, 4)

gr <- cutree(cl, 4)

Basic analysis

Plot results

boxplot(WatrCont ~ gr, data=mite.env, col="hotpink", lty=1, pch=16, notch=TRUE)

.

Order of Tree Leaves

• Trees have no order: you can take a branch and turn it around its root
• Similarity between points can only interpreted topologically: through their common root
• Being a neihbour in a tree means nothing if the common root is far away
• Software arranges leaves arbitrarily or by simple criteria
• Often: more homogeneous sister (= longer branch down) to the left
• Tree can be reordered by turning around branches on their root

.

.

Numerical Analysis

anova(lm(WatrCont ~ gr, data=mite.env))
## Analysis of Variance Table
##
## Response: WatrCont
##           Df Sum Sq Mean Sq F value    Pr(>F)
## gr         1 790535  790535  88.427 6.339e-14 ***
## Residuals 68 607917    8940
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ordination and Clustering

mod <- metaMDS(d, trace = FALSE)
plot(mod, dis="si")
ordihull(mod, gr, col=1:4, draw="poly")
ordispider(mod, gr, col=1:4, label=TRUE)

.

Partitioning of Data

• Non-hierarchical partitioning classifies data into specified number of groups
• The number of groups must be decided in advance
• Optimality criteria based on ANOVA ideas
• Commonly iterative methods
2. Move observations to the optimal group
3. Re-evaluate group means and move again
• Several methods and bags of tricks

$$K$$-means Clustering

• Allocate observations to $$K$$ groups
• Iterative: reallocate observations to better groups and re-evaluate centroids
• Hierarchic methods have a history: old decisions cannot be reconsidered to make better clusters at certain level
• $$K$$-means clustering used to optimize classification at a given level
• The classification criterion in $$K$$-means may not match with the original hierarchic clustering – even though there are several variants
• You cannot add new leaves to a tree, but you must construct a new tree
• $$K$$-means works by centroids, and you can allocate points to fixed centroids (in principle – but canned tools may be missing)

Optimize Clustering

cl <- hclust(vegdist(sqrt(mite)))
km <- kmeans(sqrt(mite), 4)
plot(cl, hang=-1, label = fitted(km, "classes"), cex=0.7)
rect.hclust(cl, 4)

. ## Optimizing $$K$$

plot(cascadeKM(decostand(mite, "hellinger"), 2, 20))

.

Fuzzy Clustering

fc <- fanny(d, k=4, memb.exp=sqrt(2)) # cluster package
head(fc$membership) ## Probability of membership ## [,1] [,2] [,3] [,4] ## 1 0.3918715 0.3618037 0.12316241 0.12316241 ## 2 0.3705928 0.3798974 0.12475488 0.12475488 ## 3 0.3864999 0.3978146 0.10784278 0.10784278 ## 4 0.3754520 0.4383846 0.09308168 0.09308168 ## 5 0.3678309 0.5166475 0.05776076 0.05776076 ## 6 0.3534142 0.5619900 0.04229790 0.04229790 head(fc$clustering) ## Crips clustering
## 1 2 3 4 5 6
## 1 2 2 2 2 2

.