January, 2016

- Classification is an alternative to ordination: simplified description of the data
- Problem: World is not made of distinct classes
- Traditional tool of analysis, and several informal subjective systems: Finnish forest and mire types, Braun-Blanquet approach hierarchic classes
- Here we only describe objective, numerical methods

- There is a
*huge*number of classification methods – and you can always invent a new one **Hierarchic**methods have several levels of classification**Agglomerative**methods combine sampling units to classes and classes to classes**Divisive**methods split data, and then split the splits

**Non-hierarchic**methods perform classification at one level- Often optimized at one level iteratively

**Probabilistic**and**fuzzy**models: not a sharp classification, but a probability to belong to a class**Model-based clustering**: assume there is a real class structure with multivariate random variability- Bags of tricks combine several approaches to a (practical) tool

- Start with combining two most similar sampling units
- Proceed with combining two next similar or cluster to cluster
- Continue until all SUs are combined to a
**tree**

- All SUs are combined to a tree, and you can extract \(1...n\) classes

cl <- hclust(vegdist(dune), "average") groups <- cutree(cl, 4) plot(cl) rect.hclust(cl, 4)

## Hey Joe!

- How to define dissimilarity to classes?
**Nearest neighbour**or**Single linkage**: the closest points**Furthest neighbour**or**Complete linkage**: the most different points**Average linkage**: distances of group centroids

##

##

**Single Linkage**chains: individual SUs are joined to growing clusters- Finds
**discontinuities**: one good property of classes - Related to Minimum Spanning Tree: shortest route that joins all points together

- Finds
**Complete Linkage**makes compact clusters- Minimizes the
**diameter**of groups, or the longest possible distance within a cluster - Groups do not grow heteregeneous, but remain small and compact
- One good property of classification, but cannot be satisfied simultaneously with distinct classes

- Minimizes the
**Average Linkage**is a compromise between these two

You must enable Javascript to view this page properly.

- The height of the common root for two SUs is their distance in a tree
- Trees
*approximate*input dissimilarities **Single Linkage**tree distances are among**nearest**points in clusters**Complete Linkage**tree distances are among**furthest**points in clusters**Average Linkage**tree distances are among cluster**centroids**

.

d <- vegdist(sqrt(mite)) cl <- hclust(d, "complete") plot(cl, hang=-1, cex=0.8) rect.hclust(cl, 4)

gr <- cutree(cl, 4)

Basic analysis

boxplot(WatrCont ~ gr, data=mite.env, col="hotpink", lty=1, pch=16, notch=TRUE)

.

- Trees have no order: you can take a branch and turn it around its root
- Similarity between points can only interpreted topologically: through their common root
- Being a neihbour in a tree means nothing if the common root is far away

- Software arranges leaves arbitrarily or by simple criteria
- Often: more homogeneous sister (= longer branch down) to the left

- Tree can be
**reordered**by turning around branches on their root

.

.

anova(lm(WatrCont ~ gr, data=mite.env))

## Analysis of Variance Table ## ## Response: WatrCont ## Df Sum Sq Mean Sq F value Pr(>F) ## gr 1 790535 790535 88.427 6.339e-14 *** ## Residuals 68 607917 8940 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

mod <- metaMDS(d, trace = FALSE) plot(mod, dis="si") ordihull(mod, gr, col=1:4, draw="poly") ordispider(mod, gr, col=1:4, label=TRUE)

.

- Non-hierarchical partitioning classifies data into specified number of groups
- The number of groups must be decided in advance
- Optimality criteria based on ANOVA ideas

- Commonly iterative methods
- Start with some classification
- Move observations to the optimal group
- Re-evaluate group means and move again

- Several methods and bags of tricks

- Allocate observations to \(K\) groups
- Iterative: reallocate observations to better groups and re-evaluate centroids
- Hierarchic methods have a history: old decisions cannot be reconsidered to make better clusters at certain level
- \(K\)-means clustering used to optimize classification at a given level
- The classification criterion in \(K\)-means may not match with the original hierarchic clustering – even though there are several variants
- You cannot add new leaves to a tree, but you must construct a new tree
- \(K\)-means works by centroids, and you can allocate points to fixed centroids (in principle – but canned tools may be missing)

cl <- hclust(vegdist(sqrt(mite))) km <- kmeans(sqrt(mite), 4) plot(cl, hang=-1, label = fitted(km, "classes"), cex=0.7) rect.hclust(cl, 4)

. ## Optimizing \(K\)

plot(cascadeKM(decostand(mite, "hellinger"), 2, 20))

.

fc <- fanny(d, k=4, memb.exp=sqrt(2)) # cluster package head(fc$membership) ## Probability of membership

## [,1] [,2] [,3] [,4] ## 1 0.3918715 0.3618037 0.12316241 0.12316241 ## 2 0.3705928 0.3798974 0.12475488 0.12475488 ## 3 0.3864999 0.3978146 0.10784278 0.10784278 ## 4 0.3754520 0.4383846 0.09308168 0.09308168 ## 5 0.3678309 0.5166475 0.05776076 0.05776076 ## 6 0.3534142 0.5619900 0.04229790 0.04229790

head(fc$clustering) ## Crips clustering

## 1 2 3 4 5 6 ## 1 2 2 2 2 2

.