Agglomerative Hierarchical Clustering Overview

Overview

Agglomerative hierarchical clustering is a bottom-up clustering method where clusters have sub-clusters, which in turn have sub-clusters, etc. The classic example of this is species taxonomy. Gene expression data might also exhibit this hierarchical quality (e.g. neurotransmitter gene families). Agglomerative hierarchical clustering starts with every single object (gene or sample) in a single cluster. Then, in each successive iteration, it agglomerates (merges) the closest pair of clusters by satisfying some similarity criteria, until all of the data is in one cluster.

The hierarchy within the final cluster has the following properties:

Clusters generated in early stages are nested in those generated in later stages.
Clusters with different sizes in the tree can be valuable for discovery.

A Matrix Tree Plot visually demonstrates the hierarchy within the final cluster, where each merger is represented by a binary tree.

Process

Assign each object to a separate cluster.
Evaluate all pair-wise distances between clusters (distance metrics are described in Distance Metrics Overview).
Construct a distance matrix using the distance values.
Look for the pair of clusters with the shortest distance.
Remove the pair from the matrix and merge them.
Evaluate all distances from this new cluster to all other clusters, and update the matrix.
Repeat until the distance matrix is reduced to a single element.

Advantages

It can produce an ordering of the objects, which may be informative for data display.
Smaller clusters are generated, which may be helpful for discovery.

Disadvantages

No provision can be made for a relocation of objects that may have been 'incorrectly' grouped at an early stage. The result should be examined closely to ensure it makes sense.
Use of different distance metrics for measuring distances between clusters may generate different results. Performing multiple experiments and comparing the results is recommended to support the veracity of the original results.

Divisive Hierarchical Clustering

A top-down clustering method and is less commonly used. It works in a similar way to agglomerative clustering but in the opposite direction. This method starts with a single cluster containing all objects, and then successively splits resulting clusters until only clusters of individual objects remain. GeneLinker™ does not support divisive hierarchical clustering.