Hierarchical Clustering
Hierarchical Clustering
Introduction
Hierarchical clustering is an unsupervised learning method used to group similar objects into clusters. Unlike k-means clustering, hierarchical clustering does not require the number of clusters to be specified in advance. Instead, it builds a hierarchy of clusters that can be visualized as a dendrogram. In this lecture, we will learn how to perform hierarchical clustering in R, including model building, evaluation, and visualization.
Key Concepts
1. What is Hierarchical Clustering?
Hierarchical clustering is a method of cluster analysis that seeks to build a hierarchy of clusters. There are two main types of hierarchical clustering:
- Agglomerative (bottom-up): Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
- Divisive (top-down): All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
2. Linkage Criteria
The way clusters are merged or split is determined by the linkage criteria. Common linkage criteria include:
- Single linkage: Minimum distance between points in two clusters.
- Complete linkage: Maximum distance between points in two clusters.
- Average linkage: Average distance between points in two clusters.
- Ward’s method: Minimizes the total within-cluster variance.
Performing Hierarchical Clustering in R
1. Installing Required Packages
We will use the stats
and ggplot2
packages for hierarchical clustering and visualization.
# Installing required packages
install.packages("ggplot2")
2. Data Preparation
Before clustering, we need to prepare the data, which may include scaling the features.
# Loading the required packages
library(ggplot2)
# Creating a sample dataset
set.seed(123)
<- data.frame(
data x1 = rnorm(100),
x2 = rnorm(100)
)
# Scaling the data
<- scale(data)
scaled_data print(head(scaled_data))
3. Building the Model
You can build a hierarchical clustering model using the hclust()
function.
# Computing the distance matrix
<- dist(scaled_data)
dist_matrix
# Performing hierarchical clustering using complete linkage
<- hclust(dist_matrix, method = "complete")
hc_model
# Plotting the dendrogram
plot(hc_model, main = "Hierarchical Clustering Dendrogram", xlab = "", sub = "", cex = 0.9)
4. Cutting the Dendrogram
To create clusters, you need to cut the dendrogram at a specified height.
# Cutting the dendrogram to create 3 clusters
<- cutree(hc_model, k = 3)
clusters
# Adding cluster labels to the original data
$cluster <- as.factor(clusters)
dataprint(head(data))
5. Visualizing the Clusters
You can visualize the clusters using a scatter plot.
# Plotting the clusters
ggplot(data, aes(x = x1, y = x2, color = cluster)) +
geom_point() +
labs(title = "Hierarchical Clustering", x = "Feature 1", y = "Feature 2")
Example: Comprehensive Hierarchical Clustering Analysis
Here’s a comprehensive example of performing hierarchical clustering analysis in R.
# Loading the required packages
library(ggplot2)
# Creating a sample dataset
set.seed(123)
<- data.frame(
data x1 = rnorm(100),
x2 = rnorm(100)
)
# Scaling the data
<- scale(data)
scaled_data
# Computing the distance matrix
<- dist(scaled_data)
dist_matrix
# Performing hierarchical clustering using complete linkage
<- hclust(dist_matrix, method = "complete")
hc_model
# Plotting the dendrogram
plot(hc_model, main = "Hierarchical Clustering Dendrogram", xlab = "", sub = "", cex = 0.9)
# Cutting the dendrogram to create 3 clusters
<- cutree(hc_model, k = 3)
clusters
# Adding cluster labels to the original data
$cluster <- as.factor(clusters)
data
# Plotting the clusters
ggplot(data, aes(x = x1, y = x2, color = cluster)) +
geom_point() +
labs(title = "Hierarchical Clustering", x = "Feature 1", y = "Feature 2")
Summary
In this lecture, we covered how to perform hierarchical clustering in R, including building the model, cutting the dendrogram, and visualizing the results. Hierarchical clustering is a powerful tool for grouping similar objects without needing to specify the number of clusters in advance.
Further Reading
For more detailed information, consider exploring the following resources:
Call to Action
If you found this lecture helpful, make sure to check out the other lectures in the ML R series. Happy coding!