Workshop Week 10
COMP20008 2021S2 Clustering
Q1: Consider the 1-dimensional data set with 10 data points {1,2,3,10}. Show the iterations of the k-means algorithm using Euclidean distance when k = 2, and the random seeds are initialized to {1, 2}.
Copyright By Assignmentchef assignmentchef
Iteration 1 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1] Centroids: [1.0, 6.0]
0 means 1 , 1 means cluster 2
Iteration 2 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 0, 0, 1, 1, 1, 1, 1, 1, 1] Centroids: [2.0, 7.0]
Iteration 3 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 0, 0, 0, 1, 1, 1, 1, 1, 1] Centroids: [2.5, 7.5]
Consider the 1-dimensional data set with 10 data points {1,2,3,10}. Show the iterations of the k- means algorithm using Euclidean distance when k = 2, and the random seeds are initialized to {1, 2}.
Iteration 4 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] Centroids: [3.0, 8.0]
Iteration 5 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] Centroids: [3.0, 8.0]
Q2: Repeat Exercise 1 using agglomerative hierarchical clustering and Euclidean distance,
with single linkage (min) criterion.
Dissimilarity Matrix
Initially, how many clusters do we have?
Step1: Calculate Distances between every pair of observation: Euclidean Distance
Inter-point distance Matrix
Dissimilarity Matrix
3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8
1 2 3 4 5 6 7 8 9 10
Dendrogram Plot X-axisobservations , Y-axisdistances
Inter-point distance Matrix
Step 2: Choose the most similar two observations to merge (i.e. Closest) (i.e. pair with the minimum distance in Dissimilarity Matrix)
Dissimilarity Matrix
Inter-point distance Matrix
Step 3: Update Dissimilarity Matrix: Calculate the distance between Cluster12 and all other observations (calculate linkage using min)
Dissimilarity Matrix
Inter-point distance Matrix
Step 3: Update Dissimilarity Matrix: Calculate the distance between Cluster12 and all other observations (calculate linkage using min)
How many clusters do we have now?
Updated Dissimilarity Matrix
3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8
Updated distance Matrix
1 2 3 4 5 6 7 8 9 10
Extended Dendrogram Plot X-axisobservations , Y-axisdistances
Repeat Step 2: Choose the most similar two observations to merge (i.e. Closest) (i.e. pair with the minimum distance in Dissimilarity Matrix)
Dissimilarity Matrix
Inter-point distance Matrix
Repeat Step 3: Update Dissimilarity Matrix: Calculate the distance between Cluster12 and all other observations (calculate single linkage using min)
Dissimilarity Matrix
Inter-point distance Matrix
Repeat Step 3: Update Dissimilarity Matrix: Calculate the distance between Cluster12 and all other observations (calculate linkage using min)
Lets see some python code
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.