SIT384 Cyber security analytics
Pass Task 7.1P: K-Means and Hierarchical Clustering
Task description:
In machine learning, clustering is used for analyzing and grouping data which does not include pre- labeled class or even a class attribute at all. K-Means clustering and hierarchical clustering are all unsupervised learning algorithms.
K- means is a collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. It is a division of objects into clusters such that each object is in exactly one cluster, not several.
In Hierarchical clustering, clusters have a tree like structure or a parent child relationship. Here, the two most similar clusters are combined together and continue to combine until all objects are in the same cluster.
In this task, you use K-Means and Agglomerative Hierarchical algorithms to cluster a given dataset and compare their difference.
You are given:
np.random.seed(0)
make_blobs class with input:
o n_samples:200
o centers:[2,1],[-1,-1],[5,3],[9,4] o cluster_std:0.9
KMeans() function with setting: init = k-means++, n_clusters = 4, n_init = 12
AgglomerativeClustering() function with setting: n_clusters = 4, linkage = average
Other settings of your choice
You are asked to:
plot your created dataset
plot the two clustering models for your created dataset
set the K-Mean plot with title KMeans
set the Agglomerative Hierarchical plot with Agglomerative Hierarchical
calculate distance matrix for Agglomerative Clustering using the input feature matrix
(linkage = complete)
display dendrogram
Sample output as shown in the following figure is for demonstration purposes only. Yours might be different from the provided.
Submission:
Submit the following files to OnTrack:
1. Your program source code (e.g. task7_1.py)
2. A screen shot of your program running
Check the following things before submitting: 1. Add proper comments to your code
Reviews
There are no reviews yet.