CMPE 255-02, Spring 2024
Assignment #2
Release on March 6th, 2023
Due 11:59pm on Sunday, March 17th, 2023
Notes
This assignment should be submitted in Canvas as a format of ipython notebook (assignment2_yourFirstName_LastnName.ipynb).
No late assignments will be accepted.
You may collaborate on homework but must write independent code/solutions. Copying and other forms of cheating will not be tolerated and will result in a zero score for the homework (minimal penalty) or a failing grade for the course. Your work will be graded in terms of correctness, completeness, and clarity, not just the answer. Thus, correct answers with no or poorly written supporting steps may receive very little credit.
NOTE: Please do not use any package/library including scikit-learn library except NumPy, Pandas, and Matplotlib.
Please download cluster_data1.csv.
X1 |
X2 |
X3 |
X4 |
6.7 |
3 |
5 |
1.7 |
6.3 |
2.9 |
5.6 |
1.8 |
5.6 |
3 |
4.5 |
1.5 |
7.6 |
3 |
6.6 |
2.1 |
6 |
3.4 |
4.5 |
1.6 |
6.4 |
3.2 |
5.3 |
2.3 |
7.7 |
2.8 |
6.7 |
2 |
4.8 |
3 |
1.4 |
0.3 |
5 |
3 |
1.6 |
0.2 |
5 |
3.4 |
1.6 |
0.4 |
-
(4 pts) Implanting K-means clustering algorithm
K-means algorithm is a method to automatically cluster similar data examples together. K-means is an iterative procedure that starts by guessing the initial centroids, and then refines this guess by repeatedly assigning examples to their closest centroids and then recomputing the centroids based on the assignments until converge.
Let’s assume K=3. Please implement K-means clustering algorithm from scratch.
Please plot the location of k centroids like the below figure for the first 5 steps including the initial setting of the centroids.
-
(2 pts) Run a few trials
Note that the converged solution may not always be ideal and depends on the initial setting of the centroids. Therefore, in practice the K-means algorithm is usually run a few times with different random initializations. One way to choose between these different solutions from different random initializations is to choose the one with the lowest objective function value.
Please run 5 times with different random initializations including the previous trial and calculate the objective function values. Then, choose the trial with the lowest objective function value.
-
(4 pts) K-mean++ algorithm
Please implement K-mean++ algorithm from scratch to initialize the centroids. Please plot each step of K-means as it proceeds until k centroids do not move. Please compare current result with the result from #2.
Reviews
There are no reviews yet.