SIT384 Cyber security analytics
Distinction Task 8.3D: DBSCAN and its parameters
Task description:
DBSCAN is a density based algorithm it assumes clusters for dense regions. It doesnt require that every point be assigned to a cluster and hence doesnt partition the data, but instead extracts the dense clusters and leaves sparse background classified as noise or outlier.
As a first step DBSCAN transforms the space according to the density of the data: points in dense regions are left alone, while points in sparse regions are moved further away. The dense region is determined by one distance parameter (called epsilon or eps) and another parameter (called min_samples in sklearn). The algorithm can be quite sensitive to the choice of the parameters. Finally the combination of eps and min_samples amounts to a choice of density and the clustering only finds clusters at or above that density; if your data has variable density clusters then DBSCAN is either going to miss them, split them up, or lump some of them together depending on your parameter choices.
In this task, you use DBSCAN to cluster a given dataset with different parameter settings to see the difference.
You are given:
a 2-dimensional dataset called Complex8_RN15, with is the variation of the Complex8 dataset with 15% gaussian noise added to the original Complex8 dataset.
The Complex8_RN15 dataset has attributes x, y, class:
The dataset is available in task resources zip file. It can also be obtained at:
https://drive.google.com/file/d/1_geQDIMQUNHhc3d7zRxfOrOreTT3HBhU/view
plot settings:
fig,ax = plt.subplots(figsize=(7, 7), dpi=100)
ax.scatter(, alpha=0.25, s=60, linewidths=0)
Other settings of your choice You are asked to:
set MIN_SAMPLES = 5
use DBSCAN (eps=5, 10, 12, 15, 20) to first fit and predict clusters
create plots for eps=5, 10, 12, 15, 20, respectively with proper plot titles
create a plot using original class (y) for comparison
Sample output as shown in the following figures are for demonstration purposes only. Yours might be different from the provided.
Submission:
Submit the following files to OnTrack:
1. Your program source code (e.g. task8_3.py)
2. A screen shot of your program running
Check the following things before submitting: 1. Add proper comments to your code
Reviews
There are no reviews yet.