[Solved] CS6375 Assignment 4

$25

File Name: CS6375_Assignment_4.zip
File Size: 178.98 KB

SKU: [Solved] CS6375 Assignment 4 Category: Tag:
5/5 - (1 vote)
  1. **Support Vector Machines with Synthetic Data**,.

For this problem, we will generate synthetic data for a nonlinear binary classification problem and partition it into training, validation and test sets. Our goal is to understand the behavior of SVMs with Radial-Basis Function (RBF) kernels with different values of C and .

# DO NOT EDIT THIS FUNCTION; IF YOU WANT TO PLAY AROUND WITH DATA GENERATION,

# MAKE A COPY OF THIS FUNCTION AND THEN EDIT

#

import numpy as np

from sklearn.datasets import make_moons from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap

def generate_data(n_samples, tst_frac=0.2, val_frac=0.2):

# Generate a non-linear data set

X, y = make_moons(n_samples=n_samples, noise=0.25, random_state=42)

# Take a small subset of the data and make it VERY noisy; that is, generate outliers m = 30

np.random.seed(30) # Deliberately use a different seed

ind = np.random.permutation(n_samples)[:m]

X[ind, :] += np.random.multivariate_normal([0, 0], np.eye(2), (m, )) y[ind] = 1 y[ind]

# Plot this data

cmap = ListedColormap([#b30065, #178000])

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap, edgecolors=k)

# First, we use train_test_split to partition (X, y) into training and test sets

X_trn, X_tst, y_trn, y_tst = train_test_split(X, y, test_size=tst_frac, random_state=42)

# Next, we use train_test_split to further partition (X_trn, y_trn) into tra ining and validation sets

X_trn, X_val, y_trn, y_val = train_test_split(X_trn, y_trn, test_size=val_fr ac, random_state=42)

return (X_trn, y_trn), (X_val, y_val), (X_tst, y_tst)

#

# DO NOT EDIT THIS FUNCTION; IF YOU WANT TO PLAY AROUND WITH VISUALIZATION,

# MAKE A COPY OF THIS FUNCTION AND THEN EDIT

#

def visualize(models, param, X, y):

# Initialize plotting if len(models) % 3 == 0: nrows = len(models) // 3 else: nrows = len(models) // 3 + 1

fig, axes = plt.subplots(nrows=nrows, ncols=3, figsize=(15, 5.0 * nrows)) cmap = ListedColormap([#b30065, #178000])

# Create a mesh

xMin, xMax = X[:, 0].min() 1, X[:, 0].max() + 1 yMin, yMax = X[:, 1].min() 1, X[:, 1].max() + 1 xMesh, yMesh = np.meshgrid(np.arange(xMin, xMax, 0.01), np.arange(yMin, yMax, 0.01))

for i, (p, clf) in enumerate(models.items()): # if i > 0: # break r, c = np.divmod(i, 3) ax = axes[r, c]

# Plot contours

zMesh = clf.decision_function(np.c_[xMesh.ravel(), yMesh.ravel()]) zMesh = zMesh.reshape(xMesh.shape)

ax.contourf(xMesh, yMesh, zMesh, cmap=plt.cm.PiYG, alpha=0.6)

if (param == C and p > 0.0) or (param == gamma): ax.contour(xMesh, yMesh, zMesh, colors=k, levels=[-1, 0, 1], alpha=0.5, linestyles=[, -, ])

# Plot data

ax.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap, edgecolors=k) ax.set_title({0} = {1}.format(param, p))

a. The effect of the regularization parameter, C

Complete the Python code snippet below that takes the generated synthetic 2-d data as input and learns nonlinear SVMs. Use scikit-learns SVC (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) function to learn SVM models with radial-basis kernels for fixed and various choices of

C {103,102 ,1, 105}. The value of is fixed to = d1X , where d is the data dimension and X is the standard deviation of the data set X. SVC can automatically use these setting for if you pass the argument gamma = scale (see documentation for more details).

Plot: For each classifier, compute both the training error and the validation error. Plot them together, making sure to label the axes and each curve clearly.

Discussion: How do the training error and the validation error change with C? Based on the visualization of the models and their resulting classifiers, how does changing C change the models? Explain in terms of minimizing the SVMs objective function ww xi,yi), where is the hinge loss for each training example (xi,yi).

Final Model Selection: Use the validation set to select the best the classifier corresponding to the best value, Cbest. Report the accuracy on the test set for this selected best SVM model. Note: You should report a single number, your final test set accuracy on the model corresponding to $C{best}$_.

File <ipython-input-4-8875a1448a41>, line 17 visualize(models, C, X_trn, y_trn)

^

IndentationError: expected an indented block

b. The effect of the RBF kernel parameter,

Complete the Python code snippet below that takes the generated synthetic 2-d data as input and learns various non-linear SVMs. Use scikit-learns SVC (https://scikit-

learn.org/stable/modules/generated/sklearn.svm.SVC.html) function to learn SVM models with radial-basis kernels for fixed C and various choices of {102,101 1,10, 102 103}. The value of C is fixed to

C = 10.

Plot: For each classifier, compute both the training error and the validation error. Plot them together, making sure to label the axes and each curve clearly.

Discussion: How do the training error and the validation error change with ? Based on the visualization of the

models and their resulting classifiers, how does changing change the models? Explain in terms of the functional form of the RBF kernel, (x, z) = exp( x z2)

Final Model Selection: Use the validation set to select the best the classifier corresponding to the best value, best. Report the accuracy on the test set for this selected best SVM model. Note: You should report a single number, your final test set accuracy on the model corresponding to $gamma{best}$_.

  1. **Breast Cancer Diagnosis with Support Vector Machines**, 25 points.

For this problem, we will use the Wisconsin Breast Cancer

(https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)) data set, which has already been pre-processed and partitioned into training, validation and test sets. Numpys loadtxt

(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.loadtxt.html) command can be used to load CSV files.

Use scikit-learns SVC (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) function to learn SVM models with radial-basis kernels for each combination of C {102,101,1,101, 104} and {103,102 101,1, 10, 102}. Print the tables corresponding to the training and validation errors.

Final Model Selection: Use the validation set to select the best the classifier corresponding to the best parameter values, Cbest and best. Report the accuracy on the test set for this selected best SVM model. Note: You should report a single number, your final test set accuracy on the model corresponding to $C{best} andgamma{best}$.

  1. **Breast Cancer Diagnosis with k-Nearest Neighbors**,

Use scikit-learns k-nearest neighbor (https://scikit-

learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) classifier to learn models for Breast Cancer Diagnosis with k {1, 5, 11, 15, 21}, with the kd-tree algorithm.

Plot: For each classifier, compute both the training error and the validation error. Plot them together, making sure to label the axes and each curve clearly.

Final Model Selection: Use the validation set to select the best the classifier corresponding to the best parameter value, kbest. Report the accuracy on the test set for this selected best kNN model. Note: You should report a single number, your final test set accuracy on the model corresponding to $k{best}$_.

Discussion: Which of these two approaches, SVMs or kNN, would you prefer for this classification task? Explain.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CS6375 Assignment 4
$25