Name: [SOLVED] CS ba2.tgz
Brand: Assignment Chef
SKU: 1659647073
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

ba2.tgz

CSCC11 Introduction to Machine Learning, Winter 2021, Assignment 3
B. Chan, Z. Zhang, D. Fleet

Answer The Following Questions:

Visualization:
1. Do you expect logistic regression to perform well on generic_1? Why?

What if we apply the feature map defined in Equation (2) on the assignment handout?

2. Do you expect logistic regression to perform well on generic_2? Why?

What if we apply the feature map defined in Equation (2) on the assignment handout?

3. Do you expect logistic regression to perform well on generic_3? Why?

4. Why cant we directly visualize the wine dataset? What are some ways to visualize it?

Analysis:
1. Generic Dataset 1: Run logistic regression without regularization and without feature map.
Did you run into any numerical errors? If so, why do you think this is the case?

Now, run logistic regression with regularization. What happens?
What are the train and test accuracies?

2. Generic Dataset 2: Run logistic regression without regularization and without feature map.
What are the train and test accuracies?

Run it with feature map now, did the performance get better? Why do you think that is the case?

3. Generic Dataset 3: Run logistic regression without regularization and without feature map.
What are the train and test accuracies?

What if we run it with feature map?

4. What are the training and validation accuracies for the wine dataset?

CSCC11 Introduction to Machine Learning, Winter 2021, Assignment 3
B. Chan, Z. Zhang, D. Fleet

import numpy as np

from utils import softmax

class LogisticRegression:
def __init__(self,
num_features,
num_classes,
rng=np.random):
This class represents a multinomial logistic regression model.
NOTE: We assume lables are 0 to K 1, where K is number of classes.

self.parameters contains the model weights.
NOTE: Bias term is the first term

TODO: You will need to implement the methods of this class:
_compute_loss_and_gradient: ndarray, ndarray -> float, ndarray

Implementation description will be provided under each method.

For the following:
N: Number of samples.
D: Dimension of input features.
K: Number of classes.

Args:
num_features (int): The number of features in the input data.
num_classes (int): The number of classes in the task.
rng (RandomState): The random number generator to initialize weights.

self.num_features = num_features
self.num_classes = num_classes
self.rng = rng

# Initialize parameters
self.parameters = np.zeros(shape=(num_classes, self.num_features + 1))

def init_weights(self, factor=1, bias=0):
This randomly initialize the model weights.

Args:
factor (float): A constant factor of the randomly initialized weights.
bias (float): The bias value

self.parameters[:, 1:] = factor * self.rng.rand(self.num_classes, self.num_features)
self.parameters[:, 0] = bias

def _compute_loss_and_gradient(self, X, y, alpha_inverse=0, beta_inverse=0):
This computes the negative log likelihood (NLL) or negative log posterior and its gradient.

NOTE: When we have alpha_inverse != 0 or beta_inverse != 0, we have negative log posterior (NLP) instead.
NOTE: For the L2 term, drop all the log constant terms and cosntant factor.
For the NLL term, divide by the number of data points (i.e. we are taking the mean).
The new loss should take the form:
E_new(w) = (NLL_term / N) + L2_term
NOTE: Compute the gradient based on the modified loss E_new(w)

Args:
X (ndarray (shape: (N, D))): A NxD matrix consisting N D-dimensional inputs.
y (ndarray (shape: (N, 1))): A N-column vector consisting N scalar outputs (labels).
alpha_inverse (float): 1 / variance for an optional isotropic Gaussianprior (for the weights) on NLP.
NOTE: 0 <= alpha_inverse. Setting alpha_inverse to 0 means no prior on weights.- beta_inverse (float): 1 / variance for an optional Gaussian prior (for the bias term) on NLP.NOTE: 0 <= beta_inverse. Setting beta_inverse to 0 means no prior on the bias term.Output:- nll (float): The NLL (or NLP) of the given inputs and outputs.- grad (ndarray (shape: (K, D + 1))): A Kx(D + 1) weight matrix (including bias) consisting the gradient of NLL (or NLP)(i.e. partial derivatives of NLL (or NLP) w.r.t. self.parameters).”””(N, D) = X.shape# ====================================================# TODO: Implement your solution within the box# ====================================================return nll, graddef learn(self,train_X,train_y,num_epochs=1000,step_size=1e-3,check_grad=False,verbose=False,alpha_inverse=0,beta_inverse=0,eps=np.finfo(np.float).eps):””” This performs gradient descent to learn the parameters given the training data.NOTE: This method mutates self.parametersArgs:- train_X (ndarray (shape: (N, D))): A NxD matrix consisting N D-dimensional training inputs.- train_y (ndarray (shape: (N, 1))): A N-column vector consisting N scalar training outputs (labels).- num_epochs (int): Number of gradient descent stepsNOTE: 1 <= num_epochs- step_size (float): Gradient descent step size- check_grad (bool): Whether or not to check gradient using finite difference.- verbose (bool): Whether or not to print gradient information for every step.- alpha_inverse (float): 1 / variance for an optional isotropic Gaussianprior (for the weights) on NLL.NOTE: 0 <= alpha_inverse. Setting alpha_inverse to 0 means no prior on weights.- beta_inverse (float): 1 / variance for an optional Gaussian prior (for the bias term) on NLL.NOTE: 0 <= beta_inverse. Setting beta_inverse to 0 means no prior on the bias term.- eps (float): Machine epsilonASIDE: The design for applying gradient descent to find local minimum is usually different from this. You should think about a better way to do this! Scipy is a good reference for such design.”””assert len(train_X.shape) == len(train_y.shape) == 2, f”Input/output pairs must be 2D-arrays. train_X: {train_X.shape}, train_y: {train_y.shape}”(N, D) = train_X.shapeassert N == train_y.shape[0], f”Number of samples must match for input/output pairs. train_X: {N}, train_y: {train_y.shape[0]}”assert D == self.num_features, f”Expected {self.num_features} features. Got: {D}”assert train_y.shape[1] == 1, f”train_Y must be a column vector. Got: {train_y.shape}”assert 1 <= num_epochs, f”Must take at least 1 gradient step. Got: {num_epochs}”nll, grad = self._compute_loss_and_gradient(train_X, train_y, alpha_inverse, beta_inverse)# Check gradient using finite differenceif check_grad:original_parameters = np.copy(self.parameters)grad_approx = np.zeros(shape=(self.num_classes, self.num_features + 1))h = 1e-8# Compute finite difference w.r.t. each weight vector componentfor ii in range(self.num_classes):for jj in range(self.num_features + 1):self.parameters = np.copy(original_parameters)self.parameters[ii][jj] += hgrad_approx[ii][jj] = (self._compute_loss_and_gradient(train_X, train_y, alpha_inverse, beta_inverse)[0] – nll) / h# Reset parameters back to originalself.parameters = np.copy(original_parameters)print(f”Negative Log Probability: {nll}”)print(f”Analytic Gradient: {grad.T}”)print(f”Numerical Gradient: {grad_approx.T}”)print(“The gradients should be nearly identical.”)# Perform gradient descentfor epoch_i in range(num_epochs):original_parameters = np.copy(self.parameters)# Check gradient flowif np.linalg.norm(grad) < eps:print(f”Gradient is close to 0: {eps}. Terminating gradient descent.”)break# Determine the suitable step size.step_size *= 2self.parameters = original_parameters – step_size * gradE_new, grad_new = self._compute_loss_and_gradient(train_X, train_y, alpha_inverse, beta_inverse)assert np.isfinite(E_new), f”Error is NaN/Inf”while E_new >= nll and step_size > 0:
step_size /= 2
self.parameters = original_parameters step_size * grad
E_new, grad_new = self._compute_loss_and_gradient(train_X, train_y, alpha_inverse, beta_inverse)
assert np.isfinite(E_new), fError is NaN/Inf

if step_size <= eps:print(f”Infinitesimal step: {step_size}. Terminating gradient descent.”)breakif verbose:print(f”Epoch: {epoch_i}, Step size: {step_size}, Gradient Norm: {np.linalg.norm(grad)}, NLL: {nll}”)# Update next loss and next gradientgrad = grad_newnll = E_newdef predict(self, X):””” This computes the probability of labels given X.Args:- X (ndarray (shape: (N, D))): A NxD matrix consisting N D-dimensional inputs.Output:- probs (ndarray (shape: (N, K))): A NxK matrix consisting N K-probabilities for each input.”””(N, D) = X.shapeassert D == self.num_features, f”Expected {self.num_features} features. Got: {D}”# Pad 1’s for bias termX = np.hstack((np.ones(shape=(N, 1), dtype=np.float), X))# This receives the probabilities of class 1 given inputsprobs = softmax(X @ self.parameters.T)return probs”””CSCC11 – Introduction to Machine Learning, Winter 2021, Assignment 3B. Chan, Z. Zhang, D. Fleet”””import _pickle as pickleimport numpy as npdef softmax(logits):””” This function applies softmax function to the logits.Args:- logits (ndarray (shape: (N, K))): A NxK matrix consisting N K-dimensional logits.Output:- (ndarray (shape: (N, K))): A NxK matrix consisting N K-categorical distribution.”””e_logits = np.exp(logits – np.max(logits, axis=1, keepdims=True))return e_logits / np.sum(e_logits, axis=1, keepdims=True)def load_pickle_dataset(file_path):””” This function loads a pickle file given a file path.Args:- file_path (str): The path of the pickle fileOutput:- (dict): A dictionary consisting the dataset content.”””return pickle.load(open(file_path, “rb”))”””CSCC11 – Introduction to Machine Learning, Winter 2021, Assignment 3B. Chan, M. Ammous, Z. Zhang, D. Fleet”””import numpy as npfrom logistic_regression import LogisticRegressionfrom utils import load_pickle_datasetdef train(train_X,train_y,test_X=None,test_y=None,data_preprocessing = lambda X: X,factor=1,bias=0,alpha_inverse=0,beta_inverse=0,num_epochs=1000,step_size=1e-3,check_grad=False,verbose=False):””” This function trains a logistic regression model given the data.Args:- train_X (ndarray (shape: (N, D))): A NxD matrix consisting N D-dimensional training inputs.- train_y (ndarray (shape: (N, 1))): A N-column vector consisting N scalar training outputs (labels).- test_X (ndarray (shape: (M, D))): A NxD matrix consisting M D-dimensional test inputs.- test_y (ndarray (shape: (M, 1))): A N-column vector consisting M scalar test outputs (labels).- data_preprocessing (ndarray -> ndarray): A data-preprocessing function that is applied on both the
training and test inputs.

Initialization Args:
factor (float): A constant factor of the randomly initialized weights.
bias (float): The bias value

Learning Args:
num_epochs (int): Number of gradient descent steps
NOTE: 1 <= num_epochs- step_size (float): Gradient descent step size- check_grad (bool): Whether or not to check gradient using finite difference.- verbose (bool): Whether or not to print gradient information for every step.”””train_accuracy = 0# ====================================================# TODO: Implement your solution within the box# Step 0: Apply data-preprocessing (i.e. feature map) on the input data# Step 1: Initialize model and initialize weights# Step 2: Train the model# Step 3: Evaluate training performance# ====================================================train_preds = np.argmax(train_probs, axis=1)train_accuracy = 100 * np.mean(train_preds == train_y.flatten())print(“Training Accuracy: {}%”.format(train_accuracy))if test_X is not None and test_y is not None:test_accuracy = 0# ====================================================# TODO: Implement your solution within the box# Evaluate test performance# ====================================================test_preds = np.argmax(test_probs, axis=1)test_accuracy = 100 * np.mean(test_preds == test_y.flatten())print(“Test Accuracy: {}%”.format(test_accuracy))def feature_map(X):””” This function perform applies a feature map on the given input.Given any 2D input vector x, the output of the feature map psi is a 3D vector, defined as:psi(x) = (x_1, x_2, x_1 * x_2)^TArgs:- X (ndarray (shape: (N, 2))): A Nx2 matrix consisting N 2-dimensional inputs.Output:- X_mapped (ndarray (shape: (N, 3))): A Nx3 matrix consisting N 3-dimensional vectors corresponding to the outputs of the feature map applied on the inputs X.”””assert X.shape[1] == 2, f”This feature map only applies to 2D inputs. Got: {X.shape[1]}”# ====================================================# TODO: Implement your non-linear-map here# ====================================================return X_mappedif __name__ == “__main__”:seed = 0np.random.seed(seed)# Support generic_1, generic_2, generic_3, winedataset = “generic_3″assert dataset in (“generic_1”, “generic_2”, “generic_3”, “wine”), f”Invalid dataset: {dataset}”dataset_path = f”./datasets/{dataset}.pkl”data = load_pickle_dataset(dataset_path)train_X = data[‘train_X’]train_y = data[‘train_y’]test_X = test_y = Nonetest_X = test_y = Noneif ‘test_X’ in data and ‘test_y’ in data:test_X = data[‘test_X’]test_y = data[‘test_y’]# ====================================================# Hyperparameters# NOTE: This is definitely not the best way to pass all your hyperparameters.# We can usually use a configuration file to specify these.# ====================================================factor = 1bias = 0alpha_inverse = 0beta_inverse = 0num_epochs = 1000step_size = 1e-3apply_data_preprocessing = Falsecheck_grad = Trueverbose = Falsedata_preprocessing = lambda X: Xif apply_data_preprocessing:data_preprocessing = feature_maptrain(train_X=train_X,train_y=train_y,test_X=test_X,test_y=test_y,data_preprocessing=data_preprocessing,factor=factor,bias=bias,alpha_inverse=alpha_inverse,beta_inverse=beta_inverse,num_epochs=num_epochs,step_size=step_size,check_grad=check_grad,verbose=verbose)”””CSCC11 – Introduction to Machine Learning, Winter 2021, Assignment 3B. Chan, Z. Zhang, D. Fleet”””import matplotlib.pyplot as pltimport numpy as npfrom utils import load_pickle_datasetdef visualize_2d_data(X, y):””” This function generates a 2D scatter plot given the inputs and their corresponding labels.Inputs with different classes are represented with different colours.Args:- X (ndarray (shape: (N, D))): A NxD matrix consisting N D-dimensional inputs.- y (ndarray (shape: (N, 1))): A N-column vector consisting N scalar outputs (labels).”””assert len(X.shape) == len(y.shape) == 2, f”Input/output pairs must be 2D-arrays. X: {X.shape}, y: {y.shape}”(N, D) = X.shapeassert N == y.shape[0], f”Number of samples must match for input/output pairs. X: {N}, y: {y.shape[0]}”assert D == 2, f”Expected 2 features. Got: {D}”assert y.shape[1] == 1, f”Y must be a column vector. Got: {y.shape}”# ====================================================# TODO: Implement your solution within the box# ====================================================if __name__ == “__main__”:# Support generic_1, generic_2, generic_3dataset = “generic_3″assert dataset in (“generic_1”, “generic_2”, “generic_3”, “wine”), f”Invalid dataset: {dataset}”dataset_path = f”./datasets/{dataset}.pkl”data = load_pickle_dataset(dataset_path)visualize_2d_data(data[‘train_X’], data[‘train_y’])A2/datasets/generic_3.pklA2/datasets/generic_1.pklA2/datasets/wine.pklA2/datasets/generic_2.pklA2/Questions.txtA2/logistic_regression.pyA2/utils.pyA2/train_logistic_regression.pyA2/visualize_generic.py

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] CS ba2.tgz

Reviews

Related products

[SOLVED] COP 3223 Program #2: P2 Lottery

[Solved] Program that has three functions: sepia(), remove_all_red(), and gray_scale()

[Solved] Indel

[Solved] Program that reads in the file climate_data_2017_numeric.csv

[Solved] Problem 3: Who are the Winners

[SOLVED] COP 3223 Program #4: Turtle Time and List Power