[SOLVED] DS4400: Machine Learning Assignment 1

Whatsapp Us

attributes., column, columns, compositions, compressive, concrete., Concrete:, dataset, datasets, different, features, from, input, Learning, Machine, named, nine, Numerical, Other, predict, regression., repository, ridge, should, strength, taken, target, used, While, ‘strength’

[SOLVED] DS4400: Machine Learning Assignment 1

Name: [SOLVED] DS4400: Machine Learning Assignment 1
Brand: Assignment Chef
SKU: 48095
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

$25

File Name: DS4400__Machine_Learning_Assignment_1.zip
File Size: 348.54 KB

Categories: attributes., column, columns, compositions, compressive, concrete., Concrete:, dataset, datasets, different, features, from, input, Learning, Machine, named, nine, Numerical, Other, predict, regression., repository, ridge, should, strength, taken, target, used, While, ‘strength’ Tags: attributes., column, columns, compositions, compressive, concrete., Concrete:, dataset, datasets, different, features, from, input, Learning, Machine, named, nine, Numerical, Other, predict, regression., repository, ridge, should, strength, taken, target, used, While, ‘strength’

Description
Reviews (0)

5/5 - (1 vote)

Hyperparameter Tuning and Cross Validation

DS4400: Machine Learning I Bilal Ahmed

Introduction

Ridge Regression – Assignment 1 (Part-II) Due By: 7/17/2023 11:59 PM EST

In this assignment, we will implement hyperparameter tuning using grid search. Additionally, we will use random k-fold cross validation to estimate the error for our ridge regression implementation. You may use any libraries you wish for plotting. You may submit your solutions as python programs + well-named image files (png, jpg, or pdf) or you may submit a Jupyter notebook. If using Jupyter, please also upload the notebook saved as a .pdf.

Datasets

Concrete: The dataset is taken from the UCI machine learning datasets repository and has nine numerical attributes. The aim of the dataset is to predict the compressive strength of different compositions and age of concrete. The target column is named ‘strength’ while all other columns should be used as input features to ridge regression.

Instructions

Read the concrete data set (8 attributes, one response variable) into a dataframe using pandas
Read the scikit-learn documentation for the StandardScaler (here), Pipeline (here), and Ridge Regression implementation (here)
Create a pipeline that can standardize the input data and then predict / train a ridge regressor.
Hyperparameter Tuning: To estimate the best value of alpha (lambda in the course notes) for our ridge regression model, we will use grid search. Scikit-learn has a built-in method for doing grid search called GridSearchCV (doc). Additionally, we also need to implement cross validation for estimating model performance at each grid point. To this end, we will use k-fold cross validation that is implemented in scikit-learn as KFold (doc).
1. Create a KFold object with k=5 (for five fold cross validation), setting random_state=44 and shuffle=True. What do these parameters signify and what is their importance for estimating model performance? (5 points)
2. Perform grid search using the k-fold object in the previous step optimizing mean squared error (MSE).
  1. Use a grid with alpha values = [0, 0.05, 0.1, 0.5, 1.0, 5.0, 10.0,
    
    50.0]
  2. Report the best value of the alpha parameter and the best score for the concrete dataset. (Note: Scikit treats higher value of scores as better model performance and to minimize the error the negative value of the error should be used for scoring (see here )) (15 points)
Estimating MSE for the dataset: Using the optimal value of alpha that we obtained using grid search, we will now estimate the MSE of our ridge regressor on the concrete dataset.
1. For k = [5, 10] set up a KFold object similar to the settings in part 4a. (These will be two different objects)
2. Use scikit-learn’s cross_val_score function (doc) estimate the model performance (mean squared error) on the concrete dataset for both k=5 and k=10. (25 points)
Using only a single run of the k-fold strategy can result in noisy estimates and a standard way of reducing the noise is to run k-fold multiple times using different partitions (randomly created) in each run. Scikit-learn provides an implementation of this strategy using RepeatedKFold (doc). Re-Implement 5a and 5b using the 5 and 10 number of repeats. Keep the random_state the same and set the n_repeats to 5 and 10 for each value of k in 5a.
1. Implementation (25 points)
2. Why would running k-fold once produce noisy estimates? (5 points)
3. How would the repeated k-fold strategy scale as the size of the dataset both in terms of the number of data points and the number of input features increases? (5 points)

Whatsapp Us

[SOLVED] DS4400: Machine Learning Assignment 1

Reviews

Related products

[SOLVED] IEOR E4525 Spring 2020 Machine Learning Fall Exam Java

[SOLVED] Chemistry 125-225 Machine Learning in Chemistry Fall Quarter 2024Python

[SOLVED] MATH38161 Multivariate Statistics and Machine Learning R

[SOLVED] Cse 590 introduction to machine learning mid-term exam 2

[SOLVED] Coms 4771: machine learning homework 4

[SOLVED] Coms 4771: machine learning homework 3