8/27/2019 regressions
HW1: Linear & Logistic Regression
In this homework, you will read the lecture notes first and then implement linear regression and logistic regression using this jupyter notebook. You will finish all blanks and run all the cells. Please export this to a pdf file that contains all results and turn in both the ipynb file and pdf file. Some helpful resources are listed below:
1. First week mentor session video: https://video.gecacademy.cn/?id=b7d464e0-c32f-11e9-8a0d- 15fd33044508&logo= (https://video.gecacademy.cn/?id=b7d464e0-c32f-11e9-8a0d-15fd33044508&logo=)
1. Second week Lecture slide: https://github.com/noise-lab/ML-Networking- Primer/blob/master/1_Regression.ipynb (https://github.com/noise-lab/ML-Networking- Primer/blob/master/1_Regression.ipynb)
1. Sklearn Logistic regression documentation: https://scikit- learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html (https://scikit- learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
1. Sklearn Linear regression documentation: https://scikit- learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html (https://scikit- learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
Note: If you cannot directly export it into pdf file, try file/print prevew and save that into a pdf.
Linear Regression
We will fit simple linear regression on the provided data file. The data samples are represented as row vectors; the first column referes to the input (x-axis) and the second column refers to the output (y-axis). We assume that this dataset was generated from a linear model plus noise, and you need to find the optimal and
to fit the data. Now follow the instructions below and show the result.
1. Load the data
localhost:8888/nbconvert/html/Desktop/8.17/HW1/regressions.ipynb?download=false 1/4
a b + xa = y
b
8/27/2019 regressions
In [1]:
In [2]:
2. Visualize the date
In [4]:
In [10]:
Out[10]: Text(0, 0.5, y)
3. Fit the linear regression
In [6]:
In [7]:
Out[7]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=Fals e)
In [8]:
import numpy as np
data = np.loadtxt(data.txt,dtype=float)
x = data[:,0].reshape(len(data),1)
y = data[:,1].reshape(len(data),1)
import matplotlib.pyplot as plt
plt.plot(x,y)
plt.grid()
plt.xlabel(x)
plt.ylabel(y)
from sklearn import linear_model
linear_regression = linear_model.LinearRegression()
linear_regression.fit(x,y)
best_line_ys = linear_regression.predict(np.array([0,1]).reshape(-1,1))
4. Plot the results along with the points
localhost:8888/nbconvert/html/Desktop/8.17/HW1/regressions.ipynb?download=false 2/4
8/27/2019 regressions
plt.plot(x,y)
plt.plot([0,1],best_line_ys)
plt.title(Linear regression line fitting)
plt.xlabel(x)
plt.ylabel(y)
plt.show()
In [11]:
Logistic Regression
We will fit a logistic regression on the provided data. This dataset is modified from Iris dataset. It contains 2 classes of 50 instances each, where each class refers to a type of the iris plant. Each instance contains 5 attributes, which are sepal length in cn, sepal width in cm, petal length in cm, petal width in cm and class (Iris- setosa, Iris-versicolor). The first 4 attributes are numerical, while the last attribute is categorical. In this task, you are going to use the first 4 attributes to predict the 5th attribute by fitting a logistic regression.
1. Load the data
In [14]:
In [21]:
2. Seperate data into train and test
import numpy as np
data = np.loadtxt(iris_data.txt,usecols = (0,1,2,3), delimiter = ,,dtype=
float)
x = np.array(data)
y = np.loadtxt(iris_data.txt,delimiter = ,,dtype=str)[:,4]
training data should have 70 instances and testing data should have the rest.
localhost:8888/nbconvert/html/Desktop/8.17/HW1/regressions.ipynb?download=false 3/4
8/27/2019 regressions
xtrain = x[:70]
ytrain = y[:70]
xtest = x[70:]
ytest = y[70:]
In [47]:
2. Fit the logistic regression
Fit the regression on our training data.
In [48]:
Out[48]:
logistic_regression = linear_model.LogisticRegression(C=50)
logistic_regression.fit(xtrain, ytrain)
C:UsersbruceAnaconda3libsite-packagessklearnlinear_modellogistic.py:4
32: FutureWarning: Default solver will be changed to lbfgs in 0.22. Specify
a solver to silence this warning.
FutureWarning)
LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
3. Calculate the prediction accuracy
multi_class=warn, n_jobs=None, penalty=l2,
random_state=None, solver=warn, tol=0.0001, verbose=0,
warm_start=False)
Find the accuracy of our fitted regression model on predicting the training and testing data. (hint: sklearn might have helpful functions that you only need one line to get the result for each case. )
In [51]:
Out[51]: 0.9857142857142858 In [49]:
Out[49]: 0.8
4. Try change the number of training and testing data and write down what you find out and explain why.
logistic_regression.score(xtrain,ytrain)
logistic_regression.score(xtest,ytest)
localhost:8888/nbconvert/html/Desktop/8.17/HW1/regressions.ipynb?download=false 4/4
Reviews
There are no reviews yet.