IntroductionIn this exercise, you will implement linear regression and get to see it work on data. Before starting on this programming exercise, we strongly recommend watching the video lectures and completing the review questions forthe associated topics.To get started with the exercise, you will need to download the starter code and unzip its contents to the directory where you wish to complete the exercise. If needed, use the cd command in Octave/MATLAB to change tothis directory before starting this exercise.You can also find instructions for installing Octave/MATLAB in the Environment Setup Instructions of the course website.Files included in this exerciseex1.m Octave/MATLAB script that steps you through the exerciseex1 multi.m Octave/MATLAB script for the later parts of the exerciseex1data1.txt Dataset for linear regression with one variableex1data2.txt Dataset for linear regression with multiple variablessubmit.m Submission script that sends your solutions to our servers[?] warmUpExercise.m Simple example function in Octave/MATLAB[?] plotData.m Function to display the dataset[?] computeCost.m Function to compute the cost of linear regression[?] gradientDescent.m Function to run gradient descent[] computeCostMulti.m Cost function for multiple variables[] gradientDescentMulti.m Gradient descent for multiple variables[] featureNormalize.m Function to normalize features[] normalEqn.m Function to compute the normal equations? indicates files you will need to complete indicates optional exercises1Throughout the exercise, you will be using the scripts ex1.m and ex1 multi.m.These scripts set up the dataset for the problems and make calls to functionsthat you will write. You do not need to modify either of them. You are onlyrequired to modify functions in other files, by following the instructions inthis assignment.For this programming exercise, you are only required to complete the firstpart of the exercise to implement linear regression with one variable. Thesecond part of the exercise, which is optional, covers linear regression withmultiple variables.Where to get helpThe exercises in this course use Octave1 or MATLAB, a high-level programming language well-suited for numerical computations. If you do not haveOctave or MATLAB installed, please refer to the installation instructions inthe Environment Setup Instructions of the course website.At the Octave/MATLAB command line, typing help followed by a function name displays documentation for a built-in function. For example, helpplot will bring up help information for plotting. Further documentation forOctave functions can be found at the Octave documentation pages. MATLAB documentation can be found at the MATLAB documentation pages.We also strongly encourage using the online Discussions to discuss exercises with other students. However, do not look at any source code writtenby others or share your source code with others.1 Simple Octave/MATLAB functionThe first part of ex1.m gives you practice with Octave/MATLAB syntax andthe homework submission process. In the file warmUpExercise.m, you willfind the outline of an Octave/MATLAB function. Modify it to return a 5 x5 identity matrix by filling in the following code:A = eye(5);1Octave is a free alternative to MATLAB. For the programming exercises, you are freeto use either Octave or MATLAB.2When you are finished, run ex1.m (assuming you are in the correct directory, type ex1 at the Octave/MATLAB prompt) and you should seeoutput similar to the following:ans =Diagonal Matrix1 0 0 0 00 1 0 0 00 0 1 0 00 0 0 1 00 0 0 0 1Now ex1.m will pause until you press any key, and then will run the codefor the next part of the assignment. If you wish to quit, typing ctrl-c willstop the program in the middle of its run.1.1 Submitting SolutionsAfter completing a part of the exercise, you can submit your solutions forgrading by typing submit at the Octave/MATLAB command line. The submission script will prompt you for your login e-mail and submission tokenand ask you which files you want to submit. You can obtain a submissiontoken from the web page for the assignment.You should now submit your solutions.You are allowed to submit your solutions multiple times, and we will takeonly the highest score into consideration.2 Linear regression with one variableIn this part of this exercise, you will implement linear regression with onevariable to predict profits for a food truck. Suppose you are the CEO of arestaurant franchise and are considering different cities for opening a newoutlet. The chain already has trucks in various cities and you have data forprofits and populations from the cities.3You would like to use this data to help you select which city to expandto next.The file ex1data1.txt contains the dataset for our linear regression problem. The first column is the population of a city and the second column isthe profit of a food truck in that city. A negative value for profit indicates aloss.The ex1.m script has already been set up to load this data for you.2.1 Plotting the DataBefore starting on any task, it is often useful to understand the data byvisualizing it. For this dataset, you can use a scatter plot to visualize thedata, since it has only two properties to plot (profit and population). (Manyother problems that you will encounter in real life are multi-dimensional andcant be plotted on a 2-d plot.)In ex1.m, the dataset is loaded from the data file into the variables Xand y:data = load(ex1data1.txt); % read comma separated dataX = data(:, 1); y = data(:, 2);m = length(y); % number of training examplesNext, the script calls the plotData function to create a scatter plot ofthe data. Your job is to complete plotData.m to draw the plot; modify thefile and fill in the following code:plot(x, y, rx, MarkerSize, 10); % Plot the dataylabel(Profit in $10,000s); % Set the yaxis labelxlabel(Population of City in 10,000s); % Set the xaxis labelNow, when you continue to run ex1.m, our end result should look likeFigure 1, with the same red x markers and axis labels.To learn more about the plot command, you can type help plot at theOctave/MATLAB command prompt or to search online for plotting documentation. (To change the markers to red x, we used the option rxtogether with the plot command, i.e., plot(..,[your options here],..,rx); )44 6 8 10 12 14 16 18 20 22 2450510152025Profit in $10,000sPopulation of City in 10,000sFigure 1: Scatter plot of training data2.2 Gradient DescentIn this part, you will fit the linear regression parameters to our datasetusing gradient descent.2.2.1 Update EquationsThe objective of linear regression is to minimize the cost functionJ() = 12mXmi=1
h(x(i)) y(i)2where the hypothesis h(x) is given by the linear modelh(x) = T x = 0 + 11Recall that the parameters of your model are the j values. These arethe values you will adjust to minimize cost J(). One way to do this is touse the batch gradient descent algorithm. In batch gradient descent, eachiteration performs the update
j:= j 1mXmi=1(h(x(i)) y(i))x(i)j(simultaneously update jfor all j).With each step of gradient descent, your parameters j come closer to theoptimal values that will achieve the lowest cost J().Implementation Note: We store each example as a row in the the Xmatrix in Octave/MATLAB. To take into account the intercept term (0),we add an additional first column to X and set it to all ones. This allowsus to treat 0 as simply another feature.2.2.2 ImplementationIn ex1.m, we have already set up the data for linear regression. In thefollowing lines, we add another dimension to our data to accommodate the0 intercept term. We also initialize the initial parameters to 0 and thelearning rate alpha to 0.01.X = [ones(m, 1), data(:,1)]; % Add a column of ones to xtheta = zeros(2, 1); % initialize fitting parametersiterations = 1500;alpha = 0.01;2.2.3 Computing the cost J()As you perform gradient descent to learn minimize the cost function J(),it is helpful to monitor the convergence by computing the cost. In thissection, you will implement a function to calculate J() so you can check theconvergence of your gradient descent implementation.Your next task is to complete the code in the file computeCost.m, whichis a function that computes J(). As you are doing this, remember that thevariables X and y are not scalar values, but matrices whose rows representthe examples from the training set.Once you have completed the function, the next step in ex1.m will runcomputeCost once using initialized to zeros, and you will see the costprinted to the screen.You should expect to see a cost of 32.07.You should now submit your solutions.62.2.4 Gradient descentNext, you will implement gradient descent in the file gradientDescent.m.The loop structure has been written for you, and you only need to supplythe updates to within each iteration.As you program, make sure you understand what you are trying to optimize and what is being updated. Keep in mind that the cost J() is parameterized by the vector , not X and y. That is, we minimize the value of J()by changing the values of the vector , not by changing X or y. Refer to theequations in this handout and to the video lectures if you are uncertain.A good way to verify that gradient descent is working correctly is to lookat the value of J() and check that it is decreasing with each step. Thestarter code for gradientDescent.m calls computeCost on every iterationand prints the cost. Assuming you have implemented gradient descent andcomputeCost correctly, your value of J() should never increase, and shouldconverge to a steady value by the end of the algorithm.After you are finished, ex1.m will use your final parameters to plot thelinear fit. The result should look something like Figure 2:Your final values for will also be used to make predictions on profits inareas of 35,000 and 70,000 people. Note the way that the following lines inex1.m uses matrix multiplication, rather than explicit summation or looping, to calculate the predictions. This is an example of code vectorization inOctave/MATLAB.You should now submit your solutions.predict1 = [1, 3.5] * theta;predict2 = [1, 7] * theta;2.3 DebuggingHere are some things to keep in mind as you implement gradient descent: Octave/MATLAB array indices start from one, not zero. If youre storing 0 and 1 in a vector called theta, the values will be theta(1) andtheta(2). If you are seeing many errors at runtime, inspect your matrix operationsto make sure that youre adding and multiplying matrices of compatible dimensions. Printing the dimensions of variables with the sizecommand will help you debug.74 6 8 10 12 14 16 18 20 22 2450510152025Profit in $10,000sPopulation of City in 10,000sTraining dataLinear regressionFigure 2: Training data with linear regression fit By default, Octave/MATLAB interprets math operators to be matrixoperators. This is a common source of size incompatibility errors. If youdont want matrix multiplication, you need to add the dot notationto specify this to Octave/MATLAB. For example, A*B does a matrixmultiply, while A.*B does an element-wise multiplication.2.4 Visualizing J()To understand the cost function J() better, you will now plot the cost overa 2-dimensional grid of 0 and 1 values. You will not need to code anythingnew for this part, but you should understand how the code you have writtenalready is creating these images.In the next step of ex1.m, there is code set up to calculate J() over agrid of values using the computeCost function that you wrote.8% initialize J vals to a matrix of 0sJ vals = zeros(length(theta0 vals), length(theta1 vals));% Fill out J valsfor i = 1:length(theta0 vals)for j = 1:length(theta1 vals)t = [theta0 vals(i); theta1 vals(j)];J vals(i,j) = computeCost(x, y, t);endendAfter these lines are executed, you will have a 2-D array of J() values.The script ex1.m will then use these values to produce surface and contourplots of J() using the surf and contour commands. The plots should looksomething like Figure 3:1050510101234010020030040050060070080001(a) Surface0110 8 6 4 2 0 2 4 6 8 1010.500.511.522.533.54(b) Contour, showing minimumFigure 3: Cost function J()The purpose of these graphs is to show you that how J() varies withchanges in 0 and 1. The cost function J() is bowl-shaped and has a globalmininum. (This is easier to see in the contour plot than in the 3D surfaceplot). This minimum is the optimal point for 0 and 1, and each step ofgradient descent moves closer to this point.9Optional ExercisesIf you have successfully completed the material above, congratulations! Younow understand linear regression and should able to start using it on yourown datasets.For the rest of this programming exercise, we have included the followingoptional exercises. These exercises will help you gain a deeper understandingof the material, and if you are able to do so, we encourage you to completethem as well.3 Linear regression with multiple variablesIn this part, you will implement linear regression with multiple variables topredict the prices of houses. Suppose you are selling your house and youwant to know what a good market price would be. One way to do this is tofirst collect information on recent houses sold and make a model of housingprices.The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), thesecond column is the number of bedrooms, and the third column is the priceof the house.The ex1 multi.m script has been set up to help you step through thisexercise.3.1 Feature NormalizationThe ex1 multi.m script will start by loading and displaying some valuesfrom this dataset. By looking at the values, note that house sizes are about1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent convergemuch more quickly.Your task here is to complete the code in featureNormalize.m to Subtract the mean value of each feature from the dataset. After subtracting the mean, additionally scale (divide) the feature valuesby their respective standard deviations.10The standard deviation is a way of measuring how much variation there isin the range of values of a particular feature (most data points will lie within2 standard deviations of the mean); this is an alternative to taking the rangeof values (max-min). In Octave/MATLAB, you can use the std function tocompute the standard deviation. For example, inside featureNormalize.m,the quantity X(:,1) contains all the values of x1 (house sizes) in the trainingset, so std(X(:,1)) computes the standard deviation of the house sizes.At the time that featureNormalize.m is called, the extra column of 1scorresponding to x0 = 1 has not yet been added to X (see ex1 multi.m fordetails).You will do this for all the features and your code should work withdatasets of all sizes (any number of features / examples). Note that eachcolumn of the matrix X corresponds to one feature.You should now submit your solutions.Implementation Note: When normalizing the features, it is importantto store the values used for normalization the mean value and the standard deviation used for the computations. After learning the parametersfrom the model, we often want to predict the prices of houses we have notseen before. Given a new x value (living room area and number of bedrooms), we must first normalize x using the mean and standard deviationthat we had previously computed from the training set.3.2 Gradient DescentPreviously, you implemented gradient descent on a univariate regressionproblem. The only difference now is that there is one more feature in thematrix X. The hypothesis function and the batch gradient descent updaterule remain unchanged.You should complete the code in computeCostMulti.m and gradientDescentMulti.mto implement the cost function and gradient descent for linear regression withmultiple variables. If your code in the previous part (single variable) alreadysupports multiple variables, you can use it here too.Make sure your code supports any number of features and is well-vectorized.You can use size(X, 2) to find out how many features are present in thedataset.You should now submit your solutions.11Implementation Note: In the multivariate case, the cost function canalso be written in the following vectorized form:J() = 12m(X ~y)T(X ~y)whereX = (x(1))T (x(2))T … (x(m))T ~y =y(1)y(2)…y(m).The vectorized version is efficient when youre working with numericalcomputing tools like Octave/MATLAB. If you are an expert with matrixoperations, you can prove to yourself that the two forms are equivalent.3.2.1 Optional (ungraded) exercise: Selecting learning ratesIn this part of the exercise, you will get to try out different learning rates forthe dataset and find a learning rate that converges quickly. You can changethe learning rate by modifying ex1 multi.m and changing the part of thecode that sets the learning rate.The next phase in ex1 multi.m will call your gradientDescent.m function and run gradient descent for about 50 iterations at the chosen learningrate. The function should also return the history of J() values in a vectorJ. After the last iteration, the ex1 multi.m script plots the J values againstthe number of the iterations.If you picked a learning rate within a good range, your plot look similarFigure 4. If your graph looks very different, especially if your value of J()increases or even blows up, adjust your learning rate and try again. We recommend trying values of the learning rate on a log-scale, at multiplicativesteps of about 3 times the previous value (i.e., 0.3, 0.1, 0.03, 0.01 and so on).You may also want to adjust the number of iterations you are running if thatwill help you see the overall trend in the curve.12Figure 4: Convergence of gradient descent with an appropriate learning rateImplementation Note: If your learning rate is too large, J() can diverge and blow up, resulting in values which are too large for computercalculations. In these situations, Octave/MATLAB will tend to returnNaNs. NaN stands for not a number and is often caused by undefinedoperations that involve and +.Octave/MATLAB Tip: To compare how different learning learningrates affect convergence, its helpful to plot J for several learning rateson the same figure. In Octave/MATLAB, this can be done by performing gradient descent multiple times with a hold on command betweenplots. Concretely, if youve tried three different values of alpha (you shouldprobably try more values than this) and stored the costs in J1, J2 andJ3, you can use the following commands to plot them on the same figure:plot(1:50, J1(1:50), b);hold on;plot(1:50, J2(1:50), r);plot(1:50, J3(1:50), k);The final arguments b, r, and k specify different colors for theplots.13Notice the changes in the convergence curves as the learning rate changes.With a small learning rate, you should find that gradient descent takes a verylong time to converge to the optimal value. Conversely, with a large learningrate, gradient descent might not converge or might even diverge!Using the best learning rate that you found, run the ex1 multi.m scriptto run gradient descent until convergence to find the final values of . Next,use this value of to predict the price of a house with 1650 square feet and3 bedrooms. You will use value later to check your implementation of thenormal equations. Dont forget to normalize your features when you makethis prediction!You do not need to submit any solutions for these optional (ungraded)exercises.3.3 Normal EquationsIn the lecture videos, you learned that the closed-form solution to linearregression is =
XT X1 XT~y.Using this formula does not require any feature scaling, and you will getan exact solution in one calculation: there is no loop until convergence likein gradient descent.Complete the code in normalEqn.m to use the formula above to calculate . Remember that while you dont need to scale your features, we stillneed to add a column of 1s to the X matrix to have an intercept term (0).The code in ex1.m will add the column of 1s to X for you.You should now submit your solutions.Optional (ungraded) exercise: Now, once you have found using thismethod, use it to make a price prediction for a 1650-square-foot house with3 bedrooms. You should find that gives the same predicted price as the valueyou obtained using the model fit with gradient descent (in Section 3.2.1).1Submission and GradingAfter completing various parts of the assignment, be sure to use the submitfunction system to submit your solutions to our servers. The following is abreakdown of how each part of this exercise is scored.Part Submitted File PointsWarm up exercise warmUpExercise.m 10 pointsCompute cost for one variable computeCost.m 40 pointsGradient descent for one variable gradientDescent.m 50 pointsTotal Points 100 pointsOptional ExercisesPart Submitted File PointsFeature normalization featureNormalize.m 0 pointsCompute cost for multiplevariablescomputeCostMulti.m 0 pointsGradient descent for multiplevariablesgradientDescentMulti.m 0 pointsNormal Equations normalEqn.m 0 pointsYou are allowed to submit your solutions multiple times, and we will takeonly the highest score into consideration.15
Reviews
There are no reviews yet.