Name: [SOLVED] case study PowerPoint Presentation
Brand: Assignment Chef
SKU: 6387853416
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

PowerPoint Presentation

Lecture 2: Linear Regression
Instructor: Xiaobai Liu

Outline of This Lecture
Linear Regression (With One Feature)
Prediction Function
Cost Function
Optimization(aka: Learning or training)
Linear Regression With Multiple Features
Case Study
Regularized Regression (Ridge, Lasso and Elastic Net)
Best Practices

Outline of This Lecture
Linear Regression (With One Feature)
Prediction Function
Cost Function
Optimization(Learning/Training)
Linear Regression With Multiple Features
Case Study
Regularized Regression (Ridge, Lasso and Elastic Net)
Best Practices

Regression: single feature
Housing Price

Supervised Learning:trading history available
Regression: predict real-value price

Price (K$)
500
1000
1500
2000
2500
3000
100
300
200
Size (feet2)

Training Set (92115)
Size (feet2)Price (K$)
856399.5
1512449
865350
1044345

Notations

, input features(i.e. size)
output results (i.e. price)

Linear Regression: Prediction Functions
Goal: learn a function y=f(x) that maps from x to y
linear function in liner regression
e.g. Size of House housing price
Formally,

x
y
(, ): parameters to learn (intercept and slope)

Quiz

E.g. (=50, ),whats the price of a house of 895 square feet?

(268550$)

Illustration of linear Regression Models
Parameters: (, )
Price (K$)
500
1000
1500
2000
2500
3000
100
300
200

How to learn parameters?
Idea: choose parametersso that close to y for five training samples (x,y)
Price (K$)
500
1000
1500
2000
2500
3000
100
300
200
Size (feet2)

Which one is the best?

Linear Regression: Cost function
Choose parameters so that f(x) is close to y for all the five training samples (x, y)

Minimize
Residual Sum of Squares (RSS)
Least Square Loss

Note that:
For fixed parameters, there is a function of x

For a set of (x,y), there is a function of parameters

Residual of Cost Function
Indicate if training procedure has converged;
Used to estimate confidences of outputs
Price (K$)
500
1000
1500
2000
2500
3000
100
300
200
Size (feet2)

How to optimize cost function
Minimize functionw.r.t

Iterative method
Start with some initials
Keep changingto reduce
Stop while certain conditions satisfied

A Computer-based solution
Iterative method
Start with some initials
Randomly generate new values for, keep it ifis the minimal.
Stop while certain conditions satisfied

Pros:
Being easy to implement

Cons:
Difficult to converge
Being worse while solving a highly complicated loss function

A smart solution

Review: Quadratic Functions

Zero, one, or two real roots.
One extreme, called the vertex.
No inflection points.
Line symmetry through the vertex. (Axis of symmetry.)
Rises or falls at both ends.
Can be constructed from three non-colinear points or three pieces of information.
One fundamental shape.
Roots are solvable by radicals. (Quadratic Formula.)

Review: Quadratic Functions

Review: gradient at a point
Quiz: How to tell whether the gradient at a pointis negative or positive?

Iterative methods
To minimize we use the iterative method
Start with some initials
Keep changingto reduce
Stop while certain conditions are satisfied

minimum

Iterative methods
To minimize we use the iterative method
Start with some initials
Keep changingto reduce
Stop while certain conditions satisfied

Question:how to change ?

Iterative methods
Left:should be positive
Right: negative

Question:how to s
= -()

Gradient based methods?
20

Solution: gradient based method
To solve
Initialize
Repeat until convergence

To solve
Initialize
Repeat until convergence

Review: First-order Derivative
Y=ax+b

Y=-ax^2+b

Y=-log x

Y=exp(-ax)

Y=ax+by+c

Derivatives:
A
-2ax
-1/x
-aExp(-ax)
a or b

Method : Gradient Descent

Initialize
Repeat until convergence

Gradient Descent for RSS
Cost Function

What are the Derivatives of F() w.r.t.and ?
=
=

Gradient Descent for RSS
Repeat until Convergence{

}

At each iteration, updateand

Understanding GD

Prediction
True Label
Feature

Understanding GD
Learning Rateshould be empirically set
Too small, slow convergence
Too large,fail to converge or diverge

fixedover time
Adaptive steps

Understanding GD
Batch Gradient Descent
At each step , access all the training samples

How to improve convergence
Access more training samples

Learning rate
Smaller or bigger

Better initialization: Using Closed form solution
Normal equation

Variants of GD
Access all training samples at each iteration
Full Batch

Access a portion of training samples at each iteration
Mini Batch

Access a single training sample at each iteration
Online Learning

Recap
Prediction Function: Linear Function

Cost Function: Residual Sum of Square

Optimization: Gradient DescentMethod

Testing: Measurement Error
Once a regression model trained, apply the prediction function over each testing sample and compare its predictionto the true label
Both labels are real-valued
L2 error:
L1 error:
With multiple testing samples, report both Mean and Std
Popular metric:coefficient of determination,
A measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model

Regression Measure:
Letdenote the true label of sample i, is the mean of the labels in testing dataset,the predicted label of sample i.
We have

where

indicates that the predicted labels exactly match the true labels;
indicates that the model already predicts
indicates that 49% of the variability of the dependent variables (true labels)are accounted for

Regression: multiple feature
Example: Housing Price
SizeBedroomBathroomBuilt yearStoriesPrice (K$)
10243219781375
13293219921425
18934219802465

+
Single feature
Multiple feature

Review: Linear Algebra
Matrix/Vector
Addition and Scalar Multiplication
Matrix-vector/matrix-matrix multiplication
A times B is not equal B times A (no commutative)
A * B *C =A*(B*C)
Identity matrix
Inverse and Transpose

Notations
Let

then we have

Letrepresent the i-th training sample,semantic label.

Gradient Descent for Multiple Features

Gradient Descent
Repeat{

}

Cost Function

Gradient Descent for Multiple Features
Gradient Descent
Repeat{

}

Update simultaneously

Dealing Qualitative features
Some predictors/features are not quantitative but are qualitative, taking a discrete set of values
Categorical predictors
Factor variable
E.g. house type, short sale, gender, student, status
Consider a feature, house type:

if single family
otherwise

Conti.
Resulting Model is:

if single family
otherwise
Additional Dummy Variables

if single family

Cast Study: Linear Regression
Project 1:Synthetic DataSet

Project 2: Housing Dataset

Project 1: Synthetic Dataset
A single input feature and a continuous target variable (or output)

Feature
Output