Useful Formulas
Linear Regression
Pseudoinverse of matrix : p=(T )1 T Questions
What is the role of basis functions in linear regression?
Copyright By Assignmentchef assignmentchef
Basis functions allow us to represent a non-linear function of the input variables with a function which is linear in the weights.
Can an algorithm doing linear regression learn only linear functions of the inputs?
No, the learned function is linear in the weights but does not need to be linear in the input variables.
When can we solve the linear regression problem exactly (with 0 error)? Why is it not a good idea to do so?
When the number of parameters is the same as the number of points in the data set. Normally, we want much fewer parameters than data points.
What is the error we want to minimize when doing linear regression?
1NT2 The sum of squares error: E(X)=2(i wti)
, where X is the dataset, i is the vector of the basis functions evaluated on point i, ti is the desired value for point i (target),
i=1 and w is the vector of weights to optimise.
What is the least-squares solution? How is it affected by outliers?
The least-squares solution uses the pseudo-inverse of the matrix of the basis functions evaluated on the data set, and is defined as: w=(T )1T t where
p=(T )1 T is the pseudo-inverse of (the full derivation is in the slides). Since the least-squares solution minimises the average error on all the points, outliers affect the average strongly by pushing it towards their value.
How can we find the least-squares solution when there are too many points to compute the pseudoinverse efficiently?
We can perform stochastic gradient descent on the error point by point, updating the current
weight vector according to: wk+1=wkEi=wk(Ti wkti)n (full derivation in the slides).
What are the bias and the variance for a supervised learning problem?
The bias is an error of the regression (or the classifier) which on average converges towards something away from the desired value. The variance is the dependency of the model on the data set, so that with different training sets we get different regressed functions (or classifiers). The variations of the different functions (or classifiers) is captured by the variance.
What is the link between the error on the validation set increasing with training, and the bias/variance decomposition?
The bias/variance decomposition shows us that the expected error has three components: the bias, the variance, and the noise in the data. The noise is an intrinsic property of the data set and training does nothing about it. On the other hand, training decreases the bias, making the average estimate increasingly correct. The total expected error, however, does not change, therefore the reduction of the bias has to happen at the expense of something else: the variance. Therefore, the model becomes increasingly dependent on the particular data points used for training (which increases the variance) and loses generalization.
Given the dataset: <-1, -0.5>, <0,1.1>, <1,3.8>, <2,8.8>, find the least-squares solution for the function: y(x ,w)=w0+w1 x
First, we need to create the matrix of the coefficients for the linear system. The first column of the matrix is the value of the first basis function on the points. The first basis function,
w0 is the constant 1. The second basis function is the
Then, we need to compute the pseudo inverse of :
1 1 T=[1 1 1 1]1 0 =[4 2]
that is what is multiplied by function x:
1 0 1 2 1 1 2 6 []
12 (T)1=[0.3 0.1]
0.1 0.2 and lastly:
p=[0.4 0.3 0.2 0.1] . 0.3 0.1 0.1 0.3
We can now use the psudo inverse to compute the optimal vector of weights:
0.5 w=pt=[0.4 0.3 0.2 0.1][1.1]=[1.77],
0.3 0.1 0.1 0.3 3.8 3.06 8.8
where the vector t is the vector of the values of the function over the points in the dataset (the last element of each vector in the dataset).
10. Given the dataset: <-1, 0.78>, <0,1>, <1,1.22>, <2,1.52>, find the least-squares solution for (x+1)2
y(x,w)=w0+w1e 20
All the steps are illustrated before, here I will just compute the final vectors for your
reference:
0.78 w=pt=[1.52 1.22 0.19 1.93][1 ]=[0.30]
1.05 0.80 0.05 1.81 1.22 1.18 1.52
11. Given the dataset: <-1, 1.6>, <0,0.95>, <1,1.2>, <2,1.9>, find the least-squares solution for
the function: y(x,w)=w +w 0
1 1+e(x+1)
0.48 0.49 0.94]0.95 =[1.22]
0.29 0.97 1.56 [1.2 ] 0.28 1.9
w= t=[1.96 p 2.23
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.