1 Linear Regression with Heterogenous Noise
In the standard linear regression model, we consider the model that the observed response variable y is the prediction perturbed by noise, namely
y = x> +
where is a Gaussian random variable with mean 0 and variance 2. Notably, we are assuming that for all observations in the training data, the corresponding noises are identically and independently distributed. In other words, for the n-th observation xn, the observed response is
where n N(0, 2).
This assumption is not applicable in some cases. For example, in the example of predicting the sale prices of houses, the variances for larger houses (e.g., houses with larger xn which is the square footage) tend to be bigger, as the sale prices for larger houses seem to be more variable. In this case, we can model the data in the following way:
where n are independently distributed but do not have to be identically distributed. In particular, each one could have a dierent variance, namely, n N(0, n2).
- Suppose our training dataset contains {(xn,yn),n = 1,2,,N} such observations. Write down the log-likelihood function of the data. This function should be a function of the data as well as and all n.
- Derive the maximum likelihood estimate of , and express it in terms of the data as well as all the n.
You should assume n is known to you you do not need to estimate them from the data.
2 Linear Regression with Smooth Coe cients
Consider a dataset with n data points (xi,yi), xi 2Rp1, drawn from the following linear model:
y = x> + ,
where is a Gaussian noise. Suppose the features xi1,,xip for all i = 1,,n have a natural ordering. Several examples have this ordering property; for example in the study of the impact of proteins on certain types of cancer, the proteins are ordered sequentially on a line. Intuitively, we can encode the natural ordering information by introducing a condition that requires the dierence ( i i+1)2 cannot be large, for i = 1,,p 1.
1
- State the condition as a regularizer. Write the new optimization problem for finding by combining both this regularization and L2 (10 points)
- Find the optimal by solving the problem in part (a). (5 points)
3 Linearly Constrained Linear Regression
Consider a dataset with n data points (xi,yi), xi 2Rp1, drawn from the following linear model:
y = x> + ,
where is Gaussian noise. Suppose we have additional information about that requires A = b where A 2 Rqp and b 2 Rq1. Suppose the constraint A = b has a non-empty set of solutions; thus the optimization has feasible solutions. Find the maximum likelihood estimation of under this constraint.
4 Online Learning
The perceptron algorithm often makes harsh updates, as it is strongly biased towards the current mistakenlylabeled sample. Suppose at the ith step, the classifier is wi and we want to make a more conservative update based on observation of (xi,yi) to a new classifier wi+1. Derive a new update method for the perceptron such that it makes the smallest dierence from the previous model, that is, it minimizes kwi+1 wik2 while ensuring that wi+1 classifies the current sample correctly. You need to provide the closed form analytical equation for the update rule.
Reviews
There are no reviews yet.