5/5 - (1 vote)

Recall that we say that a function f is convex if the domain of f is a convex set and

f(x + (1 )y) f(x) + (1 )f(y), for all x,y in the domain of f, 0 1.

And strictly convex if

f(x + (1 )y) < f(x) + (1 )f(y), for all x =6 y in the domain of f, 0 < < 1.

Prove the following assertions.

The affine function f(x) = ax + b is convex, where a,b and x are scalars.
If multiple functions f_n(x) are convex over a fixed domain, then their sum g(x) = ^P_nf_n(x) is convex over the same domain.
Take f,g : R R to be convex functions and g to be increasing. Then g f is also convex.

Note: A function g is increasing if a b g(a) g(b). An example of a convex and increasing function is exp(x),x R.

If f : R R is convex, then g : R^D R, where g(x) := f(w^>x + b), is also convex. Here, w is a constant vector in R^D, b is a constant in R and x R^D.
Let f : R^D R be strictly convex. Let x^? R^Dbe a global minimizer of f. Show that this global minimizer is unique. Hint: Do a proof by contradiction.

1 Extension of Logistic Regression to Multi-Class Classification

Suppose we have a classification dataset with N data example pairs {x_n,y_n}, n [1,N], and y_nis a categorical variable over K categories, y_n {1,2,,K}. We wish to fit a linear model in a similar spirit to logistic regression, and we will use the softmax function to link the linear inputs to the categorical output, instead of the logistic function.

We will have K sets of parameters w_k, and define and compute the probability distribution of the output as follows,

Note that we indeed have a probability distribution, as. To make the model identifiable, we will fix w_Kto 0, which means we have K1 sets of parameters to learn. As in logistic regression, we will assume that each y_nis i.i.d., i.e.,

Derive the log-likelihood for this model.
Derive the gradient with respect to each w_k.
Show that the negative of the log-likelihood is convex with respect to w_k.

2 Mixture of Linear Regression

In Project-I, you worked on a regression dataset with two or more distinct clusters. For such datasets, a mixture of linear regression models is preferred over just one linear regression model.

Consider a regression dataset with N pairs {y_n,x_n}. Similar to Gaussian mixture model (GMM), let r_n {1,2,,K} index the mixture component. Distribution of the output y_nunder the k^thlinear model is defined as follows:

Here, w_kis the regression parameter vector for the k^thmodel with w being a vector containing all w_k. Also, x.

Define r_nto be a binary vector of length K such that all the entries are 0 except a k^thentry, i.e., r_nk= 1, implying that x_nis assigned to the k^th Rewrite the likelihood p(y_n|x_n,w,r_n) in terms of r_nk.
Write the expression for the joint distribution p(y|X,w,r) where r is the set of all r₁,r₂,,r_N.
Assume that r_nfollows a multinomial distribution p(r_n= k|) = _k, with = [₁,₂,,_K]. Derive the marginal distribution p(y_n|x_n,w,) obtained after marginalizing r_n
Write the expression for the maximum likelihood estimator L(w,) := logp(y|X,w,) in terms of data y and X, and parameters w and .
Is L jointly-convex with respect to w and ? Is the model identifiable? Prove your answers.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CS433 Exercise 6- 1 Convexity

1 Extension of Logistic Regression to Multi-Class Classification

2 Mixture of Linear Regression

Reviews

Whatsapp Us

[Solved] CS433 Exercise 6- 1 Convexity

1 Extension of Logistic Regression to Multi-Class Classification

2 Mixture of Linear Regression

Reviews

Related products

[Solved] CS433 Exercise 4-Cross-Validation and Bias-Variance decomposition

[Solved] CS433 Exercise 2-Linear Regression and Gradient Descent

[Solved] CS433-Homework 1- Tweet analysis with MapReduce

[Solved] CS433 Project 1

[Solved] CS433 Exercise 1-Efficient Python/NumPy Programming

[Solved] CS433 Exercise 7-Support Vector Machine (SVM) using SGD and coordinate descent