You can write your code in any programming language so long as we are able to test it on SICE servers. We plan to run some or all or submitted code for further testing and validation.

Overview: Bayesian GLM

In this programming project we will be working with Generalized Linear Models as covered in class, including Logistic regression, Poisson regression and Ordinal regression. Your goal is to use one generic implementation for the main algorithm that works for multiple observation likelihoods.

Data

Data for this assignment is provided in a zip file pp3data.zip on Canvas. Each dataset is given in two files with the data in one and the labels in the other file. We will use the datasets A and usps for classification with logistic regression. We will use the datasets AP for count prediction with Poisson regression. We will use the datasets AO for ordinal prediction with ordinal regression.

The datasets A, AP, AO were artificially generated with labels which are not perfectly matched to any linear predictor yet they are generated to be somewhat predictable. The examples in usps represent 16×16 bitmaps of the characters 3 and 5 and are taken from the well known usps dataset (representing data originally used for zip code classification).

Implementing our variant of GLM

In this assignment we will use a Bayesian (or regularized) version of the algorithm with w ∼ ), where α = 10, to calculate the MAP solution w_MAP.

As discussed in class logistic regression (and by extension GLM) relies on a free parameter (w₀) to capture an appropriate separating hyperplane. Therefore, you will need to add a feature fixed at one (also known as an intercept) to all datasets in the assignment. To match the test case below please add this as the first column in the data matrix.
The vector of first derivatives of the log posterior is g = ^∂LogL_∂w= ^P_id_iφ(x_i)−αw = Φ^Td−αw where d is a vector whose elements are d_i.
The matrix of second derivatives of the log posterior is H = ∂w∂w^∂LogL_T= −^P_ir_iφ(x_i)φ(x_i)^T− αI = −Φ^TRΦ − αI where R is a matrix with elements r_ion the diagonal.
The GLM algorithm initializes the weight vector as w = 0 and then repeatedly applies an update with Netwon’s method w ← w − H⁻¹g until w
For this assignment we consider that w has converged if . If w has not converged in 100 iterations we stop and output the last w as our solution.
The final vector, when the algorithm stops is w_MAP. In this assignment we will use w_MAPfor prediction (i.e. we will not calculate a predictive distribution).

Likelihood models

In order to apply the algorithm for any likelihood model and to evaluate its predictions we need to specify 4 items: (1) d_i, (2) r_i, (3) how to compute our prediction tˆ for test example z, and (4) how to calculate the error when we predict tˆand the true label is t.
For the logistic likelihood we have: y_i= σ(w^Tφ(x_i)) and the first derivative term is d_i= t_i−y_ior d = t − y. The second derivative term is r_i= y_i(1 − y_i). For test example z we predict t = 1 iff p(t = 1) = σ(w_MAP^Tφ(z)) ≥ 0. The error is 1 if tˆ6= t.
Note that for the logistic model the update formula as developed in class is w_n₊₁← w_n− (−αI − Φ^TRΦ)⁻¹[Φ^T(t − y) − αw_n]. You might want to start developing your code and testing it with this special case and then generalize it to handle all likelihoods. To help you test your implementation of this algorithm we provide an additional dataset, irlstest, and solution weight vector in irlsw (for α = 0.1). The first entry in irlsw corresponds to w₀.
For the Poisson likelihood we have: y_i= e^(w^Tφ^(xⁱ⁾⁾and the first derivative term is d_i= t_i−y_ior d = t−y. The second derivative term is r_i= y_i. For test example z we have p(t) = Poisson(λ) where a = w_MAP^Tφ(z) and λ = e^a. We predict the mode t = bλ For this assignment we will use the absolute error: err = |tˆ− t|.
For the ordinal model with K levels we have parameters s and φ₀= −∞ < φ₁< … < φ_K₋₁< φ_K= ∞ where for this assignment we will use K = 5, s = 1 and φ₀= −∞ < φ₁= −2 < φ₂= −1 < φ₃= 0 < φ₄= 1 < φ₅= ∞. The model is somewhat sensitive to the setting of hyperparameters so it is important to use these settings.

Here a_i= w^Tφ(x_i) and for potential label j ∈ {1,…K} we have y_i,j= σ(s∗(φ_j−a_i)). Using this notation, for example i with label t_iwe have d_i= y_i,t_i+ y_i,_(t_i−₁₎− 1. For the second

derivative we have ri = s2[yi,ti(1 − yi,ti) + yi,(ti−1)(1 − yi,(ti−1))].

To predict for test example z we first calculate the y values: a = w_MAP^Tφ(z) and for potential label j ∈ {1,…K} we have y_j= σ(s∗(φ_j−a)). We then calculate p_j= y_j−y_j₋₁and select tˆ= argmax_jp_j. For this assignment we will use the absolute error, or the number of levels we are off in prediction, that is, err = |tˆ− t|.

While you could implement these as three separate algorithms, you are expected to provide one implementation of the main optimization which is given access to procedures calculating the 4 items above to make a concrete instance of GLM.

Evaluating the implementation

Your task is to implement the GLM algorithm and generate learning curves with error bars (i.e., ±1σ) as follows.

Repeat 30 times

Step 1) Set aside 1/3 of the total data (randomly selected) to use as a test set.

Step 2) Permute the remaining data and record the test set error rate as a function of increasing training set portion (0.1,0.2, …,1 of the total size).

Calculate the mean and standard deviation for each size and plot the result. In addition record the number of iterations and the run time untill convergence in each run and report their averages.

In your submission provide 4 plots, one for each dataset, and corresponding runtime/iterations statistics, and provide a short discussion of the results. Are the learning curves as expected? how does learning time vary across datasets for classification? and across the likelihood models? what are the main costs affecting these (time per iteration, number of iterations)?

Extra Credit

Explore some approach for model selection for α in all models and/or for s and φ in the ordinal model and report your results. You may want to generate your own data with known parameters in order to test the success of algorithms in identifying good parameters

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] B555 Programming Project 3-Bayesian Generalized Linear Models

Likelihood models

Evaluating the implementation

Extra Credit

Reviews

Related products

[Solved] B555 Written Assignment 1

[Solved] B555 Programming Project 2- Experiments with Bayesian Linear Regression

[Solved] B555 Written Assignment 2

[Solved] B555 Programming Project1 – Sentiment analysis through Naive Bayes

[Solved] B555 Project1- Experiments with Bayesian Linear Regression

[Solved] B555 Written Assignment 3