1 Sequential Bayesian Learning
Conjugate prior assures that the posterior distribution has the same functional form as the prior. The posterior is computed and viewed as the prior for the next parameter updating.
This property plays an important role in sequential Bayesian learning.
Dataset:
The file data.csv contains two sequences x = {x1,x2,,x100|0 xi 2} and t = {t1,t2,,t100} which represent the input sequence and the corresponding target sequence, respectively.
Basis Function:
Please apply the sigmoid basis functions = [0,,M1]> of the form
In this exercise, please take the following parameter settings for your basis functions: M = 3, s = 0.6 and with j = 0,,M 1. Please take the data size to be N = 5,10, 30 and
80 for each of the following questions.
Bayesian Learning:
Please compute the mean vector mN and the covariance matrix SN for the posterior distribution p(w|t) = N(w|mN,SN) with the given prior). The precision of likelihood function p(t|w,) or p(t|x,w,) is chosen to be = 1.
Note: You need to train your model by fitting data sequentially, this means that when you have calculated the result of case N = 5, you can use another 5 data points to calculate the result of case N = 10.
- Plot five curves sampled from the parameter posterior distribution and N data points, e.g. (10%)
- Plot the predictive distribution of target value t by showing the mean curve, the region of variance with one standard deviation on both sides of the mean curve and N data points, e.g. (10%)
- Plot the prior distributions by arbitrarily selecting two weights, e.g. (10%)
- Make some discussion on the results of different N in 1, 2 and 3. (10%)
2 Logistic Regression
You are given the dataset [1] of fashion products (Fashion MNIST.zip). This dataset contains 5 classes. There are 64 different images in each class. In this exercise, you need to implement batch GD (batch gradient descent), SGD (stochastic gradient descent), mini-batch SGD and Newton-Raphson algorithms to construct a multiclass logistic regression model with softmax transformation (p(Ck|n) = exp(ank)/Pj exp(anj)) = yk(n) , ynk. The error function is formed by.
Algorithms | Batch size | Iterations in one epoch |
batch GD | N | 1 |
SGD | 1 | N |
mini-batch SGD | B | N/B |
Newton-Raphson | N | 1 |
N = number of training data, B = batch size
Note: You need to normalize the data samples before training and randomly select 32 images as test data for each class.
- Set the initial weight vector wk = [wk1,,wkF] to be a zero vector where F is the number of features and k is the number of classes. Implement batch GD, SGD, mini-batch SGD (batch size = 32) and Newton-Raphson algorithms to construct a multiclass logistic regression. (15%)
- Plot the learning curves of E(w) and the accuracy of classification versus the number of epochs until convergence for training data as well as test data, e.g.
- Show the classification results of training and test data, e.g.
- Use principal component analysis (PCA) to reduce the dimension of images to d = 2,5,10.
(15%)
- Repeat 1 by using PCA to reduce the dimension of images to d.
- Plot d eigenvectors corresponding to top d eigenvalues, e.g.
Left: Olivetti faces dataset [2]; Right: Examples of top 2 eigenvectors
- What do the decision regions and data points look like on the vector space? (15%)
- Plot the decision regions and data points of the images on the span of top 2 eigenvectors by using PCA to reduce the dimension of images to 2.
- Repeat 3(a) by changing the order from M = 1 to M = 2, e.g.
- Make some discussion on the results of 1, 2 and 3. (15%)
Reviews
There are no reviews yet.