# [Solved] BIT project2-least squares regression and nearest neighbor classifiers

Click And Check all our Assignments

10 USD \$

Category: Tag:

task 2.1: least squares regression for missing value prediction Download the file whData.dat, remove the outliers (as in the previous project) and collect the remaining height and weight data in two vectors

x = [x1 x2 … xn]T and y,

respectively.

Use the method of least squares to fit polynomial models

d

y(x) = Xwjxj

j=0

to the data. In particular, fit models for d ∈ {1,5,10} and plot your results. Use each of your resulting models to predict weight values for the outliers (i.e. the data points in whData.dat where the weight value is −1).

note:

There are numerous ways of how this can be done. However, depending on how you implement your solution, you may run into numerical problems (you should recognize them when they occur). In this case, it is not acceptable to give up and simply claim that the method did not work. Rather, you should either double your efforts and try to come up with an implementation that works or provide an in-depth analysis as to why and where your code failed and point out possible solutions.

task 2.2: conditional expectation for missing value prediction Fit a bi-variate Gaussian to the height and weight data in x and y to model the joint density p(x,y) of heights and weights.

Given your fitted bi-variate Gaussian, use the idea of conditional expectation to predict the weight values for the outliers. That is, let xo denote the available height data of an outlier and compute

Do this either analytically as discussed in the lecture or numerically and report your results.

task 2.3: Bayesian regression for missing value prediction Use the method of Bayesian regression to fit a fifth degree polynomial

to the height and weight data in x and y. Assume a Gaussian prior

for the parameter vector w where µ0 = 0 and. Plot you resulting model and compare it to the corresponding model (d = 5) from task 2.1 task 2.4: Boolean functions and the Boolean Fourier transform In the supplementary material for lecture 07, we already briefly discussed elemental cellular automata.

A one-dimensional (or elemental) cellular automaton consists of

• an infinite sequence of cells {xi}iN in two possible states; typically these states are assumed to be 0 or 1, here, however, we consider
• a update rule for how to update a cell based on its and its two neighbors’ current states

At time t = 0, the xi are (randomly) initialized, and, at any (discrete) time step t, all cells are updated simultaneously. Here is an illustrative example where we use +1 = and −1 =

t …… t + 1 ……

This update process continues forever and it is common practice to plot subsequent (finite sized) state sequences below each other to visualize the evolution of an automaton under a certain rule. Here are some examples of possible evolutions starting from the same initial configuration

rule 32 rule 108

rule 126 rule 110

From now on, we remove notational clutter. That is, instead of

we simply write y = f(x) where x ∈ {+1,−1}3 and y ∈ {+1,−1}.

The naming convention for the rules of elemental cellular automata is due to Wolfram (1984) and illustrated in the following tables.

 X y (y + 1)/2 +1 +1 +1 −1 0 +1 +1 −1 +1 1 +1 −1 +1 +1 1 +1 −1 −1 +1 1 −1 +1 +1 −1 0 −1 +1 −1 +1 1 −1 −1 +1 +1 1 −1 −1 −1 −1 0
 X y (y + 1)/2 +1 +1 +1 −1 0 +1 +1 −1 +1 1 +1 −1 +1 +1 1 +1 −1 −1 +1 1 −1 +1 +1 +1 1 −1 +1 −1 +1 1 −1 −1 +1 +1 1 −1 −1 −1 −1 0

rule 110 rule 126

For x ∈ {+1,−1}3, there are 23 = 8 possible input patterns for each rule. For each input pattern, there are 2 possible outputs or target values. Hence, the total number of possible rules is 223 = 256.

Using the language of pattern recognition, we may collect the possible input patterns in an 8 × 3 design matrix X and the target values of each rule in an 8 dimensional target vector y. Converting a rule’s target vector from bipolar to binary yields a byte representation of a number between 0 and 255 and provides the rule’s name.

w= argmin Xw w

and then compute yˆ = Xw

Now, compare y and yˆ. What do you observe?

note:

Don’t be alarmed! What we did here hardly makes sense and was mainly intended to prepare ourselves for what comes next.

Each possible rule y = f(x) governing the behavior of a cellular automaton is an instance of a Boolean function

f : {−1,+1}m → {−1,+1} (1)

where in our case m = 3. In what follows, we let I denote the index set of the elements xi of x, that is

.

The power set 2I of I is the set of all subsets of I. In other words,

and we note that |2I| = 2m.

Every Boolean function f of the form in (1) can be uniquely expressed as a multilinear polynomial

f(x) = X wS Yxi (2)

S∈2I i∈S

which is known as the Boolean Fourier series expansion of f. In contrast to the complex exponentials which form the basis functions for conventional Fourier analysis, here the basis functions are the parity functions

ϕS(x) = Yxi

i∈S

where by definition

ϕ(x) = Yxi = 1.

i∈∅

Given these definitions, we can rewrite (2) as follows

f(x) = X wS ϕS = wTϕ

S∈2I

and recognize that any Boolean function is an inner product between two vectors w ∈ R2m and ϕ ∈ {+1,−1}2m.

Here is your second sub-task: Implement a function ϕ : {+1,−1}m → {+1,−1}2m

such that ϕ = ϕ(x). To be specific, for x = [x1,x2,x3]T ∈ {+1,−1}3, your function should produce

 1 

x1

x2 

x3  ϕ = x1 x2  

 

x1 x3 

 

x2 x3  x1 x2 x3

However, implement it such that is works for arbitrary m ∈ N and not just for m = 3. This will come in handy in the next project.

Here is your third sub-task: For rules 110 and 126, turn the design matrix X into a “feature” design matrix Φ, then determine w= argmin w

and finally compute

yˆ = Φw

Now, compare y and yˆ. What do you observe?

Implement a function that realizes an n-nearest neighbor classifier, i.e. a function that decides the class label of a test data point from the majority of the labels among the n nearest training data.

Download the files data2-train.dat and data2-test.dat of labeled 2D data and determine the recognition accuracy (percentage of correctly classified data points) of your classifier for n ∈ {1,3,5}.

Determine the overall run time for computing the 1-nearest neighbor of every data in data2-test.dat.

Implement a function that computes and plots up to four different kinds of kD-trees of the data in data2-train.dat where k = 2.

Determine overall run time for computing the 1-nearest neighbor of every data in data2-test.dat by means of your kD-tree.

note:

Since scipy provides functions for kD-tree computation, you may use those for your run time evaluations. However, just using these functions is of course too easy. Therefore, implement your own function for building and plotting a tree (you may ignore the problem of searching the tree). Your function for building the tree should allow for choosing how to determine the dimension for splitting and for choosing how to determine the split point. In particular, you should enable the following two methods for selecting the splitting dimension:

1. alternate between the x and the y dimension
2. split the data along the dimension of higher variance

With respect to computing the split point, you should enable the following to ideas:

1. split at the midpoint of the data
2. split at the median of the data

Given the data in data2-train.dat create all four possible kD-trees and plot them. what do you observe?

## Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] BIT project2-least squares regression and nearest neighbor classifiers
10 USD \$