task 2.1: least squares regression for missing value prediction Download the file whData.dat, remove the outliers (as in the previous project) and collect the remaining height and weight data in two vectors
x = [x_{1 }x_{2 }… x_{n}]^{T }and y,
respectively.
Use the method of least squares to fit polynomial models
d
y(x) = ^{X}w_{j}x^{j}
j=0
to the data. In particular, fit models for d ∈ {1,5,10} and plot your results. Use each of your resulting models to predict weight values for the outliers (i.e. the data points in whData.dat where the weight value is −1).
note:
There are numerous ways of how this can be done. However, depending on how you implement your solution, you may run into numerical problems (you should recognize them when they occur). In this case, it is not acceptable to give up and simply claim that the method did not work. Rather, you should either double your efforts and try to come up with an implementation that works or provide an indepth analysis as to why and where your code failed and point out possible solutions.
task 2.2: conditional expectation for missing value prediction Fit a bivariate Gaussian to the height and weight data in x and y to model the joint density p(x,y) of heights and weights.
Given your fitted bivariate Gaussian, use the idea of conditional expectation to predict the weight values for the outliers. That is, let x_{o }denote the available height data of an outlier and compute
Do this either analytically as discussed in the lecture or numerically and report your results.
task 2.3: Bayesian regression for missing value prediction Use the method of Bayesian regression to fit a fifth degree polynomial
to the height and weight data in x and y. Assume a Gaussian prior
for the parameter vector w where µ_{0 }= 0 and. Plot you resulting model and compare it to the corresponding model (d = 5) from task 2.1 task 2.4: Boolean functions and the Boolean Fourier transform In the supplementary material for lecture 07, we already briefly discussed elemental cellular automata.
A onedimensional (or elemental) cellular automaton consists of
 an infinite sequence of cells {x_{i}}_{i}_{∈}N in two possible states; typically these states are assumed to be 0 or 1, here, however, we consider
 a update rule for how to update a cell based on its and its two neighbors’ current states
At time t = 0, the x_{i }are (randomly) initialized, and, at any (discrete) time step t, all cells are updated simultaneously. Here is an illustrative example where we use +1 = and −1 =
t …… t + 1 ……
This update process continues forever and it is common practice to plot subsequent (finite sized) state sequences below each other to visualize the evolution of an automaton under a certain rule. Here are some examples of possible evolutions starting from the same initial configuration
rule 32 rule 108
rule 126 rule 110
From now on, we remove notational clutter. That is, instead of
we simply write y = f(x) where x ∈ {+1,−1}^{3 }and y ∈ {+1,−1}.
The naming convention for the rules of elemental cellular automata is due to Wolfram (1984) and illustrated in the following tables.


rule 110 rule 126
For x ∈ {+1,−1}^{3}, there are 2^{3 }= 8 possible input patterns for each rule. For each input pattern, there are 2 possible outputs or target values. Hence, the total number of possible rules is 2^{2}^{3 }= 256.
Using the language of pattern recognition, we may collect the possible input patterns in an 8 × 3 design matrix X and the target values of each rule in an 8 dimensional target vector y. Converting a rule’s target vector from bipolar to binary yields a byte representation of a number between 0 and 255 and provides the rule’s name.
Here is your first subtask: For rules 110 and 126, determine
w^{∗ }= argmin _{ }Xw w
and then compute yˆ = Xw^{∗}
Now, compare y and yˆ. What do you observe?
note:
Don’t be alarmed! What we did here hardly makes sense and was mainly intended to prepare ourselves for what comes next.
Each possible rule y = f(x) governing the behavior of a cellular automaton is an instance of a Boolean function
f : {−1,+1}^{m }→ {−1,+1} (1)
where in our case m = 3. In what follows, we let I denote the index set of the elements x_{i }of x, that is
.
The power set 2^{I }of I is the set of all subsets of I. In other words,
and we note that 2^{I} = 2^{m}.
Every Boolean function f of the form in (1) can be uniquely expressed as a multilinear polynomial
f(x) = ^{X }w_{S }^{Y}x_{i }(2)
S∈2^{I }i∈S
which is known as the Boolean Fourier series expansion of f. In contrast to the complex exponentials which form the basis functions for conventional Fourier analysis, here the basis functions are the parity functions
ϕ_{S}(x) = ^{Y}x_{i}
i∈S
where by definition
ϕ_{∅}(x) = ^{Y}x_{i }= 1.
i∈∅
Given these definitions, we can rewrite (2) as follows
f(x) = ^{X }w_{S }ϕ_{S }= w^{T}ϕ
S∈2^{I}
and recognize that any Boolean function is an inner product between two vectors w ∈ R^{2}^{m }and ϕ ∈ {+1,−1}^{2}^{m}.
Here is your second subtask: Implement a function ϕ : {+1,−1}^{m }→ {+1,−1}^{2}^{m}
such that ϕ = ϕ(x). To be specific, for x = [x_{1},x_{2},x_{3}]^{T }∈ {+1,−1}^{3}, your function should produce
1
x_{1 }
x_{2 }
x_{3 } ϕ = _{ }x1 x2
x1 x3
x2 x3 x1 x2 x3
However, implement it such that is works for arbitrary m ∈ N and not just for m = 3. This will come in handy in the next project.
Here is your third subtask: For rules 110 and 126, turn the design matrix X into a “feature” design matrix Φ, then determine w^{∗ }= argmin w
and finally compute
yˆ = Φw^{∗}
Now, compare y and yˆ. What do you observe?
task 2.4: nearest neighbor classifier
Implement a function that realizes an nnearest neighbor classifier, i.e. a function that decides the class label of a test data point from the majority of the labels among the n nearest training data.
Download the files data2train.dat and data2test.dat of labeled 2D data and determine the recognition accuracy (percentage of correctly classified data points) of your classifier for n ∈ {1,3,5}.
Determine the overall run time for computing the 1nearest neighbor of every data in data2test.dat.
task 2.5: computing a kDtree
Implement a function that computes and plots up to four different kinds of kDtrees of the data in data2train.dat where k = 2.
Determine overall run time for computing the 1nearest neighbor of every data in data2test.dat by means of your kDtree.
note:
Since scipy provides functions for kDtree computation, you may use those for your run time evaluations. However, just using these functions is of course too easy. Therefore, implement your own function for building and plotting a tree (you may ignore the problem of searching the tree). Your function for building the tree should allow for choosing how to determine the dimension for splitting and for choosing how to determine the split point. In particular, you should enable the following two methods for selecting the splitting dimension:
 alternate between the x and the y dimension
 split the data along the dimension of higher variance
With respect to computing the split point, you should enable the following to ideas:
 split at the midpoint of the data
 split at the median of the data
Given the data in data2train.dat create all four possible kDtrees and plot them. what do you observe?
Reviews
There are no reviews yet.