5/5 - (1 vote)

project 1: warm up

task 1.1: acquaint yourself with python for pattern recognition First of all, download the file

whExample.py

from the Google site that accompanies the lecture; second of all, retrieve the file

whData.dat

and run the above script. It should plot sizes vs. weights of students from an earlier course on pattern recognition^[1].

Note that two students did not want to disclose their weight. This provides us with an opportunity to deal with missing data. In the plot produced by the script, these data points appear with negative weights (they are outliers).

For starters, try to figure out a way of plotting the data without the outliers, that is, figure out how to plot only those data for which both measurements are positive.

task 1.2: fitting a Normal distribution to 1D data

Now consider only the data on body sizes, i.e. consider the 1D array of data containing the size information.

Compute the mean and standard deviation of this sample, plot the data and a normal distribution characterizing its density. Your result should resemble the following figure:

task 1.3: fitting a Weibull distribution to 1D data Download the file

myspace.csv

which contains Google Trends data that indicate how global interest in the query term myspace evolved over time. Read the data in the second column of the file and remove leading zeros! Store the remaining entries in an array h = [h₁,h₂,,h_n] and create an array x = [1,2,,n].

Now, fit a Weibull distribution to the histogram h(x). The probability density function of the Weibull distribution is given by

where and are a shape and scale parameter, respectively. Unlike in the case of the Normal distribution, maximum likelihood estimation of the parameters of the Weibull distribution is more involved.

Given a data sample, the log-likelihood for the parameters of

the Weibull distribution is

Deriving L with respect to and leads to a coupled system of partial differential equations for which there is no closed form solution. Therefore, resort to Newtons method for simultaneous equations and compute

where the entries of the gradient vector and the Hessian matrix amount to

note:

You are given a histogram h(x) where h_jcounts the number of observations of a value x_j. The MLE procedure outlined above assumes that you are given individual observations d_i. That is, it assumes as input h₁times a value of x₁, h₂times a value of x₂, etc. In other words, d₁= x₁,d₂=

x1,dh1 = x1,,dh1+1 = x2,,dh1+h2 = x2,

That is, you have to turn the histogram into a (large) set of numbers for the procedure to work. But there also is a more elegant solution, can see and implement it?

Be ambitious! Analyze why the above approach is unnecessarily cumbersome and come up with a much faster approach.

If you initialize = 1 and = 1 and run the estimation procedure for about 20 iterations, then which values do you obtain for and ?

When you plot the histogram and a correspondingly scaled version of the fitted distribution, your result should resemble this figure:

Google data

Weibull fit

note:

If you want to impress your professor, then additionally figure out how to use the function scipy.integrate.odeint to solve this task. Again, be ambitious and search the Web for tutorials . . .

task 1.4: drawing unit circles

In the lecture we discussed L_pnorms for R^mand saw that, for different p, the corresponding unit spheres may look different. For instance, the following examples show unit circles in R²:

Consider the L_pnorm for and plot the corresponding R²unit circle.

Is this really a norm or not? Discuss why or why not!

task 1.5: estimating the dimension of fractal objects in an image In the lecture we discussed the use of least squares for linear regression; here we consider a neat practical application of this technique. Box counting is a method to determine the fractal dimension of an object in an image. For simplicity, let us focus on square images whose width and height (in pixels) are an integer power of two, for instance

w = h = 2^L= 512 L = 9

Given such an image, the procedure involves three main steps:

apply an appropriate binarization procedure to create a binary image in which foreground pixels are set to 1 and background pixels to 0

specify a set S of scaling factors 0 < s_i< 1, for instance

and, for each s_i S, cover the binary image with boxes of size s_iw s_ih and count the number n_iof boxes which contain at least one foreground pixel

plot logn_iagainst and fit a line

to this data; the resulting estimate for D is the fractal dimension we are after.

In other words, the problem of estimating D is a simple linear regression problem that can of course be tackled using least squares.

Now, implement the box counting method and run it on the two test images tree-2.png and lightning-3.png.

What fractal dimensions do you obtain? Which object has the higher one, the tree or the lightning bolt?

note:

If you use the following snippet

import numpy as np import scipy.misc as msc import scipy.ndimage as imgdef foreground2BinImg(f):d = img.filters.gaussian_filter(f, sigma=0.50, mode=reflect) img.filters.gaussian_filter(f, sigma=1.00, mode=reflect)d = np.abs(d) m = d.max() d[d< 0.1*m] = 0 d[d>=0.1*m] = 1return img.morphology.binary_closing(d)imgName = lightning-3f = msc.imread(imgName+.png, flatten=True).astype(np.float) g = foreground2BinImg(f)

to read and binarize an image, then the outcome of the box counting procedure should be deterministic. I.e. every team using this snippet should obtain the same results and these results should be the same as those your professor got. Teams getting different results can rest assured they made a mistake. Try not to be one of these teams

[1] Depending on what version of python you are using the script may or may not work properly. If it does not work properly, see this as a learning opportunity and fix the errors yourself.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] BIT project1-Warm up

whExample.py

whData.dat

myspace.csv

Reviews

Whatsapp Us

[Solved] BIT project1-Warm up

whExample.py

whData.dat

myspace.csv

Reviews

Related products

[Solved] BIT project3-clustering, dimensionality reduction, and non-monotonous neurons solution(s)

[Solved] BIT project2-least squares regression and nearest neighbor classifiers