[SOLVED] algorithm PDF document created by PDFfiller

$25

File Name: algorithm_PDF_document_created_by_PDFfiller.zip
File Size: 405.06 KB

5/5 - (1 vote)

PDF document created by PDFfiller

Problem 1

Figure 1

Set up:

Consider the data set NormalMix.csv and its histogram displayed in Figure 2. The above

histogram shows a clear bimodal shape in the distribution of X. One way to model a

distribution of this type is to use a mixture of two probability distributions. Here we assume

that our data set NormalMix.csv is a random variable governed by the probability density

f(x), defined by

f(x) = f(x;1, 1, 2, 2, )

= f1(x;1, 1) + (1 )f2(x;2, 2)

=
1

22
1

exp
1

22
1

(x 1)
2 + (1 )

1

22
2

exp
1

22
2

(x 2)
2,

where < x < and the parameter space is defined by < 1, 2 < , 1, 2 > 0,

and 0 1. The mixture parameter governs how much mass gets placed on the first

distribution f(x;1, 1) and the complement of governs how much mass gets placed on

the other distribution f2(x;2, 2).

2

In our setting, we have n = 10, 000 sampled observations but we do not know how many

males and females were sampled. Assume that the distribution of males is governed by

f1(x;1, 1) =
1

22
1

exp
1

22
1

(x 1)
2, < x < ,and the distribution of females is governed byf2(x;2, 2) =1222exp1222(x 2)2, < x < .Our goal is to use a maximum likelihood approach to estimate parameters 1, 2, 1, 2, .Using these estimated parameters, we can answer questions about the individual populationsand what percentage of males and females contribute the distribution of X.Perform the following tasksi. Set up the log-likelihood function (1, 2, 1, 2, ; x1, x2, . . . , xn). Note that this func-tion will not simplify very much.ii. Run the following R code.NormalMix <- read.csv(“NormalMix.csv”)[,-1]hist(NormalMix,breaks=20,xlab=”x”,probability = T)iii. Define the negative log-likelihood function in R using the data set NormalMix.csv.Evaluate the negative log-likelihood function at the point 1 = 4, 1 = 2, 2 = 8, 2 =2, = .5.iv. Compute the maximum likelihood estimates in R using the nlm() function.v. Approximately what percentage of males and females contribute to the distribution ofX based on our data set?Hint: In Homework 2 when computing the MLE, you only had to optimize with respectto 1 parameter. In this exam problem, you have to optimize with respect to 5 parameters.There is another MLE example using two parameters posted on Canvas. The file is namedgammaMLE.Problem 2Consider the kNNData.csv data set posted under the midterm module. The goal of thisexercise to apply a classification model using the basic kNN algorithm. The response variableClass is a categorical variable with three levels: Group1,Group2,Group3. We will build akNN classification model from the training data kNNData.train and validate the trainedmodel using the test data kNNData.test.3Perform the following tasks2i. Run the following code so that everyone in the class has the same training data setand test data set.kNNData <- read.csv(“kNNData.csv”)[,c(“X1″,”X2″,”Class”)]set.seed(2)test.index <- sample(1:nrow(kNNData),100,replace=F)kNNData.test <- kNNData[test.index, ]kNNData.train <- kNNData[-test.index, ]2ii. Run the following code to gain a visual representation of how the response behaves fordifferent values of the features X1 and X2.library(ggplot2)ggplot(data=kNNData.train)+geom_point(mapping=aes(x=X1,y=X2,col=Class))+labs(title=”kNN Classification”)2iii. Modify the KNN.decision() function from class so that it can be applied to the thekNNData data frame. Using K = 5, test your function at the query points (X1test =0,X2test = 10) and (X1test = 0,X2test = 5).2iv. Compute the prediction error for K = 5.2v. Compute the prediction error for K = 1, 2, 3, . . . , 200. Create a plot of the predictionerror verses K. Note that you could also plot the prediction error verses 1/K so thatthe plot is consistent with the text but this is not required.2vi. Based on the plot from Part 2iii, what range of values would you choose for the tuningparameter K? Why did you pick this range?4Problem 3Recall the Weather data set from the PCA applications covered during lecture. This dataset is named Daily1995.csv. Note that this is a high dimensional PCA example.Figure 2Perform the following tasks3i. How many principal components do we require to explain 95% of the variance capturedby this data set? To receive full credit, validate your claim with the appropriate plot.3ii. Construct the yearly weather for Flagstaff using the minimum number of PCs thatexplain 95% of the datas variation. Plot this constructed case with the actual datafor Flagstaff . Make sure to label your plots appropriately.5

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] algorithm PDF document created by PDFfiller
$25