** **

Submit a paper containing the solutons to each of the exercises found below. The paper should be divided into two sections: Solutions, and Appendix. Each solution should be clearly presented and accompanied by narrative when necessary. Source code should *not *be included in the Solutions section unless explicitly requested. However, all supporting source code should be included in the appendix. Both Solutions and Appendix should be divided into sections, titled EXERCISE 1, EXERCISE 2, etc.. **Important: **credit for an exercise will not be awarded if there is insufficient supporting source code in the appendix.

**Exercises**

- Review and download the abalone data set at

http://archive.ics.uci.edu/ml/datasets/Abalone?pagewanted=all

Use the e1071 library’s svm function on the data set with at least 20 different combinations of polynomial degree and *C *cost. For each combination perform the following: i) 10-fold cross validation, and ii) training accuracy from training over the entire data set. Make a table showing the results. The table should order the combinations by increasing complexity. Note: assume (*d*_{1}*,C*_{1}) induces a more complex model than (*d*_{2}*,C*_{2}) iff either *d*_{1 }*> d*_{2}, or *C*_{1 }*< C*_{2}. Highlight the combination that resulted in highest average CV accuracy. Note: the 20 different combinations for *d *and *C *should provide a good variation of svm model possibilities. For the best classifier in the table, provide the average distance of the predicted class from the true class. Provide a histogram that shows the frequency of how often a predication is *m *rings away from the true number of rings, where *m *= 0*,*1*,*2*,…,*29.

1

- Consider the following alternative method for classifying the ring count of an abalone. Thismethod uses the following nine binary classifiers:
*f*_{≤9 }vs_{≥10},*f*_{≤7 }vs_{8−9},*f*_{≤5 }vs_{6−7},*f*_{8 }vs_{9},*f*_{6 }vs 7,*f*_{10}−_{11 }vs ≥_{12},*f*_{12}−_{13 }vs ≥_{14},*f*_{10 }vs 11,*f*_{12 }vs 13. For example*f*≤_{7 }vs 8−_{9 }classifies an abalone data point as either having 7 or fewer rings, or having either 8 or 9 rings. Thus, the training set for this classifier consists of all training points with 9 or fewer rings. As another example*f*_{8 }vs_{9 }classifies an abalone data point as either having 8 rings or 9 rings. Thus, the training set for this classifier consists of all training points with 8 or 9 or fewer rings. These binary classifiers are used to form a classification algorithm that behaves in a manner similar to binary search. For example, on input*x*, we first evaluate*f*_{≤9 }vs_{≥10}(*x*). Suppose the output is +1 (i.e.*x*is classified as having at least 10 rings). Next we evaluate*f*_{10−11 }vs_{≥12}(*x*). Suppose the output is −1 (i.e. .*x*is classified as having 10 or 11 rings). Finally, the algorithm evaluates*f*_{10 }vs_{11}(*x*) and returns either 10 or 11 as the final ring classification. In the case where*x*is classified as having 5 or fewer rings, then the algorithm outputs 5. Similarly, in the case where*x*is classified as having 14 or more rings, then the algorithm outputs 14.

For each of the nine classifiers, use a method similar to that used in Exercise 1 for finding a best svm model for the classifier. Provide a table having nine rows, where each row reports on the best classifier found for each of the nine different classifiers. Each row should include i) a description of the two classes (e.g. “10 vs 11”), ii) size of the training set, iii) degree value of best learning-parameter (BLP) combination, iv) *C *value of BLP combination v) the average CV accuracy for the BLP combination, and vi) the training accuracy of the final model constructed with the best learning parameters.

- Implement the binary-search learning algorithm described in the previous exercise, and applyit to the entire abalone data set. Report on the training accuracy and the average distance of the predicted class from the true class. Provide a histogram that shows the frequency of how often a predication is
*m*rings away from the true number of rings, where*m*= 0*,*1*,*2*,…,* - Use the two-dimensional data in file Exercise-4.csv to build a data frame df. Use R’s plot function to visualize the data. Verify that the relationship between
*x*and*y*appears to be quadratic. Use the e1071 library’s svm function (with the following options kernel = ‘‘polynomial’’, degree = 2, type = ‘‘eps-regression’’

held constant) on the data set with at least 20 different combinations of * *and *C *cost. For each combination perform the following: i) 10-fold cross validation of mean squared error (mse), and ii) mse over the entire data set. Make a table showing the results. The table should order the combinations by increasing complexity. Note: assume () induces a more complex model than () iff either , or *C*_{1 }*< C*_{2}. Highlight the combination that resulted in highest average CV mse. Again, the 20 different combinations for * *and *C *should provide a good variation of svm model possibilities.

- Provide a graph that shows the plotted data points against the curve provided by the best svmfrom the previous exercise. Plot the svm model using 1,000 data points equally spaced between 0 to 10. Make sure the plotted data points and plotted model points are clearly distinguishable.
- Try different combinations of
*d*,*C*, and to find a good support-vector regression machine for the abalone data set. Report on the average distance of the predicted class from the true class. Provide a histogram that shows the frequency of how often a predication is*m*rings away from the true number of rings, where*m*= 0*,*1*,*2*,…,*

## Reviews

There are no reviews yet.