Submit a paper containing the solutons to each of the exercises found below. The paper should be divided into two sections: Solutions, and Appendix. Each solution should be clearly presented and accompanied by narrative when necessary. Source code should not be included in the Solutions section unless explicitly requested. However, all supporting source code should be included in the appendix. Both Solutions and Appendix should be divided into sections, titled EXERCISE 1, EXERCISE 2, etc.. Important: credit for an exercise will not be awarded if there is insufficient supporting source code in the appendix.
Exercises
- Review and download the abalone data set at
http://archive.ics.uci.edu/ml/datasets/Abalone?pagewanted=all
Use the e1071 librarys svm function on the data set with at least 20 different combinations of polynomial degree and C cost. For each combination perform the following: i) 10-fold cross validation, and ii) training accuracy from training over the entire data set. Make a table showing the results. The table should order the combinations by increasing complexity. Note: assume (d1,C1) induces a more complex model than (d2,C2) iff either d1 > d2, or C1 < C2. Highlight the combination that resulted in highest average CV accuracy. Note: the 20 different combinations for d and C should provide a good variation of svm model possibilities. For the best classifier in the table, provide the average distance of the predicted class from the true class. Provide a histogram that shows the frequency of how often a predication is m rings away from the true number of rings, where m = 0,1,2,,29.
1
- Consider the following alternative method for classifying the ring count of an abalone. Thismethod uses the following nine binary classifiers: f9 vs 10, f7 vs 89, f5 vs 67, f8 vs 9, f6 vs 7, f1011 vs 12, f1213 vs 14, f10 vs 11, f12 vs 13. For example f7 vs 89 classifies an abalone data point as either having 7 or fewer rings, or having either 8 or 9 rings. Thus, the training set for this classifier consists of all training points with 9 or fewer rings. As another example f8 vs 9 classifies an abalone data point as either having 8 rings or 9 rings. Thus, the training set for this classifier consists of all training points with 8 or 9 or fewer rings. These binary classifiers are used to form a classification algorithm that behaves in a manner similar to binary search. For example, on input x, we first evaluate f9 vs 10(x). Suppose the output is +1 (i.e. x is classified as having at least 10 rings). Next we evaluate f1011 vs 12(x). Suppose the output is 1 (i.e. . x is classified as having 10 or 11 rings). Finally, the algorithm evaluates f10 vs 11(x) and returns either 10 or 11 as the final ring classification. In the case where x is classified as having 5 or fewer rings, then the algorithm outputs 5. Similarly, in the case where x is classified as having 14 or more rings, then the algorithm outputs 14.
For each of the nine classifiers, use a method similar to that used in Exercise 1 for finding a best svm model for the classifier. Provide a table having nine rows, where each row reports on the best classifier found for each of the nine different classifiers. Each row should include i) a description of the two classes (e.g. 10 vs 11), ii) size of the training set, iii) degree value of best learning-parameter (BLP) combination, iv) C value of BLP combination v) the average CV accuracy for the BLP combination, and vi) the training accuracy of the final model constructed with the best learning parameters.
- Implement the binary-search learning algorithm described in the previous exercise, and applyit to the entire abalone data set. Report on the training accuracy and the average distance of the predicted class from the true class. Provide a histogram that shows the frequency of how often a predication is m rings away from the true number of rings, where m = 0,1,2,,
- Use the two-dimensional data in file Exercise-4.csv to build a data frame df. Use Rs plot function to visualize the data. Verify that the relationship between x and y appears to be quadratic. Use the e1071 librarys svm function (with the following options kernel = polynomial, degree = 2, type = eps-regression
held constant) on the data set with at least 20 different combinations of and C cost. For each combination perform the following: i) 10-fold cross validation of mean squared error (mse), and ii) mse over the entire data set. Make a table showing the results. The table should order the combinations by increasing complexity. Note: assume () induces a more complex model than () iff either , or C1 < C2. Highlight the combination that resulted in highest average CV mse. Again, the 20 different combinations for and C should provide a good variation of svm model possibilities.
- Provide a graph that shows the plotted data points against the curve provided by the best svmfrom the previous exercise. Plot the svm model using 1,000 data points equally spaced between 0 to 10. Make sure the plotted data points and plotted model points are clearly distinguishable.
- Try different combinations of d, C, and to find a good support-vector regression machine for the abalone data set. Report on the average distance of the predicted class from the true class. Provide a histogram that shows the frequency of how often a predication is m rings away from the true number of rings, where m = 0,1,2,,
Reviews
There are no reviews yet.