[SOLVED] IT代考 主題: R Studio利用PCA (Principal Components Analysis –主成份分析)及KNN (邻近算法) 去讀手寫數字

30 $

File Name: IT代考_主題:_R_Studio利用PCA_(Principal_Components_Analysis_–主成份分析)及KNN_(邻近算法)_去讀手寫數字.zip
File Size: 1177.5 KB

SKU: 0087655262 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


主題: R Studio利用PCA (Principal Components Analysis –主成份分析)及KNN (邻近算法) 去讀手寫數字

Individual Project: PCA + kNN Approach in Supervised Handwritten Digit Recognition
Data: A file, project.RData is available. This file contains the following three R objects:

Copyright By PowCoder代写加微信 assignmentchef

trainD : A 2000 x 784 matrix with each row storing the 28 x 28 pixels of a handwritten digit. Hereafter, we call matrix with this structure a digit data matrix.

TDigit : A vector of size 2000 with the i-th element being the digit corresponding to the i-th row of trainD. Hereafter, we call vector with this structure a digit vector.

printDigit(v,d=NA) : A function to print a digit image. Argument v is a vector of length 784 for a handwritten digit, and d is the digit for the vector v (default is NA). For example, printDigit(trainD[3,],TDigit[3]) displays the digit image in the 3rd row of trainD.

(需要提交的: 一個PDF 報告 及1個R檔, R檔內有2個function)
(a) A pdf file of report: Not more than five A4 size pages excluding appendices. It is recommended to put tables, figures, and program listing in appendices.
(b) One R file containing two functions, Prepare” and Classify” as specified below (sample Prepare” and Classify” functions are given in Appendix C). No global variable can be used in the functions.

(i) Prepare(trainData,DigitV)
Input: (i) trainData: a digit data matrix; (ii) DigitV: the corresponding digit vector
Output: A list containing all necessary information to be used in the Classify”

(ii) Classify(QueryData,OutPre)
Input: (i) QueryData: a digit data matrix to be classified; OutPre: Output of Pre-
pare” function.
Output: A vector containing the estimated digits of the query data in QueryData.
Each estimated digit is one of the following digits (-1, 0, 1, 2, …, 9) with the digit -1″ meaning unknown and to be classified manually”.
Restrictions on Methods:
1. Principal components analysis: (a) Free to perform any kind of transformation before principal components analysis; (b) Must use function prcomp” for principal components analysis; (c) Free to determine the number of principal components chosen; (d) Can use prcomp” a number of times.

2. Classifier: (a) You can only use the simplest form of k-nearest neighbor algorithm (kNN) to classify handwritten digits where k can be any positive integer (the classifier used in Section 3.8.2 is 1-nearest neighbor classifier; see Appendix A for a brief introduction to kNN); (b) The input of the kNN algorithm can be the principal components or any transformed form of the principal components; (c) You can use function knn” in class” package for k-nearest neighbor classification.

3. Cross-validation: (a) You are recommended to use cross-validation to assess performance of several candidate classifiers and choose the best one as your final method; (b) You can use the simplest form of cross-validation which is used in Section 3.8.2, or use k-fold cross-validation (see Appendix B for a brief introduction; see Appendix D for a sample program).

Appendix A: k-NN algorithm:
Step 1: Select a positive integer k.
Step 2: Find k nearest neighbors of a query data.
Step 3: Find the categories of the k neighbors. Assign the query data to the category of the majority of its k neighbors.

Appendix B: k-fold cross-validation:
Step 1: Divide the available dataset of size n randomly into k roughly equal groups, say A1…Ak.
Step 2: For i = 1….. k, do {
Use Ai as test data and combine all remaining k -1 groups to form our training data. Use the training data to build a classifier. Apply the classifier to classify the test data. Compute ai, the number of correct classification. }
Step 3: The estimated correct classification rate is

Appendix C:Sample Prepare” and Classify” functions

Prepare <- function(trainData,DigitV) {# If needed, enter library command(s) here.d <- prcomp(trainData)list(mu=d$center,u=d$rotation[,1:30],y=d$x[,1:30],Digit=DigitV,epsilon=25e5) }Classify <- function(QueryData,OutPre) {# If needed, enter library command(s) here.m <- dim(QueryData)[1]r <- numeric(m)for (i in 1:m) {w <- t(OutPre$u)%*%(QueryData[i,]-OutPre$mu)minD <- Inffor (j in 1:(dim(OutPre$y)[1])) {dist <- sum((w-OutPre$y[j,])^2)if (dist < minD) { r[i] <- OutPre$Digit[j]; minD <- dist }}if (minD > OutPre$epsilon) r[i] <- -1 }Appendix D: Sample k-fold cross-validation programCValidate <- function(dataSet,TDigit,k) {# Perform k-fold cross-validation# for the provided “Prepare” and “Classify” functions.n <- dim(dataSet)[1]b <- sample(rep(1:k,length=n))TrueDigit <- EstDigit <- NULLfor (i in 1:k) {train <- dataSet[b!=i,] # training datatest <- dataSet[b==i,] # test datav <- Prepare(train,TDigit[b!=i])r <- Classify(test,v)TrueDigit <- c(TrueDigit,TDigit[b==i]); EstDigit <- c(EstDigit,r) }print(table(`True digit`=TrueDigit,`Estimated digit`=EstDigit)) }Assessment Scheme: The performance of the Prepare” and Classify” functions will be evaluated using 1000 test images. The grade is determined by the following four factors:(1) Correct classification rate for 1000 query data (40%):Rate = [(number of correctly classified digits) + 0.5(number of unknown digits)]/1000.Fraction of mark obtained is max([(r-0.9)/(MaxR-0.9)]40%,0), where r is the rate of the provided classifier and MaxR is the best rate in the whole class.(2) Economy in storage (30%):Storage used is the size of the output of Prepare”. Fraction of mark is max([(120000-s)/(120000-MinS)]30%,0) where s is the storage used by the provided classifier, and MinS is the minimum storage used in the whole class.(3) Elegance of method (20%)(4) report writing (10%)程序代写 CS代考加微信: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] IT代考 主題: R Studio利用PCA (Principal Components Analysis –主成份分析)及KNN (邻近算法) 去讀手寫數字
30 $