Competition
This homework is held on Kaggle as a competition so that you could see how it works.
- Click the link to participate.
- The competition provides you a training and a testing set.
- training set train.json
- testing set test.json
- Since its a competition, you wont know the answer to the testing set, which is for you to predict and submit.
- The standard procedure of a competition:
-
- Understand the data
- Split the provided training set into training subset and validation set for validation methods.
- Preprocessing, model construction, tuning
- Retrain the best model with as much data as possible, and predict testing set and make a submission.
- Win the competition
- If you have any questions, post them in the Discussion section or on Discord so everyone can see and understand.
Objective
- Data Input 5%
- Download the training set and testing set from Kaggle.
- Data Preprocessing 15%
- Transform data format and shape so your model can process them.
- Shuffle the data.
- Any data augmentation that can boost your final results. 10%
- Model Construction 50%
- Support Vector Machine 20%
- for SVM model, you may want to try out different types of kernels and compare the result.
- Artificial Neural Networks 30%
- for ANN model, you could use any Neural Network based model you want and implement it by yourself.
- Every framework (such as TensorFlow or PyTorch) is allowed.
- explain the reasoning of your model choice, data augmentation, and training process.
- Validation method
- Holdout validation with the ratio
-
- Confusion matrix
- Accuracy
- Sensitivity(Recall)
- Precision
-
- Comparison & Conclusion 10%
- Also some feedback, anything you want to tell me.
- Kaggle Submission 10% (+30%)
- After the validation, now you have working SVM and ANN models.
- Retrain one of your best models with the whole train.json, predict test.json, and submit your y_test.csv to Kaggle.
- Holdout validation with the ratio
- Support Vector Machine 20%
-
-
- You can check sample_submission.csv for the submission format.
- Take a screenshot of the Leaderboard, highlight your name, and put it in the report.
- Top 10 in the final Private Leaderboard can get 30 bonus scores.
-
Note that you still need to submit your report and code to the newE3 system.
Data Recipe Ingredients Dataset
- The objective of the competition is to predict the category of a dishs cuisine given a list of its ingredients.
- In the dataset, we include the recipe id, the type of cuisine, and the list of ingredients of each recipe (of variable length). The data is stored in JSON format.
- An example of a recipe node in train.json:
- { id: 24717, cuisine: indian, ingredients: [ tumeric, vegetable stock, tomatoes, garam masala, naan, red lentils, red chili peppers, onions, spinach, sweet potatoes ] },
- In the test file test.json, the format of a recipe is the same as train.json, only the cuisine type is removed, as it is the target variable you are going to predict.

![[Solved] NCTU-CS Assignment #4 -Support Vector Machine & ANN](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] NCTU-CS Assignment #2 -Decision Tree & Random Forest & KNN](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.