EE5434 final project
Data were available on Nov. 5 (see the Kaggle website)
Report and source codes due: 11:59PM, Dec. 6th
Full mark: 100 pts.
During the process, you can keep trying new machine learning models and boost the learning accuracy.
You are encouraged to form. groups of size 2 with your classmates so that the team can implement multiple learning models and compare their performance. If you cannot find any partners, please send a message on the group discussion board and briefly introduce your expertise. If you prefer to do this project yourself, you can get 5 bonus points.
Submission format: Report should be in PDF format. Source code should be in a notebook file (.ipynb) and also save your source code as a HTML file (.html). Thus, there are three files you need to upload to Canvas. Remember that you should not copy anyone’s codes, which can lead to faisure of this course.
Files and naming rules: If you have two members in the team, start the file name with G2, otherwise, G1. For example, you have a teammate and the team members are: Jackie Lee and Xuantian Chan, name it as G2-Lee-Chan.xxx. 5 pts will be deducted if the naming rule is not followed. In your report, please clearly show the group members.
How do we grade your report? We will consider the following factors.
1. You would get 30% (basic grade) if you correctly applied two learning models to our classification problem. The accuracy should be much better than random guess. Your report is written in generally correct English and is easy to follow. Your report should include clear explanation of your implementation details and basic analysis of the results.
2. Factors in grading:
a. Applied/implemented and compared at least 2 different models. You show good sense in choosing appropriate models (such as some NLP related models).
b. For each model, clear explanation of the feature encoding methods, model
structure, etc. Carefully tuned multiple sets of parameters or feature engineering methods. Provided evidence of multiple methods to boost the performance.
c. Consider performance metrics beyond accuracy (such as confusion matrix, recall, ROC, etc.). Carefully compare the performance of different methods/models/parameter sets. Being able to present your results using the most insightful means such as tables/figures etc.
d. Well-written reports that are easy to follow/read.
e. Final ranking on Kaggle.
For each of the factor, we have unsatisfactory (1), acceptable (2), satisfactory (3), good (4), excellent (5). The sum of each factor will determine the grade. For example, student A got 4 good and 1 acceptable for a to e. Then, A’s total score is 4*4+2=16. The full mark for a to e is 25. So, A’s percentage is 64%.
Note that if the final performance is very close (e.g. 0.65 vs 0.66), the corresponding submissions belong to the same group in the ranking.
Factors that can increase your grade:
1. You used a new learning model/feature engineering method that was not taught in
class. This requires some reading and clear explanation why you think this model fits this problem.
2. Your model’s performance is much better than others because of a new or optimized method.
The format of the report
1. There is no page limit for the report. If you don’t have much to report, keep it simple. Also, miminize the language issues by proofreading.
2. To make our grading more standard, please use the following sections:
a. Abstract. Summarize the report (what you done, what methods you use and the conclusions). (less than 300 words)
b. Data properties (data explortary analysis). You should describe your understanding/analysis of the data properties.
c. Methods/models. In this section, you should describe your implemented models. Provide key parameters. For example, what are the features? If you use kNN, what is k and how you computed the distance? If you use ANN, what is the architecture, etc. You should separate the high-level description of the models and the tuning of hyper-parameters.
d. Experimental results. In this section, compare and summarize the results using appropriate tables/figures. Simplying copying screening is acceptable but will lead to low mark for sure. Instead, you should *summarize* your results. You can also compare the performance of your model under different hyperparameters.
e. Conclusion and discussion. Discussion why your models perform well or poorly.
f. Future work. Discuss what you could do if more time is given.
3. For each model you tried, provide the codes of the model with the best performance. In
your report, you can detail the performance of this model with different parameters.
The code
The code should include:
1. Preprocessing of the data
2. Construction of the model
3. Training
4. Validation
5. Testing
6. And other code that is necessary
This is the link that you need to use to join the competition.
https://www.kaggle.com/t/79178536956041b8acb64b6268afb4de
Reviews
There are no reviews yet.