In Part 2, students are provided with a sentiment analysis dataset (IMDb). The dataset
contains positive and negative movie reviews. Training, development and test splits are
provided. Based on this dataset, students will be asked to preprocess the data, select
features and train a machine learning model of their choice to solve this problem. Students
should include at least three different features to train their model, one of them should be
based on some sort of word frequency. Students can decide the type of frequency (absolute
or relative, normalized or not) and text preprocessing for this mandatory word frequency
feature. The remaining two (or more) features can be chosen freely. Then, students are
asked to perform feature selection to reduce the dimensionality of all features.
Deliverables for this part are the Python code including all steps and an essay of up to 1200
words. The Python code should include the Python scripts and a small README file with
instructions on how to run the code in Linux. Jupyter notebooks with clear execution paths
are also accepted. The code should take the training set as input, and output the results in
the test set. The code will consist of 25% of the marks for this part and the essay the
remaining 75%. The code should contain all necessary steps described above: to get the full
marks for the code, it should work properly and clearly perform all required steps. The essay
should include:
1) Description of all steps taken in the process (preprocessing, choice of features,feature selection and training and testing of the model). (25% The quality of the preprocessing, features and algorithm will not be considered here)
2) Justification of all steps. Some justifications may be numerical, in that case a development set is included to perform additional experiments. (25% A reasonable reasoned justification is enough to get half of the marks here. The usage of the development set is required to get full marks)
3) Overall performance (precision, recall, f-measure and accuracy) of the trained model in the test set. (10% Indicating the results, even if very low, is enough to get half of the marks here. A minimum of 65% accuracy is required to get full marks)
4) Critical reflection of how the deliverable could be improved in the future and on possible biases that the deployed machine learning may have. (15% The depth and correctness of insights related to your deliverable will be assessed)
The essay may include tables and/or figures.
Extra credit (optional 15% extra marks in the second part): For this second part students
can get extra credits by writing an essay on one specific task related to Part 2 (except for
option d, see instructions below). The essay will need to contain a maximum of 500 words
(figures/tables are allowed and encouraged) and will deal with one of the following four
specific topics:
a. Error analysis: Check the types of errors that the system submitted for Part 2 makes
and reflect on possible solutions. Qualitative analysis with specific examples is
encouraged.
b. Literature review: Write an essay about the state of the art of the field (i.e.
automatic hate speech detection). Retrieve relevant articles and digest them,
connecting them with your proposed solution to the problem in Part 2.
c. Model comparison: Propose and evaluate machine learning systems of different
nature from the ones taught during the course. Write a table with all results and
analyze the strengths and limitations of the approaches.
d. Code release: Create a GitHub or Bitbucket repository with the data and Python
code used for Part 2, with very clear instructions on how to run the code from the
terminal and about its different functionalities/parameters. Include all necessary
data, provide full documentation and comment on the code. Students only need to
include the link to the repository in the pdf.
Programming
[SOLVED] algorithm python In Part 2, students are provided with a sentiment analysis dataset (IMDb). The dataset
$25
File Name: algorithm_python_In_Part_2,_students_are_provided_with_a_sentiment_analysis_dataset_(IMDb)._The_dataset.zip
File Size: 970.26 KB
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.