[SOLVED] algorithm python students are provided with a sentiment analysis dataset (IMDb). The dataset contains positive and negative movie reviews. Training, development and test splits are provided. Based on this dataset, students will be asked to preprocess the data, select features and train a machine learning model of their choice to solve this problem. Students should include at least three different features to train their model, one of them should be based on some sort of word frequency. Students can decide the type of frequency (absolute or relative, normalized or not) and text preprocessing for this mandatory word frequency feature. The remaining two (or more) features can be chosen freely. Then, students are asked to perform feature selection to reduce the dimensionality of all features.

$25

File Name: algorithm_python_students_are_provided_with_a_sentiment_analysis_dataset_(IMDb)._The_dataset_contains_positive_and_negative_movie_reviews._Training,_development_and_test_splits_are_provided._Based_on_this_dataset,_students_will_be_asked_to_preprocess_the_data,_select_features_and_train_a_machine_learning_model_of_their_choice_to_solve_this_problem._Students_should_include_at_least_three_different_features_to_train_their_model,_one_of_them_should_be_based_on_some_sort_of_word_frequency._Students_can_decide_the_type_of_frequency_(absolute_or_relative,_normalized_or_not)_and_text_preprocessing_for_this_mandatory_word_frequency_feature._The_remaining_two_(or_more)_features_can_be_chosen_freely._Then,_students_are_asked_to_perform_feature_selection_to_reduce_the_dimensionality_of_all_features..zip
File Size: 7526.58 KB

5/5 - (1 vote)

students are provided with a sentiment analysis dataset (IMDb). The dataset contains positive and negative movie reviews. Training, development and test splits are provided. Based on this dataset, students will be asked to preprocess the data, select features and train a machine learning model of their choice to solve this problem. Students should include at least three different features to train their model, one of them should be based on some sort of word frequency. Students can decide the type of frequency (absolute or relative, normalized or not) and text preprocessing for this mandatory word frequency feature. The remaining two (or more) features can be chosen freely. Then, students are asked to perform feature selection to reduce the dimensionality of all features.

Deliverables for this part are the Python code including all steps and an essay of up to 1200 words. The Python code should include the Python scripts and a README file with instructions on how to run the code in Linux. Jupyter notebooks with clear execution paths are also accepted. The code should take the training set as input, and output the results in the test set. The code will consist of 25% of the marks for this part and the essay the remaining 75%.The code should contain all necessary steps described above: to get the full marks for the code, it should work properly and clearly perform all required steps.The essay should include:

Description of all steps taken in the process (preprocessing, choice of features, feature selection and training and testing of the model). (25% The quality of the preprocessing, features and algorithm will not be considered here)

Justification of all steps. Some justifications may be numerical, in that case a development set is included to perform additional experiments. (25% A reasonable reasoned justification is enough to get half of the marks here. The usage of the development set is required to get full marks)

Overall performance (precision, recall, f-measure and accuracy) of the trained model in the test set. (10% Indicating the results, even if very low, is enough to get half of the marks here. A minimum of 65% accuracy is required to get full marks)

Critical reflection of how the deliverable could be improved in the future and on possible biases that the deployed machine learning may have. (15% The depth and correctness of insights related to your deliverable will be assessed)

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] algorithm python students are provided with a sentiment analysis dataset (IMDb). The dataset contains positive and negative movie reviews. Training, development and test splits are provided. Based on this dataset, students will be asked to preprocess the data, select features and train a machine learning model of their choice to solve this problem. Students should include at least three different features to train their model, one of them should be based on some sort of word frequency. Students can decide the type of frequency (absolute or relative, normalized or not) and text preprocessing for this mandatory word frequency feature. The remaining two (or more) features can be chosen freely. Then, students are asked to perform feature selection to reduce the dimensionality of all features.
$25