[SOLVED] 代写 algorithm Spark parallel Consider the movie lens data set: https://www.kaggle.com/shubhammehta21/movie-lens- small-latest- dataset/. Here we only consider the “ratings.csv” file which has 100,836 rows (ignore the header). We are only concerned with the first two columns: userId and movieId. Your task is to implement a Spark algorithm, assoc.py, for discovering association rules of the form: I→j, where I is an itemset and j is a single item (similar to what the text book discusses), from the dataset. Note that items here are movies and users are baskets.

30 $

File Name: 代写_algorithm_Spark_parallel_Consider_the_movie_lens_data_set:_https://www.kaggle.com/shubhammehta21/movie-lens-_small-latest-_dataset/._Here_we_only_consider_the_“ratings.csv”_file_which_has_100,836_rows_(ignore_the_header)._We_are_only_concerned_with_the_first_two_columns:_userId_and_movieId._Your_task_is_to_implement_a_Spark_algorithm,_assoc.py,_for_discovering_association_rules_of_the_form:_I→j,_where_I_is_an_itemset_and_j_is_a_single_item_(similar_to_what_the_text_book_discusses),_from_the_dataset._Note_that_items_here_are_movies_and_users_are_baskets..zip
File Size: 5388.24 KB

SKU: 6378886678 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Consider the movie lens data set: https://www.kaggle.com/shubhammehta21/movie-lens- small-latest- dataset/. Here we only consider the “ratings.csv” file which has 100,836 rows (ignore the header). We are only concerned with the first two columns: userId and movieId. Your task is to implement a Spark algorithm, assoc.py, for discovering association rules of the form: I→j, where I is an itemset and j is a single item (similar to what the text book discusses), from the dataset. Note that items here are movies and users are baskets.
Requirements:
• Your algorithm should first discover frequent itemsets with the specified threshold for support count.
• The discovery of frequent items should be done in parallel by following the SON algorithm and using mapPartitions() to process each chunk/partition of data by implementing an Apriori algorithm.
• You should make the chunk size small enough so that it can be loaded entirely into memory.
• As immediate results, your algorithm should also output the discovered frequent itemsets
(i.e., movies frequently watched by many users).
• The discovering of association rules should be done in parallel and based on the discovered
frequent itemsets. Note that we assume that the support count for I U {j} ≥ the support
threshold.
• The confidence of the discovered association rules should meet or exceed the specified
threshold.
Execution format:
spark-submit assoc.py ratings.csv
where the support threshold is an integer (for support count) and the confidence threshold is a value between 0 and 1.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 algorithm Spark parallel Consider the movie lens data set: https://www.kaggle.com/shubhammehta21/movie-lens- small-latest- dataset/. Here we only consider the “ratings.csv” file which has 100,836 rows (ignore the header). We are only concerned with the first two columns: userId and movieId. Your task is to implement a Spark algorithm, assoc.py, for discovering association rules of the form: I→j, where I is an itemset and j is a single item (similar to what the text book discusses), from the dataset. Note that items here are movies and users are baskets.
30 $