[SOLVED] algorithm Spark parallel Consider the movie lens data set: https://www.kaggle.com/shubhammehta21/movie-lens- small-latest- dataset/. Here we only consider the ratings.csv file which has 100,836 rows (ignore the header). We are only concerned with the first two columns: userId and movieId. Your task is to implement a Spark algorithm, assoc.py, for discovering association rules of the form: Ij, where I is an itemset and j is a single item (similar to what the text book discusses), from the dataset. Note that items here are movies and users are baskets.

Whatsapp Us

Scan or Click here to open the chat!

Programming

[SOLVED] algorithm Spark parallel Consider the movie lens data set: https://www.kaggle.com/shubhammehta21/movie-lens- small-latest- dataset/. Here we only consider the ratings.csv file which has 100,836 rows (ignore the header). We are only concerned with the first two columns: userId and movieId. Your task is to implement a Spark algorithm, assoc.py, for discovering association rules of the form: Ij, where I is an itemset and j is a single item (similar to what the text book discusses), from the dataset. Note that items here are movies and users are baskets.

Brand: Assignment Chef
SKU: 6378886678
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

$25

File Name: algorithm_Spark_parallel_Consider_the_movie_lens_data_set__https___www_kaggle_com_shubhammehta21_movie_lens__small_latest__dataset___Here_we_only_consider_the_ratings_csv_file_which_has_100_836_rows__ignore_the_header___We_are_only_concerned_with_the_first_two_columns__userId_and_movieId__Your_task_is_to_implement_a_Spark_algorithm__assoc_py__for_discovering_association_rules_of_the_form__Ij__where_I_is_an_itemset_and_j_is_a_single_item__similar_to_what_the_text_book_discusses___from_the_dataset__Note_that_items_here_are_movies_and_users_are_baskets_.zip
File Size: 5237.52 KB

SKU: 6378886678 Category: Programming Tags: AI, algorithm, Android, ARM, C, case study, compiler, Computer Architecture, concurrency, data mining, data science, data structure, database, decision tree, deep learning, distributed system, ER, file system, finance, GPU, GUI, Haskell, interpreter, Java, Javascript, kernel, Matlab, MIPS, PROLOG, Python, Scheme, SQL, x86

Description
Reviews (0)

5/5 - (1 vote)

Consider the movie lens data set: https://www.kaggle.com/shubhammehta21/movie-lens- small-latest- dataset/. Here we only consider the ratings.csv file which has 100,836 rows (ignore the header). We are only concerned with the first two columns: userId and movieId. Your task is to implement a Spark algorithm, assoc.py, for discovering association rules of the form: Ij, where I is an itemset and j is a single item (similar to what the text book discusses), from the dataset. Note that items here are movies and users are baskets.
Requirements:
Your algorithm should first discover frequent itemsets with the specified threshold for support count.
The discovery of frequent items should be done in parallel by following the SON algorithm and using mapPartitions() to process each chunk/partition of data by implementing an Apriori algorithm.
You should make the chunk size small enough so that it can be loaded entirely into memory.
As immediate results, your algorithm should also output the discovered frequent itemsets
(i.e., movies frequently watched by many users).
The discovering of association rules should be done in parallel and based on the discovered
frequent itemsets. Note that we assume that the support count for I U {j} the support
threshold.
The confidence of the discovered association rules should meet or exceed the specified
threshold.
Execution format:
spark-submit assoc.py ratings.csv
where the support threshold is an integer (for support count) and the confidence threshold is a value between 0 and 1.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

Reviews

Related products

[SOLVED] pakudex

[Solved] Python program to manage information about baseball players

[Solved] Modularized Body Mass Index (BMI) Program in Python

[SOLVED] ITEC136 Python Program

[Solved] Python Program 8 solved

[Solved] Indel