[SOLVED] Java database MapReduce 1. For each problem and data set described below, state how would you setup the key,value pairs as inputs and outputs for mappers and reducers. Also, explain the operation performed by the map and reduce functions given their input. If the mapper or reducer must perform a filtering step before generating its keyvalue pairs, state exactly the type of filtering needed. Assume your Hadoop program uses TextInputFormat as its input format where each record corresponds to a line of the input file. Since the the mapper inputs are the same for all problems below, you only need to specify the mapper outputs. If the problem requires more than one mapreduce jobs, specify the role of each mapreduce function along with their input and output keyvalue pairs.

$25

File Name: Java_database_MapReduce_1._For_each_problem_and_data_set_described_below,_state_how_would_you_setup_the_key,value_pairs_as_inputs_and_outputs_for_mappers_and_reducers._Also,_explain_the_operation_performed_by_the_map_and_reduce_functions_given_their_input._If_the_mapper_or_reducer_must_perform_a_filtering_step_before_generating_its_keyvalue_pairs,_state_exactly_the_type_of_filtering_needed._Assume_your_Hadoop_program_uses_TextInputFormat_as_its_input_format_where_each_record_corresponds_to_a_line_of_the_input_file._Since_the_the_mapper_inputs_are_the_same_for_all_problems_below,_you_only_need_to_specify_the_mapper_outputs._If_the_problem_requires_more_than_one_mapreduce_jobs,_specify_the_role_of_each_mapreduce_function_along_with_their_input_and_output_keyvalue_pairs..zip
File Size: 7328.76 KB

5/5 - (1 vote)

1. For each problem and data set described below, state how would you setup the key,value pairs as inputs and outputs for mappers and reducers. Also, explain the operation performed by the map and reduce functions given their input. If the mapper or reducer must perform a filtering step before generating its keyvalue pairs, state exactly the type of filtering needed. Assume your Hadoop program uses TextInputFormat as its input format where each record corresponds to a line of the input file. Since the the mapper inputs are the same for all problems below, you only need to specify the mapper outputs. If the problem requires more than one mapreduce jobs, specify the role of each mapreduce function along with their input and output keyvalue pairs.
a Data set: Student grade record database. Each record contains the following information: StudentID, GradeLevel freshman, sophmore, junior, senior, CourseID, Category CSE, MTH, CMSE, OTHER, Passed yes, no. Examples of records in the grade database are shown below:
12311, freshman, 101, CSE, yes.
12311, freshman, 101, CMSE, no.
15641, junior, 301, CSE, yes.

Problem: Find the students who have always passed i.e., passedyes on courses from a given category shehe had participated. For example, if student 12311 passed on all 4 CSE courses taken, you should output the keyvalue pair 12311, CSE, even if there are 30 CSE courses in the dataset.
Answer:
Mapper function:
Mapper output:
Reducer input:
Reduce function:
Reducer output:
b Data set: Movie watching data. Each record contains a list of people represented as userIDs who have watched and enjoyed the same movie:
The Shawshank Redemption123456999811102
The Godfather462123181250999124812
The Dark Knight215110180211302821
For example, the records above indicate that users 123 and 999 both liked The Shawshank Redemption and The Godfather.
Problem: Find all pairs of peopleuser who share more than 5 movies they like in common. If users 123 and 999 have more than 5 movies in common, the reducer should output the following keyvalue pair: Key123, Value999
The key must be less than the value to avoid generating duplicate pairs.
Answer:
Mapper function:
Mapper output:
Reducer input:
Reduce function:
Reducer output:
c Data set: Amazon shopping logs. Each log entry contains the following information: SessionID, userID, itemName, itemCost. For example, the following shopping log entries:
23331234 textbook100
23331234 laptop1500
11111234 chair200
99994545 headphones250
The first entry indicates the shopping session 2333 by user 1234 who bought a textbook that cost 100. Note that a user may buy multiple items in the same session. For example, during session 2333, the user 1234 bought both a textbook and a laptop.
Problem: Find pairs of itemNames that are frequently bought together by the users frequent here means they were bought together in more than 1000 sessions. For example, if textbook and laptop are bought together in more than 1000 sessions, you should emit the keyvalue pair:
Keylaptop Valuetextbook where value is larger than the key if you use a string comparison operator. Note that you need two MapReduceHadoop jobs to do this.
Answer:
For the first MapReduceHadoop job:
Map function:
Mapper output:
Reducer input:
Reduce function:
Reducer output:
For the second MapReduceHadoop job:
Mapper input:
Map function:
Mapper output:
Reducer input:
Reduce function:
Reducer output:
HDFS and Linux Commands
Suppose you were given a Hadoop Java program named findAnswer.java that uses a main class called findAnswer where the mapper and reducer classes are called faMapper and faReducer, respectively. Assume the Hadoop program takes three input parameters as follows: an input directory, an output directory, and an integer variable that specifies the question number we wish the program to run called qnum. Assume you were given an input file named examQuestions.txt and that the source code is stored in the directory homeexam on the local filesystem. Write down the stepbystep Linux and Hadoop commands you would use to compile and execute the program if asked to execute with the question integer qnum as 3. Make sure to include the commands of uploading the dataset from local to HDFS, compiling and archiving the Java program, executing the program, and merging the reducer output into a single output file named answer.txt that should be saved on the local directory homeexam. The directory on the HDFS to store the input dataset is called userexaminput and the directory to store the output results is called userexamoutput. You can assume the environment variables for Java and Hadoop have been set correctly and that the input directory has been created on HDFS.
Answer:
Pig
Consider the following dataset for books that are in the library. Each line in the data file named library.txt contains the following commaseparated values:
BookISBN, Title, Author, Year, Genre
where BookISBN is a unique identifier for each book and Genre is the book category that it belongs.
Write the code that would be used in a Pig Latin script to load the data into atable named library, which has the following columns: BookISBN chararray, Title chararray, Author chararray, Year int, and Genre chararray.
Answer:
Write the code that would be used in a Pig Latin script to return the Title,Author, and Year of all the books that have the Genre Fantasy. The result should be stored in a table named FantasyBooks.
Answer:
Write the code that would be used in a Pig Latin script to count the number ofbooks i.e., BookISBNs for sale in each Genre. For example, if the query result contains the rows Fantasy, 300 and Thriller, 200, this means there are 300 books in the library that are categorized in the genre of Fantasy and 200 that are Thriller.
Answer:
Write a Pig Latin script to find the name of the genre with the least number ofbooks in the library. Note that you can assume that each Genre has at least one book in the library.
Answer:
Hive
For this question, you will use the previous library book dataset.
Write a Hive script to create the schema of the Library table in Hive.
Answer:
Write a Hive script to return the BookISBN and Year for books in the librarywith the Fantasy Genre. The script should return the BookISBN and Year.
Answer:
Write a Hive script to calculate the average year of books in the library for eachgenre. You should return the Genre and the average Year.
Answer:
Pig
Write the Pig Latin scripts to process a data set that contains data about restaurants. More specifically, assume you have the file Restaurant.csv with columns as follows: restaurantID int, restaurant name chararray, owner chararray, year established int, and max occupancy int, where restaurantID can uniquely identify a restaurant and max occupancy specifies the maximum number of guests that can be eating at the restaurant at any given time.
Write the Pig Latin script code below that reads the restaurant data file Restaurant. csv and finds the restaurant name of all the restaurants owned by Gordon Ramsay. The query result should contain only 1 column restaurant name. Note that you should use CSVExcelStorage in your answer and the output can be written as if saved to the directory named GordonRamsay.
Answer:
Write a Pig Latin script that again reads the restaurant data file Restaurants. csv and counts the number of restaurants owned by each owner. The query result should return only owners that own 5 or more restaurants as a table with 2 columns owner and number of restaurants owned. In your answer below you should also write the output as if it were to be saved in the directory named successfulOwners.
Answer:

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Java database MapReduce 1. For each problem and data set described below, state how would you setup the key,value pairs as inputs and outputs for mappers and reducers. Also, explain the operation performed by the map and reduce functions given their input. If the mapper or reducer must perform a filtering step before generating its keyvalue pairs, state exactly the type of filtering needed. Assume your Hadoop program uses TextInputFormat as its input format where each record corresponds to a line of the input file. Since the the mapper inputs are the same for all problems below, you only need to specify the mapper outputs. If the problem requires more than one mapreduce jobs, specify the role of each mapreduce function along with their input and output keyvalue pairs.
$25