1. Problem statement
A sample input file is given below. Each line corresponds to a point-of-interest (POI), which contains a keyword, coordinate values x and y (separated by white space).
park 3 5
lake 2 3
mall 1 4
park 2 4
lake 9 8
mall 2 7
We measure the distance between two points p1=(x1,y1) and p2=(x2,y2) by:
_________________
dist(p1, p2) = (x1 x2)2 + (y1 y2)2
Each keyword k is associated with a group G(k) of points.
[Example] The group of park contains two points: (3,5) and (2,4).
There are 2 questions in this programming assignment. You should write a MapReduce program to solve each of them.
Question Q1: Find the centroid (i.e., the mean position of points) of each group.
[Example]
Input: the sample input above
Output:
lake5.55.5
mall1.55.5
park2.54.5
Question Q2: Find the diameter (i.e., the maximum distance between any two points inside a group) of each group.
[Example]
Input: the sample input above
Output:
lake8.602
mall3.162
park1.414
2. Requirements
Though MapReduce support multiple languages, in this assignment, you should use Java (Java 8) for implementation.
You submission should be organized as follows
Q1.java// source file for question 1
Q1.jar// jar file for question 1, compiled and archived from Q1.java
Q2.java// source file for question 2
Q2.jar// jar file for question 2, compiled and archived from Q2.java
Archive the above structure as
Make sure that you can compile your source file and run with the latest Hadoop versions (i.e., Hadoop 3.2.1) pseudo-distributed mode.
Your jar file should be directly runnable on Linux platform with the following call:
bin/hadoop jar Q1.jar Q1
Your output result should preserve double precision.
You should only use one MapReduce round to solve each sub-question.
[Hint] You may use the Ubuntu image we provided for this assignment.
Google drive:
HYPERLINK https://drive.google.com/file/d/1lMqmTAj2sC2gVqkVWW-MDUR24vv-a3Si/view?usp=sharing https://drive.google.com/file/d/1lMqmTAj2sC2gVqkVWW-MDUR24vv-a3Si/view?usp=sharing
The Y drive in COMP Lab: Y:SubjectCOMP5434 Note: These files will get expired on November 7!
3. Grading criteria
20 marks will be given if your program can be compiled.
for each .java file, 10 marks
80 marks will be given if your program is correct. We will test the correctness of your program by using 8 test cases (4 for each sub-question).
For each test case, 10 marks
Notice this is an individual assignment. Plagiarism will result in 0 mark!
Reviews
There are no reviews yet.