[SOLVED] C algorithm math Spark graph Final Exam 2018S Solution

$25

File Name: C_algorithm_math_Spark_graph_Final_Exam_2018S_Solution.zip
File Size: 508.68 KB

5/5 - (1 vote)

Final Exam 2018S Solution
Written Part
1. (2 points for each)
(1) A
(2) B
(3) A
(4) C
(5) D
2. (2.5 points for each)
(1) A
(2) C
(3) C
(4) B and C
3. Not good, (2 points) because it collects all data to the driver program, who then does the addition sequentially by itself. (3 points)
r1 = RDD1.zipWithIndex().map(lambda (x,y): (y,x))r2 = RDD2.zipWithIndex().map(lambda (x,y): (y,x))RDD3 = r1.union(r2).reduceByKey(lambda x,y: x+y)
(5 points)
4. (1) join will cause a shuffle (3 points), reduceByKey will not (2 points)
(2) Now both join andreduceByKey will cause shuffles. (2 points) Change map to mapValues. (3 points)
5.
(1) (3 points)
[(1, 2), (3, 3), (2, 4)]
[(1, 2), (2, 4)]
(2) (3 points)
[(1, 2), (3, 3), (2, 4)]
[(1, 2), (3, 3), (2, 4)]
(3) (3 points)
[(1, 2), (2, 4), (3, 3)]
[(1, 2), (2, 4), (3, 3)]
1 extra point if all correct.
Programming Part
Grading Schema:
Indentation:For all questions, if your program does not contain indentation, there will be 10% penalty on the final grade, at least one point.
Performance:For all questions, if your program contains any unnecessary operations, like unnecessary for-loop, there will be at most 10% penalty on the final grade.
Example:
for i in range (0,n):a[i] = a[i] + 1
will be considered as unnecessary operations, the correct way is
a.map(lambda a: a+1)
Output:For all questions, if your program has different output format from the requirements, there will be 10% penalty on each correct test point.
Question 1:
Four test points, each one contains 2 points.
P = (5771,4907), K = 1
OUTPUT: [(5771, 4907)]
P = (2000,5000), K = 1
OUTPUT: [(2095, 4488)]
P = (3476,9529), K = 5
OUTPUT:[(3765, 9373), (3577, 9962), (3137, 8984), (3338, 8776), (2698, 9476)]
P = (7895,9229), K = 20
OUTPUT: [(7163, 9536), (8849, 10082), (9170, 8944), (7791, 7746), (7065, 10516), (7677, 10749), (9486, 9204), (9441, 8181), (8017, 7277), (7814, 7189), (5884, 10132), (8674, 11475), (9239, 7187), (5840, 10737), (10474, 9888), (9595, 11489), (9078, 6641), (8560, 6307), (8663, 12156), (9201, 11962)]
The implementation of Divide function contains 7 points.
Calculating the nearest point in partition: 2 points
yield no more than K points in each partition: 5 points
if 2) is satisfied and the nearest point is in the K points, 1) is satisfied.
The implementation of main function contains 3 points.
If your program does not implement a divide-and-conquer algorithm, there will be 80% penalty on the final grade.
If your program does sorting in Divide part, there will be 60% penalty on the final grade, at most 9 points.
If your program only solve K=1 case, there will be 1 point penalty (Totally 15 points if your program can solve nearest neighbor problem).
If your program does not achieve any of above goals, you can get 2 points if you calculate the distance correctly.
Question 2:
Three test points, each one contains 2 points. For grading, we only consider the first five windows. if you can output the correct answer for the first three window, which means your program cannot handle window stream, you can get 1 point for each test point. In this case, you can get total 10 points.
Random seed = 1024
Output:45709
46126
48452
49226
50538
Random seed = 2018
Output:43386
44120
45155
48534
48469
Random seed = 5003
Output: 47214
48174
50523
53765
53900
Using any Spark Streaming function correctly: 2 points
Using any Spark Streaming function with window correctly: 2 points
Calculating the counts in a window: 1 points
Calculating the sum in a window: 2 points
Calculating the average: 2 point
Submitting unrelated code to the question will get 0 point.
Question 3:
Three test points, each one contains 2 points
userID = h
Output: +-+
| name|
+-+
|Issac|
|Jason|
|Charlie|
+-+
userID = c
Output:++
| name|
++
|Gabby|
|Bob|
|Alice|
|David|
++
userID=e
Output:++
| name|
++
|Alice|
|David|
++
Calculating the situation 1: 2 points
Calculating the situation 2: 2 points
Calculating the situation 3: 2 points
Calculating the situation 4: 2 points
Calculating the final result: 1 point
Not using GraphFrame/DataFrame API will lose all points from (2)-(6)
If the program can only output the result for the case userID = h’, there will be 1 points penalty (Solving userID = h’ can get totally 10 points) .
Sample Solution:
Question 1:
from math import sqrtdef distance(a,b): # define the function for distance between pointsreturn sqrt((a[0]-b[0])*(a[0]-b[0])+(a[1]-b[1])*(a[1]-b[1]))T = pairs.map(lambda pair: (pair, distance(pair,P))) # map (point->(point, distance))def f(iterator):result = [] # set the result list as emptymax = ((0,0),0) # set the farthest point in the result list as 0.for i in iterator: if (len(result) < K): # if the list is not full (Maximum = K)result.append(i) # insert new point into the listif (i[1] > max[1]) : # update the farthest point if possiblemax = ielse : # if the list is fullif (i[1] < max[1]) : # If the new point is closed to P than the farthest point in the list# Replace the farthest point with the new pointresult.remove(max) result.append(i)# Renew the farthest pointmax = ifor j in result:if (j[1] > max[1]):max = j# Report every point in the listfor i in result:yield iresult = T.mapPartitions(f)result = result.sortBy(lambda a: a[1]).map(lambda a:a[0]).take(K) # Sorting the K*numPartition elements and get the first K.
Question 2:
Stat = numbers.map(lambda u:(u,1)).reduceByWindow(lambda x ,y: (x[0]+y[0],x[1]+y[1]), lambda x,y: (x[0]-y[0],x[1]-y[1]), 30,10 )def printResult(rdd):result = rdd.take(1)print result[0][0]/result[0][1]Stat.foreachRDD(printResult)

Question 3:
one_hop = g.find((a)-[e]->(b))one_hop = one_hop.filter((one_hop[a.id] == userID)).select(b.name) # Calculate condition 1) and 2)two_hop = g.find((a)-[e]->(b); (b)-[f]->(c))two_hop = two_hop.filter(two_hop[a.id] == userID).filter(two_hop[e.relationship] == friend) # Calculate condition 3) and 4)two_hop = two_hop.filter(two_hop[c.id] != userID).select(c.name) # Remove userID from the resultresult = one_hop.union(two_hop).distinct() # Union two results

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] C algorithm math Spark graph Final Exam 2018S Solution
$25