[SOLVED] Spark csvcsvParking-Violation.csvopen-violation.csv

$25

File Name: Spark_csvcsvParking-Violation.csvopen-violation.csv.zip
File Size: 480.42 KB

5/5 - (1 vote)

csvcsvParking-Violation.csvopen-violation.csv

Spark

from csv import reader
lines = sc.textFile(sys.argv[1], 1)
lines = lines.mapPartitions(lambda x: reader(x))

Task 1: Find all parking violations that have been paid, i.e., that do not occur in open-violations.csv.
Output: A key-value* pair per line, where:
key = summons_number
values = plate_id, violation_precinct, violation_code, issue_date
(*Note: separate key and value by the tab character (t), and elements within the key/value should be separated by a comma then a space. This applies to all tasks below)
Your output format should conform to the format of following examples:
1307964308 GBH2444, 74, 46, 2016-03-07
4617863450 HAM2650, 0, 36, 2016-03-24
To complete this task,

1) Write a map-reduce job. Run Hadoop using 2 reducers.
2) Write a Spark program.

Task 2: Find the frequencies of the violation types in parking_violations.csv, i.e., for each violation code, the number of violations that this code has.
Output: A key-value pair per line, where:
key = violation_code
value = number of violations
Here are sample outputs with 1 key-value pair:
1 159
46 100
To complete this task,
1) Write a map-reduce job. Run Hadoop using 2 reducers.
2) Write a Spark program.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Spark csvcsvParking-Violation.csvopen-violation.csv
$25