[SOLVED] 代写 Spark 给定两个csv文件,从csv文件中读取数据。然后筛选出符合条件的数据,并输出成指定格式。Parking-Violation.csv是违法停车的数据集,open-violation.csv是已经交了罚单的数据集。

30 $

File Name: 代写_Spark_给定两个csv文件,从csv文件中读取数据。然后筛选出符合条件的数据,并输出成指定格式。Parking-Violation.csv是违法停车的数据集,open-violation.csv是已经交了罚单的数据集。.zip
File Size: 2241.96 KB

SKU: 3091659224 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


给定两个csv文件,从csv文件中读取数据。然后筛选出符合条件的数据,并输出成指定格式。Parking-Violation.csv是违法停车的数据集,open-violation.csv是已经交了罚单的数据集。

Spark程序可以使用如下指令输入文件

from csv import reader
lines = sc.textFile(sys.argv[1], 1)
lines = lines.mapPartitions(lambda x: reader(x))

Task 1: Find all parking violations that have been paid, i.e., that do not occur in open-violations.csv.
Output: A key-value* pair per line, where:
key = summons_number
values = plate_id, violation_precinct, violation_code, issue_date
(*Note: separate key and value by the tab character (‘t’), and elements within the key/value should be separated by a comma then a space. This applies to all tasks below)
Your output format should conform to the format of following examples:
1307964308 GBH2444, 74, 46, 2016-03-07
4617863450 HAM2650, 0, 36, 2016-03-24
To complete this task,

1) Write a map-reduce job. Run Hadoop using 2 reducers.
2) Write a Spark program.

Task 2: Find the frequencies of the violation types in parking_violations.csv, i.e., for each violation code, the number of violations that this code has.
Output: A key-value pair per line, where:
key = violation_code
value = number of violations
Here are sample outputs with 1 key-value pair:
1 159
46 100
To complete this task,
1) Write a map-reduce job. Run Hadoop using 2 reducers.
2) Write a Spark program.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 Spark 给定两个csv文件,从csv文件中读取数据。然后筛选出符合条件的数据,并输出成指定格式。Parking-Violation.csv是违法停车的数据集,open-violation.csv是已经交了罚单的数据集。
30 $