[SOLVED] 代写 MapReduce Task 1:

30 $

File Name: 代写_MapReduce_Task_1:.zip
File Size: 226.08 KB

SKU: 3975787758 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Task 1:
According to the description of NCDC data format in the “Description of Data.txt” file, you need to store all the data in the .op.gz file to the HDFS. And then load data from HDFS to table “observations” and “counts” of HBase. Set the column families of the two tables to “info” and “data” respectively. The ”counts” table stores all the count information in the .op.gz files, and the “observations ” table stores others.

Task 2:
An HBase table can be the source or target of a MapReduce job, or also we can use it as both input and output. Get data from tables “observations” and “counts” and use MapReduce to calculate the following results:
•Which station has the most records? (One row represents one data record (one day’s data))

Since each station only records part of the days in a year (eg, the observation data of station which station ID is 007026-99999 in 2016, this station only observed 8 days of data from June 22 to 29 in a year), you need to count which station has the most total days in the last 100 years.

•Which year has the most records?
Similarly, you need to figure out which year in the last 100 years has the most data recorded by these stations.

•In which year and which station has the most observations (sum of all count) of the specified information?
The number of observations of TEMP, DEWP, SLP, STP, VISIB and WDSP are different in different days for each station. Calculate in which year and which station has the largest number of observations of these information.
•Get one or more conclusions from the dataset by calculation and data processing. Give detailed procedures of the data analytics.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 MapReduce Task 1:
30 $