[SOLVED] R C MapReduce Descriptions and Tasks

$25

File Name: R_C_MapReduce_ Descriptions_and_Tasks.zip
File Size: 348.54 KB

5/5 - (1 vote)

Descriptions and Tasks
Access data: Dataset is available as ncdcdata.zip at:https:1drv.msus!AslHHkfDcU2Ngx9FuLmjD9RSkH0A
https:pan.baidu.coms1IwLMQMCLGME3HGjUWcjQgcodexkl3

More info about the data could be found at https:www.ncdc.noaa.govdataaccess
ftp:ftp.ncdc.noaa.govpubdatagsod
https:blog.csdn.netMrCharlesarticledetails50442367in Chinese

Annual files:eg, gsod2006.tarAll 2006 files compressed by station, in one tar file.etc, etcFor each annual volume.Note: Each years data are contained in subdirectoriesfolders by year.Station files:eg, 010010999992006.op.gzFiles by station year, identified by WMO number, WBAN number if appropriate, and year. For a cross reference of thefile names with location, see: ishhistory.txt.InformationalUtility Files:countrylist.txt: A list showing the station number range for each country.ishhistory.txt: A station list to be used with the data files, showing the names and locations for each station.Note: Global summary of day contains a subset of thestations listed in this station history.readme.txt: A description of the data and its format.

Description of Data Format:

FIELD POSITIONTYPE DESCRIPTION
STN16 Int. Station number WMODATSAV3 number
for the location.

WBAN812Int. WBAN number where applicablethis is the
historical Weather Bureau Air Force Navy
numberwith WBAN being the acronym.

YEAR1518 Int. The year.

MODA1922 Int. The month and day.

TEMP2530 Real Mean temperature for the day in degrees
Fahrenheit to tenths.Missing9999.9
Count 3233 Int. Number of observations used in
calculating mean temperature.

DEWP3641 Real Mean dew point for the day in degrees
Fahrenheit to tenths.Missing9999.9
Count 4344 Int. Number of observations used in
calculating mean dew point.

SLP 4752 Real Mean sea level pressure for the day
in millibars to tenths.Missing
9999.9
Count 5455 Int. Number of observations used in
calculating mean sea level pressure.

STP 5863 Real Mean station pressure for the day
in millibars to tenths.Missing
9999.9
Count 6566 Int. Number of observations used in
calculating mean station pressure.

VISIB 6973 Real Mean visibility for the day in miles
to tenths.Missing999.9
Count 7576 Int. Number of observations used in
calculating mean visibility.

WDSP7983 Real Mean wind speed for the day in knots
to tenths.Missing999.9
Count 8586 Int. Number of observations used in
calculating mean wind speed.

MXSPD 8993 Real Maximum sustained wind speed reported
for the day in knots to tenths.
Missing999.9

GUST96100Real Maximum wind gust reported for the day
in knots to tenths.Missing999.9

MAX 103108 Real Maximum temperature reported during the
day in Fahrenheit to tenthstime of max
temp report varies by country and
region, so this will sometimes not be
the max for the calendar day.Missing
9999.9
Flag109109 Char Blank indicates max temp was taken from the
explicit max temp report and not from the
hourly data. indicates max temp was
derived from the hourly data i.e., highest
hourly or synopticreported temperature.

MIN 111116 Real Minimum temperature reported during the
day in Fahrenheit to tenthstime of min
temp report varies by country and
region, so this will sometimes not be
the min for the calendar day.Missing
9999.9
Flag117117 Char Blank indicates min temp was taken from the
explicit min temp report and not from the
hourly data. indicates min temp was
derived from the hourly data i.e., lowest
hourly or synopticreported temperature.

PRCP119123 Real Total precipitation rain andor melted
snow reported during the day in inches
and hundredths; will usually not end
with the midnight observationi.e.,
may include latter part of previous day.
.00 indicates no measurable
precipitation includes a trace.
Missing99.99
Note:Many stations do not report 0 on
days with no precipitationtherefore,
99.99 will often appear on these days.
Also, for example, a station may only
report a 6hour amount for the period
during which rain fell.
See Flag field for source of data.
Flag124124 Char A1 report of 6hour precipitation
amount.
BSummation of 2 reports of 6hour
precipitation amount.
CSummation of 3 reports of 6hour
precipitation amount.
DSummation of 4 reports of 6hour
precipitation amount.
E1 report of 12hour precipitation
amount.
FSummation of 2 reports of 12hour
precipitation amount.
G1 report of 24hour precipitation
amount.
HStation reported 0 as the amount
for the day eg, from 6hour reports,
but also reported at least one
occurrence of precipitation in hourly
observationsthis could indicate a
trace occurred, but should be considered
as incomplete data for the day.
IStation did not report any precip data
for the day and did not report any
occurrences of precipitation in its hourly
observationsits still possible that
precip occurred but was not reported.

SNDP126130 Real Snow depth in inches to tenthslast
report for the day if reported more than
once.Missing999.9
Note:Most stations do not report 0 on
days with no snow on the groundtherefore,
999.9 will often appear on these days.

FRSHTT133138 Int. Indicators 1yes, 0nonot
reported for the occurrence during the
day of:
Fog F1st digit.
Rain or Drizzle R2nd digit.
Snow or Ice Pellets S3rd digit.
Hail H4th digit.
Thunder T5th digit.
Tornado or Funnel Cloud T6th
digit.

Task 1:
According to the description of NCDC data format in the Description of Data.txt file, you need to store all the data in the .op.gz file to the HDFS. And then load data from HDFS to table observations and counts of HBase. Set the column families of the two tables to info and data respectively. The counts table stores all the count information in the .op.gz files, and the observationstable stores others.

Task 2:
An HBase table can be the source or target of a MapReduce job, or also we can use it as both input and output. Get data from tables observations and counts and use MapReduce to calculate the following results:
Which station has the most records? One row represents one data record one days data

Since each station only records part of the days in a year eg, the observation data of station which station ID is 00702699999 in 2016, this station only observed 8 days of data from June 22 to 29 in a year, you need to count which station has the most total days in the last 100 years.

Get one or more conclusions from the dataset by calculation and data processing. Give detailed procedures of the data analytics.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] R C MapReduce Descriptions and Tasks
$25