[SOLVED] algorithm Java MapReduce COMP529336: COURSEWORK ASSIGNMENT 1 BATCH ANALYTICS

$25

File Name: algorithm_Java_MapReduce_COMP529336:_COURSEWORK_ASSIGNMENT_1_BATCH_ANALYTICS.zip
File Size: 715.92 KB

5/5 - (1 vote)

COMP529336: COURSEWORK ASSIGNMENT 1 BATCH ANALYTICS

COMP529336: COURSEWORK ASSIGNMENT 1 BATCH ANALYTICS

Dr. Bakhtiar Amen
Coursework Data: 16th October 2019
Due Data: 6th November 2019

INTRODUCTION
This assessed coursework assignment is worth 20 of your mark for COMP529336. Failure on this assignment can be compensated by higher marks on other assessments on the module. The assignment aims to test your understanding of batch analytics, with a focus on your ability to use Hadoop to solve Big Data Analytic problems. More specifically, it aims to partially assess the following learning outcome for COMP529: understanding of the middleware that can be used to enable algorithms to scale up to analysis of large datasets.
AsSESSMENT
The report will be assessed according to the following criteria:
Criterion

Percentage
Clarity of presentation including succinctness of main report
20
Quality of Java code including assessment of how easy it is to understand
40
Quality of analysis performed
40
SUBMISSION
Please submit your coursework online using the COMP529336 page on VITAL by 12 noon on Wednesday 6th November 2019. Standard lateness penalties will apply to any work handed in after this time. The report and the Java program must be written by yourself using your own words see the University guidance on academic integrity for additional information.
PROJECT BACKGROUND
Now more than ever, local governments have been engaged in emerging a smart city and creating the most sustainable urban environment to improve the quality of life. Part of their plan is also to introduce a new transportation program known as bike share program. The aim of this program is to help their citys traffic congestion as well as to reduce their citys air pollution. Today, the idea of sharing bike is very popular, since the bike users are easily allowed to rent any bike from any stations and return it back to their final destination. There are approximately 500,000 bicycles are available around the world for people to share over 500 different sharing programs. For this coursework, your task is to analyse one of the programs dataset known as Capital Bikeshare; http:capitalbikeshare.comsystemdatafor the Washington DC. city, in the USA.
The aim of this assignment is to help you to analyse Capital Bikeshare rental programs dataset and to understand the most popular rental season e.g., springer, summer, fall, winter across the year.
Dataset
A bikeshare dataset has 1000 records of rental bikes in between 20112013. The data has been stored in a file called BikeShareData and available on VITAL, COMP529336 Assignmentdata folder. The data field is also described in table 1.

Table 1: data record description
Field
Description
dteday
date
seasons
springer, summer, fall, winter
yr
year 2011
mnth
month1 to 12
hr
hour 0 to 23
weekday
day of the week
weathersit
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: MistCloudy, MistBroken clouds, MistFew clouds, Mist
3: Light Snow, Light RainThunderstormScattered clouds, Light RainScattered clouds
4: Heavy RainIce PalletsThunderstormMist, SnowFog
casual
count of casual users
stations
chinatown, capitollhill, lincoln, logan, southwest, oxford, abraham, alexandria, etc.
Your tasks:
Set up a Hadoop framework and justify your reason for deploying such frameworke.g., standalone?
Use ONLY seasons, stations data fields, the rest of other data fields can be deleted or ignored.
Write a Java program for a MapReduce job that counts the number of seasons in the file e.g., spring 3, summer10, winter 30.
Use the MapReduce job to calculate the number of time that each bicycle station e.g., chinatown has been used in the file.
Use the MapReduce job to show your output result in alphabetical order a z.
Comment on how this analysis could be extended to consider larger datasets e.g., 10 years of renting bicycle with 1 Terabyte of dataset.
Briefly Describe how to use your Hadoop MapReduce skills to solve other problem Chose own case studyMapReduce data flow diagram.
Your output report:
The output from this coursework is a brief report to be less than or equal to two A4 pages excluding any appendices in 12point font with no less than 2 cm margins that should have sections that describe:
Middleware configuration: How you configured the Hadoop middlewarescreen print including a description of your Hadoop cluster and your rationale for this choice.
Data Analytic Design: How you designed the MapReduce job including your rationale for your design, briefly statedraw a map reduce data flow model for your work.
Results: The results obtained excluding any discussion;
Discussion of Results;
Conclusions and Recommendations including discussion of how you would perform the task if it were to be undertaken at larger scale.
List of the Java program for your MapReduce jobs in the appendix.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] algorithm Java MapReduce COMP529336: COURSEWORK ASSIGNMENT 1 BATCH ANALYTICS
$25