[SOLVED] python Java hadoop algorithm Description

$25

File Name: python_Java_hadoop_algorithm_Description.zip
File Size: 376.8 KB

5/5 - (1 vote)

Description

INF553Spring2018

Assignment4CommunityDetection

Deadline:04/09201811:59PMPST

AssignmentOverview

In this assignment you are asked to implement theGirvan-Newman algorithmusing the Spark
Frameworkinordertodetectcommunitiesinthegraph.Youwilluseonlyvideo_small_num.csv
datasetinordertofinduserswhohavethesimilarproducttaste.Thegoalofthisassignmentisto
help you understand how to use the Girvan-Newman algorithm to detect communities in an
efficientwaybyprogrammingitwithinadistributedenvironment.

EnvironmentRequirements

Python:2.7Scala:2.11Spark:2.2.1
IMPORTANT:Wewillusetheseversionstocompileandtestyourcode.Ifyouuseotherversions,
therewillbea20%penaltysincewewillnotbeabletogradeitautomatically.
YoucanonlyuseSparkRDD.

Writeyourowncode!

Forthisassignmenttobeaneffectivelearningexperience,youmustwriteyourowncode!I
emphasize thispointbecause youwill beable to findPython implementationsofmostor
perhapsevenalloftherequiredfunctionsontheweb.Pleasedonotlookfororatanysuch
code!Donotsharecodewithotherstudentsintheclass!!

SubmissionDetails

ForthisassignmentyouwillneedtoturninaPython,Java,orScalaprogramdependingonyour
languageofpreference.

Yoursubmissionmustbea.zipfilewithname: _ _hw4.zip.Thestructure
ofyoursubmissionshouldbeidenticalasshownbelow.TheFirstname_Lastname_Description.pdf
filecontainshelpfulinstructionsonhowtorunyourcodealongwithothernecessaryinformation
asdescribedinthefollowingsections.TheOutputFilesdirectorycontainsthedeliverableoutput
filesforeachproblemandtheSolutiondirectorycontainsyoursourcecode.

Datasets

WearecontinuallyusingAmazonReviewdata.ThistimeweuseasubsetofAmazonInstantVideo
category.We have already transferred the string id of user and product to integers for your
convenience.YoushoulddownloadonefilefromBlackboard:
1. video_small_num.csv

ConstructGraph

Eachnoderepresentsauser.Eachedgeisgeneratedinfollowingway:
Invideo_small_num.csv,countthenumberoftimesthattwousersratedthesameproduct.If
thenumberoftimesisgreaterorequivalentto7times,thereisanedgebetweentwousers.

Task1:Betweenness(50%)

YouarerequiredtoimplementGirvan-NewmanAlgorithmtofindbetweennessofeach
edgeinthegraph.Thebetweennessfunctionshouldbecalculatedonlyoncefromthe
originalgraph.

ExecutionExample

The first argument passed to your program (in the below execution) is the path of
video_small_num.csv file (e.g. spark-2.2.1-bin-hadoop2.7/HW4/video_small_num.csv). The
secondinputistheoutputpath(outputpathisthedirectoryofyouroutputfile,notincludingfile
name.e.g.spark-2.2.1-bin-hadoop2.7/HW4/).Followingwepresentexamplesofhowyoucan
run your programwith spark-submit bothwhen your application is a Java/Scala program or a
Pythonscript.

A. ExampleofrunningaJava/Scalaapplicationwithspark-submit:
Noticethattheargumentclassofthespark-submitspecifiesthemainclassofyour
applicationanditisfollowedbythejarfileoftheapplication.

YoushoulduseBetweennessasyourclassnameforthistask.

B. ExampleofrunningaPythonapplicationwithspark-submit:

Resultformat:
Eachlineisatuple,theformatislike(userId1,userId2,betweennessvalue).Thefileisorderedby
thefirstelementinascendingorderandifthefirstelementisthesame,orderedbythesecond
element.Theexampleisasfollows:(theexamplejustshowstheformat,isNOTasolution)

RuntimeRequirement:
<60secTask2:DetectCommunity(50%)Youarerequiredtoimplementbetweennessandmodularityinthistask.Youalsoneedtodividethegraphintosuitablecommunities,whichreachesthehighestmodularity.WhenyouusethefollowingformulatocalculatemodularityofpartitionSofG,youshouldbeawarethatAijshouldremainthesameasoriginalgraph(i.e.Aijdoesnotchangewhileyoudeleteanyedge)ExecutionExampleThe first argument passed to your program (in the below execution) is the path of video_small_num.csv file (e.g. spark-2.2.1-bin-hadoop2.7/HW4/video_small_num.csv). Thesecondinputistheoutputpath(outputpathisthedirectoryofyouroutputfile,notincludingfilename.e.g.spark-2.2.1-bin-hadoop2.7/HW4/).Followingwepresentexamplesofhowyoucanrun your programwith spark-submit bothwhen your application is a Java/Scala program or aPythonscript.A. ExampleofrunningaJava/Scalaapplicationwithspark-submit:Noticethattheargumentclassofthespark-submitspecifiesthemainclassofyourapplicationanditisfollowedbythejarfileoftheapplication. YoushoulduseCommunityasyourclassnameforthetask.B. ExampleofrunningaPythonapplicationwithspark-submit:Resultformat:Eachlistisacommunity,inwhichcontainsuserIds.Ineachlist,theuserIdsshouldbeinascendingorder.AndalllistsshouldbeorderedbythefirstuserIdineachlistinascendingorder.Andexampleisasfollows:(theexamplejustshowstheformat,isNOTasolution)RuntimeRequirement:<60secDescriptionFilePleaseincludethefollowingcontentinyourdescriptionfile:1.MentiontheSparkversionandPythonversion2.DescribehowtorunyourprogramforbothtasksSubmissionDetailsYoursubmissionmustbea.zipfilewithname: _ _hw4.zip
Pleaseincludeallthefilesintherightdirectoryasfollowing:
1. Adescriptionfile: _ _desription.pdf
2. AllScalascripts:

_ _task1_Betweenness.scala
_ _task1_Community.scala

3. AjarpackageforallScalafile: _ _hw4.jar
If you use Scala for all tasks, please make all *.scala file into ONLY ONE
_ _hw4.jarfileandstrictlyfollowtheclassnamementionedabove.
AndDONOTincludeanydataorunrelatedlibrariesintoyourjar.

4. IfyouusePython,thenallpythonscripts:
_ _task1_Betweenness.py
_ _task2_Community.py

5. Requiredresultfilesfortask1&2:
_ _Betweenness.txt
_ _Community.txt

GradingCriteria:

1. Ifyourprogramscannotrunwiththecommandsyouprovide,yoursubmissionwillbegraded
basedontheresultfilesyousubmit,andtherewillbean80%penalty

2. Ifthefilesgeneratedarenotsortedbasedonthespecifications,therewillbe20%penalty.
3. Ifyourprogramgeneratesmorethanonefile,therewillbe20%penalty.
4. ifruntimeofyourprogramexceedstheruntimerequirement,therewillbe20%penalty.
5. Ifyoudontprovidethesourcecode,especiallytheScalascripts,therewillbe20%penalty.
6. Youcanuseyourfree5-dayextension.

7. Therewillbe10%bonusifyouuseScalafortheentireassignment.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] python Java hadoop algorithm Description
$25