[SOLVED] 程序代写代做代考 python Java hadoop algorithm Description

30 $

File Name: 程序代写代做代考_python_Java_hadoop_algorithm_Description.zip
File Size: 612.3 KB

SKU: 7619149573 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Description

INF553–Spring2018

Assignment4CommunityDetection

Deadline:04/09201811:59PMPST

AssignmentOverview

In this assignment you are asked to implement theGirvan-Newman algorithmusing the Spark
Frameworkinordertodetectcommunitiesinthegraph.Youwilluseonlyvideo_small_num.csv
datasetinordertofinduserswhohavethesimilarproducttaste.Thegoalofthisassignmentisto
help you understand how to use the Girvan-Newman algorithm to detect communities in an
efficientwaybyprogrammingitwithinadistributedenvironment.

EnvironmentRequirements

Python:2.7Scala:2.11Spark:2.2.1
IMPORTANT:Wewillusetheseversionstocompileandtestyourcode.Ifyouuseotherversions,
therewillbea20%penaltysincewewillnotbeabletogradeitautomatically.
YoucanonlyuseSparkRDD.

Writeyourowncode!

Forthisassignmenttobeaneffectivelearningexperience,youmustwriteyourowncode!I
emphasize thispointbecause youwill beable to findPython implementationsofmostor
perhapsevenalloftherequiredfunctionsontheweb.Pleasedonotlookfororatanysuch
code!Donotsharecodewithotherstudentsintheclass!!

SubmissionDetails

ForthisassignmentyouwillneedtoturninaPython,Java,orScalaprogramdependingonyour
languageofpreference.

Yoursubmissionmustbea.zipfilewithname: _ _hw4.zip.Thestructure
ofyoursubmissionshouldbeidenticalasshownbelow.TheFirstname_Lastname_Description.pdf
filecontainshelpfulinstructionsonhowtorunyourcodealongwithothernecessaryinformation
asdescribedinthefollowingsections.TheOutputFilesdirectorycontainsthedeliverableoutput
filesforeachproblemandtheSolutiondirectorycontainsyoursourcecode.

Datasets

WearecontinuallyusingAmazonReviewdata.ThistimeweuseasubsetofAmazonInstantVideo
category.We have already transferred the string id of user and product to integers for your
convenience.YoushoulddownloadonefilefromBlackboard:
1. video_small_num.csv

ConstructGraph

Eachnoderepresentsauser.Eachedgeisgeneratedinfollowingway:
Invideo_small_num.csv,countthenumberoftimesthattwousersratedthesameproduct.If
thenumberoftimesisgreaterorequivalentto7times,thereisanedgebetweentwousers.

Task1:Betweenness(50%)

YouarerequiredtoimplementGirvan-NewmanAlgorithmtofindbetweennessofeach
edgeinthegraph.Thebetweennessfunctionshouldbecalculatedonlyoncefromthe
originalgraph.

ExecutionExample

The first argument passed to your program (in the below execution) is the path of
video_small_num.csv file (e.g. “spark-2.2.1-bin-hadoop2.7/HW4/video_small_num.csv”). The
secondinputistheoutputpath(outputpathisthedirectoryofyouroutputfile,notincludingfile
name.e.g.“spark-2.2.1-bin-hadoop2.7/HW4/”).Followingwepresentexamplesofhowyoucan
run your programwith spark-submit bothwhen your application is a Java/Scala program or a
Pythonscript.

A. ExampleofrunningaJava/Scalaapplicationwithspark-submit:
Noticethattheargumentclassofthespark-submitspecifiesthemainclassofyour
applicationanditisfollowedbythejarfileoftheapplication.

YoushoulduseBetweennessasyourclassnameforthistask.

B. ExampleofrunningaPythonapplicationwithspark-submit:

Resultformat:
Eachlineisatuple,theformatislike(userId1,userId2,betweennessvalue).Thefileisorderedby
thefirstelementinascendingorderandifthefirstelementisthesame,orderedbythesecond
element.Theexampleisasfollows:(theexamplejustshowstheformat,isNOTasolution)

RuntimeRequirement:
<60secTask2:DetectCommunity(50%)Youarerequiredtoimplementbetweennessandmodularityinthistask.Youalsoneedtodividethegraphintosuitablecommunities,whichreachesthehighestmodularity.WhenyouusethefollowingformulatocalculatemodularityofpartitionSofG,youshouldbeawarethatAijshouldremainthesameasoriginalgraph(i.e.Aijdoesnotchangewhileyoudeleteanyedge)ExecutionExampleThe first argument passed to your program (in the below execution) is the path of video_small_num.csv file (e.g. “spark-2.2.1-bin-hadoop2.7/HW4/video_small_num.csv”). Thesecondinputistheoutputpath(outputpathisthedirectoryofyouroutputfile,notincludingfilename.e.g.“spark-2.2.1-bin-hadoop2.7/HW4/”).Followingwepresentexamplesofhowyoucanrun your programwith spark-submit bothwhen your application is a Java/Scala program or aPythonscript.A. ExampleofrunningaJava/Scalaapplicationwithspark-submit:Noticethattheargumentclassofthespark-submitspecifiesthemainclassofyourapplicationanditisfollowedbythejarfileoftheapplication. YoushoulduseCommunityasyourclassnameforthetask.B. ExampleofrunningaPythonapplicationwithspark-submit:Resultformat:Eachlistisacommunity,inwhichcontainsuserIds.Ineachlist,theuserIdsshouldbeinascendingorder.AndalllistsshouldbeorderedbythefirstuserIdineachlistinascendingorder.Andexampleisasfollows:(theexamplejustshowstheformat,isNOTasolution)RuntimeRequirement:<60secDescriptionFilePleaseincludethefollowingcontentinyourdescriptionfile:1.MentiontheSparkversionandPythonversion2.DescribehowtorunyourprogramforbothtasksSubmissionDetailsYoursubmissionmustbea.zipfilewithname: _ _hw4.zip
Pleaseincludeallthefilesintherightdirectoryasfollowing:
1. Adescriptionfile: _ _desription.pdf
2. AllScalascripts:

_ _task1_Betweenness.scala
_ _task1_Community.scala

3. AjarpackageforallScalafile: _ _hw4.jar
If you use Scala for all tasks, please make all *.scala file into ONLY ONE
_ _hw4.jarfileandstrictlyfollowtheclassnamementionedabove.
AndDONOTincludeanydataorunrelatedlibrariesintoyourjar.

4. IfyouusePython,thenallpythonscripts:
_ _task1_Betweenness.py
_ _task2_Community.py

5. Requiredresultfilesfortask1&2:
_ _Betweenness.txt
_ _Community.txt

GradingCriteria:

1. Ifyourprogramscannotrunwiththecommandsyouprovide,yoursubmissionwillbegraded
basedontheresultfilesyousubmit,andtherewillbean80%penalty

2. Ifthefilesgeneratedarenotsortedbasedonthespecifications,therewillbe20%penalty.
3. Ifyourprogramgeneratesmorethanonefile,therewillbe20%penalty.
4. ifruntimeofyourprogramexceedstheruntimerequirement,therewillbe20%penalty.
5. Ifyoudon’tprovidethesourcecode,especiallytheScalascripts,therewillbe20%penalty.
6. Youcanuseyourfree5-dayextension.

7. Therewillbe10%bonusifyouuseScalafortheentireassignment.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 程序代写代做代考 python Java hadoop algorithm Description
30 $