CSE2/CSE5ALG Summer 2016/17
Supplementary Assignment 1
Assessment: This assignment is a supplementary hurdle assignment. It has a pass or fail mark only. If you pass this assignment you will
Due Date: Monday 20th Mar 2017, 10 AM
Delays caused by computer downtime cannot be accepted as a valid reason for a late submission without penalty (unless the down time starts before and ends after the submission date, in which case there will be an extension only for the portion of the outage after the submission time). Students must plan their work to allow for both scheduled and unscheduled downtime. Penalties are applied to late assignments (accepted up to 5 days after the due date only, after 5 days you will get 0). See the departmental Student Handbook for details.
Copying, Plagiarism: Plagiarism is the submission of somebody elses work in a manner that gives the impression that the work is your own. The Department of Computer Science and Computer Engineering treats academic misconduct seriously. When it is detected, penalties are strictly imposed. Refer to the subject learning guide for further information and strategies you can use to avoid a charge of academic misconduct.
Submission Details: Submit all the Java/C++ files that are required for the tasks described in this handout. The code has to run under Unix on the cse2-2016.cs.latrobe.edu.au machine. Submit your files as an attached Zip folder to [email protected] with the subject ALG sup assignment
You can submit the same filename as many times as you like before the assignment deadline; the
Last submitted version of your assignment will be the only one that is assessed.
Testing Platform: While you are free to develop the code for this assignment on any operating system, your solution must run on the cse2-2016.cs.latrobe.edu.au server.
The Computation demand of this assignment may result in you being banned from the Latcs Servers if you test on the regular servers, please only use cse2-2016.cs.latrobe.edu.au.
We should be able to compile your classes with the simple command javac*.java or using a make file, and execute your programs with a simple command,
e.g. java supOne inputOne.txt inputTwo.txt outputOne.txt outputTwo.txt outputThree.txt.
Return of Assignment: Assignments will be returned within one week of the last day of submission date, so that you have your results prior to semester one censes date.
Assignment Objectives
To understand various data structures and searching and sorting techniques;
To analyze these techniques in relation to a particular problem;
To implement these techniques within a program.
Limitations
For this assignment, the only generic containers you are allowed to use are arraylists and vectors. All other data structures, sorting, and searching algorithms must be programmed by you.
Problem Description
A business has provided some basic anonymized data on friends from their social network database. You are required to read in this data and produce some statistics on the sizes of friendship circles based on some given criteria.
Assignment Requirements
You are to write a program that takes 5 command line arguments, two input file names and three output file names. The program needs to be named supOne and be able to be compiled and run using the following:
javac *.java
java supOne inputOne.txt inputTwo.txt outputOne.txt outputTwo.txt outputThree.txt
Note: the actual file names will be different
Sample input and output files are available from LMS.
Input file one contains a pair of user ids on each line separated by a comma.
ID, ID
ID, ID
ID, ID
Each line represents 2 users that a friends with each other. The relationship goes both ways if user one is friends with user two then user two is also friends with user one. The file may contain duplicate data, you only need track one relationship between a pair of users. I.e., there is no such thing as being double or triple friends, even if the pair of Ids shows up multiple times in the input file.
The first task is to read in this file and output to the first output file (command line argument 3) some basic information about the input data as follows:
The total number of records in the file = ###
The minimum number of friends = ###
The average number of friends = ###.###
The maximum number of friends = ###
The total number of unique Ids = ###
Now we come to the second input file. This second file will contain two lines that will look as follows:
3495867, 4
10, 3
This second task involves investigating friendship circles based on degrees of separation.Degrees of separation are the number of friend relations it would take to get from one user to another. Direct friends have a separation of 1, friends of friends are a separation of 2, friends of friends of friends are a separation of 3, etc.
The first line of the second input file will contain a user ID and a number separated by a comma. Starting with this initial user ID, you are required to compile the complete list, and count the number, of unique user IDs to a degree of separation defined by the second number. This list is to be output to the second output file (the 4th command line argument) as follows (based on the sample above):
Initial user: 3495867
Degrees of separation: 4
Total number of unique IDs: ##
1 degrees of separation:
ID, ID, ID, ID,
2 degrees of separation:
ID, ID, ID, ID, ID, ID, ID, ID, ID, ID,
3 degrees of separation:
ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID,
N degrees of separation:
ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID, ID,
Note:
The initial user ID is included in the count,
And no user ID should appear twice. I.e., the initial user ID and any ID in the one of the degree of separation lists should not appear in any of the other lists.
The second line of the file contains 2 numbers. You are required to estimate the average size of friend circle (i.e., number of unique IDs/friends) based on a random sample size of x individuals and a friendship circle of y degrees of separation, where x and y are the first and second numbers on this second line of the second input file. So you will need to calculate the total number of unique IDs up to y degrees of separation (like you would have done in the previous task) but perform this calculation a total of x times and then take the average. The sample of users used to estimate the average should be random so the estimate will change slightly each time the program is run as long as the sample size is less than the total number of unique IDs. The result of this task should be output to a file by the name specified by the final command line argument. The output should be formatted as follows:
Average friend circle size: ###.####
Degrees of separation: ##
Number of users sampled: ###
Users in random sample:
ID, ID, ID, ID, ID, ID, ID, ID, ID,
Note: The initial user ID is included in the friend circle when calculating average friend circles sizes.
Provided Files
The .dat files
Each of this zip folders contains a large set of sample files, both input and output files.
For each data set there is only one inputOne file and only one example outputOne file. These are identified by only the data size prefix, eg smallExampleInputOne and smallExampleOutputOne.
There are many different examples of inputTwo Files and their associated outputTwo and outputThree files. These are identified by the file size prefix and an identifying number, eg smallExample-26-InputTwo, smallExample-26-OutputTwo, and smallExample-26-OutputThree.
You should use the example input files with different output files names to compare your output files to the ones supplied.
The .sh file
This is a bash script file that is similar to the one that I will use to evaluate your final submission. You assignment will be run only using these commands; no user input will be entered. If your assignment fails to run using this script you will receive 0 marks.
Marking (out of 100)
50 marks required to receive a pass for this hurdle.
Program compiles and runs using provided bash script. (5)
Program does not crash. (5)
Program accepts any legal path-file names, including relative paths, and does not append or alter these file names. (5)
Output matches assignment definition (exactly). (15)
The optimisation based mark of this will be done by allocating full marks if the operation took less than 1 second and linearly decreasing to 0 marks if it took more than 60 seconds.
The output is correct and the small data set is process in less than 1 to 60 seconds (15 to 0)
The output is correct and the medium dataset is process in less than 1 to 60 seconds (10 to 0)
The output is correct and the large dataset is process in less than 1 to 60 seconds (20 to 0)
The output is correct and the extra large dataset is process in less than 1 to 60 seconds (25 to 0)
Reviews
There are no reviews yet.