[SOLVED] C data structure algorithm graph network COMP2521: Assignment 2 Social Network Analysis

$25

File Name: C_data_structure_algorithm_graph_network_COMP2521:_Assignment_2_Social_Network_Analysis.zip
File Size: 819.54 KB

5/5 - (1 vote)

COMP2521: Assignment 2 Social Network Analysis
A notice on the class web page will be posted after each major revision. Please check the class notice board and this assignment page frequently for Change Log. The specification may change.
FAQ:
You should check Ass2 FAQ, it may offer answers to your queries! Change log:
No entries as yet!
Objectives
to implement graph based data analysis functions ADTs to mine a given social network. to give you further practice with C and data structures Graph ADT
Admin
Marks
Individual Assignment
Due
Late Penalty
Submit
20 marks scaled to 14 marks towards total course mark This assignment is an individual assignment. 08:00pm Friday 22 November 2019
2 marks per day off the ceiling.
Last day to submit this assignment is 8pm Monday 25 November 2019, of course with late penalty. TBA
Aim
In this assignment, your task is to implement graph based data analysis functions ADTs to mine a given social network. For example, detect say influenciers, followers, communities, etc. in a given social network. You should start by reading the Wikipedia entries on these topics. Later I will also discuss these topics in the lecture.
Social network analysis Centrality
The main focus of this assignment is to calculate measures that could identify say influenciers, followers, etc., and also discover possible communities in a given social network.
Dos and Donts !
Please note that,
For this assignmet you can use source code that is available as part of the course material lectures, exercises, tutes and labs. However, you must properly acknowledge it in your solution.
All the required code for each part must be in the respective .c file.
You may implement additional helper functions in your files, please declare them as static functions.
After implementing Dijkstra.h, you can use this ADT for other tasks in the assignment. However, please note that for our testing, we will usesupply our implementation of Dijkstra.h. So your programs MUST NOT use any implementation related information that is not available in the respective header files .h files. In other words, you can only use information available in the corresponding .h files.
Your program must not have any main function in any of the submitted files.
Do not submit any other files. For example, you do not need to submit your modified test files or .h files. If you have not implemented any part, must still submit an empty file with the corresponding file name.

.
Provided Files
We are providing implementations of Graph.h and PQ.h . You can use them to implement all three parts. However, your programs MUST NOT use any implementation related information that is not available in the respective header files .h files. In other words, you can only use information available in the corresponding .h files.
Also note:
all edge weights will be greater than zero.
we will not be testing reflexive andor selfloop edges.
we will not be testing the case where the same edge is inserted twice.
Download files:
Ass2files.zip Ass2Testing.zip
Part1: Dijkstras algorithm
In order to discover say influencers, we need to repeatedly find shortest paths between all pairs of nodes. In this section, you need to implement Dijkstras algorithm to discover shortest paths from a given source to all other nodes in the graph. The function offers one important additional feature, the function keeps track of multiple predecessors for a node on shortest paths from the source, if they exist. In the following example, while discovering shortest paths from source node 0, we discovered that there are two possible shortests paths from node 0 to node 1 01 OR 021, so node 1 has two possible predecessors node 0 or node 2 on possible shortest paths, as shown below.
We will discuss this point in detail in a lecture. The basic idea is, the array of lists pred keeps one linked list per node, and stores multiple predecessors if they exist for that node on shortest paths from a given source. In other words, for a given source, each linked list in pred offers possible predecessors for the corresponding node.
Node 0 Distance
0:X 1:2 2:1
The function returns ShortestPaths structure with the required information i.e. distance array, predecessor arrays, source and noofnodes in the graph
Your task: In this section, you need to implement the following file: Dijkstra.c that implements all the functions defined in Dijkstra.h.
Preds 0 : 1 : 2 :
NULL 02NULL 0NULL
Node 1 Distance
0:2 1:X 2:3
Preds 0 : 1 : 2 :
1NULL NULL 0NULL
Node 2 Distance
0:3 1:1 2:X
Preds 0 : 1 : 2 :
1NULL 2NULL NULL
Part2: Centrality Measures for Social Network Analysis

Centrality measures play very important role in analysing a social network. For example, nodes with higher betweenness measure often correspond to influencers in the given social network. In this part you will implement two well known centrality measures for a given directed weighted graph.
Descriptions of some of the following items are from Wikipedia at Centrality, adapted for this assignment. Closeness Centrality
Closeness centrality or closeness of a node is calculated as the sum of the length of the shortest paths between the node x and all other nodes yVyx in the graph. Generally closeness is defined as below,
Cx1 . !y dy, x
where dy, x is the shortest distance between vertices x and y.
However, considering most likely we will have isolated nodes, for this assignment you need to use Wasserman and Faust formula to
calculate closeness of a node in a directed graph as described below:
denote the number of nodes in the graph.
For further explanations, please read the following document, it may answer many of your questions!
Explanations for Part2
Based on the above, the more central a node is, the closer it is to all other nodes. For for information, see Wikipedia entry on Closeness
centrality.
Betweenness Centrality
The betweenness centrality of a node v is given by the expression:
!st
where !st is the total number of shortest paths from node s to node t and !stv is the number of those paths that pass through v.
For this assignment, use the following approach to calculate normalised betweenness centrality. It is easier! and also avoids zero as denominator for n2.
normalgv1gv n1n2
where, n represents the number of nodes in the graph.
For further explanations, please read the following document, it may answer many of your questions!
Explanations for Part2
Your task: In this section, you need to implement the following file:
CentralityMeasures.c that implements all the functions defined in CentralityMeasures.h.
CWFu n1n1 N1 !N1du,v
.
where du, v is the shortestpath distance in a directed graph from vertex u to v, n is the number of nodes that u can reach, and N
v0
gv! !stv
svt
For more information, see Wikipedia entry on Betweenness centrality

Part3: Discovering Community
In this part you need to implement the Hierarchical Agglomerative Clustering HAC algorithm to discover communities in a given graph. In particular, you need to implement LanceWilliams algorithm, as described below. In the lecture we will discuss how this algorithm works, and what you need to do to implement it. You may find the following documentvideo useful for this part:
Hierarchical Clustering Wikipedia, for this assignment we are interested in only agglomerative approach. Brief overview of algorithms for hierarchical clustering, including LanceWilliams approach pdf file. Three videos by Victor Lavrenko, watch in sequence!
Agglomerative Clustering: how it works
Hierarchical Clustering 3: singlelink vs. completelink Hierarchical Clustering 4: the LanceWilliams algorithm
Distance measure: For this assignment, we calculate distance between a pair of vertices as follow: Let wt represents maximum edge weight of all available weighted edges between a pair of vertices v and w. Distance d between vertices v and w is defined as d1wt . If v and w are not connected, d is infinite.
For example, if there is one directed link between v and w with weight wt, the distance between them is 1wt. If there are two links, between v and w, we take maximum of the two weights and the distance between them is 1maxwtvw, wtwv. Please note that, one can also consider alternative approaches, like take average, min, etc. However, we need to pick one approach for this assignment and we will use the above distance measure.
You need to use the following adapted LanceWilliams HAC Algorithm to derive a dendrogram:
Calculate distances between each pair of vertices as described above.
Create clusters for every vertex i, say ci.
Let Distci , cjrepresents the distance between cluster ci and cj , initially it represents distance between vertex i and j . For k1 to N1
Find two closest clusters, say ci and cj. If there are multiple alternatives, you can select any one of the pairs of closest clusters.
Remove clusters ci and cj from the collection of clusters and add a new cluster cij with all vertices in ci and cj to the collection of clusters.
Update dendrogram.
Update distances, say Distcij, ck, between the newly added cluster cij and the rest of the clusters ck in the collection using LanceWilliams formula using the selected method Single linkage or Complete linkagesee below.
End For
Return dendrogram
LanceWilliams formula:
Distcij, ckiDistci, ckjDistcj, ckDistci, cj absDistci, ckDistcj, ck where i, j, , and define the agglomerative criterion.
For the Single link method, these values are: i12, j12, 0, and12. Using these values, the formula for Single link method is:
Distcij, ck12Distci, ck12Distcj, ck12absDistci, ckDistcj, ck We can simplify the above and rewrite the formula for Single link method as below
Distcij, ckminDistci, ck, Distcj, ck
For the Complete link method, the values are: i12, j12, 0, and12. Using these values, the formula for Complete link method is:
Distcij, ck12Distci, ck12Distcj, ck12absDistci, ckDistcj, ck We can simplify the above and rewrite the formula for Complete link method as below
,,, ,

Please see the following simple example, it may answer many of your questions!
Part3 Simple Example MS Excel file
Your task: In this section, you need to implement the following file:
LanceWilliamsHAC.c that implements all the functions defined in LanceWilliamsHAC.h. Assessment Criteria
Part1: Dijkstras algorithm 20 marks Part2:
Closeness Centrality 22 marks,
Betweenness Centrality 23 marks Part3: Discovering Community 15 marks Style, Comments and Complexity: 20
Testing
Please note that testing an API implementation is very important and crucial part of designing and implementing an API. We offer the following testing interfaces for all the APIs you need to implement for you to get started, however note that they only test basic cases. Importantly,
you need to add more advanced test cases and properly test your API implementations,
the automarking program will use more advanced test cases that are not included in the test cases provided to you.
Instructions on how to test your API implementations are available on the following page:
Testing your API Implementations
Submission
You need to submit the following five files:
Dijkstra.c CentralityMeasures.c LanceWilliamsHAC.c
Submission instructions on how to submit the above five files will be available later. Plagiarism
This is an individual assignment. Each student will have to develop their own solution without help from other people. You are not permitted to exchange code or pseudocode. If you have questions about the assignment, ask your tutor. All work submitted for assessment must be entirely your own work. We regard unacknowledged copying of material, in whole or part, as an extremely serious offence. For further information, read the Course Outline.
end
Distcij, ckmaxDistci, ck, Distcj, ck

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] C data structure algorithm graph network COMP2521: Assignment 2 Social Network Analysis
$25