, , , , ,

[SOLVED] Si 671/721 homework 3 social network analysis

$25

File Name: Si_671_721__homework_3_social_network_analysis.zip
File Size: 433.32 KB

5/5 - (1 vote)

For the first two parts of this homework we will use the Amazon co-purchasing network
dataset Leskovec et al. (2007) to perform social network analysis. This dataset contains
various products’ networks including books, music CDs, DVDs, and VHS video tapes. It was
collected by crawling Amazon website in March, 2003 according to Customers Who Bought
This Item Also Bought on the Amazon website. So, if a product A is always co-purchased
with product B, the graph contains a directed edge from A to B.We recommend that you use Jupyter Notebooks and Python libraries (Numpy, Sci-kit learn,
Pandas, and NetworkX) for this homework.
The last part of this homework contains a novel peer-assessed exam question generation
problem!This homework is divided into three parts.
1. Exploratory Social Network Analysis.
2. Predicting Review Rating using features derived from Network Properties.
3. Generating a peer-assessed exam question.2.1 Part 1: Exploratory Social Network Analysis [30 Points]This part of the homework is designed to help you familiarize yourself with the dataset and
basic concepts of network analysis. The insights from this part of the homework will help
you in building the prediction models for Part 2 of the homework.1. Read NetworkX library documentation closely to understand the context and review
some code examples of network analyses. [0 points]
2. Read the document linked below to understand the basics of Social Network Analysis.
https://www.datacamp.com/community/tutorials/social-network-analysis-python
[0 points]3. Perform some basic network analyses and briefly explain each of your findings [30
points]:
(a) Load the directed network graph (G) from the file amazonNetwork.csv. [2 points]
(b) How many items are present in the network and how many co-purchases happened? [7 points]
(c) Compute the average shortest distance between the nodes in graph G. Explain
your results briefly. [7 points](d) Compute the transitivity and the average clustering coefficient of the network
graph G. Explain your findings briefly based on the definitions of clustering coefficient and transitivity. [7 points]
(e) Apply the PageRank algorithm to network G with damping value 0.5 and find the
10 nodes with the highest PageRank. Explain your findings briefly.
NetworkX document of the PageRank algorithm: https://networkx.github.
io/documentation/networkx-1.10/reference/generated/networkx.algorithms.
link_analysis.pagerank_alg.pagerank.html [7 points]The main deliverable for this part of the homework is 1) a step-by-step exploration of data in
your Jupyter Notebook. 2) a PDF document containing the answers to each of the questions
above. You should also describe your conclusions.2.2 Part 2: Predicting Review-Rating using Features derived from
network properties [50 Points]
For this part of the homework, you will build a machine learning model to predict the
review rating of the Amazon products on a scale of 0-5 using various network properties as
features.We provide you with the training dataset (reviewTrain.csv) which you should use judiciously to train your models. We also provide a test dataset reviewTest.csv where the
“match” label is missing.You need to extract at least 4 different features based on the network properties to train
your model. The error-metric that we will use for evaluating your match labels on the test
dataset is the mean absolute error (MAE). Some of the features that you can consider using
include:
• Clustering Coefficient
• Page Rank
• Degree centrality
• Closeness centrality
• Betweenness centrality
2
Some of the models that you can consider using include:
• Logistic Regression
• Support Vector Machine (SVM)
• Multi-layer perceptronThe main deliverable for this part of the homework is a step-by-step analysis of your feature
selection and extraction and model building exercise, describing clearly how you generated
features from your dataset and why you chose a specific feature over the other. Your Jupyter
notebook should contain the reproducible code for training various models as well as text
descriptions of your conclusions after each step.Your grade on this part of the homework will depend on the accuracy of your model on the
test dataset as well as your step-by-step description of how you arrived at your final model.
We will evaluate your model using mean absolute error (MAE).Here’s the description of files included with this homework.
1. amazonNetwork.csv: This file contains the data for Part 1 of the homework. It contains 10841 observations and 2 columns with the numbers representing product IDs.
Each node represents a product and each directed edge between two nodes represents
a co-purchase. The column fromNodeId contains the ID of the main purchasing item
and ToNodeId contains the ID of the co-purchased items.2. reviewTrain.csv: This file contains the training data for Part 2 of the homework.
It contains 1674 observations and 4 columns/features. The review column contains
ratings on a scale of 1-5.
3. reviewTest.csv: This file contains the test data for Part 2 of the homework. Please
insert your prediction results in the review column in the file.After receiving some great feedback from the students regarding the questions on the midterm
exam, we thought of having a “tiny” competition among the students to generate potential
midterm questions! Here are the details:
• You need to generate 1 question that can be a potential exam question for SI 671/721
based on the material that we have covered till 11/1/2022 (Streaming data).
• The question should be a multiple choice with 1 or more correct answers. In other
words, questions with descriptive answers are not allowed.• It can be a standalone question testing some course concepts, e.g., the midterm question
“Which of the following are frequent itemsets. . . ” OR it can be a composite question
with few sub-questions similar to the scenario-based questions on the midterm, e.g.,
“Planning the course paths for students. . . ”
• You also need to provide the correct answer for the question.• Your submitted questions will be evaluated anonymously by your fellow classmates!
Each student will be assigned 5 questions (from other students), and they will 1) rank
those questions from 1 to 5 in terms of quality, and 2) reply with yes/no regarding
whether the submitted question was correct in the first place.Here is how we will grade your submitted questions:
• Submitting the ranked list (and correctness) for the 5 questions assigned to you on
time. [5 points]
• Correctness of your own submitted question. [5 points] (0 points if your submitted
question/answer was incorrect as judged by your classmates)
• The remaining 10 points will be given based on the quality of your question. 1st rank=
10 points, 2nd rank =8 points, 3rd rank =6 points, 4th rank=4 points, 5th rank=2
points. For example, if my submitted question received 1 vote each for 1st, 2nd, 3rd,
4th, 5th ranks by the students, then I’d receive (10+8+6+4+2)/5= 6 points out of 10.
If my question got all 5th rank votes, then I’d receive (2+2+2+2+2)/5= 2 points out
of 10, and so on.Note that questions that involve asking arcane facts embedded in a footnote on one of the
slides might not be rated as high-quality by your peers!
So, get set to unleash your creativity!5 Submission
All submissions should be made electronically
Here are the main deliverable files:
• HTML version of your Jupyter notebook.(Only one HTML files should be submitted)
• The actual Jupyter notebook with “step-by-step analysis,” so that we could replicate
your results.
• PDF document containing Part1’s answer.
• File reviewTest.csv with your predicted ratings on a scale of 1-5 for Part 2 of the
homework. Keep all the columns in the file reviewTest.csv which we shared with
you, as they are. Just update the file with your predictions in the correct column.
• Submission details TBD for Question 4. (Most likely, the submission will be as an
anonymous submission to Canvas).

Shopping Cart
[SOLVED] Si 671/721 homework 3 social network analysis[SOLVED] Si 671/721 homework 3 social network analysis
$25