Setup
Create a conda environment:
conda create --name cs7280_env python=3.8 -y
Activate the environment:
conda activate cs7280_env
Dependencies
Install the necessary libraries for this assignment after activating your conda environment and navigating to the correct directory.
“` bash pip install -r requirements.txt
# Imports
python import copy import networkx as nx import numpy as np import random import scipy as sp
Please don’t add additional import statements and modify this cell
# Part 1 <font size=3> [20 pts] Getting Started with NetworkX (US cities network)
## 1.1 <font size=3> [6 pts] Creating graphs with certain properties
In the body of the function below, generate the indicated graphs, with each graph satisfying the properties given in the function's docstring.
python def graphswithcertain_props() -> (nx.Graph, nx.Graph, nx.Graph, nx.Graph, nx.DiGraph, nx.MultiGraph): “”” Behavior Generates graphs, each of a distinct type
Returns
simp networkx Graph 10 node graph, undirected, exactly 21 edges, no self loops, no multi-edges
cycle networkx Graph 10 node graph, undirected, entire graph is a single cycle
clique networkx Graph 10 node graph, undirected, in which the entire graph is a clique
star networkx Graph 10 node graph, undirected, is a star
direct networkX DiGraph 10 node graph, directed, exactly 9 edges and exactly 1 node of out-degree 0
non_simp networkX MultiGraph 10 node graph, undirected, exactly 21 edges with
exactly 10 of those edges being self loops and
exactly 11 edges between the same pair of distinct nodes a and b
"""
####IMPLEMENTATION STARTS HERE####
#These lines are placeholders
simp = nx.Graph()
cycle = nx.Graph()
clique = nx.Graph()
star = nx.Graph()
direct = nx.DiGraph()
non_simp = nx.MultiGraph()
return simp, cycle, clique, star, direct, non_simp
####IMPLEMENTATION ENDS HERE####
## 1.2 <font size=3> [4 pts] Relationship between algebraic features of a graph
In the body of the function below, compute the maximum degree among the nodes of the parameter graph `G`.
python def max_degree(G:nx.Graph) -> int: “”” Params G networkx Graph an arbitrary undirected graph
Return
max_deg int the maximum degree among all nodes of the graph G
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
max_deg = 7280
return max_deg
####IMPLEMENTATION ENDS HERE####
In the body of the function below, compute the average degree among the nodes of the parameter graph `G`.
python def avg_degree(G:nx.Graph) -> float: “””
Params G networkx Graph an arbitrary undirected graph
Return
avg_deg float the average degree among all nodes of the graph G
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
avg_deg = 7280.0
return avg_deg
####IMPLEMENTATION ENDS HERE####
In the body of the function below, compute the leading eigenvalue of adjacency matrix of the parameter graph `G` and then return that value.
You may recall from a course in linear algebra that the leading eigenvalue is the eigenvalue of greatest overall magnitude. For example, if a matrix $A$ has eigenvalues $2, -5, 3i,$ and $-4+5i$, these numbers have respective magnitudes of $2, 5, 3$, and $sqrt{41}approx 6.4$. Thus, the leading eigenvalue is $-4+5i$.
python def leading_eigenvalue(G:nx.Graph) -> float: “”” Params G networkx Graph an arbitrary undirected graph
Return
eig float the 'leading' eigenvalue of the adjacency matrix
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
eig = 7280.00
return eig
####IMPLEMENTATION ENDS HERE####
Let $G$ be a graph and let $lambda_1$ be the leading eigenvalue of the adjacency matrix of $G$.
Using the functions you've created above, investigate for various graphs how the following values compare:
* Maximum degree of $G$
* Avgerage degree of $G$
* $|lambda_1|$
These three values should satisfy a compound inequality $a leq b leq c$, where $a,b,c$ each uniquely take on one of the values above.
In the body of the function below, uncomment the return that aligns the values to what you have found in your investigations.
python def algebraic_comparison() -> (str, str, str): “””
Return ??? str the value that takes the place of ‘a’ in the text discussion above ??? str the value that takes the place of ‘b’ in the text discussion above ??? str the value that takes the place of ‘c’ in the text discussion above “””
####IMPLEMENTATION STARTS HERE####
max_deg = "maximum degree"
avg_deg = "average degree"
eig = "leading eigenvalue"
# return max_deg, avg_deg, eig
# return max_deg, eig, avg_deg
# return avg_deg, max_deg, eig
# return avg_deg, eig, max_deg
# return eig, max_deg, avg_deg
# return eig, avg_deg, max_deg
####IMPLEMENTATION ENDS HERE####
## 1.3 <font size=3> [2 pts] Loading graph data from a graphml file and checking it
For the rest of Part 1, we will consider a network of 128 cities in North America (mostly the US, along with a few in Canada), with city names starting with some letter (inclusively) between R and Y. Some examples of the city names included in this network are "Ravenna, OH", "Selma, AL, and "Wichita, KS".
Between each pair $(u,v)$ of distinct cities $u$ and $v$ there is an edge (listed either as edge $(u,v)$ or edge $(v,u)$). For the purpose of familiarizing yourself with terminology, note that this network is an example of *clique*. Each edge is assigned a `"weight"`, which represents the geographic distance (as the crow flies) between the cities that make up the end points of that edge.
In the body of the function below, use NetworkX's [`read_graphml`](https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.graphml.read_graphml.html) function to read the `cities_data.graphml` file located within the data folder. Note that the relative path to the file should be `"data/cities_data.graphml"`.
python def loadcitiesdata() -> nx.Graph: “”” Return G networkx Graph “””
####IMPLEMENTATION STARTS HERE####
# This is a placeholder, load the city data into the variable G
G = nx.Graph()
return G
####IMPLEMENTATION ENDS HERE####
**Sanity Check**
You can print the return graph of the function below and what should be shown is text of the form:
> Graph with 128 nodes and 8128 edges
python print(loadcitiesdata())
## 1.4 <font size=3> [3 pts] Weighted graphs
The "cities_data" graph is a weighted graph. That is, each edge in the graph is assigned a number, or *weight*. In NetworkX graph objects (including [`Graph`](https://networkx.org/documentation/stable/reference/classes/graph.html), [`DiGraph`](https://networkx.org/documentation/stable/reference/classes/digraph.html), and [`MultiGraph`](https://networkx.org/documentation/stable/reference/classes/multigraph.html) objects), edges can be assigned properties under any name and data type. These attributes can be accessed using the NetworkX function [`get_edge_attributes`](https://networkx.org/documentation/stable/reference/generated/networkx.classes.function.get_edge_attributes.html). Note that this returns a dictionary, with the keys being the edges (2-tuples) of the graph, and the values being the associated value assigned to that edge.
In the body of the function below, write logic so that the function returns the 'weight' value of the particular parameter edge `e` of the parameter graph `G`. If `e` is not an edge of `G`, then raise an exception `Exception("e is not an edge of G")`.
python def getedgeweight(G:nx.Graph, e) -> int: “”” Params G networkx Graph an arbitrary undirected graph e 2-tuple a 2-tuple (a,b), where a,b may be either integers or python strings
Return
e_weight int the 'weight' value assigned to the edge e
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
e_weight = 7280
return e_weight
####IMPLEMENTATION ENDS HERE####
In the body of the function below, write logic so that the function returns the 'weight' (or, in this context, *distance*) between the cities "Youngstown, OH" and "Winston-Salem, NC".
python def distfromYOHtoWSNC() -> int: “”” Return distance int returns the ‘weight’ (distance) between Youngstown, OH and Winston-Salem, NC “””
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
distance = 7280
return distance
####IMPLEMENTATION ENDS HERE####
In the body of the function below, compute the number of pairs of cities that are 50 miles or fewer apart.
python def numcitieswithin50() -> int: “”” Return numcities int number of pairs of cities that are connected by a road that is 50 or fewer miles long “””
# Do not modify the line below
G_cities = load_cities_data()
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
num_cities = 7280
return num_cities
####IMPLEMENTATION ENDS HERE####
## 1.5 <font size=3> [5 pts] Generating subgraphs according to certain conditions
Notationally, let $G$ be the graph corresponding to the `cities_data.graphml` data. Let $V = V(G)$ be the node set of $G$ and $E = E(G)$ be the edge set of $G$.
In the body of the function below:
1. determine all the cities that are (inclusively) within 100 miles of at least one of the cities in the list `city_list` (which is a list of city names corresponding to the node names is `cities_data.graphml`),
2. construct a new graph $S$ where the node set $V_S subseteq V$ is exactly the set of cities found in the step above, and the edge set $E_Ssubseteq E$ consists of those edges from $E$ that have endpoints both in $V_S$.
python def subgraphcitieswithin100of(citylist:list) -> nx.Graph: “”” Params G: NetworkX graph object citylist: list of strings (names of cities in G)
Output
S: NetworX graph object (subgraph of G that only contains edges between cities in “city_list” and directly neighboring cities that are less than 100 miles away)
"""
# Do not modify the line below
G_cities = load_cities_data()
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
S = nx.Graph()
return S
####IMPLEMENTATION ENDS HERE####
**Sanity Check**
Running the cell below should yield a graph with 7 nodes and 7 edges.
python print(subgraphcitieswithin100of([“Toledo, OH”, “Stockton, CA”, “San Francisco, CA”]))
# Part 2 <font size=3> [20 pts] Walks and Paths (Les Miserables character network)
In Part 2, we will consider the undirected, weighted network of character co-appearances in Victor Hugo’s novel "Les Miserables'' using “lesmis_data.gml”. The nodes represent characters (as indicated by the labels) and the edges connect pairs of characters that haved appeared together in the same chapter at least once. The weights on the edges represent the number of times those two characters are seen together throughout the novel in total.
## 2.1 <font size=3> [2 pts] Loading the Data into a Graph
In the body of the function below, use NetworkX's [`read_gml`](https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.gml.read_gml.html) function to read the `lesmis_data.gml` file located within the data folder. Note that the relative path to the file should be `"data/lesmis_data.gml"`.
python def loadlesmisdata() -> nx.Graph: “”” Return G networkx Graph “””
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
# load the lesmis data as a nx.Graph object into the variable G
G = nx.Graph()
return G
####IMPLEMENTATION ENDS HERE####
## 2.2 <font size=3> [2 pts] Checking Connectedness
In the body of the function below, use NetworkX's [`is_connected`](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.components.is_connected.html) function to determine if the parameter graph `G` is connected or not.
python def graphisconnected(G:nx.Graph) -> bool: “”” Params G networkx Graph
Return
connected bool whether the graph is connected or not
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
connected = False
return connected
####IMPLEMENTATION ENDS HERE####
## 2.3 <font size=3> [6 pts] Analyzing a Network using Shortest Path Lengths
In the body of the function below, compute a return a dictionary where the keys are all the possible node pairs (regardless of whether there is an edge between them or not) of the parameter graph `G` and the value for each node pair is the unweighted length of the shortest path(s) between the two nodes.
python def shortestpathdict(G:nx.Graph) -> dict: “”” Params G networkx Graph
Return
shortest_path_length dict dictionary
key: node pair (a,b), where a,b are nodes of G
value: the shortest unweighted path length (int) from a to b
(fewest number of edges along any path
between a and b)
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
shortest_path_length = {}
return shortest_path_length
####IMPLEMENTATION ENDS HERE####
In the body of the function below, compute the average shortest path length among all the shortest path lengths in the parameter graph `G`.
python def averageshortestpath_length(G:nx.Graph) -> float: “”” Params G networkx Graph
Return
aspl float average shortest path length
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
aspl = 7280.0
return aspl
####IMPLEMENTATION ENDS HERE####
In the body of the function below, compute the maximum shortest path length among all the shortest path lengths in the parameter graph `G`.
python def maximumshortestpath_length(G:nx.Graph) -> int: “”” Params G networkx Graph
Return
mspl int maximum shortest path length
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
mspl = 7280
return mspl
####IMPLEMENTATION ENDS HERE####
## 2.4 <font size=3> [4 pts] Random walks (probability)
In the body of the function below, compute a dictionary based on the parameter graph `G` where the keys of the dictionary are the node pairs $(u,v)$ of `G` and the value assigned to node pair $(u,v)$ is the probability $p(u,v)$, defined as $$p(u,v) = frac{text{weight of edge between } u text{ and } v}{sumlimits_{w text{ is a neighbor of } u}^{} text{weight of edge between } u text{ and } w}$$
Notes:
* Accessing the proper edge attribute for this graph is done via the name `"value"` (as opposed to `"weight"`, as was used in Part 1).
* If there is no edge between $u$ and $v$, then consider the weight to be 0.
* One should not expect symmetry for $p$. That is, one should expect generally that $p(u,v)
eq p(v,u)$.
python def prob_dict(G:nx.Graph) -> dict: “”” Params G networkx Graph arbitrary networkx graph
Return
probs dict dictionary where
keys are pairs (a,b) of nodes a,b in G
values are the "value" (ie weight) of
the edge between a and b OR
0 if there is no such edge
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
probs = {}
return probs
####IMPLEMENTATION ENDS HERE####
**Sanity check**
Running the cell below should yield a value of about $frac{31}{158}approx 0.196$.
python print(probdict(loadlesmis_data())[(“Valjean”, “Cosette”)])
In the body of the function below, use the probability dictionary `prob_dict` to randomly choose a neighbor node from the given `source_node`.
python def getrandomnextnode(G:nx.Graph, sourcenode:str) -> str: “”” Params G networkx Graph arbitrary networkx graph source_node str node in G
Return
next_node str randomly chosen neighbor of source_node
chosen with probability related to edge strength
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
next_node = ""
return next_node
####IMPLEMENTATION ENDS HERE####
In the body of the function below, compute a single random walk within parameter graph `G` from the source node `source_node` of length `walk_length`. Use the `get_random_next_node` function from above to do so. Return the walk as a *tuple* of length `walk_length + 1` (in a walk traversing $m$ edges, there will be $m+1$ nodes).
python def randomwalk(G:nx.Graph, sourcenode:str, walklength:int) -> tuple: “”” Params G networkx Graph arbitrary networkx graph sourcenode str node in G walk_length int length (in edge distance) of the walk
Return
walk tuple tuple of nodes in G of length walk_length+1
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
walk = []
return tuple(walk)
####IMPLEMENTATION ENDS HERE####
## 2.5 <font size=3> [6 pts] Stationary distribution (linear algebra)
Following the portion of the Canvas lesson **L2: Random Walks** concerning eigenvalues and eigenvectors, let us consider the inherent ordering of the nodes of the lesmis network and $P$ as the transition matrix determined by the `prob_dict` function above applied to the lesmis network.
It follows that the stationary distribution $q$ of the lesmis network is a vector such that $(P^T)q = (1)q$, where $P^T$ is the transpose of $P$, $q$ is a $ntimes 1$ column vector, and $n$ is the number of nodes of the lesmis network. By the definition of eigenvector and eigenvalue, $q$ is an eigenvector of $P^T$ corresponding to the eigenvalue of 1.
python def getlesmisP() -> np.ndarray: “”” Return P numpy ndarray transition matrix corresponding to the les mis network “””
# Do not modify the following three lines
G_lesmis = load_lesmis_data()
G_lesmis_nodes = list(G_lesmis.nodes)
n = len(G_lesmis_nodes)
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
P = np.zeros(shape=(n,n))
return P
####IMPLEMENTATION ENDS HERE####
In the body of the function below, use NumPy's [`numpy.linalg.eig`](https://numpy.org/doc/2.2/reference/generated/numpy.linalg.eig.html) function to compute and return a right eigenvector of $P^T$ of the lesmis netwwork that is associated with the eigenvalue of 1. If an exact eigenvalue of 1 is not found, choose the eigenvalue closest to 1.
Make sure the returned eigenvector is normalized (that is has length 1). You may use [`numpy.linalg.norm`](https://numpy.org/doc/2.2/reference/generated/numpy.linalg.norm.html) for this. Also make sure that the majority of the entries are nonnegative.
python def getaneigenvectorof1(M:np.ndarray) -> np.array: “”” Params M numpy ndarray arbitrary 2-dimensional array
Return
evec_of_1 numpy array 1-dimensional array that is a normalized
eigenvector of M corresponding to an
eigenvalue of 1
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
evec_of_1 = np.zeros(shape=(M.shape[0],))
return evec_of_1
####IMPLEMENTATION ENDS HERE####
In the body of the function below, use NumPy's [`argmax`](https://numpy.org/doc/2.2/reference/generated/numpy.argmax.html) or [`argpartition`](https://numpy.org/doc/stable/reference/generated/numpy.argpartition.html) to find the top three nodes (ie, occuring les mis characters).
python def topthreelesmis_chars() -> (str, str, str): “”” Return firstEntry str one of the top 3 occuring characters secondEntry str one of the top 3 occuring characters thirdEntry str one of the top 3 occuring characters “””
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
top_3_chars = ["Micky", "Donald", "Goofy"]
return tuple(top_3_chars)
####IMPLEMENTATION ENDS HERE####
# Part 3 <font size=3> [20 pts] Connectedness in Directed Graphs (Biological neural network)
In Part 3, we will look at a directed biological network formed by biological neurons in the optic medulla (a portion of the eye) of a drosophila (a small fruit fly). The nodes of the network are neurons in the optic medulla and the (directed) edges are chemical synapses formed between one neuron and the next. Note the directed nature of the edges due to the bio-chemical structure of biological neurons.
## 3.1 <font size=3> [2 pts] Loading graph data from a graphml file and checking it
In the body of the function below, use NetworkX's [`read_graphml`](https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.graphml.read_graphml.html) function to read the `drosophila_medulla_data.graphml` file located within the data folder. Note that the relative path to the file should be `"data/drosophila_medulla_data.graphml"`.
python def loaddrosophilamedulla_data() -> nx.MultiDiGraph: “”” Return G networkx MultiDiGraph “””
####IMPLEMENTATION STARTS HERE####
# this is a placeholder, load the drosophila_medulla data into a graph, G
G = nx.MultiDiGraph()
return G
####IMPLEMENTATION ENDS HERE####
**Sanity Check**
You can print the return graph of the function below and what should be shown is text of the form:
> MultiDiGraph with 1781 nodes and 33641 edges
python print(loaddrosophilamedulla_data())
## 3.2 <font size=3> [6 pts] Weakly connected components
Using functions from the [Components](https://networkx.org/documentation/stable/reference/algorithms/component.html) suite of NetworkX, fill in logic in the body of the function below so that it:
1. finds the *weakly* connected components of this network,
2. stores the number of these components in the variable `wccs`
3. stores the component with the highest node count as a subnetwork in variable `LWCC`,
4. finds the ratio of nodes stored in `LWCC` versus the parameter network `G` and stores that value in variable `wpct`
python def weak_connectedness(G:nx.MultiDiGraph) -> (int, float, nx.Graph): “”” Params G networkx MultiDiGraph
Return
wccs int number of weakly connected components in the graph
wpct float percent of nodes in the graph that belong to the largest weakly connected component
LWCC networkx Graph largest weakly connected component
"""
####IMPLEMENTATION STARTS HERE####
# These lines are placeholders
wccs = 7280
wpct = 0.7280
LWCC = nx.Graph()
return wccs, wpct, LWCC
####IMPLEMENTATION ENDS HERE####
## 3.3 <font size=3> [6 pts] Strongly connected components
Using functions from the [Components](https://networkx.org/documentation/stable/reference/algorithms/component.html) suite of NetworkX, fill in logic in the body of the function below so that it:
1. finds the *strongly* connected components of this network,
2. stores the number of these components in the variable `sccs`
3. stores the component with the highest node count as a subnetwork in variable `LSCC`,
4. finds the ratio of nodes stored in `LSCC` versus the parameter network `G` and stores that value in variable `spct`
python def strong_connectedness(G:nx.MultiDiGraph) -> (int, float, nx.MultiDiGraph): “”” Params G networkx MultiDiGraph
Return
sccs int number of strongly connected components in the graph
spct float percent of nodes in the graph that belong to the largest strongly connected component
LSCC networkx MultiDiGraph largest strongly connected component of G
"""
####IMPLEMENTATION STARTS HERE####
# These lines are placeholders
sccs = 7280
spct = 0.7280
LSCC = nx.MultiDiGraph()
return sccs, spct, LSCC
####IMPLEMENTATION ENDS HERE####
## 3.4 <font size=3> [6 pts] Weak connectedness vs Strong connectedness
In the body of the function below, use the `weak_connectedness` and `strong_connectedness` frunctions to compute three ratios:
1. `ratio_size`. This ratio should be the size (number of nodes) of the largest strongly conneced component divided by the size of the largest weakly connected component,
2. `ratio_mspl`. This ratio should be the maximum shortest path length of the largest strongly connected component divided by the maximum shortest path length of the largest weakly connected component,
3. `ratio_aspl`. This ratio should be the average shortest path length of the largest strongly connected component divided by the average shortest path length of the largest weakly connected component,
python def ratiosstrongover_weak(G:nx.MultiDiGraph) -> (float, float, float): “”” Params G networkx MultiDiGraph
Return
ratio_size float explained above
ratio_mspl float explained above
ratio_aspl float explained above
"""
####IMPLEMENTATION STARTS HERE####
# These lines are placeholders
ratio_size = 0.0
ratio_mspl = 0.0
ratio_aspl = 0.0
return ratio_size, ratio_mspl, ratio_aspl
####IMPLEMENTATION ENDS HERE####
# Part 4 <font size=3> [20 pts] Topological Ordering (Programming languages network)
In Part 4, we will consider a influence network among programming languages. Each node in the network is a programming language. Each directed edge in the network signifies that the source programming language influenced the target programming language in some critical way.
The data for this network is contained in the `language_data.txt` file. Each line of this text file represents one edge. If a line reads `language_A language_B` (separated by exactly one space), then this means that `language_A` influenced `language_B`. In the influence network we will be using, there should be a directed edge from source node `language_A` to target node `language_B`. To confirm that it should be this way, observe that one of the lines in the `language_data.txt` file reads `c c++`. It is commonly known that C++ was developed as an extension to the C language - hence, C influenced the development of C++.
## 4.1 <font size=3> [2 pts] Loading graph data from a txt (edgelist) file
In the body of the function below, use NetworkX's [`read_edgelist`](https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.edgelist.read_edgelist.html) function to read the `language_data.txt` file located within the data folder. Note that the relative path to the file should be `"data/language_data.txt"`.
python def loadlanguagedata() -> nx.DiGraph: “”” Return G networkx DiGraph “””
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
G = nx.DiGraph()
return G
####IMPLEMENTATION ENDS HERE####
**Sanity Check**
You can print the return graph of the function below and what should be shown is text of the form:
> DiGraph with 361 nodes and 735 edges
python print(loadlanguagedata())
## 4.2 <font size=3> [2 pts] Directed acyclic graph (DAG)
In the body of the function below, use NetworkX's [`is_directed_acyclic_graph`](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.is_directed_acyclic_graph.html) function to determine whether an arbitrary (directed) parameter graph `G` is a directed acyclic graph (DAG). Return a boolean value representing if G is a DAG.
Notes:
* 'acyclic' simply means 'no cycles,'
* The function below should return a bool value based on the parameter graph `G`. The function is not explicitly tied to the language network.
python def isgraphdag(G:nx.DiGraph) -> bool: “”” Params G networkx DiGraph arbitrary directed graph
Return
is_dag bool True if parameter graph G is directed and acyclic
False otherwise
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
is_dag = None
return is_dag
####IMPLEMENTATION ENDS HERE####
## 4.3 <font size=3> [6 pts] Topological generations of a DAG
In the body of the function below,
1. Check that the parameter graph `G` is a DAG. If not, return two empty dictionaries. Otherwise, continue to step 2.
2. Using the NetworkX function [`topological_generations`](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.topological_generations.html) compute a dictionary, where the keys are integers representing the topological layer, and each value is the list of nodes found in that topological layer. For example, in the 0 layer should be all the so-called *start nodes* that have 0 in-degree.
python def topgendicts(G:nx.DiGraph) -> (dict, dict): “”” Params G networkx DiGraph arbitrary directed graph
Return
top_gen dict dictionary where the
keys are integers 0, 1, 2...
representing topological layers
values are lists of nodes that are
in that topological layer
node_gen dict dictionary where the
keys are nodes of G
values are the topological layer
integer the node is in
"""
####IMPLEMENTATION STARTS HERE####
#These are placeholders
top_gen = {}
node_gen = {}
return top_gen, node_gen
####IMPLEMENTATION ENDS HERE####
## 4.4 <font size=3> [4 pts] Topological ordering of nodes in a DAG
In the body of the function below,
1. Check that the parameter graph `G` is a DAG. If not, return an empty list. Otherwise, continue to step 2.
2. Using your function `top_gen_dicts` above, fill in the body of the function below with a computation that will return a list of nodes of the parameter graph `G` that are a topological ordering. In order to receive full credit for this subpart, the function below must call `top_gen_dicts` on the parameter graph and present an ordering that exactly matches the natural ordering provided by that dictionary. NOTE: You do not need to sort the lists in `top_gen` from `top_gen_dicts`. The order of nodes in each list comes directly from `topological_generations`, which already respects the DAG’s dependencies and provides a valid topological order. Sorting would violate this natural ordering.
python def atopologicalordering(G:nx.DiGraph) -> list: “”” Params G networkx DiGraph arbitrary directed graph
Return
top_order list a topological ordering of the nodes of G,
must be based on `topological_generations_dict`
"""
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
top_order = []
return top_order
####IMPLEMENTATION ENDS HERE####
## 4.5 <font size=3> [6 pts] Start nodes and highest total influence
In the body of the function below,
1. Check that the parameter graph `G` is a DAG. If not, return an empty dictionary and empty string. Otherwise, continue to steps 2 and 3.
2. Using your function `top_gen_dicts` above, for each node in the 0-layer topological layer (these nodes are called *start nodes*), find the total influence of that node in the graph and store it in a dictionary, where the key is the start node name and the value is an integer representing its total influence. The total influence of a start node is equal to the number of descendants the node has at *any distance* from it. For example:
* C influenced C++,
* C++ influenced C#, and
* C# influenced Rust,
* so we would consider C to have influenced Rust.
3. Return the name of the node with the highest influence.
python def start_nodes(G:nx.DiGraph) -> (dict, str): “”” Params G networkx DiGraph arbitrary directed graph
Return
start_node_dict dict dictionary
start_node_highest_inf str
"""
####IMPLEMENTATION STARTS HERE####
# These are placeholders
start_node_dict = {}
start_node_highest_inf = ""
return start_node_dict, start_node_highest_inf
####IMPLEMENTATION ENDS HERE####
# Part 5 <font size=3> [20 pts] Bipartite Graphs and Projections (Github network)
The final dataset is a bipartite network based on the association of github users and projects. The `language_data.txt` file contains lines that specify associations between a user on the left and a project on the right. The set of user nodes make up a collective "left side" of the network and the set of project nodes make up a collective "right side" of the network, with every edge having a user as one end point and a project as the other.
## 5.1 <font size=3> [2 pts] Checking if a graph is bipartite
In the body of the function below, use NetworkX's [`is_bipartite`](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.bipartite.basic.is_bipartite.html#networkx.algorithms.bipartite.basic.is_bipartite) function to check whether the parameter graph `G` is bipartite or not.
python def isgraphbipartite(G:nx.Graph) -> bool: “”” Params G networkx Graph arbitrary graph Return graphisbp bool boolean determined by whether parameter graph G is bipartite “””
####IMPLEMENTATION STARTS HERE####
# This is a placeholder
graph_is_bp = None
return graph_is_bp
####IMPLEMENTATION ENDS HERE####
## 5.2 <font size=3> [6 pts] Loading graph data from a txt (edgelist) file
In the body of the function below,
1. use NetworkX's [`read_edgelist`](https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.edgelist.read_edgelist.html) function to read the `github_data.txt` file located within the data folder. Note that the relative path to the file should be `"data/github_data.txt"`,
2. store all user nodes in the `user_list` variable, and
3. store all project nodes in the `project_list` variable.
python def loadgithubdata() -> (nx.Graph, list, list): “”” Return G networkx Graph arbitrary graph userlist list list of user nodes projectlist list list of project nodes “””
####IMPLEMENTATION STARTS HERE####
G = nx.Graph()
user_list = []
project_list = []
return G, user_list, project_list
####IMPLEMENTATION ENDS HERE####
**Sanity Check**
You can print the return graph of the function below and what should be shown is text of the form:
> Graph with 177386 nodes and 440237 edges
> 56519
> 120867
python print(loadgithubdata()[0]) print(len(loadgithubdata()[1])) print(len(loadgithubdata()[2]))
## 5.3 <font size=3> [2 pts] Biadjacency matrix of a bipartite graph
In the body of the function below:
1. Check that all the conditions below are met. If at least one of them is not met, return a `None` value. Otherwise continue on to step 2:
* The parameter graph `G` is bipartite,
* The `row_nodes` argument is nonempty and each element is a node of `G`,
* The `column_nodes` argument is nonempty and each element is a node of `G`.
2. Using NetworkX's [`biadjacency_matrix`](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.bipartite.matrix.biadjacency_matrix.html) compute and return the biadjacency matrix of the parameter graph `G`. Use NetworkX's [`tolil`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_array.tolil.html#scipy.sparse.csr_array.tolil) method to convert the biadjacency matrix to the `sp.sparse._arrays.lil_array` format.
python def getbiadjacencymatrix(G:nx.Graph=nx.Graph(), rownodes:list=[], columnnodes:list=[]) -> sp.sparse.arrays.lilarray: “”” Params G networkx Graph an arbitrary NetworkX graph rownodes list a list of the “left side” nodes columnnodes list a list of the “right side” nodes
Return
B sp.sparse._arrays.lil_array the biadjacency matrix of the graph
(in lil_array format)
"""
####IMPLEMENTATION STARTS HERE####
# These are placeholder lines
B_test = sp.sparse.lil_array((2,3))
for i in range(B_test.shape[0]):
for j in range(B_test.shape[1]):
B_test[i,j] = i + j
return B
####IMPLEMENTATION ENDS HERE####
## 5.4 <font size=3> [4 pts] Matrix products
For a general $mtimes n$ matrix $B$, we *cannot* matrix-multiply $B$ by itself if $m
eq n$.
However, note that we *can always* matrix-multiply $B$ by $B^T$ since $B$ is a $mtimes n$ matrix and $B^T$ is a $n times m$ matrix (since the "inside dimensions" match), giving a resultant $mtimes m$ matrix $BB^T$. Similarly, we can matrix-multiply $B^T$ by $B$ to get a $ntimes n$ matrix $B^TB$.
In the body of the function below, use the [`transpose`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.transpose.html) method to transpose a scipy sparse array and the [`@`] symbol to perform matrix multiplication between two scipy sparse arrays. More information about scipy sparse arrays can be found [here](https://docs.scipy.org/doc/scipy/reference/sparse.html).
python def sparsearrayproducts(B:sp.sparse.arrays.lilarray) -> (sp.sparse.arrays.lilarray, sp.sparse.arrays.lilarray): “”” Params B sp.sparse.arrays.lilarray
Return
prod_BBT sp.sparse._arrays.lil_array
prod_BTB sp.sparse._arrays.lil_array
"""
####IMPLEMENTATION STARTS HERE####
# These are placeholder lines
B_test = sp.sparse.lil_array((2,3))
for i in range(B_test.shape[0]):
for j in range(B_test.shape[1]):
B_test[i,j] = i + j
prod_BBT = copy.deepcopy(B_test)
prod_BTB = copy.deepcopy(B_test)
return prod_BBT, prod_BTB
####IMPLEMENTATION ENDS HERE####
## 5.5 <font size=3> [6 pts] One mode projections and greatest commonality
The biadjacency matrix $B$ of the gibhub network is a $mtimes n$ matrix, where $m$ is the number of users in the network and $n$ is the number of projects in the network. That is, the rows represent the users and the columns represent the projects. Alternatively, the matrix $B^T$ is a $ntimes m$ matrix where the rows represent the projects and the columns represent the users.
The matrix-product $BB^T$ is a $mtimes m$ matrix that is the adjacency (note, *not* *bi*-adjacency) matrix of a graph where the nodes consist only of users. This graph is called the one-mode projection on users. Alternatively, the matrix-product $B^TB$ is a $ntimes n$ matrix that is the adjacency matrix of a graph where the nodes consist only of projects. This graph is called the one-mode project on projects.
Another way to look at the product $BB^T$ is that it is the number of walks of length 2 from one of the user nodes back to a user node. In the case of the github network, which is bipartite, the only way to have a walk of length 2 from a user node back to a user node is for the walk to pass through exactly one project. Thus, the $(i,j)$-entry of $BB^T$ is the number of projects that user i shares with user j.
In the body of the function below, use the information above to compute the pair of users who share the greatest number of projects and the pair of projects that share the greatest number of users.
python def greatestgithubcommonality() -> (tuple, tuple): “”” Return userpair 2-tuple pair of users that share the most projects projectpair 2-tuple pair of projects that share the most users “””
####IMPLEMENTATION STARTS HERE####
# These are placeholders
G_github, user_list, project_list = load_github_data()
user_pair = ("Yo", "wassup")
project_pair = ("Hello", "world")
return user_pair, project_pair
####IMPLEMENTATION ENDS HERE####
“`
Reviews
There are no reviews yet.