, , , , , ,

[SOLVED] Ece 232e summer 2025 project 4 graph algorithms

$25

File Name: Ece_232e_summer_2025_project_4_graph_algorithms.zip
File Size: 442.74 KB

5/5 - (1 vote)

Introduction
In this project we will explore graph theory theorems and algorithms, by applying them on
real data. In the first part of the project, we consider a particular graph modeling correlations
between stock price time series. In the second part, we analyse traffic data on a dataset provided
by Uber.
1. Stock Market
In this part of the project, we study data from stock market. The data is available on this
Dropbox Link. The goal of this part is to study correlation structures among fluctuation
patterns of stock prices using tools from graph theory. The intuition is that investors will
have similar strategies of investment for stocks that are effected by the same economic factors.
For example, the stocks belonging to the transportation sector may have different absolute
prices, but if for example fuel prices change or are expected to change significantly in the near
future, then you would expect the investors to buy or sell all stocks similarly and maximize
their returns. Towards that goal, we construct different graphs based on similarities among the
time series of returns on different stocks at different time scales (day vs a week). Then, we
study properties of such graphs. The data is obtained from Yahoo Finance website for 3 years.
You’re provided with a number of csv tables, each containing several fields: Date, Open, High,
Low, Close, Volume, and Adj Close price. The files are named according to Ticker Symbol
of each stock. You may find the market sector for each company in Name sector.csv. We
recommend doing this part of the project (Q1 – Q8) in R.
1. Return correlation
In this part of the project, we will compute the correlation among log-normalized stock-return
time series data. Before giving the expression for correlation, we introduce the following notation:
• pi(t) is the closing price of stock i at the t
th day
• qi(t) is the return of stock i over a period of [t − 1, t]
qi(t) = pi(t) − pi(t − 1)
pi(t − 1)
1
• ri(t) is the log-normalized return stock i over a period of [t − 1, t]
ri(t) = log(1 + qi(t))
Then with the above notation, we define the correlation between the log-normalized stock-return
time series data of stocks i and j as
ρij =
⟨ri(t)rj (t)⟩ − ⟨ri(t)⟩⟨rj (t)⟩
p
(⟨ri(t)
2⟩ − ⟨ri(t)⟩
2
)(⟨rj (t)
2⟩ − ⟨rj (t)⟩
2
)
where ⟨·⟩ is a temporal average on the investigated time regime (for our data set it is over 3
years).
QUESTION 1: What are upper and lower bounds on ρij? Provide a justification for using lognormalized return (ri(t)) instead of regular return (qi(t)).
2. Constructing correlation graphs
In this part,we construct a correlation graph using the correlation coefficient computed in the
previous section. The correlation graph has the stocks as the nodes and the edge weights are
given by the following expression
wij =
q
2(1 − ρij )
Compute the edge weights using the above expression and construct the correlation graph.
QUESTION 2: Plot a histogram showing the un-normalized distribution of edge weights.
3. Minimum spanning tree (MST)
In this part of the project, we will extract the MST of the correlation graph and interpret it.
QUESTION 3: Extract the MST of the correlation graph. Each stock can be categorized into
a sector, which can be found in Name sector.csv file. Plot the MST and color-code the nodes
based on sectors. Do you see any pattern in the MST? The structures that you find in MST are
called Vine clusters. Provide a detailed explanation about the pattern you observe.
QUESTION 4: Run a community detection algorithm (for example walktrap) on the MST obtained above. Plot the communities formed. Compute the homogeneity and completeness of the
clustering. (you can use the ’clevr’ library in r to compute homogeneity and completeness).
4. Sector clustering in MST’s
In this part, we want to predict the market sector of an unknown stock. We will explore two
methods for performing the task. In order to evaluate the performance of the methods we
define the following metric
α =
1
|V |
X
vi∈V
P(vi ∈ Si)
2
where Si
is the sector of node i. Define
P(vi ∈ Si) = |Qi
|
|Ni
|
where Qi
is the set of neighbors of node i that belong to the same sector as node i and Ni
is
the set of neighbors of node i. Compare α with the case where
P(vi ∈ Si) = |Si
|
|V |
QUESTION 5: Report the value of α for the above two cases and provide an interpretation for
the difference.
5. Correlation graphs for weekly data
In the previous parts, we constructed the correlation graph based on daily data. In this part
of the project, we will construct a correlation graph based on WEEKLY data. To create the
graph, sample the stock data weekly on Mondays and then calculate ρij using the sampled
data. If there is a holiday on a Monday, we ignore that week. Create the correlation graph
based on weekly data.
QUESTION 6: Repeat questions 2,3,4,5 on the WEEKLY data.
6. Correlation graphs for MONTHLY data
In this part of the project, we will construct a correlation graph based on MONTHLY data.
To create the graph, sample the stock data Monthly on 15th and then calculate ρij using the
sampled data. If there is a holiday on the 15th, we ignore that month. Create the correlation
graph based on MONTHLY data.
QUESTION 7: Repeat questions 2,3,4,5 on the MONTHLY data.
QUESTION 8: Compare and analyze all the results of daily data vs weekly data vs monthly data.
What trends do you find? What changes? What remains similar? Give reason for your observations.
Which granularity gives the best results when predicting the sector of an unknown stock and why?
2. Let’s Help Santa!
Companies like Google and Uber have a vast amount of statistics about transportation dynamics. Santa has decided to use network theory to facilitate his gift delivery for the next
Christmas. When we learned about his decision, we designed this part of the project to help
him. We will send him your results for this part!
1. Download the Data
Normally we use the last winter data but this year the latest available data is Winter 2019. Go
to “Uber Movement” website and download data of Travel Times by Month (All Days),
3
2019 Quarter 4, for Los Angeles area1
. The dataset contains pairwise traveling time statistics
between most pairs of points in the Los Angeles area. Points on the map are represented by
unique IDs. To understand the correspondence between map IDs and areas, download Geo
Boundaries file from the same website2
. This file contains latitudes and longitudes of the
corners of the polygons circumscribing each area. To be specific, if an area is represented by a
polygon with 5 corners, then you have a 5 × 2 matrix of the latitudes and longitudes, each row
of which represents latitude and longitude of one corner. We recommend doing this part
of the project (Q9 – Q18) in Python.
2. Build Your Graph
Read the dataset at hand, and build a graph in which nodes correspond to locations, and
undirected weighted edges correspond to the mean traveling times between each pair of locations
(only December). Add the centroid coordinates of each polygon region (a 2-D vector) as an
attribute to the corresponding vertex.
The graph will contain some isolated nodes (extra nodes existing in the Geo Boundaries JSON
file) and a few small connected components. Remove such nodes and just keep the largest
connected component of the graph. In addition, merge duplicate edges by averaging their
weights 3
. We will refer to this cleaned graph as G afterwards.
QUESTION 9: Report the number of nodes and edges in G.
3. Traveling Salesman Problem
QUESTION 10: Build a minimum spanning tree (MST) of graph G. Report the street addresses
near the two endpoints (the centroid locations) of a few edges. Are the results intuitive?
QUESTION 11: Determine what percentage of triangles in the graph (sets of 3 points on the
map) satisfy the triangle inequality. You do not need to inspect all triangles, you can just estimate
by random sampling of 1000 triangles.
Now, we want to find an approximation solution for the traveling salesman problem (TSP) on
G. Apply the 1-approximate algorithm described in the class4
. Inspect the sequence of street
addresses visited on the map and see if the results are intuitive.
QUESTION 12: Find an upper bound on the empirical performance of the approximate algorithm:
ρ =
Approximate TSP Cost
Optimal TSP Cost
QUESTION 13: Plot the trajectory that Santa has to travel!
1
If you download the dataset correctly, it should be named as
los angeles-censustracts-2019-4-All-MonthlyAggregate.csv
2The file should be named los angeles censustracts.json
3Duplicate edges may exist when the dataset provides you with the statistic of a road in both directions. We remove
duplicate edges for the sake of simplicity.
4You can find the algorithm in: Papadimitriou and Steiglitz, “Combinatorial optimization: algorithms and complexity”, Chapter 17, page 414
4
4. Analysing Traffic Flow
Next December, there is going to be a large group of visitors travelling between a location near
Malibu to a location near Long Beach. We would like to analyse the maximum traffic that can
flow between the two locations.
5. Estimate the Roads
We want to estimate the map of roads without using actual road datasets. Educate yourself
about Delaunay triangulation algorithm and then apply it to the nodes coordinates5
.
QUESTION 14: Plot the road mesh that you obtain and explain the result. Create a graph G∆
whose nodes are different locations and its edges are produced by triangulation.
6. Calculate Road Traffic Flows
QUESTION 15: Using simple math, calculate the traffic flow for each road in terms of cars/hour.
Report your derivation.
Hint: Consider the following assumptions:
• Each degree of latitude and longitude ≈ 69 miles
• Car length ≈ 5 m = 0.003 mile
• Cars maintain a safety distance of 2 seconds to the next car
• Each road has 2 lanes in each direction
Assuming no traffic jam, consider the calculated traffic flow as the max capacity of each road.
7. Calculate Max Flow
Consider the following locations in terms of latitude and longitude:
• Source coordinates (in Malibu): [34.04, -118.56]
• Destination coordinates (in Long Beach): [33.77, -118.18]
QUESTION 16: Calculate the maximum number of cars that can commute per hour from Malibu
to Long Beach. Also calculate the number of edge-disjoint paths between the two spots. Does the
number of edge-disjoint paths match what you see on your road map?
5You can use scipy.spatial.Delaunay in Python, or RTriangle package in R.
5
8. Prune Your Graph
In G∆, there are a number of unreal roads that could be removed. For instance, you might
notice some unreal links along the concavities of the beach, as well as in the hills of Topanga.
Apply a threshold on the travel time of the roads in G∆ to remove the fake edges. Call the
resulting graph G˜∆.
QUESTION 17: Plot G˜∆ on actual coordinates. Do you think the thresholding method worked?
QUESTION 18: Now, repeat question 13 for G˜∆ and report the results. Do you see any changes?
Why?
Submission
Please submit your report to Gradescope. In addition, please submit a zip file
containing your codes and report to BruinLearn. The zip file should be named
as “Project4 UID1 … UIDn.zip” where UIDx are student ID numbers of team
members. If you have any questions you can post on piazza.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Ece 232e summer 2025 project 4 graph algorithms[SOLVED] Ece 232e summer 2025 project 4 graph algorithms
$25