[SOLVED] CS crawler algorithm COMP3027: Algorithm Design

$25

File Name: CS_crawler_algorithm_COMP3027:_Algorithm_Design.zip
File Size: 442.74 KB

5/5 - (1 vote)

COMP3027: Algorithm Design
Lecture 2: Directed Graphs
William Umboh
School of Computer Science
The University of Sydney
1

Connectivity in Directed Graphs
The University of Sydney 2

Directed Graphs
Directed graph. G = (V, E)
Edge (u, v) goes from node u to node v.

Example. Web graph hyperlink points from one web page to another.
Directedness of graph is crucial.
Modern web search engines exploit hyperlink structure to rank web pages by importance.

The University of Sydney 3

Graph Search
Directed reachability. Given a node s, find all nodes reachable from s.
Directed s-t shortest path problem. Given two node s and t, what is the length of the shortest path between s and t?
Graph search. BFS and DFS extend naturally to directed graphs. Web crawler. Start from web page s. Find all web pages linked
from s, either directly or indirectly.
The University of Sydney 4

The University of Sydney 5

Strong Connectivity
Definition: Node u and v are mutually reachable if there is a
path from u to v and also a path from v to u.
Definition: A graph is strongly connected if every pair of nodes is mutually reachable.
Examples:
The University of Sydney
6
strongly connected not strongly connected

Strong Connectivity
Definition: Node u and v are mutually reachable if there is a
path from u to v and also a path from v to u.
Definition: A graph is strongly connected if every pair of nodes is mutually reachable.
Lemma: Let s be any node. G is strongly connected iff every node is reachable from s, and s is reachable from every node.
Proof: () Follows from definition.
() Path from u to v: concatenate u-s path with s-v path. Path from v to u: concatenate v-s path with s-u path.

s
u
The University of Sydney
v
7

Strong Connectivity: Algorithm
Theorem: Can determine if G is strongly connected in O(m + n)
time.
Proof:
Pick any node s.
RunBFSfromsinG.
Run BFS from s in Grev.
Return true iff all nodes reached in both BFS executions. Correctness follows immediately from previous lemma.
reverse orientation of every edge in G
The University of Sydney 8

Strong Connectivity
Consider a graph G and let S1 and S2 be two strongly connected components in G of maximal size. Are S1 and S2 disjoint?
Can we compute all the strongly connected components of a graph G efficiently?
Can be done in O(m + n) time [Kosaraju 78].
The University of Sydney 9

Strong Connectivity
Algorithm by Kosaraju 1978 (unpublished) STRONGLY-CONNECTED-COMPONENTS (G)
1. Call DFS(G) to compute finishing times f[u] for all u.
2. Compute Grev
3. Call DFS(Grev), but in the main loop, consider vertices in order
of decreasing f[u] (as computed in first DFS) 4. Output the vertices in each tree of the depth-first forest
formed in the second DFS as a separate strongly connected component.
The University of Sydney 10

Strong Connectivity
Algorithm by Kosaraju 1978 (unpublished) STRONGLY-CONNECTED-COMPONENTS (G)
1. Call DFS(G) to compute finishing times f[u] for all u.
2. Compute Grev
3. Call DFS(Grev), but in the main loop, consider vertices in order
of decreasing f[u] (as computed in first DFS) 4. Output the vertices in each tree of the depth-first forest
formed in the second DFS as a separate strongly connected component.
Running time: O(n+m)
The University of Sydney
10

Strong Connectivity
Algorithm by Kosaraju 1978 (unpublished) STRONGLY-CONNECTED-COMPONENTS (G)
1. Call DFS(G) to compute finishing times f[u] for all u.
2. Compute Grev
3. Call DFS(Grev), but in the main loop, consider vertices in order
of decreasing f[u] (as computed in first DFS) 4. Output the vertices in each tree of the depth-first forest
formed in the second DFS as a separate strongly connected component.
Running time: O(n+m) Correctness?
The University of Sydney
10

DAGs and Topological Ordering
The University of Sydney 11

Directed Acyclic Graphs (DAGs)
Definition: A DAG is a directed graph that contains no directed
cycles.
Ex. Precedence constraints: edge (vi, vj) means vi must precede vj.
Definition: A topological order of a directed graph G = (V, E) is an ordering of its nodes as v1, v2, , vn so that for every directed
edge (vi, vj) we have i < j. v2 v3v6 v5 v4v1 v2 v3 v4 v5 v6 v7 v7v1a topological ordering The University of Sydneya DAG12 Precedence ConstraintsPrecedence constraints. Edge (vi, vj) means task vi must occur before vj.Applications.Course prerequisite graph: course vi must be taken before vj. Compilation: module vi must be compiled before vj.Pipeline of computing jobs: output of job vi needed to determine input of job vj.The University of Sydney 13 Directed Acyclic GraphsTheorem: G has a topological order if and only if G is a DAG.The University of Sydney 14 Directed Acyclic GraphsTheorem: G has a topological order if and only if G is a DAG.Question: Given a DAG, how do we compute a topological order?The University of Sydney 14 Directed Acyclic GraphsLemma: If G has a topological order then G is a DAG.Proof: (by contradiction)Suppose that G has a topological order v1, …, vn and that G also has a directed cycle C. Let’s see what happens.Let vi be the lowest-indexed node in C, and let vj be the node just before vi in C; thus (vj, vi) is an edge. Byourchoiceofi,wehavei 1 nodes, find a node v with no incoming edges.
G { v } is a DAG, since deleting v cannot create cycles.
By inductive hypothesis, G { v } has a topological ordering.
Place v first in topological ordering; then append nodes of G { v } in topological order. This is valid since v has no incoming edges.
DAG v
The University of Sydney
25

Topological Sorting Algorithm: Running Time
Theorem: Algorithm finds a topological order in O(m + n) time.
Proof:
Maintain the following information:
count[w] = remaining number of incoming edges S = set of remaining nodes with no incoming edges
Initialization: O(m + n) via single scan through graph. While S is not empty:

letvbenodeinS remove v from S
decrement count[w] for all edges from v to w, and add w to S if
count[w] hits 0
this is O(1) per edge
v2 v3 v2 v3
v6 v5 v4v6 v5 v4

The University of Sydney
v7 v1 v7 26

Summary: Graphs
Connectivity in directed graphs DAGs
Topological sort
The University of Sydney 27

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS crawler algorithm COMP3027: Algorithm Design
$25