Hello friends! This post will be a short review of the the most common and basic algorithms for undirected graphs.
Note: This is not a post intended for folks who are not at all familiar with graphs. For beginners a better starting point would be a book like this.
We will start by discussing the basic problems and then we will have a look at some common applications.
Basic Problem 1 : Traversal
Problem Statement: graph traversal is the problem of visiting all the nodes in a graph in a particular manner, updating and/or checking their values along the way.
Solution 1 : Breadth First Traversal/Search (BFS)
TL;DR: search is limited to essentially two operations: (a) visit and inspect a node of a graph, (b) gain access to visit the nodes that neighbour the currently visited node.Points to remember:
- Layer-wise search forming layers L[0],L[1],…,L[i] of nodes
- L[i] are neighbours to L[i-1] , and are not present in any of the earlier layers.
- L[j] is exactly j distant from S. (assuming non-weighted edges , each with cost 1)
- BFS tree can be obtained after running BFS.
- Any edge can be present between nodes not differing by more than one layer
- Any BFS routine produces the connected component that contains the root.
- O(m+n) running time and space.
- Efficient implementation using a queue.
Steps:
- Explore all neighbours of root. Call them level 1.
- Then explore all neighbours of nodes of level 1. Call them level 2.
- Keep Exploring like this 🙂
Code:
Without Queue:
#define nodecount 10000 | |
bool discovered[nodecount]; | |
vector<vector<int> > L;//Layers | |
void BFS(int s, vector<vector<int, int> > & adjList)//BFS for arrays, without queue, non-recursive, by @arkanathpathak | |
{ | |
fill(discovered, discovered+nodecount, false); | |
discovered[s] = true; | |
for(int v=0;v<nodecount;v++) | |
{ | |
if(v!=s)discovered[v]=false; | |
} | |
int i=0; | |
L.push_back(*(new vector<int>)); | |
L[0].push_back(s); | |
while(L[i].size()!=0) | |
{ | |
L.push_back(*(new vector<int>)); | |
for(int j=0;j < (int)L[i].size(); j++) | |
{ | |
int u = L[i][j]; | |
for(int k=0;k<(int)adjList[u].size();k++) | |
{ | |
int v = adjList[u][k]; | |
if(discovered[v]==false) | |
{ | |
discovered[v] = true; | |
L[i+1].push_back(v); | |
} | |
} | |
} | |
i++; | |
} | |
} |
With Queue:
vector<int> adjList[N]; | |
int visited[N]; | |
void bfs(int u) | |
{ | |
visited[u] = 1;//mapping the visited nodes, by default 0 | |
queue<int> nq;//queue used | |
nq.push(u);//push initial node | |
while(!nq.empty()) | |
{ | |
int tt = nq.front(); | |
visited[tt] = 1; | |
nq.pop(); | |
for(int i=0;i<adjList[tt].size();i++) | |
{ | |
if(!visited[adjList[tt][i]]) | |
{ | |
nq.push(adjList[tt][i]);//push all non-visited neighbours | |
} | |
} | |
} | |
} |
Solution 2 : Depth First Traversal/Search (DFS)
TL;DR: Greedy search, where search retreats only when it encounters a dead-end of the branch,i.e. all neighbours.Points to remember:
- Like the BFS tree, a DFS tree is obtained.
- O(m+n) running time and space.
- If an edge (x,y) is not present in the DFS tree, then one is a descendant of the other
- Can be used to produce Topological Sorting.
- Can be used to produce SCCs(Strongly connected components) using Kosaraju’s two-pass algorithm.
- Simple recursive implementation possible that makes use of recursion stacks
- Non-recursively efficiently implemented using Stacks.
- Discovery times of every node follows parenthesis structure (Source: CLRS)
- For finding all paths, we can unmark the node as visitied when we finish the loop.
- Levels are processed in reverse order for non recursive with respect to that of recursive implementation
Steps:
- Explore first neighbour of root.
- Then explore the first unexplored neighbour of the first neighbour
- Keep exploring until you reach a node with no unexplored neighbours.
- Backtrack at that node until you reach a node with unexplored neighbours.
Code:
With Recursion:
vector<int> adjList[N]; | |
int visited[N]; | |
void dfsrecursive(int u) | |
{ | |
visited[u] = 1; | |
for(int i=0;i<adjList[u].size();i++) | |
{ | |
if(!visited[adjList[u][i]]) | |
{ | |
dfsrecursive(adjList[u][i]); | |
} | |
} | |
} | |
//See, that's it! |
Without Recursion, with Stack:
vector<int> adjList[N]; | |
int visited[N]; | |
void dfsstack(int u) | |
{ | |
stack<int> nstack;//stack used in place of recursion stack | |
visited[u] = 1; | |
nstack.push(u); | |
while(!nstack.empty()) | |
{ | |
int tp = nstack.top(); | |
nstack.pop(); | |
for(int i=0;i<adjList[tp].size();i++) | |
{ | |
if(!visited[adjList[tp][i]]) | |
{ | |
nstack.push(adjList[tp][i]); | |
visited[adjList[tp][i]] = 1; | |
//visited that node, the time has come 🙂 | |
} | |
} | |
} | |
} |
Notes on Traversal
- BFS and DFS have same running time and space efficiency
- BFS and DFS can be used to find connected components.
- BFS and DFS can be used to find connected cycles (DFS is more efficient and can be used in directed graphs also, as backward edge produces cycle).
- Classification Of Edges(directed): Tree Edge, Forward Edge(connecting to descendant), Backward Edge(connecting to ancestor), Cross Edge(all other). In DFS, only tree edge or backward edge is possible.
- Classification Of Edges(un-directed): Tree Edge, Forward Edge(discovered by an ancestor), Backward Edge(discovered by a successor), Cross Edge. In DFS, only backward or forward or tree edges are possible.In BFS, only cross edges are possible.
Basic Problem 2 : Shortest Paths
Single Source Problem Statement: Given a undirected weighted graph G and a source node u, find the shortest cost paths from u to all other nodes.
Solution: Dijkstra’s Algorithm
TL;DR: Algorithm starts at the source vertex, u, it grows a tree, T, that ultimately spans all vertices reachable from u. Vertices are added to T in order of distance i.e., first u, then the vertex closest to u, then the next closest, and so on.Steps:
- Initialise a set P and a set Q, with P initially storing u with cost 0, and all other in Q with cost infinity.
- Change the priorities of the nodes in Q having an edge with u as their edge costs.
- Dequeue the minimum key node and add it to P.
- Adjust the priorities of other nodes in Q so that the cost of addition gets minimised.
- At the end, the keys are the shortest path costs.
Cost of addition is given by
Code:
int shortestdistance[N]; | |
int explored[N]; | |
vector<pair<int, int> > adjList[N]; // stores edges with cost | |
void dijkstra(int node)// Does not use priority queue, O(M+N2) implementation | |
{ | |
shortestdistance[node] = 0; | |
explored[node] = 1; | |
int discovered = 1; | |
int lastadded = node; | |
while(discovered<N) | |
{ | |
explored[lastadded] = 1; | |
repi(i, 0, adjList[lastadded].sz) | |
{ | |
int cost = adjList[lastadded][i].ss; | |
int ii = adjList[lastadded][i].ff; | |
shortestdistance[ii] = min(shortestdistance[ii],shortestdistance[lastadded]+cost); | |
} | |
int mincost = INT_MAX; | |
int mincostindex = -1; | |
for(int i=0;i<N;i++) | |
{ | |
if(shortestdistance[i]<mincost && explored[i]==0) | |
{ | |
mincostindex = i; | |
mincost = shortestdistance[i]; | |
} | |
} | |
if(mincost==INT_MAX) break;//completes the connected component | |
lastadded = mincostindex; | |
discovered++; | |
} | |
} |
Proof of Correctness:
Proof by Induction on set P’s size: Lets assume that the paths are correct for all the previous nodes in P for some node v. Then if there exists some path with smaller cost it must cross P some time, let’s say it crosses with edge passing from x to y. Then algorithm would have selected y before v as all edge lengths are positive, which is a contradiction.
Running Time Analysis:
Total deleteMin – O(nlogn) with Priority Queue, O(n2) without Priority Queue
Total change priorities – O(mlogn) with Priority Queue, O(m) without Priority Queue
Hence total running time = O((n+m)logn) with Priority Queue, O(m+n2) without Priority Queue
Points to remember:
- Uses Adjacency List Representation of Graph
- Much Similar to Prim’s Algorithm
- Requires all edge costs to be non-negative
Okay, now let’s look at a problem where we are not required to find the shorted distances from only one source, but from each node.
All Pairs Shortest Paths Problem Statement: Given a undirected weighted graph G and, find the shortest cost paths between all pairs.
Naive Solution: Run Dijkstra’s algorithm on all nodes. O(mnlogn) = O(n3logn).
Solution 1: Floyd-Warshall Algorithm
TL;DR: Dynamic Programming approach with subproblem as shortest distance between node i and node j using only the first k vertices in between.Points to remember:
- O(n3) running time
- No negative cost cycles should be present.
- In place implementation possible as the (k+1)th row, and (k+1)th column don’t change in one iteration.
Steps:
- Maintain n 2d arrays D[n][][], such that D[k][i][j] gives the shortest distance between i and j using only the first k vertices in between.
- Final result will be D[n-1]
- Initiate D[-1][i][j] as 0 if i==j, cost of (i,j) if edge exists, infinity otherwise.
- Using dynamic programming, find D[k+1][i][j] = min(D[k][i][j] , D[k+1][i][k+1] + D[k+1][k+1][j]).
Steps for finding the shortest paths:
- Same kind of implementation, use a matrix N[][][]
- N[-1][i][j] = j if (i,j) exists, undefined otherwise.
- N[k+1][i][j] = N[k][i,k+1] if k+1 path is shorter, else N[k][i][j]
- Now for finding the path between i,j keep following the array N and printing the nodes.
Basic Problem 3 : Minimum Spanning Trees
Definition: A spanning tree of a graph is a tree that covers all the nodes of the original graph, e.g. BFS/DFS trees. A minimum spanning tree is a spanning tree with the minimum cost.
The MST Lemma (Cut Property): Let P and Q be two disjoint node sets that together give the union of the original graph nodes, then among all the edges between P and Q, the node with minimum edge cost must be present in an MST(there might be more than one MST)
Proof: Let T be an MST that doesn’t contain e{u,v} (the minimum edge cost), then adding e to it will produce a cycle. Traverse along that cycle , and whenever it crosses from P to Q, replace that edge with e. This produces an MST again.
The MST Cycle Lemma (Cycle Property): The costliest edge of a connected undirected graph must not be in an(any if edge costs are distinct in that cycle) MST.
Proof: Let T be an MST that contains the costliest edge e {u,v}. T-e gives two connected disjoint components, traverse along the cycle to get a cheaper edge that can be replaced with e.
Problem Statement: Find a minimum spanning tree for an undirected weighted graph.
Solution 1: Prim’s Algorithm
Point to remember: Nearly direct implementation of the MST lemma.
Steps:
- Start with a set P with some random root node u and a priority queue Q that contains all the remaining nodes of graph with key as infinity.
- Set priority of the nodes that share an edge with u as their edge cost.
- In the priority queue Q maintain a priority of the shortest edge cost from the node to P and the respective closest neighbour
- In a loop keep choosing elements from Q, s.t. d[u] is as small as possible. Add u to P.
- Update the priority of all the nodes in Q w.r.t u. i.e., compare their current priority with the edge cost with u and update if the edge cost is lesser.
Running Time Analysis:
- Adjacency lists are preferred.
- Make heap and other inititalizations – O(n)
- One iteration O(1) in look up and deletion – total of O(nlogn).
- Total iterations of ChangePriority O(m) and each individual costs O(log(n)), hence a total of O(mlogn).
Solution 2: Kruskal’s Algorithm
Points to remember:
- Uses MST lemma.
- Uses MFset( also called Union-Find) Data Structure
- O(mlogn) using MFSet
- Runtime dramatically reduces near to Ackermann Function after using path compression, goes to O(n * ackermann(m)), where ackermann(m)<= 5 for all practical purposes.
Steps:
- Sort the edges in increasing order of cost.
- Keep adding edges to MST, and whenever a cycle is formed, just don’t add it.
Running Time Analysis: O(mlogn) in worst case without path compression
Solution 3: Reverse- Delete Algorithm
Point to remember: Uses the cycle MST lemma
Steps:
- Sort the edges in decreasing order of costs.
- Keep deleting the costliest edges as long as the tree stays connected
Running Time Analysis: O(n2) with usual implementation.
Notes on MSTs
- All of them are Greedy Solutions
- It turns out that any greedy approach using either of the lemmas will work.
Interesting Applications
Cycle Detection in a Graph: Detect if a directed graph contains a cycle
Points to remember:
- Direct extension of DFS.
- Cycle in a graph only if there is a back edge present in the graph.
Code 🙂
vector<int> adjList[N]; | |
int visited[N]; | |
bool dfsrecursiveandcycle(int u, int parent) | |
{ | |
cout << "visited " << u << endl;// 🙂 | |
visited[u] = 1; | |
bool ans = false; | |
for(int i=0;i<adjList[u].size();i++) | |
{ | |
if(!visited[adjList[u][i]]) | |
{ | |
if(dfsrecursiveandcycle(adjList[u][i], u)) ans = true; | |
} | |
else if(adjList[u][i]!=parent) | |
{ | |
ans = true; | |
} | |
} | |
return ans; | |
} |
Running Time Analysis:
- Seriously?
Testing Bipartiteness
Bipartite Graph Definition: A graph whose vertices can be divided into two disjoint sets X and Y such that every edge has one end in X and other in Y. Consequently, the sets X and Y are independent sets.
Points to remember:
- Sufficient condition is the non existence of any odd cycles.
- Can use both BFS and DFS to test bipartiteness
- For BFS, check whether there is an odd cycle which can be checked by confirming if there exists an edge between two consecutive layers.
- For DFS, while performing DFS assign colors to the node with negative of parent, hence if you encounter a contradictory case, then not bipartite.
Extending these algorithms For Directed Graphs:
Points to remember:
- BFS and DFS remain the same.
- Strong Connectivity can be checked by running both forward and backward BFS/DFS and comparing the two components if they are the same.
- Strong Components will either be equal or disjoint because if s & u are mutually reachable, s & v are mutually reachable then u & v will also be mutually reachable.
Topological Ordering/Sorting in a Directed Graph:
Topological Ordering of nodes: If every directed edge is forward pointed then the ordering is a topological ordering.
Points to remember:
- A topological ordering of a graph exists iff the graph is a DAG(Directed Acyclic Graph)
- Useful in finding precedence relations
- For every DAG there must be a node with no incoming edges
Algortithms Possible:
- Recursively follow the node with no incoming edges, delete that and then recursively find the next such vertex,
- A variation of DFS, whenever a node is finished searching, add it to front of a linked list. This list will be a DAG. Note to make repeated invocations of DFS as the DFS might stop before traversing to each edge. For more information see this.
Analysis:
- O(n2) running time for raw algorithm1, and O(n+m) for DFS implementation
- To obtain O(m+n) time optimization, use the following technique for Algorithm-1:
- Store two Data Structures:
- for each w, no of active nodes that have incoming edge w.r.t. w.
- the set of all active nodes that have no incoming edges from other active nodes.
- Store two Data Structures:
So that would be all for this post. BFS, DFS, Dijkstra, Floyd-Warshall, MST, and some interesting applications. The notes are highly influenced from the book by Kleinberg, Tardos and the book by Cormen et.al. . If you are not satisfied have a read in the books, or just post a comment and I can care to explain 🙂 .
Make sure to point out any mistakes or suggestions in the comments.
See you in the next post!