Graph Algorithms – a revision of all the basic ones.

Hello friends! This post will be a short review of the the most common and basic algorithms for undirected graphs.

Note: This is not a post intended for folks who are not at all familiar with graphs. For beginners a better starting point would be a book like this.

We will start by discussing the basic problems and then we will have a look at some common applications.

Basic Problem 1 : Traversal

Problem Statement: graph traversal is the problem of visiting all the nodes in a graph in a particular manner, updating and/or checking their values along the way.

Solution 1 : Breadth First Traversal/Search (BFS)

bfs

TL;DR: search is limited to essentially two operations: (a) visit and inspect a node of a graph, (b) gain access to visit the nodes that neighbour the currently visited node.

Points to remember:

  • Layer-wise search forming layers L[0],L[1],…,L[i] of nodes
  • L[i] are neighbours to L[i-1] ,  and are not present in any of the earlier layers.
  • L[j] is exactly j distant from S. (assuming non-weighted edges , each with cost 1)
  • BFS tree can be obtained after running BFS.
  • Any edge can be present between nodes not differing by more than one layer
  • Any BFS routine produces the connected component that contains the root.
  • O(m+n) running time and space.
  • Efficient implementation using a queue.

Steps:

  1. Explore all neighbours of root. Call them level 1.
  2. Then explore all neighbours of nodes of level 1. Call them level 2.
  3. Keep Exploring like this 🙂

Code:

Without Queue:

With Queue:

Solution 2 : Depth First Traversal/Search (DFS)

dfs

 
TL;DR: Greedy search, where search retreats only when it encounters a dead-end of the branch,i.e. all neighbours.

Points to remember:

  • Like the BFS tree, a DFS tree is obtained.
  • O(m+n) running time and space.
  • If an edge (x,y) is not present in the DFS tree, then one is a descendant of the other
  • Can be used to produce Topological Sorting.
  • Can be used to produce SCCs(Strongly connected components) using Kosaraju’s two-pass algorithm.
  • Simple recursive implementation possible that makes use of recursion stacks
  • Non-recursively efficiently implemented using Stacks.
  • Discovery times of every node follows parenthesis structure (Source: CLRS)
  • For finding all paths, we can unmark the node  as visitied when we finish the loop.
  • Levels are processed in reverse order for non recursive with respect to that of recursive implementation

Steps:

  1. Explore first neighbour of root.
  2. Then explore the first unexplored neighbour of the first neighbour
  3. Keep exploring until you reach a node with no unexplored neighbours.
  4. Backtrack at that node until you reach a node with unexplored neighbours.

Code:

With Recursion:

Without Recursion, with Stack:

Notes on Traversal

  • BFS and DFS have same running time and space efficiency
  • BFS and DFS can be used to find connected components.
  • BFS and DFS can be used to find connected cycles (DFS is more efficient and can be used in directed graphs also, as backward edge produces cycle).
  • Classification Of Edges(directed): Tree Edge, Forward Edge(connecting to descendant), Backward Edge(connecting to ancestor), Cross Edge(all other). In DFS, only tree edge or backward edge is possible.
  • Classification Of Edges(un-directed): Tree Edge, Forward Edge(discovered by an ancestor), Backward Edge(discovered by a successor), Cross Edge. In DFS, only backward or forward or tree edges are possible.In BFS, only cross edges are possible.

Basic Problem 2 : Shortest Paths

Single Source Problem Statement:  Given a undirected weighted graph G and a source node u, find the shortest cost paths from u  to all other nodes.

Solution: Dijkstra’s Algorithm

dijkstra

TL;DR: Algorithm starts at the source vertex, u, it grows a tree, T, that ultimately spans all vertices reachable from u. Vertices are added to T in order of distance i.e., first u, then the vertex closest to u, then the next closest, and so on.

Steps:

  • Initialise a set P and a set Q, with P initially storing u with cost 0, and all other in Q with cost infinity.
  • Change the priorities of the nodes in Q having an edge with u as their edge costs.
  • Dequeue the minimum key node and add it to P.
  • Adjust the priorities of other nodes in Q so that the cost of addition gets minimised.
  • At the end, the keys are the shortest path costs.

Cost of addition is given by

Screen Shot 2013-11-19 at 9.32.39 am

Code:

Proof of Correctness:
Proof by Induction on set P’s size: Lets assume that the paths are correct for all the previous nodes in P for some node v. Then if there exists some path with smaller cost it must cross P some time, let’s say it crosses with edge passing from x to y. Then algorithm would have selected y before v as all edge lengths are positive, which is a contradiction.

Running Time Analysis:
Total deleteMin – O(nlogn) with Priority Queue, O(n2) without Priority Queue
Total change priorities – O(mlogn) with Priority Queue, O(m) without Priority Queue

Hence total running time = O((n+m)logn) with Priority Queue, O(m+n2) without Priority Queue

Points to remember:

  • Uses Adjacency List Representation of Graph
  • Much Similar to Prim’s Algorithm
  • Requires all edge costs to be non-negative

Okay, now let’s look at a problem where we are not required to find the shorted distances from only one source, but from each node.

All Pairs Shortest Paths Problem Statement:  Given a undirected weighted graph G and, find the shortest cost paths between all pairs.

Naive Solution: Run Dijkstra’s algorithm on all nodes. O(mnlogn) = O(n3logn).

Solution 1: Floyd-Warshall Algorithm

TL;DR: Dynamic Programming approach with subproblem as shortest distance between node i and node j using only the first k vertices in between.

Points to remember:

  • O(n3) running time
  • No negative cost cycles should be present.
  • In place implementation possible as the (k+1)th row, and (k+1)th column don’t change in one iteration.

Steps:

  • Maintain n 2d arrays D[n][][], such that D[k][i][j] gives the shortest distance between i and j using only the first k vertices in between.
  • Final result will be D[n-1]
  • Initiate D[-1][i][j] as 0 if i==j, cost of (i,j) if edge exists, infinity otherwise.
  • Using dynamic programming, find D[k+1][i][j] = min(D[k][i][j] , D[k+1][i][k+1] + D[k+1][k+1][j]).

Steps for finding the shortest paths:

  1. Same kind of implementation, use a matrix N[][][]
  2. N[-1][i][j] = j if (i,j) exists, undefined otherwise.
  3. N[k+1][i][j] = N[k][i,k+1] if k+1 path is shorter, else N[k][i][j]
  4. Now for finding the path between i,j keep following the array N and printing the nodes.

Basic Problem 3 : Minimum Spanning Trees

Definition: A spanning tree of a graph is a tree that covers all the nodes of the original graph, e.g. BFS/DFS trees. A minimum spanning tree is a spanning tree with the minimum cost.

The MST Lemma (Cut Property): Let P and Q be two disjoint node sets that together give the union of the original graph nodes, then among all the edges between P and Q, the node with minimum edge cost must be present in an MST(there might be more than one MST)

Proof: Let T be an MST that doesn’t contain e{u,v} (the minimum edge cost), then adding e to it will produce a cycle. Traverse along that cycle , and whenever it crosses from P to Q, replace that edge with e. This produces an MST again.

The MST Cycle Lemma (Cycle Property): The costliest edge of a connected undirected graph must not be in an(any if edge costs are distinct in that cycle) MST.

Proof: Let T be an MST that contains the costliest edge e {u,v}. T-e gives two connected disjoint components, traverse along the cycle to get a cheaper edge that can be replaced with e.

Problem Statement: Find a minimum spanning tree for an undirected weighted graph.

Solution 1: Prim’s Algorithm

Point to remember: Nearly direct implementation of the MST lemma.

Prim-algorithm-animation-2

Steps:

  • Start with a set P with some random root node u and a priority queue Q that contains all the remaining nodes of graph with key as infinity.
  • Set priority of the nodes that share an edge with u as their edge cost.
  • In the priority queue Q maintain a priority of the shortest edge cost from the node to P and the respective closest neighbour
  • In a loop keep choosing elements from Q, s.t. d[u] is as small as possible. Add u to P.
  • Update the priority of all the nodes in Q w.r.t u. i.e.,  compare their current priority with the edge cost with u and update if the edge cost is lesser.

Running Time Analysis:

  • Adjacency lists are preferred.
  • Make heap and other inititalizations – O(n)
  • One iteration O(1) in look up and deletion – total of O(nlogn).
  • Total iterations of ChangePriority O(m) and each individual costs O(log(n)), hence a total of O(mlogn).

Solution 2: Kruskal’s Algorithm

Kruskal_algorithm

Points to remember:

  • Uses MST lemma.
  • Uses MFset( also called Union-Find) Data Structure
  • O(mlogn) using MFSet
  • Runtime dramatically reduces near to Ackermann Function after using path compression, goes to O(n * ackermann(m)), where ackermann(m)<= 5 for all practical purposes.

Steps:

  1. Sort the edges in increasing order of cost.
  2. Keep adding edges to MST, and whenever a cycle is formed, just don’t add it.

Running Time Analysis: O(mlogn) in worst case without path compression

Solution 3: Reverse- Delete Algorithm

Point to remember: Uses the cycle MST lemma

Steps:

  1. Sort the edges in decreasing order of costs.
  2. Keep deleting the costliest edges as long as the tree stays connected

Running Time Analysis: O(n2) with usual implementation.

Notes on MSTs

  • All of them are Greedy Solutions
  • It turns out that any greedy approach using either of the lemmas will work.

Interesting Applications

Cycle Detection in a Graph: Detect if a directed graph contains a cycle

Points to remember:

  • Direct extension of DFS.
  • Cycle in a graph only if there is a back edge present in the graph.

Code 🙂

Running Time Analysis:

  • Seriously?

Testing Bipartiteness

Bipartite Graph Definition: A graph whose vertices can be divided into two disjoint sets X and Y such that every edge has one end in X and other in Y. Consequently, the sets X and Y are independent sets.

Points to remember:

  • Sufficient condition is the non existence of any odd cycles.
  • Can use both BFS and DFS to test bipartiteness
  • For BFS, check whether there is an odd cycle which can be checked by confirming if there exists an edge between two consecutive layers.
  • For DFS, while performing DFS assign colors to the node with negative of parent, hence if you encounter a contradictory case, then not bipartite.

Extending these algorithms For Directed Graphs:

Points to remember:

  • BFS and DFS remain the same.
  • Strong Connectivity can be checked by running both forward and backward BFS/DFS and comparing the two components if they are the same.
  • Strong Components will either be equal or disjoint because if s & u are mutually reachable, s & v are mutually reachable then u & v will also be mutually reachable.

Topological Ordering/Sorting in a Directed Graph:

Topological Ordering of nodes: If every directed edge is forward pointed then the ordering is a topological ordering.

Points to remember:

  • A topological ordering of a graph exists iff the graph is a DAG(Directed Acyclic Graph)
  • Useful in finding precedence relations
  • For every DAG there must be a node with no incoming edges

Algortithms Possible:

  • Recursively follow the node with no incoming edges, delete that and then recursively find the next such vertex,
  • A variation of DFS, whenever a node is finished searching, add it to front of a linked list. This list will be a DAG. Note to make repeated invocations of DFS as the DFS might stop before traversing to each edge. For more information see this.

Analysis:

  • O(n2) running time for raw algorithm1, and O(n+m) for DFS implementation
  • To obtain O(m+n) time optimization, use the following technique for Algorithm-1:
    • Store two Data Structures:
      • for each w, no of active nodes that have incoming edge w.r.t. w.
      • the set of all active nodes that have no incoming edges from other active nodes.

So that would be all for this post. BFS, DFS, Dijkstra, Floyd-Warshall, MST, and some interesting applications. The notes are highly influenced from the book by Kleinberg, Tardos and the book by Cormen et.al. . If you are not satisfied have a read in the books, or just post a comment and I can care to explain 🙂 .

Make sure to point out any mistakes or suggestions in the comments.

See you in the next post!


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s