New Article: Machine Learning -> Clustering -> K-Means

In this article we will be discussing about the k-means algorithm for clustering. This is the first in a series of articles related to Machine Learning that I will be writing about in Trends Of Code.

The prerequisites for this article is almost nothing, but if you are completely new to Machine Learning I would recommend you to first build up from an even simpler machine learning algorithm like k-nearest neighbours and get an idea of how a general machine learning algorithm works. Still, I will be building up from the very basics and hence understanding this article without any pre knowledge should also not be difficult.

A very natural follow up to this article would be a more general algorithm, called the Fuzzy c-means (FCM) algorithm, also known as the soft version of k-means. That will probably be the next article that I will write in Trends of Code.

Also, as a note, I have introduced a vertical scroll view too since the previous post reported lack of support for some browsers.

Read the article here: http://trendsofcode.net/kmeans

Advertisements

Trends Of Code is shifting!

Since this blog is going to be a little more scientific in the future, I needed a better design than the suits theme. For that purpose, I have developed a new website for the blog which will follow a design similar to that used in academic literature.

The URL for the website is http://trendsofcode.net/ . I have posted an article there explaining further the need for the change.

However I will still be posting links to the posts here for you WordPress folks.

Hope you like the new design.

Graph Algorithms : Network Flow

Definition of a Flow Network: 
A directed graph (V,E) with the following properties:
  • Associated with each edge e is a capacity c(e) , c(e)>0.
  • There is a single source node s belonging to V
  • There is a single sink node t belonging to V
Nodes other than s and t are called internal nodes.
Assumptions:
  • All capacities are integers.
  • There must be at least one edge leaving source
  • There must be at least one edge entering the sink
Defining Flow:
maxFlow01
Basically carrying traffic or flow. More formally, flow is a function mapping each edge to a value f(e) satisfying the following three constraints:
  • {Capacity Constraint} For each edge e,  f(e)<=c(e)
  • {Skew Symmetry} For each pair of vertices u,v ,  f(u,v) = -f(v,u)
  • {Flow Conservation} For each node v other than s and t,  sum of all flow going out of v must be zero. In other words, total +ve flow coming into v (or fin(v)) equals total +ve flow going out of v (or fout(v)).
Hence f can be both negative or positive, also called the net flow of that edge. The value of network flow is thence summation of flow going out of source.
Note: Generally flow for an edge corresponds to the positive flow.
Defining Flow between sets:
f(X,Y) = sum of f(x,y) , x belongs to X, y belongs to Y.
Some Basic Identities:
  • f(X,X) = 0
  • f(X,Y) = -f(Y,X)
  • f(XUY,Z) =  f(X,Z) + f(Y,Z) if X and Y are disjoint.
  • f(s,V) = f(V,t)
    Proof:
    f(s,V)
    = f(V,V) – f(V-s,V)
    = f(V,V-s)
    = f(V,t) + f(V,V-s-t)
    = f(V,t) – f(V-s-t,V)
    = f(V,t)

The Maximum-Flow Problem
 
Natural Goal – Maximizing the flow from source to sink without violating capacity constraints.
Note: Problem for multiple sources and sinks can be reduced making a global source and a global sink and connecting with infinite capacities.
 
Example: Formulate maximum flow as linear programming problem.
 
Designing The Algorithm:
  • DP won’t work here.
  • If start with a greedy approach then we can counter that by compensating some flows and directing it to other edges.
  • Hence the concept of augmenting paths

Residual Graph: G’ defined w.r.t. some flow f.

7761D2BF-F984-459F-B9D4-AB5DC0CC1F5D

  • Node set of G’ same as G
  • Consists of edges that can admit more net flow.
  • For every edge having flow less than capacity (assuming forward to be the direction of the capacity)
    • A forward edge with residual capacity
    • A backward edge describing the flow for that edge
  • Mathematically, cf(u,v)=c(u,v)-f(u,v) {c(u,v) will be zero for the reverse edge in nearly all cases}
 
Note: If f is a flow in G and  f’ is a flow in Gf, then f+f’ is also a flow in G. Proof is trivial using the definition of flow network.
Augmenting Flow using the residual graph:
For any residual graph G’ , a simple path P with the bottleneck edge b as the minimum cost edge in P is called an augmenting path.
 
fp(u,v): b if (u,v) belongs to P, -b if (v,u) belongs to P, 0 otherwise. 
 
Augmenting flow is defined as f’(u,v) = f(u,v)+fp(u,v)
Proof of f’ being a flow:
 
If e is a forward edge on P ->  0 <= f(e) <=  f’(e) = f(e) + b <=  f(e) + (ce – f(e)) = ce;
If e is a backward edge on P -> ce >= f(e) >= f’(e) = f(e) – b >= f(e) – f(e) = 0;
Proof for conservation of each node:
If node not in P, no effect. If node is in P then one outgoing flow increases and one incoming flow decreases by the same amount.
 Flows and Cuts:
maxFlow07
Notion of Cut:  partition the nodes into two sets S & T, s belongs to S, t belongs to T. Then (S,T) form a cut.
Identity: Flow of a cut ( f(S,T) ) equals f(s,V).
Proof:
  • f(S,T)
  • = f(S,V-S)
  • = f(S,V)
  • = f(s,V) + f(V-s,V)
  • = f(s,V)
  • = |f|
Capacity of the Cut: c(S,T) sum of capacities of all edges from S to T.
Theorem: Max Flow = Min Cut for a flow network.
Proof:
  • Any Flow <= Capacity of any Cut
    • Trivial
  • If f is maximum,
    • Gf has no augmenting path
      • Proof:
        • |f’| = |f| + |fp| hence flow increasing with each augmentation.
    • |f| = c(S,T) for some S,T
      • Proof:
        • Take S as all nodes reachable from s in the Residual Graph.
        • As t is not reachable from s, S must not contain t.
        • Take T as V-S.
        • As there are no outgoing edges from S to T in Gf , f(S,T) must be equal to c(S,T).
Ford-Fulkerson Algorithm for finding Max-Flow:
Max-Flow
Initially f(e)=0 for all e in G
While there is an s-t path in the residual graph Gf
  • Let P be any (not necessarily simple, simple makes it the generic flow algorithm) s-t path in Gf
  • f’= augment(f, P)
  • Update f to be f’
  • Update the residual graph Gf to be Gf
Endwhile
Return
Points to remember:
  • After every augmentation, the flow values and residual capacities are integers
  • While augmenting the flow is taken along the direction of the path.
  • The total value of flow increases by the bottleneck edge of s-t path cost as all other points conserved
  • At most |F| iterations where |F| is maximum flow.
  • Hence running time at most O(m|F|) as BFS in O(m+n) and n<=2*m as all nodes have at least one incident edge.
Edmond-Karp Algorithm for finding Max-Flow:
Same as the Generic flow algorithm, with augmentation done only with shortest BFS paths (edge weights as 1).
Running Time With This Modification: O(m2n).
Proof that running time is O(m2n):
  • Let δ(s,v) be the level of v in BFS tree.
  • After an augmentation, for each edge:
    • Either a backward edge is added (if existing was forward)
    • Or a forward edge is added (if existing was backward)
  • Which means, in a BFS tree, if Li to Li+1 was removed, Li+1 to Li would be added.
  • Hence, δ(s,v) must increase at every iteration.
  • If an edge starting at u is chosen as the bottleneck edge, u’s level increases by 2.
  • Hence a total of 2m * n/2 edges can be chosen as bottleneck.
  • Total no. of iterations <= mn.
  • Hence O(m2n).
Final Points:
  • The flow returned by Ford-Fulkersen Algorithm is a maximum flow.
  • Our time analysis assumes that the capacities are integers.
  • Given a flow of maximum value, we can compute s-t cut of minimum capacity in O(m) time.
  • In every flow network, the maximum value of  s-t flow is equal to the minimum capacity of s-t cut.
  • Dinic’s algorithm computes Max Flow in O(n2m).

How to add Duplicate Line key binding for Xcode 6 (an IDE for Mac OS X)

Just a quick tip!

Steps:

  1. Go to Applications/
  2. Right Click on Xcode -> Show Package Contents
  3. Go to Contents -> Frameworks -> Resources (shortcut)
  4. Open the location in terminal
  5. chmod 777 the Resources folder and the file IDETextKeyBindingSet.plist during this process
  6. Open the file in Xcode, Something like this will show up:
    Screen Shot 2014-12-09 at 10.13.13 pm
  7. Click on root, add a new Row Customized of type Dictionary
  8. Under that add a new Row for your task with type String with value
    selectLine:, copy:, moveToEndOfLine:, insertNewline:, paste:, deleteBackward:
  9. It should now look like this:
    Screen Shot 2014-12-09 at 10.16.11 pm
  10. Save the file, Close Xcode.
  11. Open Xcode, Go to Preferences > Key Bindings
  12. Scroll all the way down and you will find your new action! Add whatever key binding you like. In my case it was like this:
    Screen Shot 2014-12-09 at 10.20.08 pm
    Notice how I had to change the existing Duplicate command (possibly for some use in Interface Builder)
  13. That’s it! You will have your new key key binding working.

Graph Algorithms – a revision of all the basic ones.

Hello friends! This post will be a short review of the the most common and basic algorithms for undirected graphs.

Note: This is not a post intended for folks who are not at all familiar with graphs. For beginners a better starting point would be a book like this.

We will start by discussing the basic problems and then we will have a look at some common applications.

Basic Problem 1 : Traversal

Problem Statement: graph traversal is the problem of visiting all the nodes in a graph in a particular manner, updating and/or checking their values along the way.

Solution 1 : Breadth First Traversal/Search (BFS)

bfs

TL;DR: search is limited to essentially two operations: (a) visit and inspect a node of a graph, (b) gain access to visit the nodes that neighbour the currently visited node.

Points to remember:

  • Layer-wise search forming layers L[0],L[1],…,L[i] of nodes
  • L[i] are neighbours to L[i-1] ,  and are not present in any of the earlier layers.
  • L[j] is exactly j distant from S. (assuming non-weighted edges , each with cost 1)
  • BFS tree can be obtained after running BFS.
  • Any edge can be present between nodes not differing by more than one layer
  • Any BFS routine produces the connected component that contains the root.
  • O(m+n) running time and space.
  • Efficient implementation using a queue.

Steps:

  1. Explore all neighbours of root. Call them level 1.
  2. Then explore all neighbours of nodes of level 1. Call them level 2.
  3. Keep Exploring like this 🙂

Code:

Without Queue:

With Queue:

Solution 2 : Depth First Traversal/Search (DFS)

dfs

 
TL;DR: Greedy search, where search retreats only when it encounters a dead-end of the branch,i.e. all neighbours.

Points to remember:

  • Like the BFS tree, a DFS tree is obtained.
  • O(m+n) running time and space.
  • If an edge (x,y) is not present in the DFS tree, then one is a descendant of the other
  • Can be used to produce Topological Sorting.
  • Can be used to produce SCCs(Strongly connected components) using Kosaraju’s two-pass algorithm.
  • Simple recursive implementation possible that makes use of recursion stacks
  • Non-recursively efficiently implemented using Stacks.
  • Discovery times of every node follows parenthesis structure (Source: CLRS)
  • For finding all paths, we can unmark the node  as visitied when we finish the loop.
  • Levels are processed in reverse order for non recursive with respect to that of recursive implementation

Steps:

  1. Explore first neighbour of root.
  2. Then explore the first unexplored neighbour of the first neighbour
  3. Keep exploring until you reach a node with no unexplored neighbours.
  4. Backtrack at that node until you reach a node with unexplored neighbours.

Code:

With Recursion:

Without Recursion, with Stack:

Notes on Traversal

  • BFS and DFS have same running time and space efficiency
  • BFS and DFS can be used to find connected components.
  • BFS and DFS can be used to find connected cycles (DFS is more efficient and can be used in directed graphs also, as backward edge produces cycle).
  • Classification Of Edges(directed): Tree Edge, Forward Edge(connecting to descendant), Backward Edge(connecting to ancestor), Cross Edge(all other). In DFS, only tree edge or backward edge is possible.
  • Classification Of Edges(un-directed): Tree Edge, Forward Edge(discovered by an ancestor), Backward Edge(discovered by a successor), Cross Edge. In DFS, only backward or forward or tree edges are possible.In BFS, only cross edges are possible.

Basic Problem 2 : Shortest Paths

Single Source Problem Statement:  Given a undirected weighted graph G and a source node u, find the shortest cost paths from u  to all other nodes.

Solution: Dijkstra’s Algorithm

dijkstra

TL;DR: Algorithm starts at the source vertex, u, it grows a tree, T, that ultimately spans all vertices reachable from u. Vertices are added to T in order of distance i.e., first u, then the vertex closest to u, then the next closest, and so on.

Steps:

  • Initialise a set P and a set Q, with P initially storing u with cost 0, and all other in Q with cost infinity.
  • Change the priorities of the nodes in Q having an edge with u as their edge costs.
  • Dequeue the minimum key node and add it to P.
  • Adjust the priorities of other nodes in Q so that the cost of addition gets minimised.
  • At the end, the keys are the shortest path costs.

Cost of addition is given by

Screen Shot 2013-11-19 at 9.32.39 am

Code:

Proof of Correctness:
Proof by Induction on set P’s size: Lets assume that the paths are correct for all the previous nodes in P for some node v. Then if there exists some path with smaller cost it must cross P some time, let’s say it crosses with edge passing from x to y. Then algorithm would have selected y before v as all edge lengths are positive, which is a contradiction.

Running Time Analysis:
Total deleteMin – O(nlogn) with Priority Queue, O(n2) without Priority Queue
Total change priorities – O(mlogn) with Priority Queue, O(m) without Priority Queue

Hence total running time = O((n+m)logn) with Priority Queue, O(m+n2) without Priority Queue

Points to remember:

  • Uses Adjacency List Representation of Graph
  • Much Similar to Prim’s Algorithm
  • Requires all edge costs to be non-negative

Okay, now let’s look at a problem where we are not required to find the shorted distances from only one source, but from each node.

All Pairs Shortest Paths Problem Statement:  Given a undirected weighted graph G and, find the shortest cost paths between all pairs.

Naive Solution: Run Dijkstra’s algorithm on all nodes. O(mnlogn) = O(n3logn).

Solution 1: Floyd-Warshall Algorithm

TL;DR: Dynamic Programming approach with subproblem as shortest distance between node i and node j using only the first k vertices in between.

Points to remember:

  • O(n3) running time
  • No negative cost cycles should be present.
  • In place implementation possible as the (k+1)th row, and (k+1)th column don’t change in one iteration.

Steps:

  • Maintain n 2d arrays D[n][][], such that D[k][i][j] gives the shortest distance between i and j using only the first k vertices in between.
  • Final result will be D[n-1]
  • Initiate D[-1][i][j] as 0 if i==j, cost of (i,j) if edge exists, infinity otherwise.
  • Using dynamic programming, find D[k+1][i][j] = min(D[k][i][j] , D[k+1][i][k+1] + D[k+1][k+1][j]).

Steps for finding the shortest paths:

  1. Same kind of implementation, use a matrix N[][][]
  2. N[-1][i][j] = j if (i,j) exists, undefined otherwise.
  3. N[k+1][i][j] = N[k][i,k+1] if k+1 path is shorter, else N[k][i][j]
  4. Now for finding the path between i,j keep following the array N and printing the nodes.

Basic Problem 3 : Minimum Spanning Trees

Definition: A spanning tree of a graph is a tree that covers all the nodes of the original graph, e.g. BFS/DFS trees. A minimum spanning tree is a spanning tree with the minimum cost.

The MST Lemma (Cut Property): Let P and Q be two disjoint node sets that together give the union of the original graph nodes, then among all the edges between P and Q, the node with minimum edge cost must be present in an MST(there might be more than one MST)

Proof: Let T be an MST that doesn’t contain e{u,v} (the minimum edge cost), then adding e to it will produce a cycle. Traverse along that cycle , and whenever it crosses from P to Q, replace that edge with e. This produces an MST again.

The MST Cycle Lemma (Cycle Property): The costliest edge of a connected undirected graph must not be in an(any if edge costs are distinct in that cycle) MST.

Proof: Let T be an MST that contains the costliest edge e {u,v}. T-e gives two connected disjoint components, traverse along the cycle to get a cheaper edge that can be replaced with e.

Problem Statement: Find a minimum spanning tree for an undirected weighted graph.

Solution 1: Prim’s Algorithm

Point to remember: Nearly direct implementation of the MST lemma.

Prim-algorithm-animation-2

Steps:

  • Start with a set P with some random root node u and a priority queue Q that contains all the remaining nodes of graph with key as infinity.
  • Set priority of the nodes that share an edge with u as their edge cost.
  • In the priority queue Q maintain a priority of the shortest edge cost from the node to P and the respective closest neighbour
  • In a loop keep choosing elements from Q, s.t. d[u] is as small as possible. Add u to P.
  • Update the priority of all the nodes in Q w.r.t u. i.e.,  compare their current priority with the edge cost with u and update if the edge cost is lesser.

Running Time Analysis:

  • Adjacency lists are preferred.
  • Make heap and other inititalizations – O(n)
  • One iteration O(1) in look up and deletion – total of O(nlogn).
  • Total iterations of ChangePriority O(m) and each individual costs O(log(n)), hence a total of O(mlogn).

Solution 2: Kruskal’s Algorithm

Kruskal_algorithm

Points to remember:

  • Uses MST lemma.
  • Uses MFset( also called Union-Find) Data Structure
  • O(mlogn) using MFSet
  • Runtime dramatically reduces near to Ackermann Function after using path compression, goes to O(n * ackermann(m)), where ackermann(m)<= 5 for all practical purposes.

Steps:

  1. Sort the edges in increasing order of cost.
  2. Keep adding edges to MST, and whenever a cycle is formed, just don’t add it.

Running Time Analysis: O(mlogn) in worst case without path compression

Solution 3: Reverse- Delete Algorithm

Point to remember: Uses the cycle MST lemma

Steps:

  1. Sort the edges in decreasing order of costs.
  2. Keep deleting the costliest edges as long as the tree stays connected

Running Time Analysis: O(n2) with usual implementation.

Notes on MSTs

  • All of them are Greedy Solutions
  • It turns out that any greedy approach using either of the lemmas will work.

Interesting Applications

Cycle Detection in a Graph: Detect if a directed graph contains a cycle

Points to remember:

  • Direct extension of DFS.
  • Cycle in a graph only if there is a back edge present in the graph.

Code 🙂

Running Time Analysis:

  • Seriously?

Testing Bipartiteness

Bipartite Graph Definition: A graph whose vertices can be divided into two disjoint sets X and Y such that every edge has one end in X and other in Y. Consequently, the sets X and Y are independent sets.

Points to remember:

  • Sufficient condition is the non existence of any odd cycles.
  • Can use both BFS and DFS to test bipartiteness
  • For BFS, check whether there is an odd cycle which can be checked by confirming if there exists an edge between two consecutive layers.
  • For DFS, while performing DFS assign colors to the node with negative of parent, hence if you encounter a contradictory case, then not bipartite.

Extending these algorithms For Directed Graphs:

Points to remember:

  • BFS and DFS remain the same.
  • Strong Connectivity can be checked by running both forward and backward BFS/DFS and comparing the two components if they are the same.
  • Strong Components will either be equal or disjoint because if s & u are mutually reachable, s & v are mutually reachable then u & v will also be mutually reachable.

Topological Ordering/Sorting in a Directed Graph:

Topological Ordering of nodes: If every directed edge is forward pointed then the ordering is a topological ordering.

Points to remember:

  • A topological ordering of a graph exists iff the graph is a DAG(Directed Acyclic Graph)
  • Useful in finding precedence relations
  • For every DAG there must be a node with no incoming edges

Algortithms Possible:

  • Recursively follow the node with no incoming edges, delete that and then recursively find the next such vertex,
  • A variation of DFS, whenever a node is finished searching, add it to front of a linked list. This list will be a DAG. Note to make repeated invocations of DFS as the DFS might stop before traversing to each edge. For more information see this.

Analysis:

  • O(n2) running time for raw algorithm1, and O(n+m) for DFS implementation
  • To obtain O(m+n) time optimization, use the following technique for Algorithm-1:
    • Store two Data Structures:
      • for each w, no of active nodes that have incoming edge w.r.t. w.
      • the set of all active nodes that have no incoming edges from other active nodes.

So that would be all for this post. BFS, DFS, Dijkstra, Floyd-Warshall, MST, and some interesting applications. The notes are highly influenced from the book by Kleinberg, Tardos and the book by Cormen et.al. . If you are not satisfied have a read in the books, or just post a comment and I can care to explain 🙂 .

Make sure to point out any mistakes or suggestions in the comments.

See you in the next post!


Object Oriented Programming (OOP) in C++… short descriptions of all you need to know.

This post is highly inspired by and can be considered as some of my class notes for the Software Engineering course (CS20006) at IIT Kharagpur taught by Prof. Partha Pratim Das.

Many of the code illustrations are snapshots of the output along with the code using the application Xcode for Mac OS X. These should be followed thoroughly as these cover extra informations.


Procedural Enhancements over C: 


HOW STUFF WORKED IN C – THE SYNTAX

Image

THE CONST Keyword

Image

THE ENTRY OF REFERENCES

Image

Call by reference (act as in-out parameters),

whereas call by value acts as strictly-in parameters.

Image

Return by reference 

not to be used with local variables inside function (return value is now not copied back!)

Image

Pointers vs. References

  • Pointers can point to NULL whereas References cannot. There is nothing defined as NULL Reference.
  • Pointers can point to different variables at different times whereas for a reference, its referent is fixed.
  • References make code faster since, unlike pointers, checks for NULL are not required.
  • Reference “refers” to an address but does not store that address. Pointer does.
  • Can not have an array of references, not allowed.
  • You should never return by reference a local variable

MACROS and Inline Functions:

  • C Pre-Processor (CPP) macros are replaced textually before compilation, in turn:
    • Size increases
    • Speed increases
    • Perfect for small code reuse
    • Some expressions might turn out to be unexpected and in some cases compiler dependent.
    • Scope of the variables confined to where it’s being implemented (Can be both useful and harmful)
  • Inline Functions
    • Act as functions, implemented as macros
    • Type Checking is still performed.
    • May take more space
    • Implemented by using inline keyword
    • are to be defined in .h files not .c/.cxx files preferably..
    • Inline Functions are not always implemented inline, this is decided by the compiler based on the performance. See here for good answers.

Default Function Arguments: 

(Preferably in header files to avoid contradictions)

Image

Function Overloading:

  • Same function name can be used by different kind of definitions.
  • Function selection based on number and types of the actual parameters at the places of invocation.
  • Function selection (Overload Resolution) is performed by the compiler in the following order:
    • Exact Match
    • Promotion
    • Standard Type Conversion (like function-to-pointer and array-to-pointer)
    • User Defined Type Conversion
  • Two functions having the same signature but different return types will result in a compilation error due to “attempt to re- declare”.
  • Allows static polymorphism …

New-Delete Operators:

  • new Operator when called can be considered as performing these two operations:
    • Allocates memory in Free Store
    • Calls the constructor (discussed later)
  • delete Operator when called can be considered as performing these two operations:
    • Calls the Destructor
    • Frees up the memory in Free Store

Image


Classes in C++:


What is a Class?:

  • C++ Class is an implementation of “type”
  • In C++, class is the only way to implement user defined data types.
  • An object (any variable of that type, also called an instance) is an aggregate(encapsulation) of data items(member variables) and methods(member functions) that can work with them.
  • A Class is a layout of it’s objects.

How to code them?

  • A Class is defined by “class” keyword.
  • Each member in a class has an access specifier.
    • public: These members are accessible everywhere.
    • private: These members are accessible inside the definition of the class
    • protected: These are accessible only to it’s class definition and it’s subclass definitions.

“this” pointer: 

  • follows the type – X * const , where X is the class name as the pointer address can not be changed.
  • Necessity of this to the programmer:
    • Distinguishing data member from non-member variable
    • Explicit Use
  • Member functions implicitly take this pointer as an argument.

Constructor Functions:

  • are member functions with the same name as the class
  • are automatically called when an object is created, just after memory allocation.
  • allow the class to initialise the state of an object
  • Constructors also allocate additional memory space from the Free store (if) required for the object.
  • must not have any return type , not even void.
  • If users do not define any constructor then the compiler provides a default constructor.
  • Default constructors are those constructors which either do not have an argument or have default arguments (ints to 0, pointers to NULL) for all parameters.  can be directly called to create anonymous objects.
  • can be overloaded.
  • X a; invalid if no default constructor (if user defined constructor is not a default kind)
  • Initializer Lists: 
    • Initializer lists appear as a comma separated list
    • Initializer lists are required for a member object that must be passed initial arguments for construction
    • Initializer lists are required for const members and reference members
  • local static specifier:
    • created first time you cross them
    • destructed in terms of global variables
    • However, scope is lexically restricted, as usual, by braces.

Destructor Functions:

  • are member functions with the name “~<class-name>
  • is automatically called when an object is destroyed, just before memory is freed.
    • For auto variables when the variable goes out of scope
    • For objects created in the Free store, destructor is called after “delete” or “delete[]” is explicitly invoked by the programmer.
  • must not have any argument
  • must not have any return value
  • If destructor is not called then there could be memory leaks for objects which calls “new” in the constructor.
  • Also a free function (provided by default if needed)
  • Provides an Automatic Life Time 
  • If destructor is private, or not provided by any other way… then creation of an object is a compile error. 

Copy Constructor Functions:

  • Copy constructor is a special constructor which is used to initialize an object with another object of the same type.
  • Functions call it for both arguments and return.
  • Needs to take a reference as its argument because if not taken as reference, an infinite loop occurs as C++ itself calls the copy constructor for argument values.

Ahh! That’s a lot, let’s see all of them working!

Image

Object Layout w.r.t. memory:

  • Simplistically, an object of a class must have enough space to store all members of that class.
  • No space is required per object for storing function pointers for the methods declared in the class.
  • Class members can also be objects.
  • All objects get allocated a contiguous chunk of memory
  • Arrays of Objects:
    • String arrayOfString[100]; // 100 objects created using the default constructor
    • String arrayOfString[2] = { String(“abc”), String(“def”) }; // Using specialized constructor while creating an array of objects.
    • String *pArrayOfString[2] = { new String(“abc”), new String(“def”) };//Using different constructor for creating array objects dynamically.

Const Member Functions:

  • Constant Member Functions are not allowed to change the state of the object on which they are invoked.
  • Type of this pointer passed to a const function is thence const X* const this
  • Const Functions are declared with keyword const following member function parameter list.
  • const must also be present at the definition.
  • for escaping const member functions’ const-ness you can use mutable keyword

Image

Friends and Static Data:

  • Friend Functions 
    • are to be declared in one or more classes
    • behave like global functions and can’t be called on an object(no this pointer), they behave like global functions.
    • have access to the private members of those classes
    • are distinguished from members with the keyword friend
    • Member of a different class may be a friend function. (e.g. friend Vector Vector::prod(Matrix *pM);)
    • A class may be declared as a friend implying all member functions of that class are friends.
    • Friend-ship is neither commutative nor transitive.
    • It is best to avoid friend in an ideal OO Design.
    • Can be used for operators in which both ways needed  (like  int+Fraction, and Fraction+int)
  • Static Data Members 
    • A static data member is shared by all the objects of a class.
    • Static data member may be public or private.
    • Static data member must be initialized in a source file.
    • It can be accessed
      • with the class-name followed by the scope resolution operator “::”
      • as a member of any object of the class
    • Static Member functions:
      • May be invoked with the class-name::
      • May be invoked  as part of any objects interface
      • this pointer is not passed to a static function
      • must not refer to non-static members of its “invoking object”  
    • Features like Singleton Class (only one instance possible) can be implemented using static variables. {Make constructor private and store a static instance pointer, which will be returned by some method getInst() only if instance is not there

Operator Overloading:


Points to remember:

  • An operator is a symbol that tells the compiler to perform specific mathematical or logical manipulations.
  • can be postfix or infix for unary
  • “a+b” calls either a.operator+(b) or operator+(a,b)
  • These functions can either be implemented as global friend functions and/or methods.
  • Overloaded operator function must exist… i.e. you can only overload.
  • Overloaded operator function must not change the arity of already present operator
  • precedence and associativity should not change
  • Fraction& operator–(); for infix… no argument for unary operator
  • Fraction operator–(int); for postfix
  • Some operators can’t be overloaded, e.g. ::,.*.?:,., sizeof() etc.
  • Conditional Operators like &&, ||, comma operator should not be overloaded.
  • Member operators will not help if the left hand side of the operator is not of the class type…. for this reason stream operators must be friend
  • Returning const from a function ensures that you can’t use operations like (a*b)=c
  • Returning const ensures that overloaded “*” is compatible with the same operator on built-in types.

Assignment Operator:

  • Poly& operator=(const Poly&);
  • Poly returned because it makes chain assignment possible
  • Some cases in which you need to free some data before copying new data, make sure that self assignment can also take place
  • Reference returned to allow (a=b)=c, kind of statements… also some operators are left associative, in that case return must be by reference… e.g. (next point)
  • System defined assignment operator function makes a shallow copy.
  • If there is a need to define a copy constructor then there must be a need to overload assignment operator and vice-versa.

Inheritance


Image

What is inheritance: We say that a class(also called the subclass or derived class) inherits another class( also called the super class or base class) if the sub class has all the data attributes of the super class. C++ provides a special syntax with which use this special relation. When we use inheritance , all data members and methods of the base class are immediately available to the derived class. The derived class, however, may extend the state and behavior of the base class by adding more attributes and methods.

Advantages:

  • Reusability, reuses an already existing code
    • Reduces cost & development time.
    • Improves quality.
    • In C, we had to reuse a code using a library function… which can not be customised
    • In C++, we have features like inheritance and composition(code reuse by containing other classes
  • Extensibility
  • Data hiding, base class can decide to keep some data private so that it cannot be altered by the derived class
  • Overriding

Behaviour of Access Specifiers for members:

  • public: public for derived classes.
  • private: private for derived classes, can not be used by derived class members.
  • protected: private for derived classes, but can be used by them.

Accessibility based on type of inheritance:

  • public: Mostly needed… and mostly used.
    • private members of the base class become private members of the derived class, but can not be accessed by the derived class members!
    • However, the private members of the base class will still be accessible by the methods inherited from the base class.
    • public members of the base class become public members of the derived class
  • private: All members of base class behave as private to the derived.
  • protected: public and protected members become protected, and private members stay private.
Public inheritance
Base access specifier Derived access specifier Derived class access? Public access?
Public Public Yes Yes
Private Private No No
Protected Protected Yes No
Private inheritance
Base access specifier Derived access specifier Derived class access? Public access?
Public Private Yes No
Private Private No No
Protected Private Yes No
Protected inheritance
Base access specifier Derived access specifier Derived class access? Public access?
Public Protected Yes No
Private Private No No
Protected Protected Yes No

Order of Constructor/Destructor Calls:

  • Constructor
    • Base then Derived, as derived object members contain members of the base class.
  • Destructor
    • Derived then Base.

Casting: A pointer of a class can be cast to it’s derived or super class pointer.

  • Upcasting – Casting derived into a base class pointer
  • Downcasting – Casting base into a derived class pointer.. CAN BE DANGEROUS
  • Only base class part of the derived object can be seen through the base pointer.
  • Hence overridden functions will also not be called by a static cast.
  • Virtual Functions: 
    • In C++, dynamic binding is made possible only for pointer & reference data types and for methods that are declared as virtual in the base class.
    • Virtual functions are called through the VFT function pointer array
    • pointer to the array is stored in each object, as shown below.
    • Once a function is made virtual it will automatically be virtual in it’s derived classes, as the derived class inherits. Due to this, if there is no overridden function for that virtual function, the pointer to the virtual function of the base class will be stored in the VFT of derived class.
    • When a virtual function is called, the only information the compiler has is the VFT of the class in which it has been cast, hence the according index will be present in the compiled code, which in execution time will point to the VFT of the actual object.
    • Any class which has even one virtual function, a VFT will be created for that, and is called a Polymorphic Class.

VTable

  • Pure Virtual Functions… is a special syntax specified in C++ with which you can specify that you don’t have any definition for that member function, and hence don’t allow the construction of an object of the classes in which it is present.

Screen Shot 2014-02-12 at 11.36.34 pmType Casting:

  • used to convert the type of an object, expression, function argument, or return value to that of another type
  • (Silent) Implicit conversions and Explicit conversions.
  • To perform a type cast, the compiler:
    • allocates temporary storage
    • Initializes temporary with value being cast
  • Automatic Conversions:
    • The standard C++ conversions and user-defined conversions
    • Can be very handy,
    • And can also be unsafe at times
    • Overuse of user defined casting may lead to ambiguities
    • User defined casting can be obtained by either making a constructor of type A(B &) or operator A () in class B.

Alright, Alright, Alright! That would be all for this post. Remember that there is no better method for understanding a language than to code in it. Any mistakes or suggestions should be posted in the comments. Thanks for reading!

Hello World! What are Greedy Algorithms?

So hello world.. This would be the first post on this blog.

The posts here would be of the kind that are easy to read, objective oriented, and food for thought.

So this one goes for the beginners… don’t worry, after reading this you will not be a one!

Greedy Algorithms

Definition: An algorithm which makes locally optimum choices with a hope to make the whole outcome optimum.

Points to remember:

  • Greedy Algorithms are most commonly proved using two different approaches
  • First proving approach is “Stays Ahead” where you prove that your algorithm stays ahead in terms of the underlying criteria w.r.t. to any other approach.
  • Second proving approach is called “Exchange Argument”, where you start with the optimal solution and exchange it with the greedy approach using some number of steps so that at each step you prove that you are not decreasing the outcome.

Some Basic Pseudocodes:

Interval Scheduling Problem:

 

Interval Partitioning Problem:

 

Minimize maximum Lateness: