Dynamic Programming – Edit Distance

Hello people..! In this post, we will see another dynamic programming based problem, finding the minimum edit distance between two strings.

Problem – Given two strings A and B, we need to find the minimum number of operations which can be applied on A to convert it to B. The operations are –

  • Edit – Change a character to another character.
  • Delete – Delete a character.
  • Insert – Insert a character.

Normally I would take a brute force solution and reduce it to a dynamic programming solution, but in this case I’m not sure how I can do that 😛 . So, I will straight away explain the dynamic programming solution in the best way I can.

Example

Let’s take an example. Let the string A = “this”, and let string B = “there”. Now we will perform a very simple pen-and-paper exercise. Please draw the chart given below in your rough notes –

Edit distance matrix

The Y axis denotes the source string A and the X axis denotes the target string B.

  • Empty cell at 1st row, 1st column denotes the situation when A = ∅ and B = ∅. The edit distance would be 0.
  • Empty cell at 1st row, 2nd column denotes the situation when A = ∅, B = “t”. We would need 1 insertion to convert A to B. Here, the answer is 1.
  • Empty cell at 1st row, 3rd column denotes the situation when A = ∅, B = “th”. We would need 2 insertions to convert A to B. Here, the answer is 2.
  • Empty cell at 1st row, 6th column denotes the situation when A = ∅, B = “there”. We would need 5 insertions to convert A to B. Here, the answer is 5.

This is the trivial case when A = ∅. The other trivial case is when B = ∅, which is denoted by the empty cells in the first column.

  • Empty cell at 2nd row, 1st column denotes the situation when A = “t” and B = ∅. We would need 1 deletion to convert A to B. Here, the answer is 1.
  • Empty cell  at 3rd row, 1st column denotes the situation when A = “th” and B = ∅. We would need 2 deletions to convert A to B. Here, the answer is 2.
  • Empty cell  at 5th row, 1st column denotes the situation when A = “this” and B = ∅. We would need 4 deletions to convert A to B. Here, the answer is 4.

So, now after the trivial cases our matrix should look like this –

Edit distance matix with trivial cases

Now that you know what this matrix really means, it shouldn’t be difficult to understand that what we actually want is the value of the last empty cell, which denotes the situation when A = “this” and B = “there”.

  • Empty cell at 2nd row, 2nd column denotes the situation when A = “t” and B = “t”. The answer is 0.

Now take a moment and think, when A = “t” and B = “t”, the last characters match. Wouldn’t the answer be same as when the last characters were removed? That is, when A = ∅, B = ∅? Yes! Of course! Take another example, let A = “ste”  and B = “sto”, the edit distance currently is 1. Now, if A = “step” and B = “stop” isn’t still the edit distance 1? Yes! So, if the ending characters are same, then the edit distance is same as when these two were removed.

Looking at the matrix, if you are at the cell (i, j) can you tell which cell would be the situation when the last characters of both the strings were removed? It is the cell (i – 1, j – 1)! Moving forward,

  • Empty cell at 2nd row, 3rd column denotes the situation when A = “t” and B = “th”. The edit distance is 1.

We know that the answer is 1. But how do we derive this using the solutions we already have? Let us say the empty cell is (i, j). We can arrive at (i, j) using the result at (i – 1, j), (i – 1, j – 1) and (i, j – 1).

edit-distance-3

What does each arrow mean?

  • Insert – Cell at 2nd row 2nd column denotes A = “t” and B = “t”. So we know that the edit distance to convert “t” → “t” is 0, which is the value of the cell. Now, if we insert a character at the end, we can get “th”. So, we are doing “t” → “t” → “th”. We already know the cost of “t” → “t” and we add the cost of “t” → “th” to it. So, the value is 0 + 1.
  • Delete – Cell at 1st row 3rd column denotes ∅ → “th”. But we want “t” → “th”. If you observe, we start with A = “t” for the 2nd row. So, imagine we have A = “t∅” and we used ∅ → “th” to get “t∅” → “tth”. Now, deleting a ‘t’ would give us “tth” → “th”. It might sound really confusing, but the chain of conversions is, “t∅” → “tth” → “th”. The cost of “t∅” → “tth” is given by the cell at 1st row 3rd column and “tth” → “th” is a delete operation which costs 1. So, the value is 2 + 1.
  • Edit – Cell at 1st row 2nd column denotes the cost of ∅  → “t”. As said previously, we start with A = “t” in the 2nd row. So consider A = “t∅” and use the above conversion to get, “t∅” → “tt”. Now we apply edit operation to convert “tt” → “th”. So, the chain of operations are “t∅” → “tt” → “th”. Cost of “t∅” → “tt” is given by the value of cell at 1st row 2nd column and “tt” → “th” costs 1 edit operation. So, the value is 1 + 1.

I know the delete and edit operations sound extremely confusing, please ignore them if you don’t understand. Most tutorials just put the dynamic programming formula for the edit distance problem, write the code and be done with it. But I wanted to go one step deep and explain what that matrix meant and what each term in the dynamic programming formula (in a few moments) will mean.

So, now we had 3 options, insert, delete and update. We have calculated the total cost of choosing each option. Now which one will we choose? The one which has the least cost! If you notice in my explanation above, the cost is = 1 + value_of_respective_cell. So, now the dynamic programming formula to calculate the value of any cell is –

Edit distance formula

Where dp is the matrix. We already know the values for the first row and the first column, and the rest of the matrix can be generated using the above formula. If you notice, we are using the previous results to build the current result. This is a form of bottom-up dynamic programming.

If you fill all the cells of the matrix, it would look like –

Edit distance final matrix

The final edit distance is the last cell which is 3.

Code

If you have understood the above formula, then coding is pretty simple. You construct a 2-D array of size |A| + 1 and |B| + 1, where |A| and |B| are lengths of the strings respectively. Initialize the fist row and first column with its row and column number respectively and apply the formula above.

The code for edit distance is pretty simple. Try to code it with a clear head. It is a kind of code which people generally code without logical errors in the first go. The trick is in that formula above. If you get stuck you can use my code below as a reference.

CJava
    
    

Optimized Edit Distance Algorithm

If you have noticed, we just need the (i – 1)th row to calculate the ith row. Which means that we need not store the entire array, we could just store the previous row only. This drastically reduces the memory complexity.

Try coding using only one array of auxiliary space. You can use my code below as a reference –

CJava
    
    

Edit Distance Algorithm Complexity

The time complexity for both versions of the algorithm is O(|A| × |B|) and the memory complexity of the normal version is O(|A| × |B|) and the memory complexity of the optimized version is O(|A|) where A would be the shorter string.

Practice

Edit distance is very useful in competitive coding. You can solve these questions to get a hang of the algorithm –

I hope you have understood a little about the algorithm. The low level details of this algorithm sound very confusing, so try learning it on a high level ignoring the details if you don’t understand them yet. But do come back and take a look at my explanation! Keep practicing! Happy Coding! 😀

Dynamic Programming – Kadane’s Algorithm

Hello people..! In this post, we will look at a simple dynamic programming based algorithm which is the optimal solution to the Maximum Sum Subarray Problem. Try to focus on how the approach of dynamic programming is applied rather than just trying to understand the main algorithm. Now, let us define the problem.

Maximum Sum Subarray – Given an array A[1, 2, 3, … n]. We need to find that subarray A'[i, i + 1, i + 2, … j] where 1 ≤ i ≤ j ≤ n, where the sum of all the elements in A’ is maximum.

In simpler terms, we need to find that contiguous portion of an array where the sum is maximum.

Brute force approach

Brute force approach would be to look at all the sub-arrays starting with A[i] –

maxSum ← -∞

for i ← 1 to n
        sum ← 0
        for j ← i to n
                sum += arr[j]

                maxSum ← max(maxSum, sum)

return maxSum

This is algorithm is simple but it takes O(N²) time. The problem with this algorithm is that we are not able to save computation by using the solutions to the problems we already solved.

Let us re-write this algorithm slightly differently! Pay attention! Instead of starting from A[i] and going forward looking for maximum, this time, let us start at A[i] and go backward doing the same thing, finding the maximum. Here is the algorithm –

maxSum ← -∞

for i ← 1 to n
        sum ← 0
        for j ← i to 1   // decrement j
                sum += arr[j]

                maxSum ← max(maxSum, sum)

return maxSum

What we are doing is instead of looking at all the sub-arrays starting with A[i], we will look at all the sub-arrays ending with A[i]. This still runs in O(N²) time. So, what’s so special about this? Let is say we have an array A = {-1, 2, 3, -4}. The above algorithm would work on this algorithm something like this –

  • for i = 1 ⇒ max( sum(-1) )
  • for i = 2 ⇒ max ( sum(2), sum(2, -1) )
  • for i = 3 ⇒ max ( sum(3), sum(3, 2), sum(3, 2, -1) )
  • for i = 4 ⇒ max ( sum(-4), sum(-4, 3), sum(-4, 3, 2), sum(-4, 3, 2, -1) )

The solution would be the maximum of all iterations. If you notice, in each iteration, we are finding the maximum sum of all the sub-arrays ending with A[i]. Let us say we store this result in an array maxSum[i].

Kadane’s Algorithm states that,

kadane-algorithm-equation

In simple terms, it states that, the maximum sum sub-array ending with A[i], is either the element A[i] itself, or A[i] combined with the maximum sum sub-array ending with A[i – 1], whichever is the greater one.

Ponder upon this for a while. Using the above intuition we can re-write the above iterations as –

  • for i = 1 ⇒ max ( -1 ) ⇒ maxSum[1] = -1
  • for i = 2 ⇒ max ( 2, 2 + -1 ) ⇒ maxSum[2] = 2
  • for i = 3 ⇒ max ( 3, 3 + 2 ) ⇒ maxSum[3] = 5
  • for i = 4 ⇒ max ( -4, -4 + 5 ) ⇒ maxSum[4] = 1

And as before, the solution would be the maximum of all the elements in maxSum array. For the sake of simplicity, we used the array maxSum[], but we only need the previous iteration’s result to compute the current iteration result. So, we can make it a variable. So our iterations change into –

  • for i = 1 ⇒ maxSum = -1
  • for i = 2 ⇒ max ( 2, 2 + -1 ) ⇒ maxSum = 2
  • for i = 3 ⇒ max ( 3, 3 + 2 ) ⇒ maxSum = 5
  • for i = 4 ⇒ max ( -4, -4 + 5 ) ⇒ maxSum = 1

Why was the first iteration written like that? It is the trivial case. The maximum sum sub-array ending with the first element is the first element itself. And for each iteration, we could keep a global variable, say maxGlobalSum, which keeps track of maximum of all maxSum values.

If you have understood everything up to now, you should be able to code the algorithm very easily. If you didn’t understand, please give a couple more readings and try to code again. Feel free to ask your doubts in the comments!

If you are stuck with the algorithm, you can use my code as a reference –

CJava
    
    

This algorithm runs in O(N) time and uses O(1) space. I hope you have developed an idea of how to think in the dynamic programming way. To get a dynamic programming algorithm, we just have to analyse if where we are computing things which we have already computed and how can we reuse the existing solutions. Keep practising..! Happy Coding! 😀

Dynamic Programming – Introduction and Fibonacci Numbers

Hello people..! This is the first post of Dynamic Programming – Introduction and Fibonacci Numbers. In this post I will introduce you, to one of the most popular optimization techniques, the Dynamic Programming. Dynamic Programming is the way of solving very complex problems by breaking them into subproblems such that the optimal solutions of the subproblems can be used to construct the optimal solution to the main problem. Dynamic Programming was put forth by Richard Bellman, who is well known for his work, Bellman Ford Algorithm. The name Dynamic Programming is a very misleading name, he just gave it to hide the fact that he used a lot of mathematics to come up with Dynamic Programming.

Dynamic Programming such a general concept that is so often used in competitive programming, that you will see a dynamic programming problem every now and then. When you ask the people who solved the question, they say, “Oh! That’s a DP problem.”. And when you ask, “Wow! You know Dynamic Programming..?! When did you learn that…?”. They often say, “I didn’t learn it anywhere, it just struck my head..!”. I hear this from many people. Most of the all-time programmers solve DP problems with ease without formally learning Dynamic Programming. This is because, over the course of their programming experience, they found some defects in many algorithms, and the solution to correct them seemed rather too obvious to them. This is what Dynamic Programming corrects does too. It corrects the naive way of doing things such that we end up getting an optimal way of doing it. We will discuss the formal way of approaching a Dynamic Programming problem.

Elements of Dynamic Programming

There are two fundamental elements of Dynamic Programming –

  • Optimal Substructure – We can apply Dynamic Programming to a problem if we are able to identify an optimal substructure for that problem. If a problem can be divided into sub problems such that, the optimal solutions to the sub problems can be used to construct the optimal solution of the main problem, then, the problem is said to exhibit an optimal sub structure. Generally, Greedy Algorithms are used to solve problems that exhibit optimal sub structure. Then, when do we use Dynamic Programming ? The next point answers that question.
  • Memoisation – When the sub problems overlap, or, there is a vast repetition of the sub problems, then, it is better compute the optimal solution to the repetitive sub problem once, and then we store it. Similar to us writing notes on a memo table so that we can refer to it later. This is done so that whenever we need it again, we can lookup for the solution to the sub problem, which would save us the time of computing it again.

These two are the most fundamental elements of Dynamic Programming. If you can form the optimal substructure for the given problem and make sure that you don’t compute one sub problem again and again, then, you have made your code optimal in Dynamic Programming’s point-of-view…! Generally, the naive approaches to various problems have high complexity because we keep computing a sub problem again-and-again. Dynamic Programming exploits this fact. This can be very easily understood if we take up the example of computing the nth Fibonacci number. The code for the naive approach would be –

long long int fibonacci(int n)
{
	if (n == 1 || n == 2) {
		return 1;
	} else {
		return fibonacci(n - 1) + fibonacci(n - 2);
	}
}

Is this approach efficient…? No..! It is a crazy approach..! Why..?! We all know that its complexity is exponential. Remember guys, for a code –

  • O(1) is ideal.
  • O(log N), logarithmic, is a fair compromise to being a constant.
  • O(N), linear, is pretty fast.
  • O(N log N) is okay.
  • O(N2) is bad.
  • And O(2N), exponential, will get you kicked out of the house..! 😛

But if we go deep into the algorithm we wrote above, we observe the reason why it is exponential in complexity. Try drawing the recursion tree of the above recursive algorithm. It should look something like this –

Fibonacci Number Without DP

Fibonacci Number Without DP

Observe closely…! The fib(4) sub-tree occurs twice and is exactly the same. The fib(3) sub-tree which occurs trice, is exactly the same everywhere. Clearly, we are doing the same thing again-and-again. What if we did it only once ? We could compute fib(i) only once and write it down somewhere so that it can be referred to later when we needed it again. If we compute fib(i) only once, our algorithm would have a complexity of O(N), which would be way better than exponential. This is what Memoisation does to an algorithm. It ensures that we don’t need to compute a sub problem again. Suppose we stored the value of fib(i) once we computed, we could save a lot of recursive calls. If we applied memoisation to the above recursive tree, it would look like –

Fibonacci Number With DP

Fibonacci Number With DP

The green tick marks indicate the value of recursive calls we would already know forehand. So, we would save all the recursive calls that would emerge from them. By the way, the word “Memoisation” is not a typo. It is “Memoisation” and not “Memorization”. It comes from the word “memo”, since we are writing down the values in a memo table so that we can look it up later.

Look up Table

For memoisation, we need a look up table. But, we should take care that the insert and find operations defined on this data structure do not take too much of time, otherwise, our algorithm might have the same worst case complexity as the naive one. So, what is the data structure that supports the fastest insert and search operations…? A Hash Table off course…! Hash table by chaining, with a decent hash function is one of the best options you have. But if you don’t have the time to create a hash table by chaining, or you don’t want to have so much of code in your algorithm, the next best thing you can think of should be a C++ STL Map, or a Multimap. Maps and Multimaps in C++’s STL Library are implemented using binary search trees. Which means that the insert and search operations defined on a map take O(log N) each. In most situations, O(log N) is a very good compromise for speed. This is because even if your ‘N’ is in the order of 106, log N, would be around 20, which has an almost negligible performance difference to that of O(1) time as far as 106 is concerned. So, using a map keeps you away from writing too much code, makes your program look neat and tidy. But, it is a compromise nonetheless. If your insert and search operations are way too high, prefer a Hash Table by Chaining. Python programmers can use a dictionary.

Bottom Up Approach

The correct way of solving the Fibonacci numbers is the “bottom-up” approach. As the name suggests, we do things “bottom-up”. What it means is that we compute the smallest sub problem first and then work our way up to progressively build the final solution. Take a look at the code below –

long long int fib(int n)
{
	if (n == 1 || n == 2) {
		return 1;
	}

	int i;
	long long int SecondLast, ThirdLast, Last;

	SecondLast = ThirdLast = 1;

	for (i = 1; i <= n - 2; ++i) {
		Last = SecondLast + ThirdLast;
		ThirdLast = SecondLast;
		SecondLast = Last;
	}

	return Last;
}

This function computes nth Fibonacci number in O(N) time and O(1) space complexity. It does it by keeping a track of TN – 1 and TN – 2 to compute TN. It does it in a bottom-up fashion, that is, we already know T1 and T2. Now it uses T1 and T2 to compute T3. Now, to compute T4, it will use the values that it already has, that is, it re-uses the solution to the sub problem, T3 to compute T4 (= T3 + T2). It goes on like this by using the solutions of the sub problems which it has already solved. Well, the function can compute only up to the 92nd Fibonacci number, beyond that can’t be put into a long long integer. But, this was the bottom-up approach. The bottom-up approach is slightly faster than the normal memoisation approach (or you could say “Top-to-Bottom approach”), because, it avoids function calls and performing operations on the look up table.

However, the fastest way to compute the nth Fibonacci number is not this. There is an O(log N) approach which based on the mathematical approximation formula –

f_n \approx \frac{1}{\sqrt{5}} \left ( \frac{1 + \sqrt{5}}{2} \right )^{n + 1}

The value computed by f_n is such that it very close to the nth Fibonacci number, so it can be rounded off to the nearest integer to get the nth Fibonacci number. The value \frac{1 + \sqrt{5}}{2} is called phi, or “φ”, or the Golden Ratio. Now, the question is how to calculate \phi^{n + 1} in O(log N) time. We do this, by using Binary Exponentiation Technique, which works by Dynamic Programming…!

Binary Exponentiation Technique

Binary Exponentiation Technique or Exponentiation by Squaring or simply Binary Exponentiation is used to compute positive integer powers of a number very quickly, in O(log N) time. This process basically, divides the job into two equally small sub problems, in this case, the sub problems are identical, so we solve it only once, recursively. This formula will make things clear –

x^{n} =  \left\{\begin{array}{l}  x \times x^{n / 2} \times x^{n / 2} \qquad \, \text{n is odd} \\  x^{n / 2} \times x^{n / 2} \qquad \qquad \text{n is even}  \end{array}\right.

This can be easily implemented by a recursive approach. The code would look like –

double BinaryExponentiation(double x, int n)
{
	if (n == 0) {
		return 1;
	} else if (n == 1) {
		return x;
	} else {
		if (n % 2 == 0) {
			return BinaryExponentiation(x, n / 2) * BinaryExponentiation(x, n / 2);
		} else {
			return x * BinaryExponentiation(x, n / 2) * BinaryExponentiation(x, n / 2);
		}
	}
}

But this still isn’t an O(log N) approach. If you observe carefully, we are computing “BinaryExponentiation(x, n / 2)” twice at each step. We can improve the algorithm if we can memoise it. The Dynamic Programming version would be –

double BinaryExponentiation(double x, int n)
{
	if (n == 0) {
		return 1;
	} else if (n == 1) {
		return x;
	} else {
		double temp = BinaryExponentiation(x, n / 2);

		if (n % 2 == 0) {
			return temp * temp;
		} else {
			return x * temp * temp;
		}
	}
}

This is an O(log N) algorithm which gives x^{n} . If we substitute ‘x’ for φ and ‘n’ for (n + 1) and round it to the nearest integer, we would get the nth Fibonacci number. So, the call from main function would look like –

// define a macro
// #define PHI (1.0 + sqrt(5)) / 2.0

int fib = BinaryExponentiation(PHI, n + 1) + 0.5;

The additional 0.5 is to round it off to the nearest integer. It is a very clever way of rounding off floating point numbers. Give it a thought and try a few examples, I’m sure you’ll get it…! This is the O(log N) solution to calculate the nth Fibonacci number using Dynamic Programming.

Dynamic Programming VS Greedy Algorithms

A problem which can be solved by dynamic programming always seems that it can be solved by a greedy approach too. This is because if any problem exhibits optimal substructure property, it is said to be solvable by a greedy approach. So, when does Dynamic Programming come into the picture ? It is when the optimal sub problems overlap or repeat themselves. Then it becomes important to memoise the solutions to the sub problems to optimize our solution. But some times, the greedy approach doesn’t give the right answer.

Dynamic Programming VS Divide-and-Conquer

There is a subtle difference between the two flavors of algorithms. Both depend on the fact that dividing the main problem into sub problems and these local solutions can be used to construct a global solution. But, the sub problems in Divide-and-Conquer are disjoint in nature, that is, one sub problem has got nothing to do with the other. Where as, in Dynamic Programming, the sub problems are inter-dependent and they often overlap with one another. Its okay if you don’t understand this difference straight away, it will be more clear when we will finish our discussion about Dynamic Programming’s limitations.

Dynamic Programming as “Careful Brute Force”

Most problems of Dynamic Programming actually don’t contain any genius logics. Well, look at the above example. We didn’t do anything super intelligent. We just looked at all the different possibilities, which is nothing more than a brute force kind-off approach. But, we did this “carefully”…! We ensured that we never did the same exact thing twice in our algorithm and the algorithm ran perfectly. What really happens in a brute force approach is that we tend to compute the same things again-and-again. By using memoisation, we can turn it into an efficient algorithm. So, henceforth, whenever you think of a brute force algorithm, think of what you are doing again-and-again, then, you’ll know if the given problem is actually a Dynamic Programming problem or not…!

Limitations of Dynamic Programming

Although, Dynamic Programming is one of the most popular optimisation techniques, it is applicable only when –

  • Subproblems form an optimal substructure.
  • Subproblems overlap or repeat themselves.

If these conditions don’t meet, Dynamic Programming will only add overheads to the algorithm. For example, do you think applying memoisation to Merge Sort will speed up the algorithm…? No..! This is because the sub problems are disjoint in nature. Such situations draw a line in the sand for Dynamic Programming.

So, it is up to you that you feel if Dynamic Programming, or Divide-and-Conquer or Greedy approach is correct for the given problem. These three can get real confusing. And the best way to deal with it is by practice. And this is what we will be doing in all the posts to come, related to Dynamic Programming. We will take up some classical DP problems and solve them. This post was to get you started with Dynamic Programming. I know it is a very very long post, in fact this is the longest of mine so far…! Thanks for being very patient with me..! I hope my post has helped you, if it did, let me know by commenting…! Keep practising… Happy Coding…! 🙂