Hello people..! In this post, we will see another dynamic programming based problem, finding the minimum edit distance between two strings.
Problem – Given two strings A and B, we need to find the minimum number of operations which can be applied on A to convert it to B. The operations are –
- Edit – Change a character to another character.
- Delete – Delete a character.
- Insert – Insert a character.
Normally I would take a brute force solution and reduce it to a dynamic programming solution, but in this case I’m not sure how I can do that 😛 . So, I will straight away explain the dynamic programming solution in the best way I can.
Let’s take an example. Let the string A = “this”, and let string B = “there”. Now we will perform a very simple pen-and-paper exercise. Please draw the chart given below in your rough notes –
The Y axis denotes the source string A and the X axis denotes the target string B.
- Empty cell at 1st row, 1st column denotes the situation when A = ∅ and B = ∅. The edit distance would be 0.
- Empty cell at 1st row, 2nd column denotes the situation when A = ∅, B = “t”. We would need 1 insertion to convert A to B. Here, the answer is 1.
- Empty cell at 1st row, 3rd column denotes the situation when A = ∅, B = “th”. We would need 2 insertions to convert A to B. Here, the answer is 2.
- Empty cell at 1st row, 6th column denotes the situation when A = ∅, B = “there”. We would need 5 insertions to convert A to B. Here, the answer is 5.
This is the trivial case when A = ∅. The other trivial case is when B = ∅, which is denoted by the empty cells in the first column.
- Empty cell at 2nd row, 1st column denotes the situation when A = “t” and B = ∅. We would need 1 deletion to convert A to B. Here, the answer is 1.
- Empty cell at 3rd row, 1st column denotes the situation when A = “th” and B = ∅. We would need 2 deletions to convert A to B. Here, the answer is 2.
- Empty cell at 5th row, 1st column denotes the situation when A = “this” and B = ∅. We would need 4 deletions to convert A to B. Here, the answer is 4.
So, now after the trivial cases our matrix should look like this –
Now that you know what this matrix really means, it shouldn’t be difficult to understand that what we actually want is the value of the last empty cell, which denotes the situation when A = “this” and B = “there”.
- Empty cell at 2nd row, 2nd column denotes the situation when A = “t” and B = “t”. The answer is 0.
Now take a moment and think, when A = “t” and B = “t”, the last characters match. Wouldn’t the answer be same as when the last characters were removed? That is, when A = ∅, B = ∅? Yes! Of course! Take another example, let A = “ste” and B = “sto”, the edit distance currently is 1. Now, if A = “step” and B = “stop” isn’t still the edit distance 1? Yes! So, if the ending characters are same, then the edit distance is same as when these two were removed.
Looking at the matrix, if you are at the cell (i, j) can you tell which cell would be the situation when the last characters of both the strings were removed? It is the cell (i – 1, j – 1)! Moving forward,
- Empty cell at 2nd row, 3rd column denotes the situation when A = “t” and B = “th”. The edit distance is 1.
We know that the answer is 1. But how do we derive this using the solutions we already have? Let us say the empty cell is (i, j). We can arrive at (i, j) using the result at (i – 1, j), (i – 1, j – 1) and (i, j – 1).
What does each arrow mean?
- Insert – Cell at 2nd row 2nd column denotes A = “t” and B = “t”. So we know that the edit distance to convert “t” → “t” is 0, which is the value of the cell. Now, if we insert a character at the end, we can get “th”. So, we are doing “t” → “t” → “th”. We already know the cost of “t” → “t” and we add the cost of “t” → “th” to it. So, the value is 0 + 1.
- Delete – Cell at 1st row 3rd column denotes ∅ → “th”. But we want “t” → “th”. If you observe, we start with A = “t” for the 2nd row. So, imagine we have A = “t∅” and we used ∅ → “th” to get “t∅” → “tth”. Now, deleting a ‘t’ would give us “tth” → “th”. It might sound really confusing, but the chain of conversions is, “t∅” → “tth” → “th”. The cost of “t∅” → “tth” is given by the cell at 1st row 3rd column and “tth” → “th” is a delete operation which costs 1. So, the value is 2 + 1.
- Edit – Cell at 1st row 2nd column denotes the cost of ∅ → “t”. As said previously, we start with A = “t” in the 2nd row. So consider A = “t∅” and use the above conversion to get, “t∅” → “tt”. Now we apply edit operation to convert “tt” → “th”. So, the chain of operations are “t∅” → “tt” → “th”. Cost of “t∅” → “tt” is given by the value of cell at 1st row 2nd column and “tt” → “th” costs 1 edit operation. So, the value is 1 + 1.
I know the delete and edit operations sound extremely confusing, please ignore them if you don’t understand. Most tutorials just put the dynamic programming formula for the edit distance problem, write the code and be done with it. But I wanted to go one step deep and explain what that matrix meant and what each term in the dynamic programming formula (in a few moments) will mean.
So, now we had 3 options, insert, delete and update. We have calculated the total cost of choosing each option. Now which one will we choose? The one which has the least cost! If you notice in my explanation above, the cost is = 1 + value_of_respective_cell. So, now the dynamic programming formula to calculate the value of any cell is –
Where dp is the matrix. We already know the values for the first row and the first column, and the rest of the matrix can be generated using the above formula. If you notice, we are using the previous results to build the current result. This is a form of bottom-up dynamic programming.
If you fill all the cells of the matrix, it would look like –
The final edit distance is the last cell which is 3.
If you have understood the above formula, then coding is pretty simple. You construct a 2-D array of size |A| + 1 and |B| + 1, where |A| and |B| are lengths of the strings respectively. Initialize the fist row and first column with its row and column number respectively and apply the formula above.
The code for edit distance is pretty simple. Try to code it with a clear head. It is a kind of code which people generally code without logical errors in the first go. The trick is in that formula above. If you get stuck you can use my code below as a reference.
Optimized Edit Distance Algorithm
If you have noticed, we just need the (i – 1)th row to calculate the ith row. Which means that we need not store the entire array, we could just store the previous row only. This drastically reduces the memory complexity.
Try coding using only one array of auxiliary space. You can use my code below as a reference –
Edit Distance Algorithm Complexity
The time complexity for both versions of the algorithm is O(|A| × |B|) and the memory complexity of the normal version is O(|A| × |B|) and the memory complexity of the optimized version is O(|A|) where A would be the shorter string.
Edit distance is very useful in competitive coding. You can solve these questions to get a hang of the algorithm –
I hope you have understood a little about the algorithm. The low level details of this algorithm sound very confusing, so try learning it on a high level ignoring the details if you don’t understand them yet. But do come back and take a look at my explanation! Keep practicing! Happy Coding! 😀