Problem in understanding the edit distance problem??

can anyone please explain how to solve edit distance problem, I have referred many text but couldn’t understand the solution.

Can anyone please provide detailed explanation of the solution or refer me to some text.

Given two strings str1 and str2 and below operations that can performed on str1. Find minimum number of edits (operations) required to convert ‘str1′ into ‘str2′.

Insert
Remove
Replace
All of the above operations are of equal cost.

I have copied above text from Edit Distance | DP-5 - GeeksforGeeks
Infact both the problem and the answer has been nicely discussed there.

So we basically write down a backtrack function first of all:

string s1 , s2;

edit(m,n)// m->length of s1; // a backtrack fn which returns the number of operations reqd. if the lengths of //s1 and s2 were m and n only;
{
//base cases;
if(m==0)
return n;// as we need to remove all characters of s2;

if(n==0)
return m;

// if last characters are same
if(s1[m-1]==s2[n-1])
return edit(m-1,n-1);

//otherwise we need to do 1 of the three operations and we would like to do the one which results in minimum //total operations;

return 1 + min ( edit(m, n-1), // Insert
edit( m-1, n), // Remove
edit(m-1, n-1) // Replace
);

}

Now you can just memoize the above backtrack and complete the solution using dp.

NOTE: Code written above is kinda pseudo code so it won’t run directly. :stuck_out_tongue:

1 Like

Here are some resources you can refer

link1

link2

Even if you don’t understand then here are some insight about it.

PS: Basically you have two strings {str1,str2} and you have to convert str1 into str2 such that only three operations is allowed each of equal cost

1.Insert a character

2.Delete a character

3.Replace a character

Solution:

m = length of str1
n = length of str2
A recursive approach is to start from end of both string 
rec(string str1,string str2,int m,int n){
    if(m == 0)
        return n;
    if(n == 0)
        return m;
    if (str1[m-1] == str2[n-1])
        return rec(str1,str2,m-1,n-1)
    else
        //you have to take minimum of three operations.
        //insert , append nth char of str2 into str1
        int ins = 1 + rec(str1,str2,m,n-1) 
        //Delete , nth character of str1
        int del = 1 + rec(str1,str2,m-1,n) 
        //Replace nth character of str1 with mth character of str2
        int rep = 1 + rec(str1,str2,m-1,n-1) 
        return min(ins,del,rep);
}

Note : Above algorithm will take exponential time due to many overlapping cases.

Follow the two links above to understand the dynamic programming concept involved here which can reduced the time complexity to O(mn).

I was in the same situation a while back.
I’m assuming you’ve already seen the recursion/ pseudo code, but it’s something like

f(a, b) = min( 1 + f(a, b - 1), 1 + f(a - 1, b), 1 + f(a - 1, b  - 1))   - if str1[a] != str2[b] 
else f(a, b) = f(a - 1, b - 1) 

So assuming we have str 1 = “Hello”, str2 = “World”.

We start with the last letters, ‘o’ and ‘d’
These are not equal.
Also keep in mind that str1 is a source string and str2 is the target string. So any operations we’re performing are on str1.

So we can either replace ‘o’ with ‘d’, insert ‘d’ at the end of str1, or delete ‘o’

On inserting ‘d’, we would have str1 = “Hellod”, and str2 = “World”.

Why is this resulting in f(a, b - 1)?

As ‘d’ is at position a + 1 (As a is the position of the last character in the original string, hence appending any character to it makes it at position a + 1). So f(a, b - 1) is basically f(“Hello”, “Worl”).

Similarly, on deleting the character, we only consider “Hell” and “World”
Hence deleting = f(a - 1, b).

On replacing we only assume “Hell” and “Worl” (As replacing the character would return “Helld” and “World”) .
Hence it is equal to f(a - 1, b - 1).

I won’t go further, but this is the main concept, this is how I understood it. I hope it helps.

refer This website

1 Like