You are not logged in. Please login at www.codechef.com to post your questions!

×

SEATSR-Editorial

31
6

PROBLEM LINK:

Practice
Contest

Author: Sergey Nagin
Tester: Jingbo Shang
Editorialist: Ajay K. Verma

DIFFICULTY:

Easy-Medium

PREREQUISITES:

Dynamic programming, Edit Distance

PROBLEM:

Given two strings, and and integer k, determine if the edit distance between the two strings exceeds k, and if not, then also compute the edit distance between the two strings.

EXPLANATION:

We are given a string s, which we want to transform into another string w using the following three operations:
1) add a character at any position , cost = a
2) delete a character, cost = a
3) replace a character, cost = b

The goal is to minimize the cost, and if the cost exceeds a given value k, then return -1. Note that if a = 0, then the answer will be zero, as we can remove insert/delete any character in the first string at zero cost. From now onwards, we will assume that a > 0.

This is a standard edit-distance problem, which can be solved using dynamic programming in O (N2) time, where N is the size of the string. However, in our case N is too large and the Wagner–Fischer algorithm of quadratic complexity will run out of time. However, there is a special constraint here, which is that we need to compute the edit distance only if it does not exceed k. This can be used to reduce the complexity to O (Nk).

In order to understand the modification, we first need to understand the Wagner–Fischer algorithm, which we highlight very briefly in the next section. If you are unaware of this algorithm, please take a moment to go through the above mentioned link.

Wagner–Fischer algorithm:

Suppose of the of string s is m, and the size of string w is n, then we create a two dimensional array d[0..m][0..n], where d[i][j] denotes the edit distance between the i-length prefix of s and j-length prefix of w.

The computation of array d is done using dynamic programming, which uses the following recurrence:
d[i][0] = i * a, for i <= m
d[0][j] = j * a, for j <= n

d[i][j] = d[i - 1][j - 1], if s[i] == w[j],
d[i][j] = min(d[i - 1][j] + a, d[i][j - 1] + a, d[i - 1][j - 1] + b), otherwise.

Restricted Edit Distance:

We can modify the definition of array d[][], a little bit.
d[i][j] = inf, if the edit distance between the i-length prefix of s and j-length prefix of w exceeds k,
d[i][j] = edit distance between the i-length prefix of s and j-length prefix of w, otherwise.

Note that if (i - j) > k, then we need to delete more than k characters from the first string, hence, the edit distance will be greater than k * a >= k. Similarly, if (i - j) < -k, then we need to add more than k characters in the first string, which will again have a cost greater than k * a >= k. Hence, d[i][j] = inf, if |i - j| > k

This means we only need to compute the values d[i][j], where |i - j| <= k. All other values in the array has to be infinite. There are only O(N * k) such entries, hence the complexity reduces to O (Nk).

For simplicity, we can use another array P[0..N][-k...k], such that
P[i][delta] = d[i][i + delta]
We are using negative indices only for simplicity, an offset can be used to make the indices non-negative.

The recurrence for P can be written easily using the recurrences for d.
P[i][- i] = i * a, for i <= k
P[0][i] = i * a, for i <= k

P[i][delta] = P[i - 1][delta], if s[i] == w[i + delta]
P[i][delta] = min(P[i - 1][delta + 1] + a, P[i][delta - 1] + a, P[i - 1][delta] + b), otherwise

In the recurrence, one should use the value inf, if delta goes outside the range [-k, k].

If the difference between the length of the two strings is more than k, then the answer will be -1, otherwise answer will be P[s.length()][w.length() - s.length()]. Once again, it should be checked if the final answer exceeds k, if so then we should return -1 instead.

Time Complexity:

O (Nk)

AUTHOR'S AND TESTER'S SOLUTIONS:

Author's solution will be put up soon.
Tester's solution will be put up soon.

This question is marked "community wiki".

asked 13 Oct '14, 15:01

djdolls's gravatar image

6★djdolls
2.2k508484
accept rate: 0%

edited 29 Oct '14, 20:00

varun1's gravatar image

4★varun1
280124

1

Please update author's solution

(18 Oct '14, 22:18) gm_krishna52★

Please update author's and tester's solution...

(19 Oct '14, 14:36) khooni_khopri2★

@admin When exactly you will be putting author and tester's solution?

(21 Oct '14, 04:01) rahugur4★
3

waiting for authors solution...........

(29 Oct '14, 03:08) Ordinary_Sheep4★
1

Waiting for author's, setter's and editorialist's solution, DAY 201

(05 May '15, 23:50) prashant_93_y2★

36

Could be done by Ukkonen's Algorithm: http://www.cs.helsinki.fi/u/ukkonen/InfCont85.PDF

link

answered 13 Oct '14, 17:07

kaushik_iiitd's gravatar image

2★kaushik_iiitd
1.6k1818
accept rate: 36%

4

Great to fellow programmers referring to research papers, @kaushik_iiitd and yes (Y) for @sereja :-)

(13 Oct '14, 18:15) abcdexter242★
1

higher quality PDF of the paper available here @ http://www.sciencedirect.com/science/article/pii/S0019995885800462

(13 Oct '14, 19:20) yleewei1★
2

Yeah that works , thanks @kaushik_iiitd :D

(15 Oct '14, 12:43) dhruvchandhok5★
20

Why is this problem tagged as Easy-Medium in the editorial tag? I think it should be marked at least as Medium because it requires pre-requisite knowledge of a popular dynamic programming algorithm and then we need to optimize it further.

The reason why I am pointing this out is that several users who start off with competitive programming on Codechef, start solving problems from the Easy subpart in the Practice Section. It can be demotivating for a user if he/she is unable to solve the problem even though it is in the so-called "Easy" Section.

I guess it is very common that even some rather difficult problems are marked as Medium.

Of course, difficulty is relative, it differs from person to person. However, some problems like this one are definitely not Easy.

Do correct me if I am wrong :)

link

answered 13 Oct '14, 22:54

wittyceaser's gravatar image

2★wittyceaser
3.4k194375
accept rate: 16%

edited 13 Oct '14, 23:03

1

I have to agree. I think the tags are a bit off this time. ChefSquare = easy, SeaStr = easy-med and Qstring - med (considering O(log n) per query)

(13 Oct '14, 23:02) thezodiac19946★
1

Its in the medium section though @wittyceaser. So even if anybody practices he'll find it in medium!

(14 Oct '14, 01:31) pranjalranjan4★
1

Yup, I saw that but my intention was to address this issue on an overall basis and not just this problem.

(14 Oct '14, 01:51) wittyceaser2★

If no more than k differences should be found. In this case it is necessary to calculate only the diagonal band of width 2k+1 in matrix (Ukkonen cut-off), which reduces the time complexity to O (k min (m, n))

I think this figure is best to visualize the idea.

k step right and k step left to the main diagonal will do.

alt text

My Solution which implement this

link

answered 13 Oct '14, 19:19

diptesh1ce3's gravatar image

4★diptesh1ce3
3652713
accept rate: 8%

edited 13 Oct '14, 21:41

I referenced this diagram (@ http://i.stack.imgur.com/2LtnD.png) instead for my implementation (which turns out to be WA) is there a way to reconcile the two? meaning how should I setup the for-loops to iterate the correct bounds as necessary to compute the final answer?

(13 Oct '14, 19:37) yleewei1★

I edited my ans with link of my solution, see if it helps.

(13 Oct '14, 21:41) diptesh1ce34★

alright, will take a look at it. thanks

(13 Oct '14, 23:10) yleewei1★

Please note a correction in the editorial, for the statement (i-j) < k , it should be (i-j) < -k.

link

answered 13 Oct '14, 15:50

foolan's gravatar image

4★foolan
356
accept rate: 0%

1

thanks, corrected.

(13 Oct '14, 16:01) djdolls6★

Shouldn't

P[i][delta] = P[i - 1][delta], if s[i] == w[i + delta] ?

link

answered 20 Oct '14, 10:27

murdocc007's gravatar image

3★murdocc007
4727
accept rate: 12%

What is wrong in this solution?

Solution

link

answered 13 Oct '14, 18:14

rahul_nexus's gravatar image

2★rahul_nexus
7741923
accept rate: 13%

your solution fails for this test case 1 abcde aa 0 1 0 exp ans=0 your ans=-1 try returning 0 whenever value of a=0(there are definitely some cases regarding this as i was too stuck) since there is no need for further computation as deleting the entire first string and inserting the other would take 0 min operations

(16 Oct '14, 23:54) ak294★

I seem to have gotten most of the insights mentioned in this editorial, but my solution is still WA.

Any idea why? or possibly provide a test case where my algo would fail, so I can see where I went wrong, and what I've missed out?

Thanks

link

answered 13 Oct '14, 19:24

yleewei's gravatar image

1★yleewei
8414
accept rate: 0%

I used this logic, but getting wrong answer. Can somebody plz help me finding where i was wrong?

//m is length(X)+1            
//n is length(Y)+1
//long long int arr[100005][2]
//flag is 0 or 1, initialized by 0

long long int DP()
{
        if(a==0)
           return 0;

        diff=m-n;
        if(diff<0)
          diff=(-diff);

        if(diff*a>k)
            return -1;

        if(b==0)
         {
             if(diff*a <=k)
                return diff*a;
             else
                return -1;
         }

        for(i=1;i<m;i++)
    {
        for(j=1;j<n;j++)
        {
                               //calculation for replacement    
                if(i==1 && j==1)
                    corner=0;
                else if(i==1)
                    corner=(j-1)*a;
                else if(j==1)
                    corner=(i-1)*a;
                else
                    corner=arr[1-flag][j-1];

                        if(X[i-1]!=Y[j-1])
                    corner=corner+b;

                                //calculation for addition          
                if(j==1)
                    left=i*a;
                else
                    left=arr[flag][j-1];

                left=left+a;

                               //calculation for deletion
                if(i==1)
                    top=j*a;
                else
                    top=arr[1-flag][j];

                top=top+a;

                arr[flag][j]=Minimum(left,top,corner);

        }
        flag=1-flag;
    }

            return arr[1-flag][n-1];

}
link

answered 13 Oct '14, 22:43

vikas6816's gravatar image

4★vikas6816
211
accept rate: 0%

edited 14 Oct '14, 00:58

guys do some smart work... instead of pasting your entire code .. simply provide a link to your solution.. it will be more friendly to others..

(17 Oct '14, 03:26) vkrishna173★

Could anyone point out what's wrong with my solution? I'm getting a TLE. I used Ukkonen's Algorithm for my approach. I'm using a single array to store the edit distances and a temporary variable to store the value of what would be d[i-1][j-1], if the O(N2) approach with the d[0..m][0..n] matrix is used. I seem to have missed something and I haven't been able to figure it out yet.

link

answered 14 Oct '14, 10:09

sniper6's gravatar image

2★sniper6
14814
accept rate: 50%

edited 14 Oct '14, 12:07

what is delta in edtiorial

link

answered 14 Oct '14, 22:10

nil96's gravatar image

2★nil96
18071844
accept rate: 5%

I'd like to ask:

How you generated some let say random inputs, but with known result to test your algorithm?

Because of small k I thought, that I can use this:

Both input string should be in form of

...S1...S2...S3...SN...

where S1, S2, ... SN are some common substrings and "..." is some mess.

To find longest substring I wanted to use DP, which is unfortunately O(n2) (n is the length of longer string), so I decided I can try to find those in segments of 400 characters (if there is not common string result is -1).

When I did this I simply replaced S1, ..., SN with characters that were not in initial strings and run "find longest common subsequence algorithm" also O(n2) algorithm, but now for string with length of ~200 characters...

Expected complexity something like 100400400 + 200*200 per test case, but I didn't get to TLE, because it worked on my PC, but was failing for some test cases.

link

answered 14 Oct '14, 23:46

betlista's gravatar image

3★betlista ♦♦
16.9k49115225
accept rate: 11%

I'm trying to solve this problems with above idea, but I get WA, I don't know what's wrong with my code? Here is my code : http://www.codechef.com/viewsolution/5163464 Could anyone give me a hint why I'm wrong? Thank you! ^^

link

answered 18 Oct '14, 10:04

tonthatvinh's gravatar image

4★tonthatvinh
33215
accept rate: 0%

Author's and Setter's solutions please

link

answered 19 Oct '14, 02:04

yashkumar18's gravatar image

5★yashkumar18
82661224
accept rate: 18%

Can anyone explain Restricted Edit Distance in some another manner ?

I'm not able to understand that how we are going to reduce the size of the array d that we are using to store edit distance.

link

answered 19 Oct '14, 19:39

sagar0430's gravatar image

2★sagar0430
1
accept rate: 0%

nice editorial.

link

answered 06 May '15, 02:12

sharru05's gravatar image

3★sharru05
549523
accept rate: 14%

Hey guys, I made a video editorial on this technique here:

https://youtu.be/7GCSA4u5ynw

It's been a long while, but the algorithm is still great!
Cheers!

link

answered 12 May, 17:46

gkcs's gravatar image

4★gkcs
2.5k11127
accept rate: 9%

edited 12 May, 17:48

toggle preview
Preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported
  • mathemetical formulas in Latex between $ symbol

Question tags:

×14,366
×2,396
×319
×47

question asked: 13 Oct '14, 15:01

question was seen: 8,429 times

last updated: 12 May, 17:48