DELSUB2 - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: raysh07
Tester: sushil2006
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

Dynamic Programming

PROBLEM:

You have two strings A and B, of lengths N and M.
In one operation, you can delete a substring of A.
Find the minimum number of operations needed to make A = B.

N \le 2\cdot 10^5 and M \le \min(N, 100).

EXPLANATION:

Observe that repeatedly deleting substrings of A means we’ll eventually end up with some subsequence of A.
Further, every subsequence of A is reachable; since single characters are themselves substrings.
So, the set of all reachable strings is exactly the set of subsequences of A.

Thus, if B is not a subsequence of A, it’s impossible to ever reach B.
Checking this is a standard greedy idea and can be done in \mathcal{O}(N+M) time (though as we’ll see from the rest of the solution, not really explicitly needed at all.)

Let’s now try to find the minimum number of operations needed.

Suppose B is present in A as a subsequence, say at indices i_1, i_2, \ldots, i_M.
Then,

  • We need to delete all the characters at indices 1, 2, \ldots, i_1-1.
    If i_1 = 1 this needs no operations, otherwise it requires one operation since we can just delete the entire prefix at once.
  • Similarly, we need to delete all the characters at indices i_M+1, i_M+2, \ldots, N.
    If i_M = N this has no cost; otherwise it requires one operation.
  • The same applies to parts between chosen indices.
    That is, for each 1 \le j \lt M, we need to delete the characters at indices i_j+1, \ldots, i_{j+1}-1.
    If i_{j+1} = i_j + 1 then this has no cost; otherwise it has a cost of 1.

In particular, note that the cost of each part depends only on adjacent chosen indices (i.e. whether they’re adjacent or not.)

This observation, along with M being small, allows us to use dynamic programming.


Define dp[i][j][x] to be the minimum cost of obtaining the first j characters of B as a subsequence from the first i characters of A, where

  • x = 0 means index i is not part of the subsequence.
  • x = 1 means index i is a part of the subsequence.

Note that this DP has 2MN states, which is fine for the given constraints.

For transitions, we have the following.

  • Consider dp[i][j][0], meaning we don’t choose index i.
    There are two cases:
    • B_j is matched to A_{i-1}, corresponding to dp[i-1][j][1].
      Here, a new ‘empty space’ starts at index i, increasing our cost by 1.
      So, the cost is dp[i-1][j][1] + 1.
    • B_j is not matched to A_{i-1}, corresponding to dp[i-1][j][0].
      Here, we already have an ongoing ‘empty space’, so it’s been accounted for in the cost already.
      The cost is simply dp[i-1][j][0].
    • dp[i][j][0] equals the minimum of these two values.
  • Next, consider dp[i][j][1], meaning we do choose index i.
    Note that this option is available only when A_i = B_j; if it isn’t, we say dp[i][j][1] = \infty
    Here, we don’t really care about the state of the previous index (if there was an empty space, its cost was accounted for when starting it anyway).
    So, dp[i][j][1] just equals the minimum of dp[i-1][j-1][0] and dp[i-1][j-1][1].

With \mathcal{O}(NM) states and constant-time transitions from each one, the overall solution is \mathcal{O}(NM) which is fast enough.
The final answer is simply \min(dp[N][M][0], dp[N][M][1]).

Note that if the answer turns out to be \infty it means B doesn’t appear as a subsequence of A so we answer -1; which is why checking for this separately at the start is not necessary.

Note that since dp[i] depends only on dp[i-1], memory usage can be optimized to \mathcal{O}(M) if necessary for speed.
This might be helpful since 2MN = 4\cdot 10^7 is quite large in terms of memory allocation.

TIME COMPLEXITY:

\mathcal{O}(NM) per testcase.

CODE:

Editorialist's code (PyPy3)
for _ in range(int(input())):
    n, m = map(int, input().split())
    a = input()
    b = input()
    
    dp = [[n+1, n+1] for i in range(m+1)]
    dp[0] = [1, 0]
    
    for i in range(n):
        ndp = [[n+1, n+1] for i in range(m+1)]
        for j in range(m+1):
            ndp[j][0] = min(dp[j][0], dp[j][1] + 1)
            if j > 0 and b[j-1] == a[i]: ndp[j][1] = min(dp[j-1][0], dp[j-1][1])
        dp = ndp[:]
    ans = min(dp[m][0], dp[m][1])
    if ans > n: print(-1)
    else: print(ans)