PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Author: raysh07
Tester: sushil2006
Editorialist: iceknight1093
DIFFICULTY:
TBD
PREREQUISITES:
Dynamic Programming
PROBLEM:
You have two strings A and B, of lengths N and M.
In one operation, you can delete a substring of A.
Find the minimum number of operations needed to make A = B.
N \le 2\cdot 10^5 and M \le \min(N, 100).
EXPLANATION:
Observe that repeatedly deleting substrings of A means we’ll eventually end up with some subsequence of A.
Further, every subsequence of A is reachable; since single characters are themselves substrings.
So, the set of all reachable strings is exactly the set of subsequences of A.
Thus, if B is not a subsequence of A, it’s impossible to ever reach B.
Checking this is a standard greedy idea and can be done in \mathcal{O}(N+M) time (though as we’ll see from the rest of the solution, not really explicitly needed at all.)
Let’s now try to find the minimum number of operations needed.
Suppose B is present in A as a subsequence, say at indices i_1, i_2, \ldots, i_M.
Then,
- We need to delete all the characters at indices 1, 2, \ldots, i_1-1.
If i_1 = 1 this needs no operations, otherwise it requires one operation since we can just delete the entire prefix at once. - Similarly, we need to delete all the characters at indices i_M+1, i_M+2, \ldots, N.
If i_M = N this has no cost; otherwise it requires one operation. - The same applies to parts between chosen indices.
That is, for each 1 \le j \lt M, we need to delete the characters at indices i_j+1, \ldots, i_{j+1}-1.
If i_{j+1} = i_j + 1 then this has no cost; otherwise it has a cost of 1.
In particular, note that the cost of each part depends only on adjacent chosen indices (i.e. whether they’re adjacent or not.)
This observation, along with M being small, allows us to use dynamic programming.
Define dp[i][j][x] to be the minimum cost of obtaining the first j characters of B as a subsequence from the first i characters of A, where
- x = 0 means index i is not part of the subsequence.
- x = 1 means index i is a part of the subsequence.
Note that this DP has 2MN states, which is fine for the given constraints.
For transitions, we have the following.
- Consider dp[i][j][0], meaning we don’t choose index i.
There are two cases:- B_j is matched to A_{i-1}, corresponding to dp[i-1][j][1].
Here, a new ‘empty space’ starts at index i, increasing our cost by 1.
So, the cost is dp[i-1][j][1] + 1. - B_j is not matched to A_{i-1}, corresponding to dp[i-1][j][0].
Here, we already have an ongoing ‘empty space’, so it’s been accounted for in the cost already.
The cost is simply dp[i-1][j][0]. - dp[i][j][0] equals the minimum of these two values.
- B_j is matched to A_{i-1}, corresponding to dp[i-1][j][1].
- Next, consider dp[i][j][1], meaning we do choose index i.
Note that this option is available only when A_i = B_j; if it isn’t, we say dp[i][j][1] = \infty
Here, we don’t really care about the state of the previous index (if there was an empty space, its cost was accounted for when starting it anyway).
So, dp[i][j][1] just equals the minimum of dp[i-1][j-1][0] and dp[i-1][j-1][1].
With \mathcal{O}(NM) states and constant-time transitions from each one, the overall solution is \mathcal{O}(NM) which is fast enough.
The final answer is simply \min(dp[N][M][0], dp[N][M][1]).
Note that if the answer turns out to be \infty it means B doesn’t appear as a subsequence of A so we answer -1; which is why checking for this separately at the start is not necessary.
Note that since dp[i] depends only on dp[i-1], memory usage can be optimized to \mathcal{O}(M) if necessary for speed.
This might be helpful since 2MN = 4\cdot 10^7 is quite large in terms of memory allocation.
TIME COMPLEXITY:
\mathcal{O}(NM) per testcase.
CODE:
Editorialist's code (PyPy3)
for _ in range(int(input())):
n, m = map(int, input().split())
a = input()
b = input()
dp = [[n+1, n+1] for i in range(m+1)]
dp[0] = [1, 0]
for i in range(n):
ndp = [[n+1, n+1] for i in range(m+1)]
for j in range(m+1):
ndp[j][0] = min(dp[j][0], dp[j][1] + 1)
if j > 0 and b[j-1] == a[i]: ndp[j][1] = min(dp[j-1][0], dp[j-1][1])
dp = ndp[:]
ans = min(dp[m][0], dp[m][1])
if ans > n: print(-1)
else: print(ans)