PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4
Author: maximus11
Tester: kingmessi
Editorialist: iceknight1093
DIFFICULTY:
Simple
PREREQUISITES:
None
PROBLEM:
You have two strings A and B. In one move you can delete A_i from A, for a cost of i.
Find the minimum cost to convert A into B.
EXPLANATION:
Our only operation is to delete characters from A, so in the end what we’ll be left with is a subsequence of A.
This means if B is not a subsequence of A, it’s impossible to obtain B; and otherwise it always is.
Checking whether B is a subsequence of A is a well-known problem, and can be done greedily in linear time: iterate through A left-to-right, and try to match the current character of A with the leftmost unmatched character in B.
If all characters of B have been matched, it appears as a subsequence; otherwise it does not.
Now, when B does appear as a subsequence of A, we need to find the minimum cost possible.
That involves two decisions:
- We need to decide which subsequence of A will become B, i.e. which characters to delete and which to keep.
- Once we’ve decided which characters to delete, we need to decide on the order in which they are deleted.
Let’s tackle the second issue first.
Suppose we want to delete the characters at indices i_1, i_2, \ldots, i_k, where i_1 \lt i_2 \lt \cdots \lt i_k.
Then, it’s not hard to see that for minimal cost, it’s best to delete them in ascending order.
This is because deleting i_1 first reduces the cost of deleting each other index by 1; whereas deleting anything else will not affect the cost of i_1 at all - so it’s optimal to delete i_1 first. Keep applying this argument to the rest of the sequence and we see that deleting in order i_1, i_2, \ldots, i_k is best.
Now, we need to figure out which subsequence of A that equals B, to keep.
Intuitively, deleting early characters of A is cheaper, so it makes sense to keep a ‘later’ occurrence of B in A.
This is in fact true: it’s optimal to keep the “last” occurrence of B in A, and delete everything else.
What does this mean?
Recall the algorithm to check whether B is a subsequence of A.
This algorithm finds the “first” occurrence of B in A, as in, each of the indices it chooses is minimal: there’s no way to move any of them further left and still find an occurrence of B as a subsequence.
This follows naturally from that fact that the choice is made greedily: the instant it’s possible to match a character, we do so.So, note that simply running this algorithm in reverse will give us the “last” occurrence of B in A.
That is, iterate over A from right to left, and try to match the current character of A with the last unmatched character of B.
This will give us the optimal subsequence to keep.
Once the subsequence itself has been computed, as per the previous step, finding the actual cost is not that hard.
Let j_1, j_2, \ldots, j_m be the indices of the subsequence (so that all the other elements must be deleted).
Then, since we’re deleting from left to right,
- Everything before j_1 has a deletion cost of 1.
- Everything between j_1 and j_2 has a deletion cost of 2.
- Everything between j_2 and j_3 has a deletion cost of 3.
\vdots - Everything after j_m has a deletion cost of m+1.
So, once the indices j_1, j_2, \ldots, j_m are known this final cost can be computed easily enough.
TIME COMPLEXITY:
\mathcal{O}(N) per testcase.
CODE:
Author's code (C++)
// __builtin_clz(x): the number of zeros at the beginning of the number
// __builtin_ctz(x): the number of zeros at the end of the number
// __builtin_popcountll(x): the number of ones in the number
// __builtin_parity(x): the parity (even or odd) of the number of ones
// __builtin_ffs(int) finds the index of the first (most right) set bit
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;
#define ordered_set tree<int, null_type,less<int>, rb_tree_tag,tree_order_statistics_node_update>
#define ll long long
#define nl "\n"
#define pb push_back
#define eb emplace_back
#define Sort(a) sort(a.begin(),a.end())
#define vi vector<int>
#define vll vector<long long>
#define F first
#define S second
#define yes cout<<"YES\n";
#define no cout<<"NO\n";
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
// freopen("input06.txt","r",stdin);
// freopen("output06.txt","w",stdout);
int T;
cin>>T;
while(T--)
{
string a,b; cin>>a>>b;
if(b.size() > a.size()){cout<<-1<<endl; continue;}
int ind = b.size()-1;
set<int>s;
for(int i = a.size()-1;i>=0;i--)
{
if(a[i] == b[ind])
{
ind--;
s.insert(i);
if(ind == -1) break;
}
}
if(ind != -1) {cout<<-1<<endl; continue;}
ll ans = 0;
ll ct = 1;
for(int i = 0;i<a.size();i++)
{
if(s.find(i) == s.end())
{
ans += ct;
}
else
{
ct++;
}
}
cout<<ans<<endl;
}
return 0;
}
Editorialist's code (PyPy3)
for _ in range(int(input())):
a = input()
b = input()
n, m = len(a), len(b)
ans, p = 0, m-1
for i in reversed(range(n)):
if p >= 0 and a[i] == b[p]:
p -= 1
else: ans += p + 2
if p >= 0: print(-1)
else: print(ans)