MKSM - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: rudro25
Tester: yash_daga
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

KMP or the Z-function

PROBLEM:

You have two strings S and T. You can do the following moves:

  • Choose a proper substring of one of the strings and copy it to either the start or end of the other; or
  • Choose a proper substring of one of the strings and delete it.

Find the minimum number of moves needed to make both strings equal.

EXPLANATION:

It’s always possible to make the strings equal in 3 moves, as we can:

  • Copy the first character of S and append it to T.
  • Delete all but the first character of S.
  • Delete all but the last character of T.

So, we only need to check whether it’s possible in \lt 3 moves.

Checking for 0 moves is trivial: it’s only possible if S= T.

The discussion below assumes |S| \geq |T|, for convenience.

1 move

Our one move is to either delete a substring of S, or copy a substring of S to one end of T.
In fact, it turns out that the copy move is unnecessary as well!

Why?

Let |S| = N and |T| = M.
Suppose we copy some substring of length N-M from S and append it to T, after which the strings are equal.
This would mean the first M characters of S equal the first M characters of T — that is, T is a prefix of S. We could’ve just deleted the last N-M characters of S and the strings would still be equal.

Similarly, if a substring is copied to the front of T, we’d be able to delete some prefix of S instead and achieve the same thing; so it suffices to consider only deletions.


Now, all we need to do is check if deleting some substring of S can give us T.
This is fairly easy to do: for instance, you could compute the largest common prefix of S and T (say of length x), and their largest common suffix (say of length y); both of which are easily found with a simple loop.
Then, T is obtainable by deleting something from S if and only if x+y \geq |T|.


2 moves

There are four possibilities to check here: the first and second move can both be deletions or copies.
However, as seen in the case of 1 move, if the last move is a copy, there will be a way to make both strings equal with a deletion as well.
So, we only need to consider two cases: two deletions, or a copy followed by a deletion.

Time for some cases!

Two deletions

Even within this subcase, there are more subcases: we can either delete two substrings of S, or delete one substring of S and one of T.

The latter is easier to think about: if some prefix or suffix of S equals a prefix or suffix of T, we can delete things appropriately to leave only the matching strings.
Nothing more needs to be considered here.

Proof

Notice that since we aren’t allowed to delete the entire string, there are two possibilities for S and T after deletion:

  • Only a prefix/suffix remains; or
  • Both the first and last characters remain.

If S is only a prefix, then either T's first character remains (in which case there’s a matching prefix), or a suffix of T remains (in which case a prefix of S matches a suffix of T).
A similar argument holds for when only a suffix of S remains; or when only a prefix/suffix of T remains.

Finally, if both strings have something removed from their middles, their first characters should match (so there’s an equal prefix anyway).


What about deleting two substrings from S? We (almost) don’t need to think about it!
If the substrings we delete overlap (in terms of indices of the original string), a single deletion would’ve been enough; and we’re only here because it wasn’t.
If we delete disjoint substrings, that means the part between them lies in T — we could instead delete this part from both S and T (note that this merges the whole thing into a single large deletion in S); so we’re back to the previous case!

The only exception is if the “single large deletion” is S itself, which isn’t allowed.
However, note that this means the undeleted substring of S would be T itself; so we just check if T occurs as a substring of S.

All-in-all, we need to check prefix/suffix matches and if T occurs as a substring of S. All of this can be done quickly using one of the many string algorithms around: the KMP algorithm, the Z-function, you can even use hashing.


Copy then delete

Yay, more cases!
There are a few possibilities here: we can copy from S/T to the other one, and can delete from either string after copying.

Case 1: Copy to S, delete from S.
If the deletion is fully within the copied part, S will end up with larger length than T, so they can’t be equal.
In every other case, the copy → delete step can be replaced with an equivalent delete → copy step, and as we noted earlier, we don’t need to think about anything that ends with a copy.
This case gets ignored!

Case 2: Copy to S, delete from T.
S will have strictly larger length than T, so they’ll never be equal. Ignored!

Case 3: Copy to T, delete from T.
Just as in case 1, if the deletion isn’t fully within the copied part then an equivalent delete → copy operation exists (which we don’t need to deal with).
So, the only possibility is if we copy something over from S, then delete something from within it.
In such a case, the original T must match a prefix or suffix of S; and we already checked for that anyway.
Yet again, case ignored!

Case 4: Copy to T, delete from S.
I’m sure you see the pattern here by now.
Notice that if deletion is done from S, the entirety of the original T remains.
So, if the copy is appended to T, then either some prefix of T will match some prefix of S, or some suffix of S will itself be T (and you get similar prefix/suffix conditions for when the copy is placed at the front).
For the final time, we’ve already checked these conditions earlier, so this case gets ignored, too!

tl;dr you only need to check if some prefix/suffix of S matches some prefix/suffix of T; or if T is a substring of S itself — all of which are doable with a variety of string algorithms.

If all the above checks fail, the answer is 3.

TIME COMPLEXITY:

\mathcal{O}(N) per testcase.

CODE:

Author's code (C++)
#include <bits/stdc++.h>
using namespace std;


#define ll long long
#define pb push_back
#define all(x) x.begin(),x.end()
#define FastIO   ios::sync_with_stdio(0);cin.tie(0); cout.tie(0);


const int N= 1e5+5;


int main(){
  FastIO;


  int test; cin>>test;
  while(test--){
   // str_len >= 2
   ll n,m; cin>>n>>m;
   string s,t; cin>>s>>t;
   if(t.size()>s.size()) swap(s,t), swap(n,m);  //n>m always
   ll ans; 

   if(s==t) ans=0;  //s= ab, t=ab
   else{
       //s=akc, t= ac
       //
       bool chk=0;
       ll l=0,r=0;
       while(l<m && s[l]==t[l]) l++;
       ll p1=n-1, p2= m-1;
       while(p2>=0 && s[p1]==t[p2]) r++,p1--,p2--;
       if(l+r>=m) chk=1;
       if(chk) ans=1;  //s=akc, t= ac //s=akc, t=ak
       else{
           ll cnt=0;
           string ss= t+"#"+s;
           ll sz= ss.size();
           ll lps[sz+1], lpss[sz+1]; lps[0]=0;
           for(ll i=1,k=0;i<ss.size();i++){
               while(k>0 && ss[k]!=ss[i])
               k=lps[k-1];
               if(ss[i]==ss[k]) k++;
               lps[i]=k;
               if(lps[i]==m) ++cnt;
           }
           //
           ss= s+"#"+t;
           lpss[0]=0;
           for(ll i=1,k=0;i<ss.size();i++){
               while(k>0 && ss[k]!=ss[i])
               k=lpss[k-1];
               if(ss[i]==ss[k]) k++;
               lpss[i]=k;
               //if(lpss[i]==m) ++cnt;
           }

           // s= ac, t= ba
           //s= abde, t= bd
           //s= abx, t= yab
           //s=xab, t=aby
           if(s[0]==t[0] || s[0]==t[m-1] || s[n-1]==t[0] || s[n-1]==t[m-1] || cnt || lps[sz-1] || lpss[sz-1]) ans=2;
           else ans=3;
       }
   }
   cout<<ans<<endl;
  }
 
  return 0;
}
/*
10
2 2
ab ab
3 2
akc ac
3 2
abc bc
4 2
abcd bc
4 2
abcd bd
4 4
abcd xdab
3 3
xab aby
3 3
aby xab
4 4
abcd cdab
3 3
abd efg


ans:
0
1
1
2
2
2
2
2
2
3
*/
Tester's code (C++)
//clear adj and visited vector declared globally after each test case
//check for long long overflow   
//Mod wale question mein last mein if dalo ie. Ans<0 then ans+=mod;
//Incase of close mle change language to c++17 or c++14  
//Check ans for n=1 
#pragma GCC target ("avx2")    
#pragma GCC optimize ("O3")  
#pragma GCC optimize ("unroll-loops")
#include <bits/stdc++.h>                   
#include <ext/pb_ds/assoc_container.hpp>  
// #define int long long      
#define IOS std::ios::sync_with_stdio(false); cin.tie(NULL);cout.tie(NULL);cout.precision(dbl::max_digits10);
#define pb push_back 
#define mod 998244353ll
#define lld long double
#define mii map<int, int> 
#define pii pair<int, int>
#define ll long long 
#define ff first
#define ss second 
#define all(x) (x).begin(), (x).end()
#define rep(i,x,y) for(int i=x; i<y; i++)    
#define fill(a,b) memset(a, b, sizeof(a))
#define vi vector<int>
#define setbits(x) __builtin_popcountll(x)
#define print2d(dp,n,m) for(int i=0;i<=n;i++){for(int j=0;j<=m;j++)cout<<dp[i][j]<<" ";cout<<"\n";}
typedef std::numeric_limits< double > dbl;
using namespace __gnu_pbds;
using namespace std;
typedef tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> indexed_set;
//member functions :
//1. order_of_key(k) : number of elements strictly lesser than k
//2. find_by_order(k) : k-th element in the set
const long long N=300005, INF=2000000000000000000;
const int inf=2e9 + 5;
lld pi=3.1415926535897932;
int lcm(int a, int b)
{
    int g=__gcd(a, b);
    return a/g*b;
}
int power(int a, int b, int p)
    {
        if(a==0)
        return 0;
        int res=1;
        a%=p;
        while(b>0)
        {
            if(b&1)
            res=(1ll*res*a)%p;
            b>>=1;
            a=(1ll*a*a)%p;
        }
        return res;
    }
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

int getRand(int l, int r)
{
    uniform_int_distribution<int> uid(l, r);
    return uid(rng);
}

vector<int> prefix_function(string s) {
    int n = (int)s.length();
    vector<int> pi(n);
    for (int i = 1; i < n; i++) {
        int j = pi[i-1];
        while (j > 0 && s[i] != s[j])
            j = pi[j-1];
        if (s[i] == s[j])
            j++;
        pi[i] = j;
    }
    return pi;
}
struct input_checker {
	string buffer;
	int pos;

	const string all = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
	const string number = "0123456789";
	const string upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
	const string lower = "abcdefghijklmnopqrstuvwxyz";

	input_checker() {
		pos = 0;
		while (true) {
			int c = cin.get();
			if (c == -1) {
				break;
			}
			buffer.push_back((char) c);
		}
	}

	int nextDelimiter() {
		int now = pos;
		while (now < (int) buffer.size() && buffer[now] != ' ' && buffer[now] != '\n') {
			now++;
		}
		return now;
	}

	string readOne() {
		assert(pos < (int) buffer.size());
		int nxt = nextDelimiter();
		string res;
		while (pos < nxt) {
			res += buffer[pos];
			pos++;
		}
		return res;
	}

	string readString(int minl, int maxl, const string &pattern = "") {
		assert(minl <= maxl);
		string res = readOne();
		assert(minl <= (int) res.size());
		assert((int) res.size() <= maxl);
		for (int i = 0; i < (int) res.size(); i++) {
			assert(pattern.empty() || pattern.find(res[i]) != string::npos);
		}
		return res;
	}

	int readInt(int minv, int maxv) {
		assert(minv <= maxv);
		int res = stoi(readOne());
		assert(minv <= res);
		assert(res <= maxv);
		return res;
	}

	long long readLong(long long minv, long long maxv) {
		assert(minv <= maxv);
		long long res = stoll(readOne());
		assert(minv <= res);
		assert(res <= maxv);
		return res;
	}

	auto readIntVec(int n, int minv, int maxv) {
		assert(n >= 0);
		vector<int> v(n);
		for (int i = 0; i < n; ++i) {
			v[i] = readInt(minv, maxv);
			if (i+1 < n) readSpace();
			else readEoln();
		}
		return v;
	}

	auto readLongVec(int n, long long minv, long long maxv) {
		assert(n >= 0);
		vector<long long> v(n);
		for (int i = 0; i < n; ++i) {
			v[i] = readLong(minv, maxv);
			if (i+1 < n) readSpace();
			else readEoln();
		}
		return v;
	}

	void readSpace() {
		assert((int) buffer.size() > pos);
		assert(buffer[pos] == ' ');
		pos++;
	}

	void readEoln() {
		assert((int) buffer.size() > pos);
		assert(buffer[pos] == '\n');
		pos++;
	}

	void readEof() {
		assert((int) buffer.size() == pos);
	}
};


int32_t main()
{
    IOS;
    input_checker inp;
    int t = inp.readInt(1, 1e5); inp.readEoln();
    int sum_n = 0, sum_m = 0;
    while(t--)
    {
        int n = inp.readInt(2, 1e5); inp.readSpace();
        int m = inp.readInt(2, 1e5); inp.readEoln();
        sum_n += n;
        sum_m += m;
        assert(sum_n <= 2e5 && sum_m <= 2e5);
        string a = inp.readString(n, n, inp.lower); inp.readSpace();
        string b = inp.readString(m, m, inp.lower); inp.readEoln();
        if(m>n)
        {
            swap(n, m);
            swap(a, b);
        }
        int ans=3;
        if(a == b)
        ans=0;
        else
        {
            int mx_pref=0, mx_suf=0;
            int p1=0, p2=0;
            while(p2<m && a[p1++]==b[p2++])
            mx_pref++;
            p1=n-1, p2=m-1;
            while(p2>=0 && a[p1--]==b[p2--])
            mx_suf++;
            // cout<<mx_pref<<" "<<mx_suf<<"\n";
            if(mx_pref + mx_suf >= m)
            ans=1;
            string ab=a+"#"+b;
            string ba=b+"#"+a;
            vi vab=prefix_function(ab);
            vi vba=prefix_function(ba);
            rep(i,m+1,n+m+1)
            {
                if(vba[i]>=m)
                ans=min(ans, 2);
            }
            if(a[0]==b[m-1] || a[0]==b[0] || a[n-1]==b[0] || a[n-1]==b[m-1] || vba[n+m] || vab[n+m])
            ans=min(ans, 2);
        }
        cout<<ans<<"\n";
    }
    inp.readEof();
}

I guess this part of the editorial have format issues.

Nice problem btw.

Thanks, fixed.