EQSBSTR2 - Editorial

sibasish_14 · August 16, 2020, 5:18am

PROBLEM LINK:

Practice

Contest

Author: Sibasish Ghosh

Tester: Pritish Nayak

Editorialist: Sibasish Ghosh

DIFFICULTY:

Easy-Medium

PREREQUISITES:

String Hashing, Binary Seacrh

PROBLEM:

You are given two strings A and B of lengths N and M respectively. Find the maximum length of a substring of A that is also present in B. Let this maximum length be K. Also, find all such substrings of length K that is present in both A and B and print the number of times each such substring occurs in B.

EXPLANATION:

A brute force approach won’t work here. Neither would a DP approach work (as some of you might have guessed in the first place) because it would be an O(N^2) approach, which is just too slow for the given constraints.

We need an O(NlogN) approach for the given constraints. This is where binary search will help us. We can binary search over the length of A. For each length (that we find using binary search), we find all substrings of A that are also present in B. But using the actual substrings would take linear time, and thus would increase the complexity. So we will use string hashing to store each substring as hashes. Then it would take constant time to find and compare the hashes. Remember to use double hashing to avoid collisions.

This is just the brief idea. It’s better if you try to implement it yourself. If you are not familiar with binary search or string hashing, visit these links:

Refer to the solutions below if you face any problem.

SOLUTIONS:

Setter's/Editorialist's Solution

#include<bits/stdc++.h>
#define mod1 1000000007
#define mod2 2038074743
#define p1 31
#define p2 37
#define F first
#define S second
#define pb push_back
#define all(v) v.begin(),v.end()
#define rall(v) v.rbegin(),v.rend()

using namespace std;
using namespace std::chrono; 
typedef long long int ll;

string a,b;
ll n,m;
ll pw1[50010],invpw1[50010];
ll pw2[50010],invpw2[50010];
ll hash1[50010],hash2[50010];
ll hash3[50010],hash4[50010];

void createHash()
{
    ll i;
    hash1[0]=hash2[0]=hash3[0]=hash4[0]=0;
    for(i=1;i<=n;i++)
    {
        hash1[i]=(hash1[i-1]+(a[i-1]-'a'+1)*pw1[i-1])%mod1;
        hash2[i]=(hash2[i-1]+(a[i-1]-'a'+1)*pw2[i-1])%mod2;
    }
    for(i=1;i<=m;i++)
    {
        hash3[i]=(hash3[i-1]+(b[i-1]-'a'+1)*pw1[i-1])%mod1;
        hash4[i]=(hash4[i-1]+(b[i-1]-'a'+1)*pw2[i-1])%mod2;
    }
}

bool check(ll x)
{
    ll i,j;
    map<ll,ll> mp1,mp2;
    for(i=1;i<=m-x+1;i++)
    {
        ll val=(hash3[i+x-1]-hash3[i-1]+mod1)%mod1;
        val=(val*invpw1[i-1])%mod1;
        mp1[val]++;
        val=(hash4[i+x-1]-hash4[i-1]+mod2)%mod2;
        val=(val*invpw2[i-1])%mod2;
        mp2[val]++;
    }
    for(i=1;i<=n-x+1;i++)
    {
        ll val1=(hash1[i+x-1]-hash1[i-1]+mod1)%mod1;
        val1=(val1*invpw1[i-1])%mod1;
        ll val2=(hash2[i+x-1]-hash2[i-1]+mod2)%mod2;
        val2=(val2*invpw2[i-1])%mod2;
        if(mp1[val1] && mp2[val2])
            return true;
    }
    return false;
}

ll binSearch(ll low,ll high)
{
    ll res=0;
    while(low <= high)
    {
        ll mid=(low+high)/2;
        if(check(mid))
        {
            res=mid;
            low=mid+1;
        }
        else high=mid-1;
    }
    return res;
}

int main()
{
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    // freopen("input7.txt","r",stdin);
    // freopen("output7.txt","w",stdout);
    ll t=1;
    cin>>t;
    pw1[0]=pw2[0]=1;
    for(ll i=1;i<=5*1e4;i++)
    {
        pw1[i]=(pw1[i-1]*p1)%mod1;
        pw2[i]=(pw2[i-1]*p2)%mod2;
    }
    invpw1[0]=invpw2[0]=1;
    invpw1[1]=129032259;
    invpw2[1]=330498607;
    for(ll i=2;i<=5*1e4;i++)
    {
        invpw1[i]=(invpw1[i-1]*129032259)%mod1;
        invpw2[i]=(invpw2[i-1]*330498607)%mod2;
    }
    while(t--)
    {
        ll i,j,k=0;
        cin>>n>>m;
        cin>>a>>b;
        createHash();
        k=binSearch(1,n);
        cout<<k<<"\n";
        if(!k) continue;
        set<pair<string,ll> > ans;
        map<ll,ll> mp1,mp2;
        for(i=1;i<=m-k+1;i++)
        {
            ll val=(hash3[i+k-1]-hash3[i-1]+mod1)%mod1;
            val=(val*invpw1[i-1])%mod1;
            mp1[val]++;
            val=(hash4[i+k-1]-hash4[i-1]+mod2)%mod2;
            val=(val*invpw2[i-1])%mod2;
            mp2[val]++;
        }
        for(i=1;i<=n-k+1;i++)
        {
            ll val1=(hash1[i+k-1]-hash1[i-1]+mod1)%mod1;
            val1=(val1*invpw1[i-1])%mod1;
            ll val2=(hash2[i+k-1]-hash2[i-1]+mod2)%mod2;
            val2=(val2*invpw2[i-1])%mod2;
            if(mp1[val1] && mp2[val2])
                ans.insert({a.substr(i-1,k),mp1[val1]});
        }
        for(auto it:ans)
            cout<<it.F<<" "<<it.S<<"\n";
    }
    return 0;
}

Tester's Solution

#include<bits/stdc++.h>
#define int long long
using namespace std;
 
struct StringHasher{
  const char A='a';
  long long n,P,MOD;
  vector<long long> hsh,p;
  
  StringHasher(string s,int base,int modulo){
    P=base;MOD=modulo;n=s.length();
    hsh.resize(n);p.resize(n);
    
    p[0]=1;
    for(int i=1;i<n;i++)
      p[i]=(p[i-1]*P)%MOD;
 
    hsh[0]=s[0]-A+1;
    for(int i=1;i<n;i++)
      hsh[i]=(hsh[i-1]*P+s[i]-A+1)%MOD;
  }
 
  int sub(int l,int r){
    int x=hsh[r],y=0;
    if(l!=0)y=(hsh[l-1]*p[r-l+1])%MOD;
    return ((x-y)%MOD+MOD)%MOD;
  }
};
 
int32_t main()
{
   ios_base::sync_with_stdio(false);cin.tie(NULL);
   int t;
   cin>>t;
 
   assert(t>=1&&t<=500);
 
   int sumn=0,summ=0;
 
   while(t--)
   {
      int n,m;
      cin>>n>>m;
      
      sumn+=n;summ+=m;
      assert(n>=1&&n<=50000);assert(m>=1&&m<=50000);
      assert(sumn>=1&&sumn<=100000);assert(summ>=1&&summ<=100000);
 
      string a,b;
      cin>>a>>b;
 
      //---string checking---
      assert(a.size()==n);
      assert(b.size()==m);
      for(int i=0;i<n;i++)
        assert(a[i]>='a'||a[i]<='z');
      for(int i=0;i<m;i++)
        assert(b[i]>='a'||b[i]<='z');
      //---checking done---
 
      int lo=0,hi=min(n,m),ans=0;
 
      StringHasher hA1(a,51,1000000321),hA2(a,31,2147483647),hA3(a,47,1000000007),hA4(a,101,2147483647);
      StringHasher hB1(b,51,1000000321),hB2(b,31,2147483647),hB3(b,47,1000000007),hB4(b,101,2147483647);
      typedef pair<int,int> pii;
      while(lo<=hi)
      {
        int mid=(lo+hi)/2;
 
        int ok=0;
        if(mid==0)ok=1;
        set<pair<pii,pii>> sA;
        for(int i=0;i<n;i++)
        {
          if(i+mid-1>=n)break;
          int h1=hA1.sub(i,i+mid-1);
          int h2=hA2.sub(i,i+mid-1);
          int h3=hA3.sub(i,i+mid-1);
          int h4=hA4.sub(i,i+mid-1);
          sA.insert({{h1,h2},{h3,h4}});
        }
 
        for(int i=0;i<m;i++)
        {
          if(i+mid-1>=m)break;
          int h1=hB1.sub(i,i+mid-1);
          int h2=hB2.sub(i,i+mid-1);
          int h3=hB3.sub(i,i+mid-1);
          int h4=hB4.sub(i,i+mid-1);
          if(sA.find({{h1,h2},{h3,h4}})!=sA.end()){
            ok=1;
          }
        }
 
        if(ok){
          ans=mid;
          lo=mid+1;
        }
        else hi=mid-1;
      }
 
      cout<<ans<<'\n';
      
      if(ans==0)continue;
      
      map<pair<pii,pii>,int> sB;
      for(int i=0;i<m;i++)
      {
        if(i+ans-1>=m)break;
        int h1=hB1.sub(i,i+ans-1);
        int h2=hB2.sub(i,i+ans-1);
        int h3=hB3.sub(i,i+ans-1);
        int h4=hB4.sub(i,i+ans-1);
        sB[{{h1,h2},{h3,h4}}]++;
      }
 
      map<string,int> freq;
      for(int i=0;i<n;i++)
      {
        if(i+ans-1>=n)break;
        int h1=hA1.sub(i,i+ans-1);
        int h2=hA2.sub(i,i+ans-1);
        int h3=hA3.sub(i,i+ans-1);
        int h4=hA4.sub(i,i+ans-1);
        if(sB.find({{h1,h2},{h3,h4}})!=sB.end())
          freq[a.substr(i,ans)]=sB[{{h1,h2},{h3,h4}}];
      }
 
      for(auto x:freq)
        cout<<x.first<<" "<<x.second<<"\n";
 
   }
 
   cerr<<"Tester:"<<(float)clock()/CLOCKS_PER_SEC*1000<<" ms"<<endl;
   return 0;
}

Feel free to write your approach in the comments

ssrivastava990 · August 16, 2020, 9:45am

@admin please move this problem to practice section .

sibasish_14 · August 16, 2020, 1:26pm

I guess it has been moved. You can submit your solutions now.

ssrivastava990 · August 16, 2020, 3:12pm

Yes AC with 6 penalty , good question.

sibasish_14 · August 16, 2020, 4:28pm

Thanks!

ssrivastava990 · August 16, 2020, 6:20pm

Btw If u have some good questions which uses this approach , although I solve many questions , but if u have some good problems please share .

sibasish_14 · August 16, 2020, 8:23pm

I haven’t solved many such questions, but you can find some String Hashing problems here: link

harshraj22 · August 16, 2020, 8:29pm

Not to be considered as destructive comment.

I am just expressing my views.

TBH, I hate string Hashing. They aren’t reliable at all ! And test-cases can always be made to ensure the worst case of string hashing.

ssrivastava990 · August 17, 2020, 5:13am

Nope , I disagree , I think hashing is more easier then any string matching algorithm at least concept wise they are easy.

sibasish_14 · August 17, 2020, 5:14am

Yeah, it does depend upon probability. So there’s always a possibility that a solution can be wrong. For this particular problem, the tester had to test it with a quadruple hash to make sure the solution was right (initially, my solution comprised of a single hash, which failed many test cases).

But said that, string hashing is one of the fastest way to get the job done.

ssrivastava990 · August 17, 2020, 5:18am

I also tried with single hash with some higher prime still wrong answer , then I use double hash for each string and it passes,
One more if u have implementation of rolling hash , then this question can be done in 10 minutes only .

pritishn · August 17, 2020, 7:37pm

As a tester, I did quadruple hashing to check if the answer was correct.