 # Most frequent substring problem.

Given a string, we want to know the maximum no. of occurrences of any substring that satisfies following two conditions:

1. The substring’s lengths is within in inclusive range of `minLength` to `maxLength`.
2. The total no. of unique characters in the string doesn’t exceed `maxUnique`.

For example, given a string s=abcde, minLength=2, maxLength=5, maxUnique=3, the substrings matching the criteria are (ab, bc, cd, de, abc, bcd, cde). Any shorter string fails he `minLength>=2` any longer will fail `maxUnique <= 3`. Each of the substring occurs only one time.

INPUT :
First line contains a string, second line contains minLength, third line contains maxLength, and the last line contains maxUnique.

Constraints :

• 2<=n<=105
• 2<=minLength<= maxLength <=26
• maxLength<n
• 2<=maxUnique<=26

SAMPLE INPUT :

``````ababab
2
3
4
``````

SAMPLE OUTPUT :

``````3
``````

**EXPLANATIONS : **

We want to find the no. of occurrences of the most frequently occurring substring of s= “ababab” that has the length in the inclusive range from minLength = 2 and maxLength=3 and contains maximum of maxUnique = 4 unique characters. The substring ab occurs three times aba, bab and ba occurs twice. Because we want maximum of this frequencies we return 3 as our answer.

(Part 1)

Obviously we shouldn’t care for maxLength,since we wanna maximize ocuurrence(as lets say if a substring of length x,occurs y times,then there must be a substring of length (x-1) that occurs>=y times)

So we just have to find max no.of occurrences of a substring of length minLength

(Part 2)

unique thing can easily be done using frequency arrays

(Part 3)

One simple way is to use rolling hashing see this: https://threads-iiith.quora.com/String-Hashing-for-competitive-programming

for counting max occurence of a substring of given length

(Part 1)

Proof: Lets say a substring s of length x occurs b times,then there will be a substring s’ of length < x occuring >=b times(1 such substring would be a substring of s)

like ababa

ab occurs 2 times,so obviously a occurs>=2,b>=2

So better to find minimum Length String,as it will have less unique characters and also occurs greater number of times

(Part 2)

how to check if a substring s[i:j] is valid (no.of unique characters<=maxUnique)

lets construct a prefix sum freq array

where:

freq[i][‘a’]=freq[i-1][‘a’]+(s[i]==‘a’)

freq[i][‘b’]=freq[i-1][‘b’]+(s[i]==‘b’)

So for calculating no.of unique characters in s[i:j]

``````uniqueCount=0
for(c='a';c <= 'z';c++)
if((freq[j][c]-freq[i-1][c])>0)
uniqueCount++
``````

(Part 3)

we can now iterate for all substring of length minLength

so :

``````for(i=0;i< s.length()-minLength;i++)
{
//so i to i+minLength-1 is substring starting at i
//first we'll check if its valid(unique characters ...),i have explained it in part 2
//if its unique,find hash value of substring s[i:i+minLength-1],which can be found quickly
int hashValue=hash(i,i+minLength-1)
map[hashValue]++;
}
Now iterate through the map to find highest frequency hash value``````
1 Like

@vivek_1998299

Can you please provide more detailed explanation, I am finding it hard to understand.

These bugs…

@vivek_1998299

I have got the idea from the explanation and the link, can you explain me about implementing hashing. I have encountered this kind of problem for the first time that’s why I am facing difficulty.

Can anyone help me with this problem.

Plz read from link,and do a bit of searching too!!,i don’t support spoon feeding…

@vivek_1998299 In the worst case, Time complexity of your approach is O(N^2). Is there any O(N log N) algorithm for the same thing?

@vivek_1998299 In the worst case, Time complexity of your approach is O(N^2). Is there any O(N log N) algorithm for the same thing?

`// Given a string, we want to know the maximum no. of occurrences of any substring that satisfies following two conditions:

// The substring’s lengths is within in inclusive range of minLength to maxLength.
// The total no. of unique characters in the string doesn’t exceed maxUnique.
// For example, given a string s=abcde, minLength=2, maxLength=5, maxUnique=3, the substrings matching the criteria are (ab, bc, cd, de, abc, bcd, cde). Any shorter string fails he minLength>=2 any longer will fail maxUnique <= 3. Each of the substring occurs only one time.

// INPUT :
// First line contains a string, second line contains minLength, third line contains maxLength, and the last line contains maxUnique.

// Constraints :

// 2<=n<=105
// 2<=minLength<= maxLength <=26
// maxLength<n
// 2<=maxUnique<=26

#include
using namespace std;
#include<bits/stdc++.h>
#define mod 1000000007
#define ll long long int

int computeRollingHash(string s,int l,int r)
{
s=s.substr(l,r-l+1);

``````ll n=s.size();

int p=31;
int p_power=1;

ll hashValue=0;

for(ll i=0;i<n;i++)
{
ll adder= ( (s[i]-'a'+1) * p_power)%mod;

p_power=(p_power * p)%mod;
}

return hashValue;
``````

}

ll countUniqueChars(int i,int j,int n,int freq[])
{
ll count=0;

for(int k=0;k<26;k++)
{
ll charCount=freq[j][k]-((i==0)?0:freq[i-1][k]);

``````    if(charCount)
count++;
}

return count;
``````

}

int main()
{
string s;
cin>>s;

``````int maxLength,minLength,maxUnique;

cin>>maxLength>>minLength>>maxUnique;

int n=s.size();

int freq[n];

for(int i=0;i<n;i++)
{
for(int j=0;j<26;j++)
{
freq[i][j]=0;
}
}

for(int i=0;i<n;i++)
{
freq[i][s[i]-'a']++;
}

int k=minLength;

map<ll,ll> umap;

ll maxCount=0;

for(int i=0;i<=n-k;i++)
{
int j=i+k-1;

int countChar=countUniqueChars(i,j,n,freq);

if(countChar>maxUnique)
continue;

ll ans=computeRollingHash(s,i,j);

if(umap.find(ans)==umap.end())
{
umap[ans]=1;
}
else
{
umap[ans]++;
maxCount=max(maxCount,umap[ans]);
}
}

cout<<maxCount+1<<"\n";
``````

}`