×

# Most frequent substring problem.

 0 Given a string, we want to know the maximum no. of occurrences of any substring that satisfies following two conditions: The substring's lengths is within in inclusive range of minLength to maxLength. The total no. of unique characters in the string doesn't exceed maxUnique. For example, given a string s=abcde, minLength=2, maxLength=5, maxUnique=3, the substrings matching the criteria are (ab, bc, cd, de, abc, bcd, cde). Any shorter string fails he minLength>=2 any longer will fail maxUnique <= 3. Each of the substring occurs only one time. INPUT : First line contains a string, second line contains minLength, third line contains maxLength, and the last line contains maxUnique. Constraints : 2<=n<=105 2<=minLength<= maxLength <=26 maxLength

 0 (Part 1) Obviously we shouldn't care for maxLength,since we wanna maximize ocuurrence(as lets say if a substring of length x,occurs y times,then there must be a substring of length (x-1) that occurs>=y times) So we just have to find max no.of occurrences of a substring of length minLength (Part 2) unique thing can easily be done using frequency arrays (Part 3) One simple way is to use rolling hashing see this: https://threads-iiith.quora.com/String-Hashing-for-competitive-programming for counting max occurence of a substring of given length (Part 1) Proof: Lets say a substring s of length x occurs b times,then there will be a substring s' of length < x occuring >=b times(1 such substring would be a substring of s) like ababa ab occurs 2 times,so obviously a occurs>=2,b>=2 So better to find minimum Length String,as it will have less unique characters and also occurs greater number of times (Part 2) how to check if a substring s[i:j] is valid (no.of unique characters<=maxUnique) lets construct a prefix sum freq array where: freq[i]['a']=freq[i-1]['a']+(s[i]=='a') freq[i]['b']=freq[i-1]['b']+(s[i]=='b') ...... So for calculating no.of unique characters in s[i:j] uniqueCount=0 for(c='a';c <= 'z';c++) if((freq[j][c]-freq[i-1][c])>0) uniqueCount++ (Part 3) we can now iterate for all substring of length minLength so : for(i=0;i< s.length()-minLength;i++) { //so i to i+minLength-1 is substring starting at i //first we'll check if its valid(unique characters ...),i have explained it in part 2 //if its unique,find hash value of substring s[i:i+minLength-1],which can be found quickly //if u use rolling hashing(read the link i provided) int hashValue=hash(i,i+minLength-1) map[hashValue]++; } Now iterate through the map to find highest frequency hash value  answered 01 Sep '18, 11:38 1.6k●2●9 accept rate: 23% @vivek_1998299 Can you please provide more detailed explanation, I am finding it hard to understand. (01 Sep '18, 22:20) arpit7281★ OOh sorry,for the late reply!! These bugs... (03 Sep '18, 14:10) @vivek_1998299 I have got the idea from the explanation and the link, can you explain me about implementing hashing. I have encountered this kind of problem for the first time that's why I am facing difficulty. (30 Sep '18, 16:44) arpit7281★ If u had sincerely read the link ,u wouldn't have asked this..:(. Plz read from link,and do a bit of searching too!!,i don't support spoon feeding... (04 Oct '18, 10:21) @vivek_1998299 In the worst case, Time complexity of your approach is O(N^2). Is there any O(N log N) algorithm for the same thing? (20 Dec '18, 19:25) nik_845★ @vivek_1998299 In the worst case, Time complexity of your approach is O(N^2). Is there any O(N log N) algorithm for the same thing? (20 Dec '18, 19:26) nik_845★ showing 5 of 6 show all
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×643

question asked: 01 Sep '18, 10:43

question was seen: 5,749 times

last updated: 20 Dec '18, 19:26