SUBQUERY - Editorial

Problem link : contest practice

Difficulty : Medium

Pre-requisites : Suffix Automaton

Problem : Given a string S and N queries. Each query is defined by two integers - Li and Pi. Count the number of strings of the length Li that occur exactly Pi times (as the consecutive substrings) in the string S.

Explanation

This problem is the hardest in the set. At least, according to my estimations. There are some simple solutions that can be coded without putting a lot of effort, though, so let’s consider them first.

How to get 13 points

The constraints are very low, so you can do a pure brute force. For example, you can store all the substrings in the STL map along with the number of their occurences and when you get a query, you can iterate through all the substrings in the map and for the each of them you can check that the length and the number of occurences is the same with what is asked in the corresponding query.

How to get 40 points

Now N is larger so we have to store the information about all the substrings in the more efficient way. For example, using a trie. It is possible to construct the trie, containing all the suffixes (=all the substrings) of the string S in O(|S|2) time. Moreover it’s possible to store the depth and the number of occurences of the associated substring for each trie’s node. Knowing that there will be no more than O(|S|2) such pairs, we can just count the amount of occurences of each pair and then answer queries in O(1) time.

How to get 100 points

Here we need some suffix structure in order to store the information about the substrings. Writer’s choice is the suffix automaton. The complete its’ description is beyond this editorial, because it requires a lot of space and theoretical results. So, from now on, we will consider that you’re familiar with the suffix automaton. If you’re not, here are some basic facts that will be crucial in out solution:

  • The suffix automaton is a finite automaton that has letters associated with its’ edges.
  • Any path from the root’s node corresponds to some substring of the string |S| (of course, assuming that we’ve built the suffix automaton for the string S). And vice versa, any substring of S can be obtained by going along some path from the root node of the suffix automaton. So we can form the following criteria: the string T is a substring of S <=> there’s a path from the root node of the suffix automaton for the string S that has the string T associated with it.
  • The suffix automaton has O(|S|) nodes. In the general case, no more than 2|S| nodes.
  • Each node has a set of substrings associated with it.
  • All the substrings associated with a single node occur the same number of times.
  • All the substrings associated with a single node are the suffixes of the largest substring, associated with this node.

Knowing this, we obtain the following solution:

  • At first, we build the suffix automaton for the given string
  • Then, for each its’ node we calculate the number of occurences of its’ associated substrings, as well as the lowest and the highest length of the string, associated with this node.
  • Now we obtain a set of segments parallel to Y-axis, where X-axis corresponds to the number of occurences and the Y-axis corresponds to the length. There are as much these segments as the nodes in the suffix automaton.
  • And now the problem is: given O(N) segments, parallel to Y-axis and queries of the form (X, Y). The answer for a query is the number of the segments that cover the point with the coordinates (X, Y).

This new problem can be solved different. One of the ways is to note that different X-coordinates’ segments are independent. So we can solve the 1D case of this problem. While solving the 1D case, it’s easy to see that the number of the segments that cover the point X equals to A-B, where A equals to the number of the segments that end later or equal than in X and B equals to the number of the segments that begin after X.

Related links

  • A paper on suffix automaton
  • The TSUBSTR problem from CodeChef April 2012 Long Challenge, that also requires you to use the suffix automaton, and its’ editorial

Solutions : setter tester

4 Likes

Can any one explain this sentence :-

“Count the number of strings of the length Li that occur exactly Pi times (as the consecutive substrings) in the string S.” And,

does the output of

abacaba

1

3 2

Same as :-

dabaecaba

1

3 2

Is this supposed to be school level? Because afaik, they dont teach vectors and suffix automation in school level, at least in India. Well this did help me to learn new stuff

Is there any way of doing it using suffix array .

I wasn’t able to think of any which would get 100 points. :frowning:

2 Likes

@urdarinda

The way I feel about these LunchTime contests is ,more or less like you, that they are targeted at very skilled school kids only… I am now finishing my Bachelor’s degree and I’ve never even implemented a trie on my own as far as I recall. And I have finished both my two algorithm subjects with grades nearing 80% so, it seems to me that these contests just show that A LOT of hard work is required on your own if you want to succeed in programming contests… That and not having some more decent ad-hoc logic… which really impares my performance…

But maybe this is the goal of the contest too: to show that these contests are not for everyone and that having good grades at uni and interest is not even enough to perform well…

Bruno

1 Like

Hey ! I submitted the following code , don’t understand why it is wrong ?

 #include<stdio.h>#include<string.h>
int main()
{
 char s[50];
 int length;
 int N,L,P,i,ctr,count;
 ctr=1;
 char *p,*q,*r;
 scanf("%s",s);
 length=strlen(s);
 scanf("%d",&N);
 for(i=0;i<N;i++)
{
 scanf("%d",&L);
 scanf("%d",&P);
 //flag=P;
 p=s;
 //r=&s[L];ntf
 count=0;
  while(p<s+length-L)
  {
   r=p+L;
   q=p;
  while(*r!='\0')
 { count=0;
  while(*q==*r && count<L)
 {
  count++;
   q++;
    r++;
} 
if (count<L)
{
 count=0;
 q=p;
}
else if(count==L)
 ctr++;
 r++;
 }//while *r!='\0'
 p++;
}//outer while -p
if(ctr==P)
printf("%d",--ctr);
else
printf("0");
}//outer for
return 0;
}

yes because it takes aba in both cases

But what does ‘consecutive substrings’ mean?

actually i think this IOI style grading is perfect, otherwise it won’t be fair to some advanced users (there can be some advanced users even at the school level).

I think the question should be find the number of distinct substring of length L that occurs exactly p times