Given an array of N integers A[1…N] you have to answer Q queries of the form L R K which requires you to find how many number inside the sub-array A[L…R] that appears no more than K time in that sub-array.
The query in this problem would be wierd to who are not familiar with Mo’s algorithm. If you don’t know about this algorithm please take a look at the mentioned blog. This will help you solve a class of query problems with O(Nsqrt(N)) when segment tree cannot handle.
The core of Mo’s algorithm is that you maintain the query value of the range after adding/removing elements. The frequencies of numbers can be easily maintained in this fashion. We just need an cnt array to keep the frequencies of each numbers. Note that while the number can be as large as 10^9 we can do some precalculation to convert them into the range [1…n]. So now let assume that A[i] ≤ N and cnt[i] is the frequency of the number i in the current range.
Recall that in this problem we need to find the number of numbers that have frequency no more than K. That value equals to the number of numbers in cnt array that is less than or equal to K. Now it comes to the tricky part. Let above array where above[i] is the number of numbers in cnt that is larger than i. We can see that when we add/remove elements in the current range in Mo’s Algorithm we will change the value of one element in cnt. More specifically we need to increase or decrease an element of cnt by one when adding/removing one element into the current range. Let’s see how this effect the above array.
- When increase cnt[i] by one: let’s call the value of cnt[i] before the increment is v then after the increment there is one more number in cnt that is larger then v so we just need to increase above[v] or basically we need to increase above[cnt[i]] before do so with cnt.
- Similary when decrease cnt[i] we need to decrease above[cnt[i]] after that because there is now one less number in cnt that is larger than cnt[i].
Now we have cnt and above all being maintained corresponding to the current range. Now one question is left - What is the answer for the current query? Some of you might see that it is already there: above - above[k]. above is the number of numbers in cnt that is larger than 0 which means the number of distinct values in the current range. above[k] is the number of distinct values that is have frequency more than k in the range.