DISTNUM2 - Editorial

This is my approach (and a lot of others used this) :

  1. Coordinate compression on the values to map array values in range : [1,N].
  2. Divide the array into blocks using sqrt decomposition. For each block maintain a bitset (you may have to custom code it). The bitset has N bits and marks whether the ith value (after coordinate compression) is present or not.
  3. For all pairs of block numbers (i,j) find the distinct elements. For any (L,R), it is simply the bitwise OR of all bitsets in [L,R].
  4. Follow normal sqrt decomposition for queries now. Generate the bitset for the query range and using builtin_popcount, you can find the answer in O(N/64) for every query.

Time complexity : O(N * sqrt(N) + N * (N/64)).

7 Likes

It can be solved using persistent segment trees. More precisely, a persistent segment tree whose nodes are themselves segment trees which make up another persistent segment tree.

AC solution: CodeChef: Practical coding for everyone

EDIT: This question is basically a combination of kth order (SPOJ MKTHNUM) and distinct count (SPOJ DQUERY)

Here is a very good explanation of MKTHNUM.

DQUERY has a similar approach but instead of an array that represents the count of elements you use an array that represents the count of prev[i] value of elements. Where, given an array A, prev[i] is the maximum index j < i such that A[i] = A[j].

Try and solve these beforehand if you haven’t yet or at least make yourself familiar with the solutions.

For this problem we implement a persistent segtree like the one in MKTHNUM but instead of storing count it stores a segtree like the one we used in DQUERY.

So when it has to decide which direction to go it doesn’t just compare count of all elements but count of all distinct elements. The key idea is that the segtree (which tracks prev[i]) of first N values over a given interval differs in logn values from the segtree over the first N - 1 values over the same interval.

Hope this clears it up. :slight_smile:

5 Likes

@grebnesieh Yes, can you expain it like how @himanshujaju explained his approach? Thanks!

I didn’t understand the line “Find the lowest value V such that the number of distinct elements of A[l…r] that are ≤V is at least k”. I think the line should be “Find the lowest value V in the range A[l…r] such that the number of distinct elements of A[l…r] that are <=V is exactly k”

Hi, can someone explain me the offline approach required to answer subtask 3.
I know Mo’s algorithm, so I can sort queries in each sqrt(N) bucket and process them in order. But how exactly can I mantain a structure to add/remove elements to the set , and give me the K-th element in the set, each in no more than O(logN)

Can someone help me out ?

1 Like

About the offline approach to solve subtask 3:

First use coordinate compression on the array elements so that all array elements are to mapped to numbers 1 to N. Then create an array of previous and next occurance for all indices in the array. Then use MO’s algorithm. For this create a segtree with N leaf nodes (i.e. 2*N-1 total nodes ) and a node representing range (i,j) will contain count of the number of array elements in range (i,j) present for current position of mo-left and mo-right. Now suppose we want to increase mo-right then we see that whether the previous occurance of the array element at mo-right+1 is less than mo-left. If yes then add 1 to all the ranges in segtree including this array element. This takes O(log n) time. Once mo-left = l and mo-right = r ( l and r represent query ) then query on the segtree takes O(log n) time.

1 Like

@grey_code Thanks! I undetstood the part where you find previous and next occurence of the number,

But I am having some trouble understanding what your seg tree is doing.
So if I understood this right, you can get number of distinct elements in range [i,j] for current mo-left and current mo-right.
How does that help you find the K-th element in the range [i,j].

Thanks again!

@s4shyam95

You understood it right that for certain mo-left and mo-right I have a segtree each of whose node stores number of distinct elements in its corresponding range [i,j]. Clearly finding kth largest element means kth largest element in range [1,N] (Remember all array elements are mapped to numbers 1 to N). This is a basic query. Now if I query giving k to the root node it will check if its left child has more than k elements. If yes it will go to the left child and repeat the procedure. If left child has less than k elements it will go the right child with parameter k-(count of left child) and then repeat the procedure. This process will continue until the leaf node is reached, hence taking O(height) = O(log n) time.

1 Like

@grey_code Thanks a lot, that was great explanation, I understood if correctly once I read this followed by seeing your code for it.

Though I believe that the prev and next can be eliminated by using a frequency map for current range i.e. mo-left to mo right.

@grebnesieh Thanks a lot for sharing your solution. Understood a whole new way to use dynamic pointers. I solved it after reading the editorial in N* log^ 3( N) and still got AC:) but I still couldn’t get how to reduce memory consumption to log( N) using persistent segment tree. Your approach taught me both the log( N) memory as well as N* log^ 2( N) approach. Keep sharing…:D.

Here’s my solution( N* log^ 3( N) ): CodeChef: Practical coding for everyone

I’d love to get any reviews on my solution to improve the approach or any explanation on how did it pass the time limit.

1 Like

There is another approach using bitset (you have to code your own bitset though) and sqrt decomposition, which is pretty easy to code!

4 Likes

Can you explain it more?

Added explanation.

I am familiar with MO’s algorithm. You can use c++ map to store both the presence and the count of an integer. So everytime you increase or decrease the range, you either decrease or increase the count, and when the count is 0, you basically remove it. The number of distinct integers is basically the size of the map. The problem is to get the kth integer in the map, unforunately, c++ map doesn’t support random access. One solution is to use an auxillary vector that works together with the map to allow random access. let me know if you want me to explain in more detail.

@mightymercado Yes please, can you explain or give me a link that explains the how to use a auxillary vector that works together with the map to allow random access.

Thanks a lot!

@mightymercado, map with logn insertion and logn order statistics doesn’t work unfortunately. You could use policy based data structure for this as in: CodeChef: Practical coding for everyone However this tle’s for 3 of the tasks. I am wondering whether my mo algorithm is correctly set up or something wrong with it
However seeing others solution for 60 points, storing a binary indexed tree works :confused:

Hmm sorry. I was wrong about auxillary array. I think you need to use an order tree Order statistic tree - Wikipedia

Wow, you got AC with O(N*N/64) :o

1 Like

The editorial also includes an approach using persistent segment trees. (“functional data structures”)

2 Likes

Using “at least” in this case is equivalent, but has the added benefit that if the statement is true for V, then the statement is true for V’ > V.